How can I use a black list to automatically block wiki spammers from using my site in an attempt to increase their Google pagerank?


Category: Security

The Attach:mt-blacklist.php script uses a black list to help control wiki spam. This list contains hundreds of partial urls (now thousands) that can now be blocked automatically. To use the script, simply copy it into the cookbook/ directory (or local/ in pmwiki1) and then add the line

    if ($action=='edit') include_once('cookbook/mt-blacklist.php');

to the local/config.php file.

When activated, mt-blacklist.php checks to see how old its copy of the black list files are and automatically downloads newer versions if out of date. It then scans the message for all of the regular expressions found in the file.

If you do not make changes to the mt-blacklist.php file it will try to put the blacklist files into the same directory where you started php (usually the same place as pmwiki.php). To set the directory for blacklist files find the $arcDir variable and set it to point to the cookbook directory (or the directory containing the mt-blacklist.php file or any other directory you want). Remember that this variable is inside of function check_Blacklist so you must declare any global variable used or it will not work:

        global $FarmD;
        $arcDir = "$FarmD/cookbook/";

Black Lists Supported

Notes and Comments

Version 1.0 - Initial Release

Version 1.1 - 04/28/2005

  • Changed to case insensitive search (this should catch a BUNCH more links)
  • Added E-mail notification of blocks
  • Added compatibility with Cookbook.Blocklist
  • Added time delay for "bad" links (tie up the sender for x seconds)

Version 1.2 - 07/19/2005

  • Added support for adding other black lists
  • Added support for the blacklist

Version 1.21 - 07/20/2005

  • Modified report to include number of matched expressions and number of matches.

Version 1.22 - 08/01/2005

  • Added archive file path configuration.

Version 1.3 - 08/23/2005

  • Improved compatibility with PMWiki 2
  • Fixed "Save and Edit" bypass of script
  • Fixed error where ok URL's were blocked

Version 1.31- 08/29/2005

  • Added improved detection of encrypted strings

Version 1.4 - 10/24/2005

  • Removed link to MT-Blacklist

Version 1.5 - 08/21/2006

  • Added support for pmwiki-2.1.14 (Thanks John Bittner?)
  • Added the MoinMaster Block List


2005/07/15: I have communicated to Wendell about this. I was getting wikispam and installed MT-Blacklist, and turned on UrlApprovals at the same time. When I tried to "save" an edited page, some PHP messages about "headers" flashed by. Things seemed to work, though. Wendell couldn't diagnose without going deeper into what I had done, so I'm going to try to stay with UrlApprovals for now, and will check in later to see if anyone else has discovered similar issues. David Ing

David, I saw a situation where removing the trailing "?>" (ie, the last line). Corrected a SIMILAR issue. I'm still not sure what's going on. Sorry! Wendell

2005/08/01: Please note that the archive files are saved in the PARENT of the cookbook directory. This means that you will need to make sure that this directory is writable (or change the base path of the file to a directory that IS writable). Use the $arcDir variable to change the default directory.

2005/08/02: This script seems to check only links inserted into your pages versus Blocklist2 which checks words and IP address.

That is EXACTLY the point! The only way spamming a wiki is profitable is IF you post a url - simple text (while sometimes inapproporiate) doesn't generate money (in the sense of links) for the spammer. To do this, the script uses the regular expressions from the files listed. Almost all of those regular expressions ARE url's but some are MUCH more generic. As noted, it does not use IP addresses (they are TOO easy to change).

2005/08/02: It is not clear what is meant by '$lockTime = 10' Change this to the number of seconds to wait - to wait for what?

The $lockTime varable is used to set the number of seconds the script will wait until it returns the error message to the spammer. The goal is that it will tie up the spammer (or at least ONE of his tasks) for that many seconds. BTW, this seems to work in at least some cases, I have a couple of "regular" spammers that hit 5 or 6 pages one right after the other - when I have this set to 0 the hits come seconds apart - when I have it set to 10 minutes, the hits come 10 minutes apart.

2005/10/16: In order to get this to work, I had to remove the following line:

I guess the MT blacklist is out of service, and this give the script hiccoughs. The other blacklist is still working, so the script is worth fixing. Jason Grossman
2005/10/24 - Thanks for the feedback. I've corrected this in version 1.4 - Wendell

2006/08/20: Using pmwiki-2.1.14 the message "This post has been blocked by the administrator" was not displaying. To fix this I put

    global $MessagesFmt;
    $MessagesFmt[] = $EditMessageFmt;
under the $EditMessageFmt assignment near the end of the file.
The $EditMessageFmt variable for displaying messages to authors is now the $MessagesFmt array, which can be displayed using the (:messages:) markup. [ ] The transition script takes care of moving messages between $EditMessageFmt and $MessagesFmt as need to preserve correct operation. John Bittner? 08 21 2006 1:10 am CST
2006/08/21 - Thanks for the feedback. I've corrected this in version 1.5 - Wendell

See Also


Wendell Brown