Blocklist2

Summary: Block vandals and spammers by IP or phrase.
Version: 2006-04-20
Prerequisites: PmWiki 2.0.x or newer
Status: Deprecated
Maintainer: Crisses XES
Categories: Obsolete

This recipe is deprecated, and will no longer be supported. Please see the built-in blocklist features that come pre-packaged in PmWiki. Thank you!!

This recipe lets you block wiki vandals and spammers who are trying to use your site in a vain, futile attempt to increase their Google pagerank. The "blocklist" is like a blacklist for IP numbers or phrases.

Everyone should upgrade to blocklist2.phpΔ, and there is a Attach:sampleblocklist.txt Δ (note that the file contains content for blocking purposes that may be unsuitable for children) that you can copy to your web page -- warning the file is rather long, and you may want or need to pare it down. Blocklist2 is based on Blocklist and cleans up the code, runs more smoothly, and incorporates additional features, such as the ability to block phrases.

Blocklist2 allows the creation of a list of words or phrases, and IP addresses to be blocked from posts to the site. See BlocklistHelperScripts for scripts that will help parse emails or blocked content for relevant terminology you may want to block on your site.

  • Blocklist phrases are case insensitive.
  • Blocklist phrases may include spaces, eg block:short phrase
  • Yes, you can use the wildcard (*) to include larger groups with IP addresses, eg block:123.124.*
  • Multiple spaces are significant eg to block a href=, a   href=, and a     href=
  • Use the approved URLS recipe that comes with PmWiki. It has a post-blocking feature you can set to a # of attempted posts of a URL. So that you can block someone attempting to post 5 or more URLs.

Installation on an individual wiki

  1. copy blocklist2.php into the cookbook/ directory
  2. add the following line to your "local/config.php" file, preferably near the top.
    if ($action=='edit') include_once('cookbook/blocklist2.php');
(Note that people using the CommentBox cookbook module should use
if ($action == 'edit' || $action == 'comment') {
  include_once("cookbook/blocklist2.php"); }
instead).
  1. create a page called Site.Blocklist (see the item below about Main.Blocklist) on your site, and seed it with some of the words, phrases or IP addresses from the posts that have been plaguing you.
  2. if you like it, take a look at the file attachment Attach:sampleblocklist.txt Δ and decide how much of that you would like to add to your blocklist. It is a list of obscure drug names, known words and phrases in-use by vandals, IP addresses of repeat offenders, etc.
  3. decide whether you would like any of the Options listed below to be added to your configuration. In particular, you should check the Blocklist Security items.

Installation on a wiki farm

  • copy blocklist2.php into the farm's cookbook/ directory
  • add the following line to the farm's "local/farmconfig.php" file, preferably near the top
    if ($action=='edit') include_once($FarmD.'/cookbook/blocklist2.php');
(Note that people using the CommentBox cookbook module should use
if ($action == 'edit' || $action == 'comment') {
  include_once($FarmD.'/cookbook/blocklist2.php'); }
instead).
  • create a page called Site.Blocklist on each wiki in your farm you want to protect - each wiki can have its own entries
  • seed Site.Blocklist with some of the words, phrases or IP addresses from the posts that have been plaguing you

Spelling counts

Be careful when typing the page name "Blocklist". On some OSes (Windows) you can use the wrong case but still get the Blocklist page displayed. However, when you try to save your edits, the script will reject them because the page name has the wrong case.

Site.Blocklist versus Main.Blocklist

The original version of the Blocklist script looked up the blocklist terms on a page called Main.Blocklist. Later, this was changed to a page called Site.Blocklist (which makes more sense because the blocklist works site-wide). To maintain backwards compatibility, the script also looks for a page called Main.Blocklist. Until the script is changed to eliminate the check on Main.Blocklist, admins should create and edit-protect both the Site and Main blocklist pages. Otherwise, a vandal can create the missing page and populate it with terms that would make your wiki unusable.

How it Works

When activated, blocklist.php scans the Site.Blocklist and Main.Blocklist pages looking for strings of the form block:something where something is a sequence of characters (including spaces to end-of-line) to be excluded from posting. Thus,

block:spam.com

would prevent any edits containing "spam.com".

In addition, the Site.Blocklist page can contain IP addresses or ranges of the form a.b.c.d or a.b.c.*; any postings from a listed address are also blocked.

Options

Blocklist Security

In general the administrator will also want to edit-protect the Site.Blocklist page to prevent arbitrary additions/deletions from the blocklist (see PmWiki.Passwords). It's also possible to read-protect the Site.Blocklist page so that others do not know the exact phrases and/or IP addresses that are being blocked.

Because of the way the script checks the Blocklist pages you should create and edit-protect both the Main.Blocklist and Site.Blocklist pages. If you do not, spammers can populate the missing one with their trash. In the long run, Main.Blocklist will be abandoned, so put your blocklist terms on Site.Blocklist.

Blocked Post Notifications & Scores

Default behavior is not to let the poster know why they are being blocked. Most of the time the poster of a post that should be blocked is a program or spammer, not a legitimate author. Since live people really trying to post are the ones most likely to see the comments, If you would like the author to see the reasons and content that is being blocked set the following:

$EnableWhyBlocked = 1;
  • this line needs to be included before the blocklist2.php file is included, otherwise the explanations won't show up.
  • With $EnableWhyBlocked = 0; the "score" ($Blocklisted) will always be either 0 or 1; the program will stop checking after the first block criteria is met. This may be necessary on older/slower servers, or if your blocklist is very long.

Regular Expressions - aka "regex" (Expert)

Advanced admins may need a little more flexibility for blocklist criteria, but this comes at an expense on the web server's resources. It's highly advisable that you only use "regex" blocklisting for a small number of entries, only when straight text matching will not work. Enable regex checking by adding:

$EnableBlocklistRegex = 1;

to your config.php (or farmconfig.php) file, before the blocklist2.php include. Format for regex is PCRE (Perl Compatible Regular Expressions) described in depth at http://us2.php.net/manual/en/ref.pcre.php. Because of the complexity of regexes, the entire regex including delimiters must be included after the blocklist term "regex:" -- for example:

regex:/Snow\s*White\s*(&|and|And)\s*[tT]he\s*(Seven|7)\s*Dwarves/

The "reason" for the regex match is the matched text, not the regex pattern. In this case the viewer or email would see that the pattern matched "Snow White & the 7 Dwarves" not the regex pattern.

Note that ^ and & will not behave in their usual manner until I get a bug fixed.

Please use this feature wisely and sparingly. Let me know if you have problems, and make sure to test your regular expressions with $EnableWhyBlocked = 1 so you can see whether they are working. Clever and judicious use of this feature can be more economical than the normal text matching, but thorough understanding of regular expressions is worthy of a dedicated upper level college course. I may put real-world examples on this page, after experimenting with some patterns.

Quickie useful example patterns:

regex:/[^(spe)]ci[=a=]lis/
regex:/ortho[^pd]/
regex:/soma[^t]/
regex:/retin[^a]/

Respectively these regexes block "cialis" but not specialist, "ortho" but not orthodontist or orthopedic, "soma" but not somatic/somatoform, "retin" but not retina.

**Note that most posts using these patterns have so many other blockable items in them that these regexes are not generally necessary!**

  • to block '/a(\s*)?href=/' or something like that. That is to say pattern start / match "a" that may be followed by any or no white space (\s*)? followed by "href" end pattern /

If anyone can think of worthy uses for the regex match but needs help with the regex syntax necessary, let me know. I'm a local regex maven.

Custom Options

The blocklist2.php script also sets the variable $Blocklisted to the number of reasons that a post was blocked, so that an administrator can perform other actions beyond simply disallowing a post. See the example configuration below for one complicated use of this variable, where it is also used to rate the severity of a post (in the email Subject line).

Example custom config.php alterations (optional)

This is a detailed set-up to perform custom actions when blocklisting. Add the following to your config.php file and alter it as required. (NOTE: This is not needed for standard installations of Blocklist2.)

  // Only bother with this info if this is an edit action
  if ($action=='edit') {

      // get the IP address of the poster
      $editip = $_SERVER["REMOTE_ADDR"];

      // Allows the poster to see a message on the edit page
      // about why they have been blocked.  Also allows the "score" feature.
      $EnableWhyBlocked = 1;

      // run the script
      include_once('cookbook/blocklist2.php');

      // if the script found blocked content
      if ($Blocklisted) {

          // Send an email to notify the site admins

          // compose the contents portion of the email to be sent:

          // give the page URL posted to
          $contents = "The poster was calling: $ScriptUrl/$pagename\n"; // page URL

          // give the IP address
          $contents .= "Blocked poster is at $editip.\n";

          // Let the admin know why the post is blocked
          $contents .= "$WhyBlockedFmt\n\n";

          // Give the original post contents, in case it's legit, has
          // additional content that ought to be blocked, or needs
          // to be reported to someone.
          $contents .= "The body of the attempted post was:\n\n" . @$_POST['text'];

          // If you want to have the end of the webserver logfile attached, use these
          // lines.  (If you don't have permissions for reading the mail log, comment
          // out these lines and report website abuse to your system's administrator.
          // This is configured for some types of Linux servers.  Your server's files
          // may vary.)
          $weblog = '/var/log/httpd/access_log';
          $contents .= "\n\n------Access Log Tail------\n" . `tail -n 20 $weblog`;

          // compose the email with the following attributes:

          // comma-separated addresses to send the mail to:
          $emailto = "myname@mydomain.tla";

          // the subject line to send it out with (includes a "score" for how bad it is):
          $subjectline = "Blocked $WikiTitle Post (Score $Blocklisted)";

          // email address the mail should be from (does not need to be a real address):
          $mailfrom = "myserver@mydomain.tla";

          // This line sends the mail.  Do not alter (unless you know what you
          //  are doing).
          mail("$emailto","$subjectline","$contents","$mailfrom");

      }
  }

Version History

Blocklist version 2.3.1 - Sept 18, 2006
bug fix as noted below.
Blocklist version 2.3.0 - April 20, 2006
Several changes - use $EnableWhyBlocked to tally score or reasons for output. Leaving the default of $EnableWhyBlocked == 0 will only check until the first block criteria is reached. Added regex enhancement -- see above. To be used SPARINGLY.
UPGRADE NOTE: Use $EnableWhyBlocked == 1; to turn on reason listing or tallies -- my error of putting "1" (quoted) is fixed, please enable your blocklist with the number instead of the string. Above config example is corrected. Failure to correct your config files when upgrading will disable your tallies and reasons!
Tested with PmWiki v 2.1.5
Blocklist version 2.2.2 - Aug 2, 2005
Added ability for the blocklist to apply to the CommentBox cookbook script. See notes above for how to invoke Blocklist2.php for CommentBox.
Blocklist version 2.2.1
Tested on PmWiki v 2.0 beta 50 -- Fixed a silly typo/bug -- Unsure why my test worked with it -- sorry!
Blocklist version 2.2
Tested on PmWiki v 2.0 beta 50 -- Added Site.Blocklist as the main page, checks for Main.Blocklist automatically to support earlier versions. Tweaked the comparison engine a little in hopes of a performance improvement by using an array_walk and changing a pattern match so that a trim() statement was eliminated. Made code a little more readable by changing variable names from the original PM code and added more comments. Added the version numbers per the cookbook module suggestions. More to come.

Suggested Uses

  • After you create your Site.Blocklist page, ask someone for a seed copy of their Blocklist, or copy & paste the text of the sample blocklist above into your Site.Blocklist page.
  • Block a word which is a prefix for other words without blocking other words by adding a space after the word prefix. This is not foolproof, but it will catch some instances of the word. A clean example is "block:car " (note the space) -- this won't block "carpooling". It also won't block "car." (at the end of a sentence).
  • Block phrases such as "block:no spam" which is one vandal's idea of a good time. This can work for bad words that are commonly broken apart to dodge spam detection such as "block:phar macy".
  • The #1 blocklist trick is to use domain names or domain name fragments. if you get a list of spam that is "subdomain.domain.com" where every "subdomain" is different, but "domain.com" keeps repeating, then use "domain.com"
  • If you put the "dot" in front of short domains you can be sure you won't block longer domains -- if bm.com is spamming you as aliens.bm.com but you don't want to block ibm.com use "block:.bm.com"
  • Because this is a wiki and doesn't use html markup you can block:href=http:, also useful to block are block:[ url and block:[link.

Notes

Another versionΔ of the script is available that's a fork of an earlier Blocklist2 release. Some differences:

  • There's no regex matching.
  • The default blocklist pages are $SiteGroup.FarmBlocklist, $SiteGroup.Blocklist, and Main.Blocklist.
  • You can do whole-string matching by double-quoting a string (so block:"urge" doesn't match the word "burger").
  • There's Unblocklist capability so a wiki in a farm can override items in the farm-wide blocklist.
  • The script is capable of logging to a wiki page ($SiteGroup.Blocklog by default).

--Hagan

A re: blocking specification questions

  • I'll work on the leading zeros the next time I revise the script, but I'm not sure I'll be able to test that since my servers don't use leading zeros. If your server uses leading zeros, it might be easiest for me to strip the leading zeros out before doing the IP check. If you want to put leading zeros in the blocklist, but your server doesn't use leading zeros it's the opposite issue -- I need to know which it is. Are you looking to line up your blocked IPs in the list, or is your server different than mine?

--Crisses XES

Corollary
The reason for leading zeros is simply to allow me to textually sort the IP address list on the Site/Blocklist page.
thanks very much

it may be able to be done if you change from:

if (preg_match("/(\\D$ipa\\.$ipb\\.(\\D|$ipc)\\.($ipd)?\\D)/", $page['text'], $matchedip)) {

to

if (preg_match("/(\\D0*$ipa\\.0*$ipb\\.(\\D|0*$ipc)\\.(0*$ipd)?\\D)/", $page['text'], $matchedip)) {

I don't think I'll be porting that change to the code permanently: I forgot that the ip address is used to check the page, not the page to check the IP address -- so it's somewhat sloppy. Let me know if this doesn't work, but be careful when you test it :) This is adding in the ability for the regular expression to match "has one or more zeros" before the sections of the IP address.XES


There is an error on line 134. It only manifests itself if you define your own $BlocklistPages. Where it says:

foreach ($BlocklistPages as $key->$value) {

It should say:

foreach ($BlocklistPages as $key => $value) {

Other than that, a very nice script. Thanks! PiotrSzczepanski

Fixed - and Thank you for the bug notification!! - released Version 2.3.1 XES

Contributors

  • Crisses XES, 5-June-2005 (latest script 2.3.1 18-Sept-2006 & page updates 20-April-2006)
  • Pm, 29-Nov-2004 (the original code this file is based on, and help tweaking the current version)

See Also

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.