SpamFilters

Summary: Automatic blocking of some spambots
Version: 20170619
Prerequisites:
Status: beta
Maintainer: Petko
Users: +2 (view / edit)
Discussion: SpamFilters-Talk
License: PD

Description

Automatic blocking of some spambots.

The recipe offers a way to block a number of spambots (programs posting spam on wikis and forums). Four methods are used: a honeypot, blocking HTML links, analysis of the edit summaries and post size.

Honeypot: some of the spambots will try to fill all fields of the edit form. We will add two hidden form fields, invisible for a human user. When the form is submitted, if the fields are filled or modified, then this is very likely a spambot trying to post, so we refuse to save the form.

Blocking HTML links: if the posted text contains raw HTML links like <a href=...> which are not escaped with [=...=] or [@...@] then the edit form is blocked. Some spambots try to post raw HTML; even if it wouldn't work in a wiki page, cleaning it would be annoying, so we just block it and issue a message for a human user on how to escape the HTML in order to save the page.

Edit summaries: some spambots will fill the "edit summary" field with random uppercase and lowercase characters (examples: [1], [2], [3]). We will block most of these posts if the edit summary doesn't look like a word or a sentence -- mixed upper-lowercase letters, too many consonants without a vowel. Note that sometimes, a real user may be blocked, with a message to change the edit summary, and sometimes, a spambot may post successfully, but this filter works in most cases. The code proposed below will allow $MixedCaseVariables and `EscapedText, that is, if the filter blocks your page summary, insert a backtick ` before the words, functions or variables that do not look like language.

Post size: some spambots will deface a page, replacing the content with a short paragraph with links. We can block the saving of the posted content if it is less than half of the previous content. Note that this may be annoying for real users/admins who try to refactor, cleanup or delete some pages, so we enable it only for specific pages which are often defaced.

Empty groups: some spambots create pages in new wikigroups, and in order to cleanup the mess, one has to delete the spam pages, delete the group recent changes page, and possibly cleanup the Site.AllRecentChanges page. We can conditionally set an "edit" password, even an open or community known one, for groups that do not have a *.RecentChanges page (usually empty groups).

Unlink recently deleted pages: Some spambots follow links from *.*RecentChanges pages, and by default, recently deleted pages have direct links to the edit form. This deactivates such links.

Variants of these filters have been used on pmwiki.org for several years.

Installation

To set the two hidden honeypot fields, edit the wikipage Site.EditForm and insert the following line before (:input end:)

(:input hidden code1 7264:)%comment%Enter code: (:input text code2:)%%

If your skin uses a different edit form, you should obviously edit the skin's edit form. Do this before enabling the config.php code below.

Place this near the beginning of your file local/config.php or local/farmconfig.php:

## if an edit form is posted
if ($action == 'edit' && preg_grep('/^post/', array_keys(@$_POST)) ) {
  $tmp_csum = trim(@$_POST['csum']);
  $tmp_csum = preg_replace('/[$`]\\w+/', '', $tmp_csum); # allow $Vars and `Text

  ## honeypot fields
  if (@$_REQUEST['code1']!='7264' || @$_REQUEST['code2'] > ''){
    $WhyBlockedFmt[] = 'Invalid code entered';
  }
  ## edit summary doesn't look like language
  elseif ($tmp_csum && ( preg_match("/^\\w*([a-z]+[A-Z]{2,})\\w*$/", $tmp_csum)
     || preg_match("/[bcdfghjklmnpqrstvwxz]{5,}/i", $tmp_csum) )
    ) {
    $WhyBlockedFmt[] = 'Invalid "edit summary" entered, please select a different one.';
  }
  ## raw HTML anchors
  elseif(@$_POST['text']>'' && preg_match("/(&lt;|[<])a +href=/i", MarkupEscape($_POST['text']))) {
    $WhyBlockedFmt[] = 'HTML code needs to be escaped with [=code=] or [&#64;code&#64;].';
  }
  ## if some of the above filters activated, block the post
  if(count($WhyBlockedFmt)) {
    $EnablePost = 0;
    $IsBlocked = 1;
  }
}

If you want to install the Post size filter, add to the same file the following:

function PageTextSize($pagename, $page, $new) {
  global $EnablePost, $IsBlocked, $WhyBlockedFmt, $MessagesFmt;
  if (!$EnablePost) return;
  $L1 = strlen($new['text']);
  $L0 = strlen($page['text']);
  if(!$L0) return; # page is new or was empty
  if ( $L1/$L0 < .5 && $L0-$L1>200) { # more than half AND more than 200 characters removed
    $EnablePost = 0;
    $IsBlocked = 1;
    $WhyBlockedFmt[] = $MessagesFmt[] = 'You tried to remove a large part of the page content.';
  }
}
## to enable it on all pages, remove the # before the next line
#  array_unshift($EditFunctions, 'PageTextSize');

## we enable it on selected pages only
if(preg_match('/^PmWiki\\.(Questions|PmWikiUsers)$/', $pagename))
  array_unshift($EditFunctions, 'PageTextSize');

If you want to install the Empty groups filter, add near the end of config.php:

if ($action=='edit' && ! PageExists( preg_replace('/[\\/\\.].*$/', '.RecentChanges', $pagename)))
  $DefaultPasswords['edit'] = pmcrypt('PASSWORD');

To unlink recently deleted pages, add this to the bottom of local/config.php:

# spambots abuse recently deleted pages Cookbook:SpamFilters
if(preg_match('/RecentChanges/', $pagename)) {
  $LinkPageCreateFmt =
    "<a class='createlinktext' rel='nofollow' title='\$LinkAlt'
    href='#{\$FullName}'>\$LinkText</a>";
}

Configuration

Usage

Notes

Change log / Release notes

  • 20170619 - Added Backtick escape character to Summary filter.
  • 20150412 - Added "Empty groups" filter
  • 20121020 - first public release

See also

On pmwiki.org, we have also enabled UrlApprovals and Blocklist.

PmWiki /
Blocklist  Blocking IP addresses, phrases, and expressions to counteract spam and vandalism.
Security  Resources for securing your PmWiki installation
UrlApprovals  Require approval of Url links
Cookbook /
OpenPass  Set a global password which is openly displayed to reduce spam (Alpha)
OpenPass-Talk  Talk page for OpenPass.
RecentChangesDeletion  Allow authors to delete RecentChanges pages, there-by making it possible for authors to delete wiki groups.
Security  Security authentication and authorization methods and systems
TrackChanges  Ways to more easily detect and verify all recent edits

Contributors

  • Recipe written and maintained by Petko (5ko [snail] 5ko [period] fr). The honeypot code was written by Pm.
  • If this recipe helps you or saves you time, you can help support its continued development by .

Comments

See discussion at SpamFilters-Talk

User notes +2: If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.