QuickReplace

Summary: Quickly define replacement texts in wiki pages, and use them as markup or during page save.
Version: 2006-10-27
Prerequisites: pmwiki 2.1
Status: Testing
Maintainer: StirlingWestrup
Categories: Markup MarkupWriting
  • This recipe was last tested on PmWiki version: 2.1.18
  • This recipe requires at least PmWiki version: 2.1
  • This recipe has ONLY been tested on version 2.1.18 but is intended to be compatible with all versions from 2.1 onwards. If it fails to work with an earlier version, please inform the maintainer.

Questions answered by this recipe

  • How can I define simple patterns in a wiki-page that will be automatically replaced when I save the page, like the QuickWords feature of some word processors?
  • How can I define a bunch of abbreviations so that whenever I enter them on a page, they generate HTML like <abbr title="Abbreviation Explanation">abbrev</abbr>?

Description

The QuickReplace recipe lets you define a bunch of keys and values on wiki-pages using the standard 'key' => 'value' markup, and have them interpreted as patterns and replacements to apply either when displaying a page, or when saving a page.

Notes

To activate this script, copy it into the cookbook/ directory, and add a command to your local/config.php file to load the recipe, like this:

include_once("$FarmD/cookbook/quickreplace.php");

Without further configuration, the above will cause QuickReplace to read the Site.QuickReplace page looking for entries to turn into markup. If that page had the following entries:

 'axxb' => 'XX Replacement Text'
 'azzb' => 'ZZ Replacement Text'

Then wherever axxb and ayyb appeared in the markup they would be replaced with XX Replacement Text and ZZ Replacement Text respectively. By default any regex characters in the keys will be escaped, as will any html markup (other than simple character entities) in the output. Thus the line:

 '/.*/' => '<i>foo</i>'

will replace the exact text '/.*/' with '&lt;i&gt;foo&lt;/i&gt;' which will display as <i>foo</i>, not as 'foo'.

Note that you'll probably want to enclose the definitions inside of indented [@...@] tags on Site.QuickReplace so that they are readable, like this:

[@
'axxb' => 'XX Replacement Text'
'azzb' => 'ZZ Replacement Text'
@]

Please take care if using QuickReplace in this unconfigured way, as a simple replacement like:

  'i' => 'X'

Can make it very difficult to use the wiki, as it converts ?action=edit links into ?actXon=edXt ones. If this happens, you may have to comment out the recipe while you correct the entry. Greater safety can be achieved by following some of the configuration advice below.

The operation of QuickReplace can be modifed by first defining some configuration parameters before loading quickreplace.php. Multiple sets of configuration parameters can exist at the same time, and all should be defined before loading quickreplace.php. Four different use scenarios and their accompanying configurations will be presented, followed by a general description of all of the configuration parameters, for those who need to perform greater customization.

Replace ISO-8859-1 characters with HTML entities

This configuration solves a problem reported by Hannes Korte on the PmWiki mailing list. He wanted certain German characters which were being entered in markup as 8859-1 characters to be displayed as the associated HTML entities.

He solved the problem himself, but here's how one would solve it using QuickReplace:

$QuickReplace['Entities'] = array
  ( 'replace' => array
      ( 'Ä' => '&Auml;'
      , 'Ö' => '&Ouml;'
      , 'Ü' => '&Uuml;'
      , 'ä' => '&auml;'
      , 'ö' => '&Ouml;'
      , 'ü' => '&uuml;'
      , 'ß' => '&szlig;'
      )
  );
include_once("$FarmD/cookbook/quickreplace.php");

This configuration just tells QuickReplace to handle 'Entities' using a specific set of replacements. Note that it relies on the characters like 'ß' being stored in the config.php file as 8859-1 characters. If the file is stored as UTF-8, for instance, then this example would fail to work.

By default, these replacements can be added to or overridden by entries on a page at Site.Entities. QuickReplace generates this page name by converting the array index into a page name in the $SiteGroup (which is usually Site), but this can be changed, as is shown below.

Replace the Cookbook-V1 Acronyms recipe.

Jan Hegewald asked on the PmWiki mailing list for a replacement for the obsolete Acronyms recipe. The existing Markup Extensions recipe was inadequate because it did not allow one to define the abbreviations in place and simply refer to them later.

QuickReplace will behave like the old recipe if given this configuration:

$QuickReplace['Acronyms'] = array
  ( 'flags' => 'e'
  , 'page'  => array
      ( '{$SiteGroup}.Acronyms'
      , '{$Group}.Acronyms'
      , '{$FullName}-Acronyms'
      )
  , 'match' => '{$1}'
  , 'output' => 'Keep(PSS("<abbr title=\"$2\">$1</abbr>"))'
  );

include_once("$FarmD/cookbook/quickreplace.php");

The flags parameter of 'e' is passed on to PmWiki's call of the preg_replace function, so that the replacement text is the result of executing output, rather than just the content of output itself. Because this could lead to the potential execution of arbitrary code, whenever flags contains 'e', the default value of page is set empty so that loading replacements from a wiki-page is turned off, and 'output' is set to 'Keep(PSS(\'$2\'))' which turns the executable replacement back into a plain text string. Thus, if you really want to use the 'e' flag, you will need to explicitly set the output parameter, and probably the page parameter as well.

Since we have to explicitly set the page parameter when using 'e' in flags, we're going to set it to an array of places to load pages from. Entries from later pages in the list replace earlier ones, so you can set up both Group- and Page-specific sets of abbreviations. Abbreviations on these pages will have a form like:

  'CD'   => 'Compact Disk'
  'CDR'  => 'Compact Disk (Recordable)'
  'WORM' => 'Write Once, Read Many'

The match parameter tells QuickReplace to pretend that there is a pair of curly brackets around each key, as if they had been written as {CD}, {CDR}, and {WORM} on the Acronyms page.

The output parameter has a similar effect on the replacement text. The default output value is usually just '$2', which tells QuickReplace to output the replacement text as-is ($1 is the corresponding key that was found). When 'e' is set in flags then the default output value is Keep(PSS(\'$2\')) which has the same effect. Here we're explicitly setting the output parameter to an expression which would generate an HTML abbreviation if executed.

With this configuration in place, any time the text {CD} is entered into a wiki-page, it will be displayed as the HTML markup:

 <abbr title="Compact Disk">CD</abbr>

Define QuickWords which are replaced on save.

This is the problem which this recipe was originally envisioned to solve. I wanted to be able to define a set of abbreviations and have them replaced by full text when I saved a page. I mainly plan to have abbreviations like \ae and \e' which will be replaced by æ and é when the page is saved, but there may be other uses I find as I go along.

$QuickReplace['QuickReplace'] = array
  ( 'mode'     => 'ROS'
  , 'match'    => '\$1'
  , 'sortfunc' => 'krsort'
  );

include_once("$FarmD/cookbook/quickreplace.php");

This configuration sets the match parameter so that all of the patterns generated will have a leading backslash (\). This causes a slight problem as I could easily end up defining both \a and \ab and when I entered \ab I might get the replacement for \a followed by a normal b.

To solve this problem, I've set the sortfunc parameter to the name of a custom sort function. I'm just using the standard PHP krsort function here, which will sort the replacement list in reverse order by keys. This ensures that longer replacements will always be tried before shorter ones.

In order to have the replacements occur when the page is saved rather than when it is displayed (the default), I've set mode equal to 'ROS' (Replace On Save). It is intended for it to be relatively simple to define new operational modes for QuickReplace but currently only 'markup' (the default) and 'ROS' are defined.

Convert HTML to PmWiki markup when a page is saved.

This is a far less complete example than the previous ones, because HTML is a large and complex markup language and it would take many more entries to do a decent job of conversion. Still, these patterns would make it possible to perform much of the conversion automatically. They were taken from the ROSPatterns recipe, and modified somewhat to behave slightly better. Should anyone be interested in extending them further, they should contact me and we might put together a mini-recipe for it.

$QuickReplace['HTML'] = array
  ( 'mode'     => 'ROS'
  , 'match'    => '$1'
  , 'page'     => ''
  , 'flags'    => 'sim'
  , 'regex'    => true
  , 'ordered'  => true
  , 'ends'     => array('@','@')
  , 'replace'  => array
      ( '</?i>'      => "''"          # <i> and </i>
      , '</?b>'      => "'''"         # <b> and </b>
      , '<em>'       => "'~"          # <em>, markup must be enabled
      , '</em>'      => "~'"          # </em>, markup must be enabled
      , '<strong>'   => "'*"          # <strong>, markup must be enabled
      , '</strong>'  => "*'"          # </strong>, markup must be enabled
      , '<sup>'      => "'^"          # <sup>
      , '</sup>'     => "^'"          # </sup>
      , '<sub>'      => "'_"          # <sub>
      , '</sub>'     => "_'"          # </sub>
      , '<br\s*/?>'  => "[[<]]"       # <br> and <br />
      , '<\s*?a.+?href\s*?=\s*?["\'](.*?)["\'].*?>(.*?)</a>'
        => '[[\1|\2]]'                # <a href=...>
      , '<\s*?img.+?src\s*?=\s*?["\'](.*?)["\'].*?>'
        => '\1'                       # <img ...>
      , "([^\n])(?=<h[1..6])" 
        => "$1\n"                     # newline before <h1>..<h6>
      , '<h1\s*>' => "!"              # <h1>
      , '<h2\s*>' => "!!"             # <h2>
      , '<h3\s*>' => "!!!"            # <h3>
      , '<h4\s*>' => "!!!!"           # <h4>
      , '<h5\s*>' => "!!!!!"          # <h5>
      , '<h6\s*>' => "!!!!!!"         # <h6>
      , '(</h[1..6]\s*>)([^\n])' 
        => "$1\n$2"                   # newline after</h1>..</h6>
      , '</h[1..6]\s*>'  => ""        # </h1>..</h6>
      , '([^\n])(?=<p>)' => "$1\n"    # Ensure newline before <p>
      , '([^\n]\n)(?=<p>)' => "$1\n"  # Ensure 2 newlines before <p>
      , '(</p>)([^\n])' => "$1\n$2"   # Ensure newline after </p>
      , '(</p>\n)([^\n])' => "$1\n$2" # Ensure 2 newlines after </p>
      , '</?p>'      => ""            # <p> and </p>
      )
  );

include_once("$FarmD/cookbook/quickreplace.php");

To start with, this configuration sets the page variable to the null string to disable the loading of patterns from wiki-pages. There seems (to me) to be little benefit to managing these complex strings from a wiki-page.

The mode is set to 'ROS' so that the replacements will take place when the page is saved.

The flags parameter is set to 'sim' which tells preg_replace that in these patterns '.' can match a newline, case is ignored and that matching should take place in multi-line mode.

The regex flag tells QuickReplace that the keys that it finds should be treated as regular expressions and not as ordinary strings. Normally a '.' in a key will only match a period (.), but with regex equal to true, it will match any character.

The ordered flag tells QuickReplace that it should take care to apply the patterns in the order they are given, as it can affect the outcome. Normally QuickReplace doesn't try to ensure that replacements happen in order. Note that setting the sortfunc parameter to the name of a sort function will automatically cause the ordered flag to be set.

The ends parameter is an array that tells QuickReplace how to cap the ends of a regular expression. Normally, when given a key of 'KEY', it generates the regular expression /KEY/. This parameter tells QuickReplace which alternate characters to use at the beginning and end of the regular expression. To be valid they must either both be the same character, or they should be balanced brackets like '()', '{}' or '[]'. We're setting them to '@' here so that QuickReplace generates regular expressions like @KEY@. This is a convenience as the slash character (/) is common in HTML and would have to be escaped if it were also used to mark the beginning and end of the regular expression. Note that setting the ends parameter is not recommended when mode is 'markup' as PmWiki uses the first character of a markup pattern to determine how best to process it.

Configuration Parameters

Here is the full set of configuration parameters that are currently defined, and how they control pattern processing:

action
A string containing the user action which triggers this configuration. When $action matches this string, the configuration is used. This is set to 'edit' when mode is set to 'ROS' and is empty when mode is set to 'markup' (which means the configuration is always active). This parameter can also be set to an array of strings, and the configuration is then active whenever any of the strings matches $action.
applyfunc
This is the name of a function which takes a list of pattern keys and replacement values, and a copy of the current configuration and applies them to PmWiki so that they are active. When mode is 'ROS' then this is set to 'QuickReplace_ApplyROS' which stores the patterns in the $ROSPatterns array, and when mode is 'markup' then this is set to 'QuickReplace_ApplyMarkup' which calls the PmWiki %hlt phpMarkup() function to store the patterns. New modes can easily define new apply functions that are specific to their needs.
convfunc
This is the name of a function which is in charge of performing any value quoting or representation changes in a given list of replacement keys and values. By default it is set to 'QuickReplace_Convert', which regex-quotes (via PHP's preg_quote) all keys unless regex is set to true. It also converts '<' and '>' characters to '&lt;' and '&gt;' in replacement strings unless html is set to true.
ends
This stores an array containing the beginning and ending markers that are used to create a regular expression search string. By default its set to array('/','/'). This parameter is only used by the QuickReplace_PatternList function which is called through the pattfunc parameter. If that parameter has been changed, then ends may be ignored.
flags
When QuickReplace_PatternList generates a search pattern to feed to preg_replace, it appends the flags string onto the end. Thus if it generates a regex of /key/ and flags are set to 'im' then /key/im will be the final regex handed to preg_replace. As with the ends parameter, if pattfunc has been changed from the default, then flags may be ignored.
html
This parameter controls whether the initial list of keys and replacements are to be considered to contain valid HTML or not. By default this value is false and the QuickReplace_Convert function quotes all '<' and '>' characters in replacement values so that they cannot be interpreted as parts of HTML tags. If this parameter is set to true, then replacement may contain HTML tags. Note that this opens the potential for a malicious user to add arbitrary pieces of javascript to a web page, so turning on this key changes the default value of the page parameter to empty which turns off loading of replacements from a wiki-page. If the convfunc parameter has been changed, then html may be ignored.
key
The key parameter is the only one which cannot be changed by setting a value inside a QuickReplace configuration entry. It holds the index into the QuickReplace table for the current configuration. In other words, for QuickReplace['foo'], it's the string 'foo'. It's currently not used, but is provided as an aid to those writing customizations. It may be used at some point in the future to help provide for dynamic changes to the replacement lists in mid-page.
listfunc
This is the name of a function which takes a copy of the current configuration and generates a list of search keys and replacement values. By default it's set to the function 'QuickReplace_GetReplaceList' which reads in a list of keys and values from the pages specified in the page parameter and combines them with those defined by the replace parameter. It also ensure that the input keys are regex-quoted by preg_quote whenever regex is false. This can be changed to a different function to allow for different sources for the replacement lists.
match
This value is used by QuickReplace_PatternList to generate search patterns. The match parameter has its '$1' replaced by each search key in turn to generate patterns to look for. The value of match is always regex-quoted so that it cannot be used as a regular expression. (If you need to do that, you should change the pattfunc parameter instead.) match defaults to the string '$1'.
mode
The mode parameter controls which mode-specific defaults are loaded. The value of the mode parameter is looked up in the $QuickReplaceMode table and the corresponding array of parameters is loaded into the current configuration, so long as they are not already set. New sets of behaviors can be created for QuickReplace just by adding new entries to the QuickReplaceMode table. By default, the mode is 'markup'
name
This parameter holds the default name of this configuration, and it is generated by passing the key value through PmWiki's MakePageName function and extracting the $Name portion. It is mainly used as a basis for other default values.
ordered
When this parameter is true then QuickReplace_ApplyMarkup attempts to ensure that newly defined markup will be executed in the order it is stored in the pattern list. This parameter is ignored by QuickReplace_ApplyROS (or conversely it always assumes to be true) as no special care need be taken to have ROS patterns executed in order. This parameter is automatically set to true whenever the sortfunc parameter is non-empty.
output
QuickReplace_PatternList generates replacement patterns by replacing all occurrences of '$1' in output with the current search key and '$2' with the current replacement value. If pattfunc has been set to another value, then output may be ignored. Normally this parameter defaults to the string '$2', which leaves output unchanged. If flags contains 'e' then output defaults to the expression string Keep(PSS(\'$2\')) which simply outputs the replacement string as-is.
page
This parameter is used by the QuickReplace_GetReplaceList function to determine which wiki pages to load keys and values from. If page is a string, then it is assumed to be the name of the page to load. If page is an array, then the entire list of pages is loaded in such a way as to ensure that later values override earlier ones. If no page should be loaded then page should be set to ''. If left NULL, this parameter normally defaults to "{\$SiteGroup}.$name", which loads the page with the same name as the configuration, from the Site group. For safety reasons, if flags contains 'e' then this value defaults to empty and must be set explicitly. If listfunc has been changed, then this parameter may be ignored.
pattfunc
This parameter holds the name of the function to call in order to turn a list of search keys and replacement values into a list of regular expressions and replacements suitable for passing to preg_replace. Its default value is 'QuickReplace_PatternList'.
regex
This parameter controls whether the initial list of keys and replacements are to be considered to contain regular expressions or not. By default this value is false and the QuickReplace_Convert function quotes all keys so that they cannot be interpreted as regular expressions. If this is set to true, then keys may contain regular expression, and replacements may contain capture references ($1, $2, ...) etc. Note that $0 refers to the entire matching key (ignoring the contents of match) while $99 refers to the entire matching string, including the contents of match. If the convfunc parameter has been changed, then this flag may be ignored.
sortfunc
Normally QuickReplace generates a replacement list in no specific order. If this parameter is set, it names a sort routine that should be used to sort the list before patterns are generated. This helps to ensure that patterns are tried in some specific order. When this parameter is non-empty then the ordered flag is automatically set to true. By default, this parameter is empty.
tag
This is a minor parameter which is used in a few places when an arbitrary identifier is needed. It is used by QuickReplace_GetReplaceList to specify the 'language' that the source wiki pages are in when loading replacement lists via PmWiki's XLPage function and it is used by QuickReplace_ApplyMarkup as a prefix for forming unique names for the generated markup rules passed to PmWiki's Markup function. By tag is the string "QuickReplace:$name:", where $name is the value of the name parameter.
when
When QuickReplace_ApplyMarkup is generating markup, it needs to tell PmWiki's Markup function at which point in the display process the markup should be interpreted. The value of when is passed to Markup as its second parameter, unchanged. By default it is set to 'inline'. If applyfunc is not set to 'QuickReplace_ApplyMarkup' then this parameter is ignored.

Release Notes

2006-10-27
Security update. This makes it slightly harder for a user of the recipe to shoot themselves in the foot. Now the default settings of page and output change when flags contains 'e'. The html and convfunc parameters have also been added.
2006-10-26
This is the first release of this recipe. It has been tested on my system but there are undoubtedly a few bugs left.

Comments

See Also

This recipe was created to supplement or supersede the following recipes:

Contributors

User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.