TextExtract

(redirected from Cookbook.ExtractText)

Summary: search, grep, and extract text from other pages or groups with search terms and regular expressions, using search form or markup expression.

Version: 2025-01-30

Prerequisites: PmWiki 2.2.56 (compatible with PHP 8.1)

Status: stable

Maintainer: HansB

Users: +5 (View / Edit)

Download: extract.php Δ

Discussion: TextExtract-Talk

Questions answered by this recipe

How can I do searches showing query results within context and not just a list of page links?
I'd like to offer more advanced search options like case-sensitive, whole words, regular expressions in addition to the standard search.
How can I show content from different pages if the content matches specific query terms?

Description

Text Extract provides a search form and a markup expression for extracting text sentences, lines or paragraphs from multiple pages, using search terms (including regular expressions), and wildcard pagename patterns.

Installation:

Download extract.php Δ, copy to cookbook folder and install in config.php with:

include_once("$FarmD/cookbook/extract.php");

Usage:

Markup syntax

As search form:

  (:extract <parameters> :)
(:searchresults:)

As markup expression:

  {(extract Term1 [Term2] [-Term3] ... name=PageName group=GroupName \
        [keyword=value] [keyword=value] ...)}

With (:pagelist:):

  (:pagelist fmt=extract Term1 [Term2] [-Term3]  \
         <pagelist + textextract parameters>:)

With (:searchbox:):

  (:searchbox fmt=extract <parameters>:)
(:searchresults:)

With PowerTools {(pagelist ...)} markup expression:

  {(pagelist fmt=extract <parameters>)}

Arguments / Search Terms

Text(Pattern) - = Search Terms - display lines containing Text string or matching regular expression TextPattern. All single arguments are treated as search terms, results will be returned from pages matching all terms with no prefix or a + (plus) prefix, but not terms prefixed with - (minus). Arguments enclosed in double quotes will be treated as search phrase.

Optional Parameters

search control

case=1 - do case-sensitive search. Default is 0 insensitive case search.
phrase=1 - search terms entered are treated as a phrase, same as if they were enclosed in double quotes. Default is 0.
word=1 - match whole words only, default is 0.
regex=1 - treat search term as regular expression (preg), default is 0 treat terms as text strings.
regexcut=1 - treat terms entered into a field named 'cut' as regular expression. Default 0, terms treated ordinarily, same as in query input field.
strict=1 - a strict search mode. If more than one search terms are used it will show results only if all the terms are part of the display unit (paragraph, sentence, line, depending on the unit parameter setting).
strict=0 - a loose search mode. Will show results for any search terms, if they are on the same page, but not necessarily in the same unit (paragraph, sentence, line). If UTF-8 is enabled will also search for any search terms which include accented characters and where the accent is replaced by an ASCII equivalent (like ä to ae, ß to ss, á to a, etc. Uses ISO-8859-1 character table. See section about utf-8 below.
acc-restore=1 - (default) for search terms including strings mapped to language specific accented characters additional terms will be created with the accented replacements. First implementation for German Umlaute and ss.

page selection

group=GroupName source pages from group GroupName. Allowed are Wiki wildcards '*' and '?'.
name=PageName - source pages from PageName or Group.PageName. Allowed are Wiki wildcards '*' and '?'. You can specify any number of pagenames, comma-separated, and each could contain wiki wildcards. Page names with a - (minus) prefix will be excluded. Note that wiki wildcard pagename patterns are not the same as regex patterns!
name=PageName#section - the text from anchored #section will be taken as source. Allowed are Wiki wildcards '*' and '?' in PageName, but not in #section.
name=PageName#sectA#sectB - the text from anchor #sectA to anchor #sectB will be taken as source. Allowed are Wiki wildcards '*' and '?' in PageName.
page=GROUP.NAME - full source pagename, can include wildcards * and ?
As a searchform parameter this will hide the page field.
defaultpage=GROUP.NAME - search form parameter to put initial value into page field.

display modification

pattern=SEARCHTERM - search form parameter for search term which will hide the search field.
cut=PATTERN - do not display rows (lines or paragraphs according to unit=) matching PATTERN.
count=n - include only n number of pages in the output.
lines=n - the text source is the first n lines of a page or page section.
lines=-n - the text source is the last n lines of a page or page section.
lines=n..m - the text source is the lines from line n to line m (including line m) of a page or page section.
lines=n.. - the text source is the lines from line n till end of a page or page section.
snip=PATTERN - do not display text matching PATTERN, remove it from the line
highlight=COLOR - highlight matches using COLOR for background, default is 'yellow' background.
highlight=bold - bold (strong) text highlight.
highlight=none - do not use match highlighting.
unit=sent - single sentence is shown.
unit=dsent - default: double sentence is shown: single sentence with result plus extra sentence if in paragraph.
unit=line - single text row (line) is shown.
unit= dline - double line, single line with result plus extra line if in paragraph.
unit=para - whole paragraph is shown (separated by empty lines or headings)
unit=page - the whole page text is shown (or a part of a source page specified by PageName#section or PageName#sectA#sectB.
markup=cut - default: directives and other invisible markup will be removed and ignored.
markup=code - lines including directives will be shown as source code.
markup=text - show only visible text without markup rendering, shortened by default.
markup=on - directives will be active, but only if pattern is '.' or unit=page or unit=para.
markup=source - display results as page source code.

result display of headers, footer and lines

header=STRING - display STRING on first line.
header=count - display results counter on first line.
header=full - display extended result count on first line plus a footer to mark end.
footer=STRING - display STRING at the end as a footer.
phead=link - display fullname page link above extract; prefix=STRING display STRING above extract
phead=linkmod - display line with page link and 'modified by' link and modified time above extract
phead=linktitle - display page link as 'title'.
phead=linkgrouptitle - display page link as 'group: title'.
pfoot=STRING - display STRING on line below text page extract
title =STRING - display STRING on left side in full header, default is 'Text Extract'.
timer=1 - display search time in full header.
linenum-color=COLOR - display color of line numbers given with COLOR (color code or recognised name).
matchnum-color=COLOR - display color of match numbers given with COLOR (color code or recognised name).
pagenum-color=COLOR - display color of page numbers given with COLOR (color code or recognised name).
linenum=1 - display line numbers, default is 0
matchnum=1 - display match numbers, default is 0
pagenum=1 - display page numbers, default is 0
linewrap=0 - prohibit automatic linewrapping of preformatted text. Default is 1 (linewrap true).
shorten=1 - shorten (truncate) output to 5 words left and 10 words right of terms.
shorten=7 (example) - output is shortened to 7 words left of the highlighted term, and 14 (double) words right of it.
lwords=n - markup=text output is shortened to n words left of term.
rwords=m - markup=text output is shortened to m words right of term.
linktext=COLOR - links in markup=text output shown with COLOR. Default is blue.
ellipsis=STRING - shortened markup=text output displayed with STRING at shortened end. Default is (ellipsis)
textlinks=1 - if set links will be rendered as text only. This is the default for markup=text, but not for markup='code', 'cut' and 'on'.
order=results - pages will be displayed in the order of match results per page, pages with most matches first.

Text(Pattern)

By default search terms entered are treated as strings. With regex=1 set, or the regular expression box ticked, the term entered is taken as a regular expression (Perl). With phrase=1 set terms entered are taken as a phrase, same as enclosing terms in double quotes, like "cat & dog".

'cat' will look for all occurrences of 'cat'. The default is a case-insensitive search, so any occurrence of 'Cat', 'CAT', 'cAt' etc. will also be returned.
'cat dog' will look for string 'cat' AND 'dog', for strict=1 mode both strings need to be present in the page text portion set by parameter unit= (paragraph, sentence, line,...), or, for strict=0 mode, just in the same page, but possibly on different lines or paragraphs, i.e. 'cat' AND 'dog' on the page, but 'cat' OR 'dog' in the paragraph or sentence or line, depending on unit= parameter).
'"cat and mouse" dog' will look for string 'cat and mouse' AND string 'dog'.
To look for matches of 'cat' OR 'dog' use 'cat|dog' and check 'Regular expression'.
To look for matches of 'cat' but NOT 'dog' use 'cat -dog'.
To match the word 'cat' and not 'catastrophe' tick the Match whole word box, or use parameter word=1 in the markup expression.
When using a regex search be aware that some characters are used as special control characters: the dot ., the star *, the question mark ?, the pipe |, the dollar $, and brackets. To use any of these as normal characters you need to escape them with a backslash in front.
The regex dot . character represent any character, so if you use a single dot as the textpattern the whole page content will be returned, as it matches everything. This is enabled for default searches too.

To specifically exclude lines matching some text(pattern) put it into the cut= option. With the snip= option on the other hand you can prevent certain words or phrases being shown in any matching lines, but still get the line. Input in cut= and snip= is treated as a regular expression pattern.

PageName source lists

Page names or group names can be specified with name= and group= parameters, and can include wildcard characters star * and question mark ?, ? representing any valid single character, and * representing any string of valid characters. A page name with a minus - or ! in front will be excluded from the pages to be searched.

So name=Test* means all pages beginning with 'Test', group=PmWiki will be interpreted as all pages in group PmWiki, name=-*RecentChanges means no RecentChanges and no AllRecentChanges etc. pages.

If you use full page names like Group.Name note that the wildcard pagename pattern is not a regex pattern, and a dot here means just the separator between the Group and PageName component of a page name! When several expressions are given, they will be combined logically as AND conditions to arrive at a valid source pagelist.

Comma-separated lists of page names can also be given.

Instead of using all of a page as the source for the text extract, one can specify an anchor defined page section as source with Group.PageName#anchor, or a section between two anchors with Group.PageName#anchor1#anchor2. Within the anchor section part you cannot use wiki wildcards, but if the name contains wildcards, then pages matching the name will be searched, and results only taken from the specified anchor section. You cannot use several names with different anchor sections!

Search form markup

Markup (:extract <parameters>:) will produce a search form with a field for entering search terms and a field for entering a page name or pagename with wildcards.
Markup (:searchresults:) is used as marker for showing the results.

Note that in the standard PmWiki searchbox when entering Main/apple apple is searched in pages of group Main, but TextExtract will search for string Main/apple in pages or groups specified in the page name field. Leave a space between Main/ and apple to search for apple in group Main.

Default parameters for markup (:extract <parameters>:)

size=30
button='Search'
searchlabel='Search for'
pageslabel='On pages'
caselabel='Match case'
phraselabel='Match phrase'
wordlabel='Match whole word'
strictlabel='Match all'
regexlabel='Regular expression'
header='full'
phead='link'

Other optional parameters

case=1 - searches will be case-sensitive. The checkbox 'Match case' will not be shown.
phrase=1 - searches will look for the phrase when several words are entered, same as if these are entered in double quotes. The checkbox 'Match phrase' will not be shown.
word=1 - searches will look for whole words, not part of words. The checkbox 'Match whole word' will not be shown.
regex=1 - a checkbox 'Regular expression' will be shown, giving the option to enter a regular expression as search term.
Use group= and name= parameters as with pagelist and search markup.
page=GROUP.NAME - (you can use wildcards * and ?) this will hide the pagename field of the form, and pass on 'PageName' as source page parameter.
pattern=SEARCHTERM - this will hide the search field and search is always with the search term thus set. Setting both page= and patter= options you will get a form with just the submit button, useful to let a user get information with a preprogrammed search.
defaultpage=GROUP.NAME to set initial value for page field.
All the other keyword=value options from the {(extract ....)} markup expression can be used.

Notes on page field input

A single * will search all pages in group (if group= parameter is set), or all pages.
A group name plus an ending / will search pages in that group.
Names with wildcards will search corresponding pages, narrowed down by any page options, like group=.

Examples:

Default Search Form showing fields for search term and for page pattern input.

  (:extract:)
(:searchresults:)

Search PmWiki Documentation (by paragraph, ignore hidden markup)

  (:extract page=* group=PmWiki name=-RecentChanges regex=1:)
(:searchresults:)

Search PmWiki Documentation (by line, with code )

  (:extract page=* group=PmWiki name=-RecentChanges markup=code unit=line  regex=1:)
(:searchresults:)

Notes

Styling

You can change styling of results via css:

The results are wrapped in a div with class 'te-results'.
The header div has class 'te-header'.
The footer div has class 'te-footer'.
Each page link subheader div has class 'te-pageheader'.

Template variables

You can use some template variables within the values set with parameters header= footer= phead= pfoot=.
Useful for header=

{$$time} - search time
{$$pattern} - search term(s) from input.
{$$listcnt} - number of pages in source page list.
{$$pagecnt} - number of results pages.
{$$matchcnt} - number of matches (results).
{$$rowcnt} - number of result rows.

Useful for phead=

{$$pagenum} - consecutive number of source page.
{$$pmatchnum} - number of matches on the source page.
{$$source} - full source page name. Use as link like [[{$$source}]]
{$$pname} - name of source page
{$$ptitle} - title of source page
{$$ptitlespaced} - spaced title of source page
{$$pgroup} - group of source page

Example, imitating header=full (remove line break):

  header="%rfloat%{$$matchcnt} results from {$$pagecnt} pages,
 {$$listcnt} pages searched in {$$time} %%[+ '''$[Text Extract]''' +]"

PmWiki Search Form and pagelist directives

It is possible to use TextExtract with PmWiki (:searchbox:) search form, or (:pagelist:) directives. This may be useful in situations were it is necessary to use some pagelist options, which TextExtract does not supply.

and any other TextExtract options within the markup.

Example 1:

Search the PmWiki Documentation

  (:searchbox group=PmWiki fmt=extract:)
(:searchresults:)

Example 2

  (:pagelist Search Terms fmt=extract header=full phead=link:)

Target output to another page

With the TextExtract (:extract ...:) search form, as with the PmWiki (:searchbox:) search form, a target=Group.PageName parameter can be set, which directs the output to a specified target page

Custom Search Form

This is a form built with Forms markup, and using action='search' and fmt='extract' utilising Pmwiki pagelist and TextExtract.

Example search form, not enabled here!

(:markup:)

(:input form :)
(:input default request=1:) 
(:input default name 'PmWiki.*':)
||width=30em
||Search for ||(:input text q:) (:input submit post "Search":)||
||On pages ||(:input text name :) ||
|| ||(:input checkbox word 1:) Match whole word ||
|| ||(:input checkbox case 1:) Match case ||
|| ||(:input checkbox regex 1:) Regular expression ||
(:input hidden unit line:)
(:input hidden markup code:)
(:input hidden header full:)
(:input hidden title 'Search Results:':)
(:input hidden phead link:)
(:input hidden matchnum 1:)
(:input hidden timer 1:)
(:input hidden action search:)
(:input hidden fmt extract:)
(:input end:)
(:searchresults:)

Widening search with utf8 enabled

For a site which has UTF8 enabled and with option strict=0 set (default), from version 2023-02-07 TextExtract can search with automatically generated additional search terms, which have accented letters from the Latin1 (ISO-8859-1) character set replaced with unaccented ASCII characters. If acc-restore=1 is set (default), TE also searches with additional terms which restored accented characters, using a custom array of replacement patterns. This for now is implemented for German, for other languages you may wish to create your own custom array. Variables $TEAccReplacePatterns and $TEAccRestorePatterns can be set to point to custom pattern arrays.

This extra functionality works best if the recipe UnaccentUTF8 is installed, and for German if the German modifications for that recipe are enabled (uncommenting two code lines, see the recipe). UnaccentUTF8 changes the default PmWiki pagelist behaviour, creating a page index which is insensitive to accent characters, and also case-insensitive. This allows for more search hits for the page list created in a TextExtract search, and we can get results for terms with accented characters and their replacements (searching for "Fluß" will give results with "Fluß", "Fluss", "Einfluss" etc.). Additionally, if the $TEAccRestorePatterns array allows it, a search for "fluss" will give results for "fluss", "Fluß", "Einfluss" etc. If this is not wanted, one can restrict the search by setting option strict to 1, as well as doing a case sensitive search with case=1, a word specific search with word=1. Those options could be supplied via checkboxes in the search form.

Note on `$TERemoveMarkupPatterns`

$TERemoveMarkupPatterns is an array variable containing patterns and replacements, which are used during a search for a phrase (phrase=1) to remove inline markup during the search and in the results shown, to make search for text phrases possible and independent from whatever inline markup is part of the page source text.

Custom inline markup, which may interfere in a phrase search, may be added as pattern => replacement in config.php, before including the extract.php script.
Example:

//remove {gloss} markups
$TERemoveMarkupPatterns["/\\{(.*?)\\}/"] = "";

Note on `$TEPermittedDirectives`

Experimental! Allowing markup directives to stay active in search results can have undesired consequences. The highlight feature will certainly not work on terms matched with text rendered via active directives.
From version 2023-02-05 on the array variable $TEPermittedDirectives can be set as an array containing a simple list of keywords for markup directives which should stay in place. As default TE removes any markup of form (:key ...:). By setting $TEPermittedDirectives[] = 'key'; in config, such markup directive will stay active. You can add several key names into the array, you need to observe case-sensitivity. Example:

  $TEPermittedDirectives = array('Title','keywords');

Including 'Title' will allow matches to terms in a page title, even if such terms are not in the page text. If you permit in this way the (:title ...:) directive, please consider setting in config also $EnablePageTitlePriority = 1;, otherwise titles of page matches will override the page title of the search page!

Release Notes

2025-01-30: Fixed bug with footer for PHP 8 compatibility. Added default footer option.
2024-05-14a: Fixed 'strict' mode for more than two query terms.
2024-05-14: Fixed some bugs for PHP 8 compatibility. Improved query processing.
2024-05-07: Fixed bug for 'strict' matching, added as 'Match all' to extract standard form. Fixed some bugs for PHP 8 compatibility.
2023-02-16: Fixed bug in handling $TEPermittedDirectives.
2023-02-08: Fixed bug in word boundaries for Word search. Improved handling of exclusions. Fixed bugs in extract markup expression and pagelist with fmt=extract. Added option 'regexcut', so 'cut' option can have input ordinary terms or as regular expression. Fixed bug with option phrase=1.
2023-02-07: renamed $AccentReplPatterns to $TEAccReplacePatterns, added $TEAccRestorePatterns and $DeAccRestorePatterns for restoring German Umlauts and 'ß' for additional search terms and wider search results. This works best with recipe UnaccentUTF8 installed, including the modification for German Umlauts. Added stitlespaced pseudo-template variable for use in phead result section.
2023-02-05: Added $AccentReplPatterns, $ISOLatin1ReplPatterns. Added more options for display of headings, with additional pseudo template variables. Enabled easier style customisation. Added Fixed some search bugs for searches in quotes.
2023-02-04: Fixed vspace styling bug. Fixed errors in handling multiple search words. Improved handling of phrases.
2023-02-02: Fixed pagelist input for phrases. Fixed bug in indexing page hits.
2023-02-01: Fixed using InputValues for extract search form. Fixed use of custom $TERemoveMarkupPatterns. Fixed bug in multi-line table searches when using unit=para.
2023-01-31: Fixed bug selecting rows with unit=para. Added helper function 'is_countable' to make extract.php backwards compatible with PHP 5.6
2023-01-28: Fixed bug selecting rows with unit=para (tables were not shown whole).
2023-01-26: Further code fixes.
2023-01-25: Fixed more bugs for PHP 8.1 compatibility. Modified vertical spacing.
2023-01-24: Fixed some potential problems with keeptoken strings.
2022-10-31: Major revision of code to make script PHP 8.1 compatible. Changes to select page and match numbering
2022-01-22: fix Array and string offset access syntax with curly braces is deprecated warning
2017-06-16: changed extract form markup to use Markup(), not Markup_e().
2017-06-07: fixed bug to recognise word boundaries for UTF-8 characters. Fixed bugs to handle excluded terms (i.e. "-abc") correctly.
2017-06-06: added strict parameter, default is 1, for a stricter search by default, resulting in all terms being part of the search unit (line, paragraph, sentence etc.). Added serial parameter. Added config variable $TERemoveMarkupPatterns, to enable adding of markup patterns, for removing of markup during a phrase search.
2017-06-04: added phrase and text parameters, to enable search matches with inline markup removed during the search through the pages.
2016-04-23: convert source file encoding from UTF-8 to ANSI
2015-06-10: Added more unit= options, for results per sentence and double sentence and double line options. Added target=Group.PageName option for (:extract:) form, in line with the PmWiki Search form. Added result display of default list of matching pages if no search term is provided. Fixed bug with image display in results.
2015-06-06: Bug fix for when no search term was supplied. TextExtract always needs a specific search term, as it is meant to extract text portions. A '.' (dot) as search term will be treated as a regex universal character and will return all the text of all the matched pages.
2015-06-05: Bug fix for lines= parameter and result highlighting. Added phead=link as default when TextExtract is used with fmt=extract in searchbox or with pagelist directive, in order to display Page links as headers by default.
2014-02-22: Updated markup definitions for PHP 5.5 compatibility.
2009-10-15: Improved order=results sorting.
2009-10-15: Added order=results to show pages with highest number of matches first; fixed checkboxes to retain previous setting.
2009-10-02: Fixed count= option for page names with anchored section.
2009-09-28: Fixed quoted parameter handling in {(extract..)}. Fixed result count and output when snip= removes searchterm. Added single dot (.) input to return all page text (not just for regex=1).
2009-09-26: Added config variable $TEModeDefaults for setting markup mode specific default options. Made option shorten= available for all markup modes. Changed activelinks=0 to textlinks=1. Modified handling of vertical spacing, removed custom (:spacer:) markup. Fixed bug handling input of '/' when regex=1.
2009-09-25a: Simplified code for handling of input 'foo/bar' and '/'.
2009-09-25: Added activelinks=0 option (default for markup=text). Modified handling of input 'foo/bar' and '/'. Fixed markup=source output. Modified cleanup of directives. Added stripmagic() to input strings.
2009-09-23: Added markup=text option, including truncating by words.
2009-09-22: Fixed bug with handling escape markup and highlighting.
2009-09-21: please adjust your markup! I normalised the input syntax to correspond with pagelist syntax! Added #section for usewith wildcard PageName pattern; removed action=extract; source pagelist is now always generated via MakePageList(); deprecated extractresults:) (use (:searchresults:) instead); deprecated prefix= option (use phead= instead); deprecated page2= option (use name= instead).
2009-09-18: improved timer for more accuracy; corrected {$$listcnt} for use with fmt=extract.
2009-09-17a: added FPL function for pagelist fmt=extract, no custom pagelist template needed when using fmt=extract in pagelist or searchbox.
2009-09-17: Integrated use of (:pagelist ..... fmt=#extract:). Fixed some vertical spacing bugs.
2009-09-16a: Fixed bug in form markup causing inline markup in parameters to be rendered.
2009-09-16: added {$$pattern} template variable; fixed some minor bugs.
2009-09-15: modified search term input to add inclusive and exclusive term options, similar to PmWiki searchbox input; split regex from normal search input; added Regular expression checkbox; added Match whole word checkbox; added template variables for header, footer, phead parameters; changed prefix and suffix to phead and pfoot.
2009-09-07: added wrapper div and style classes.
2009-09-06: large speed optimization; more argument tweaking.
2009-09-05: tweaked argument handling.
2009-09-04: reworked the way options are combined for making pagelist; fixed some form bugs; added 'defaultpage' form parameter; added 'pattern' as form option; fixed 'suffix' bug; silently drop pages for which no read permission exists; escaped markup expressions from output.
2009-09-03: fixed bug in line numbers; expanded line numbers; changed search form to use POST and retain input values; improved markup cleaning for better display; changed some defaults.
2009-09-01: Complete code overhaul for better text processing and maintenance. Added options markup=source, (match) numbers, linewrap, perpagenumbers, highlight styles.
2008-03-07: Added unit=para option to show whole paragraphs, separated by empty lines or headings.
2008-02-12: Changed extractresult markup so outpput does not get wrapped in <p>..</p> tags
2008-02-11: Added options group= name= for source pages (same as PageList directive). Improved handling of input from pagelist markup expression (PowerTools)Added option count= and prefix=linkmod.
2008-01-31: Added markup=on option for processing markup directives when pattern is '.' or unit=page. Fixed wrong line handling when unit=page. Added cleanup of form input options. Added qualifying of relative links.
2008-01-29: Added simple filter to suppress bad pattern input by disallowing input of single regex special characters. Added capability to receive input from Pmwiki standard search form, with use of custom fmt template.
2008-01-28: Added search form with markup (:extract:) and (:extractresult:). Optimised code. Improved handling of directives and highlighting. Removed timer since results were not very meaningful. Added default option arrays. Added capability to handle comma-separated pagename lists.
2008-01-25a: Added error notice if no pages were found matching the PageName list. Changed full header to include number of pages searched.
2008-01-25: Minor fixes to handling of parameters supplied.
2008-01-24: Further improved highlighting. Added markup expressions to be rendered as source code rather than evaluated in output (same as directives). Improved vertical spacing for both nolinebreaks and linebreaks conditions, by adding custom (:spacer:) markup. Added markup expression {(cleanspacer ...)} as a wrapper for use in form templates to write output directly into a page, to remove the (:spacer:) markup.
2008-01-23: Added handling of -PageName for page exclusion from source list. Added results counter and timer for option 'header'. Added case sensitive and insensitive search option. Improved handling of directives and of highlighting. Renamed 'out' to 'markup'.
2008-01-22: Added 'highlight', 'unit' and 'out' options.
2008-01-21b: Renamed script. Renamed expression to 'extract'. Renamed 'hide' option to 'snip'.
2008-01-21a: Added suffix= option. Added handling of page section as source input. Added support for multiple PageNames, each can also have wiki wildcard characters, unless the pagename has a #section specified.
2008-01-21: Enhanced lines= option. Changed fmt= to prefix=
2008-01-20a: Added lines= parameter
2008-01-20: Initial release

Contributors

Comments

See discussion at TextExtract-Talk

User notes +5: If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.