MarkupToUnstyled
Questions answered by this recipe
How can I extract just the pure unstyled text from a given string containing PmWiki markup? Like MarkupToHTML does only without HTML tags.
Links should be converted to the usual link text PmWiki produces.
Description
Developing SlimTableOfContents and extending SectionEdit I was in need for a recipe that gives the - properly unformatted - text of the headings.
I ended up with
function MarkupToUnstyled(
implemented in this cookbook.
$pagename
, $markuptext)
SlimTableOfContents uses the text-only result as link text in the TOC and SectionEdit creates the edit link html title from it.
Activation
- Download markuptounstyled.phpΔ into your cookbook folder
Cookbooks SlimTableOfContents and SectionEdit (since v 2.2.1-2009-02-26) include this script automatically.
When NOT using those cookbooks:
- activate the script as usual by adding the following line to your local/config.php:
include_once("$FarmD/cookbook/markuptounstyled.php");
- Customize the
$MarkupToUnstyledIgnorePattern
array depending on recipes / markup your Wiki implements - see Customization
Usage
Whenever you need unstyled text-only call function MarkupToUnstyled()
:
$unstyledtext = MarkupToUnstyled($pagename
, $markuptext);
The $unstyledtext
will contain no more markup, no links, no formattings, no HTML <tags>.
How it works
MarkupToUnstyled()
- redirects all link functions to suppress the generation of <a href></a> tags and to produce only the regular PmWiki link text
e.g.[[PageWithTitle|+]]
becomes 'TitleOfPageWithTitle'
e.g.[[PageNotYetCreated|+]]
becomes 'PageNotYetCreated' - removes markup patterns from the input text which shouldn't be executed in step 4., i.e. removes markup that produces output we don't want in the unstyled text - see Customization
- removes html tags BEFORE evaluation markup (e.g.
[@..@]
might already be wrapped with <code class='escaped'> - evaluates markup by calling PmWiki's MarkupToHTML
- removes newlines from result
- removes html tags from result
- replaces non-styling
%...%
- produced by $KeepTokens which might be restored in step 4. - restores LinkFunctions back to their original function call
Customization
The array $MarkupToUnstyledIgnorePattern
holds regex patterns for markup that should be ignored in unstyled text.
These patterns are removed from the input before calling MarkupToHTML.
By default it holds the replace pattern for [[target |#]]
reference links and [[#anchor]]
s:
SDV($MarkupToUnstyledIgnorePattern, array( "(?>\\[\\[([^|\\]]+))\\|\\s*#\\s*\\]\\]", // [[target |#]] reference links "(?>\\[\\[#([A-Za-z][-.:\\w]*))\\]\\]" // [[#anchor]] ));
Depending on the cookbooks / markups your Wiki uses you should extend the $MarkupToUnstyledIgnorePattern array - after including the script.
E.g. if you have cookbook Footnotes installed you should add the following to your config.php:
$MarkupToUnstyledIgnorePattern[] = '\\[\\^(.*?)\\^\\]';
Cookbook SectionEdit already adds the following pattern:
$MarkupToUnstyledIgnorePattern[] = '\\(:sectionedit.*:\\)';
Notes
The default $MarkupToUnstyledIgnorePattern
array will be extended in future versions - I'm no PmWiki expert and there might be a lot more
PmWiki builtin markups that should be ignored.
The recipe is required by cookbooks
- SlimTableOfContents - to extract the pure text for the TOC
- SectionEdit (since v 2.2.1-2009-02-26) to retrieve the edit link HTML title
Release Notes
See Also
Contributors
Comments
See discussion at MarkupToUnstyled-Talk
User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.