MarkupToUnstyled
Questions answered by this recipe
How can I extract just the pure unstyled text from a given string containing PmWiki markup? Like MarkupToHTML does only without HTML tags.
Links should be converted to the usual link text PmWiki produces.
Description
Developing SlimTableOfContents and extending SectionEdit I was in need for a recipe that gives the - properly unformatted - text of the headings.
I ended up with
function MarkupToUnstyled( implemented in this cookbook.
$pagename, $markuptext)
SlimTableOfContents uses the text-only result as link text in the TOC and SectionEdit creates the edit link html title from it.
Activation
- Download markuptounstyled.phpΔ into your cookbook folder
Cookbooks SlimTableOfContents and SectionEdit (since v 2.2.1-2009-02-26) include this script automatically.
When NOT using those cookbooks:
- activate the script as usual by adding the following line to your local/config.php:
include_once("$FarmD/cookbook/markuptounstyled.php");
- Customize the
array depending on recipes / markup your Wiki implements - see Customization$MarkupToUnstyledIgnorePattern
Usage
Whenever you need unstyled text-only call function MarkupToUnstyled():
$unstyledtext = MarkupToUnstyled($pagename, $markuptext);
The $unstyledtext will contain no more markup, no links, no formattings, no HTML <tags>.
How it works
MarkupToUnstyled()
- redirects all link functions to suppress the generation of <a href></a> tags and to produce only the regular PmWiki link text
e.g.[[PageWithTitle|+]]becomes 'TitleOfPageWithTitle'
e.g.[[PageNotYetCreated|+]]becomes 'PageNotYetCreated' - removes markup patterns from the input text which shouldn't be executed in step 4., i.e. removes markup that produces output we don't want in the unstyled text - see Customization
- removes html tags BEFORE evaluation markup (e.g.
[@..@]might already be wrapped with <code class='escaped'> - evaluates markup by calling PmWiki's MarkupToHTML
- removes newlines from result
- removes html tags from result
- replaces non-styling
%...%- produced by$KeepTokenswhich might be restored in step 4. - restores LinkFunctions back to their original function call
Customization
The array holds regex patterns for markup that should be ignored in unstyled text.$MarkupToUnstyledIgnorePattern
These patterns are removed from the input before calling MarkupToHTML.
By default it holds the replace pattern for [[target |#]] reference links and [[#anchor]]s:
SDV($MarkupToUnstyledIgnorePattern, array(
"(?>\\[\\[([^|\\]]+))\\|\\s*#\\s*\\]\\]", // [[target |#]] reference links
"(?>\\[\\[#([A-Za-z][-.:\\w]*))\\]\\]" // [[#anchor]]
));
Depending on the cookbooks / markups your Wiki uses you should extend the
$MarkupToUnstyledIgnorePattern array - after including the script.
E.g. if you have cookbook Footnotes installed you should add the following to your config.php:
$MarkupToUnstyledIgnorePattern[] = '\\[\\^(.*?)\\^\\]';
Cookbook SectionEdit already adds the following pattern:
$MarkupToUnstyledIgnorePattern[] = '\\(:sectionedit.*:\\)';
Notes
The default array will be extended in future versions - I'm no PmWiki expert and there might be a lot more
PmWiki builtin markups that should be ignored.
$MarkupToUnstyledIgnorePattern
The recipe is required by cookbooks
- SlimTableOfContents - to extract the pure text for the TOC
- SectionEdit (since v 2.2.1-2009-02-26) to retrieve the edit link HTML title
Release Notes
See Also
Contributors
Comments
See discussion at MarkupToUnstyled-Talk
User notes : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.