|
Cookbook /
ConvertHTML-TalkSummary: Talk Page for ConvertHTML recipe
Maintainer: Eemeli Aro
CommentsErrors?Version 2011-02-16 converts code included in $LinkPageSelfFmt = "<span class='selflink'>\$LinkText</span>"; becomes $LinkPageSelfFmt = "<span class='selflink'>$LinkText</span>"; OliverBetz 2011-05-14 The latest version is giving syntax errors for me when editing certain pages (that contain no HTML): Parse error: syntax error, unexpected ':', expecting T_VARIABLE or '$' in /home/smspower/public_html/pmwiki.php(1691) : regexp code on line 18 Fatal error: preg_replace() [<a href='function.preg-replace'>function.preg-replace</a>]: Failed evaluating code: Keep(stripslashes("[@ ... in /home/smspower/public_html/pmwiki.php on line 1691 The markup snippet is part-way through my Site.LocalTemplates. I'm not sure what the problem is, it was fine with 2009-08-25. That would be a bug in how I used 'quotes "inside" quotes' on a preg_replace call with the PREG_REPLACE_EVAL modifier. Fixed now with version 2011-02-16. —Eemeli Aro February 16, 2011, at 05:49 AM
font face not convertedI've just tried to convert some text that came from google showing me a word document. It took care of most issues, but I had to clean up a few hundred "<font face="Arial" size="5">DITA </font><font face="Arial" size="6">1</font>" type of things. Any possibility these could be included in the ROS patterns? Also & nbsp ; (ampersand-nbsp-semicolon) is left untranslated. --Peter Bowers May 11, 2010, at 08:19 AM FONT tags I've left untouched for now. Yes, they're annoying, but they may also be necessary for the page layout. I you'd like to remove them on your own site, try adding the following to your config file:
$ROEPatterns['#</?font([^>]*)>#i'] = '';
is left as it is since it's valid PmWiki markup as well. To replace them with normal spaces, you could try adding the following to your config. —Eemeli Aro May 11, 2010, at 09:18 AM
$ROEPatterns['# #'] = ' ';
redundant linksand one more - i was running into the issue that a link like # convert [[http://blam.com|http://blam.com]] to http://blam.com ,'#\[\[(http[^\|]+)\s*\|\s*\1\]\]#i' => '$1' thanks again! overtones99 August 26, 2009, at 01:28 AM converting annoying tabs...sorry - one more - it may just be a result of my own crappy first timer html coding efforts from several years ago, but i'm getting TONS of tabs everywhere in my output. i've found that adding the following very simple line is indispensible in my scenario: overtones99 August 25, 2009, at 03:31 PM Archived comments"title" error, additional tagsIn the title pattern seems to be an error: "\*s" should be "\s*". <HTML></HTML>, <HEAD></HEAD> and <BODY></BODY> should be removed. What about <FONT> tags? IMO annoying, shouldn't they be removed? What about converting character entities (e.g. "Umlauts") to searchable characters? Strings containing " OliverBetz 2010-01-24 I've updated the recipe to fix the title pattern error (thanks!) and to add the HTML/HEAD/BODY removal, provided that they don't have any parameters. FONT tags I've left untouched for now. Yes, they're annoying, but they may also be necessary for the page layout. I you'd like to remove them on your own site, try adding the following to your config file:
$ROEPatterns['#</?font([^>]*)>#i'] = '';
Converting character entities to characters or vice versa isn't a bad idea, but it's a different thing from what this recipe does: PmWiki will happily handle entities and characters, unlike HTML. —Eemeli Aro April 20, 2010, at 07:50 AM
converting underlineshi eemeli. i just noticed that underlines '#<u>(.*?)</u>#i' => "{+$1+}",
thanks. overtones99 October 02, 2009, at 09:18 PM Added in version 2010-04-20. —Eemeli Aro April 20, 2010, at 07:50 AM
current content of convert-html fileThe title field in the current convert-html file is Ubuntu Edgy on the Apple Macbook and the file contents have nothing to do with html2pmwiki markup conversion Jean-Pierre Chrétien 2010-01-08 Thanks for letting us know, now fixed (copied from convert-html-2009-08-25.php). --Petko February 09, 2010, at 07:38 AM Update of today?Hi Eemeli, the convert-html.php script was today uploaded again with no author specified and without further information. Spammed or correct version? -- SchreyP January 19, 2010, at 05:05 PM convert-html.php (upload date 2010-01-19) and convert-html-2009-08-25.php (old) are identical, so don't worry OliverBetz 2010-01-24.
converting links with
|