|
Cookbook /
ISO8859MakePageNamePatternsSummary: ISO 8859 character conversion for url page names; strip accents etc. from characters for easier readable page names
Version: 2007-11-20
Prerequisites:
Status:
Maintainer:
Questions answered by this recipeHow can I strip accents from characters for easier readable page names? DescriptionTo convert ISO 8859 character input to unaccented equivalents Add the following to config.php for automatic creation of page names which have accents stripped from their characters. This adds a conversion mapping array to PmWiki's Links like To convert existing pagenames you can use the script isorename.phpΔ. Read below! For ISO 8859-1 (Latin-1 Western European)NOTE: This cookbook is dependent on the encoding type of # standard patterns from pmwiki.php
SDV($PageNameChars, '-[:alnum:]');
SDV($MakePageNamePatterns, array(
"/'/" => '',
"/[^$PageNameChars]+/" => ' ',
'/((^|[^-\\w])\\w)/e' => "strtoupper('$1')",
'/ /' => ''
));
# additonal character conversion patterns for ISO 8859-1 character set
SDV($ISO88591MakePageNamePatterns, array(
'/Á/' => 'A', '/Â/' => 'A', '/Ã/' => 'A', '/Ä/' => 'Ae', '/Å/' => 'Ao', '/Æ/' => 'Ae', '/Ç/' => 'C',
'/È/' => 'E', '/É/' => 'E', '/Ê/' => 'E', '/Ë/' => 'E', '/Ì/' => 'I', '/Í/' => 'I', '/Î/' => 'I', '/Ï/' => 'I',
'/Ð/' => 'D', '/Ñ/' => 'N', '/Ú/' => 'U', '/Ó/' => 'O', '/Ô/' => 'O', '/Õ/' => 'O', '/Ö/' => 'Oe', '/Ø/' => 'Oe',
'/Ù/' => 'U', '/Ú/' => 'U', '/Û/' => 'U', '/Ü/' => 'Ue', '/Ý/' => 'Y', '/Þ/' => 'Th', '/ß/' => 'ss',
'/à/' => 'a', '/á/' => 'a', '/â/' => 'a', '/ã/' => 'a', '/ä/' => 'ae', '/å/' => 'ao', '/æ/' => 'ae', '/ç/' => 'c',
'/è/' => 'e', '/é/' => 'e', '/ê/' => 'e', '/ë/' => 'e', '/ì/' => 'i', '/í/' => 'i', '/î/' => 'i', '/ï/' => 'i',
'/ð/' => 'd', '/ñ/' => 'n', '/ò/' => 'o', '/ó/' => 'o', '/ô/' => 'o', '/õ/' => 'o', '/ö/' => 'oe', '/ø/' => 'oe',
'/ù/' => 'u', '/ú/' => 'u', '/û/' => 'u', '/ü/' => 'ue', '/ý/' => 'y', '/þ/' => 'th', '/ÿ/' => 'y'
));
# join to standard patterns
$MakePageNamePatterns = array_merge($ISO88591MakePageNamePatterns, $MakePageNamePatterns);
For other ISO 8859 standardsPlease add a suitable charcter conversion array Converting existing pagenames to unaccented equivalentsYou can use the script isorename.phpΔ. Install it as normally, than run it after you installed the character conversion patterns above, with the action: Admin permission is necessary to run this action. You can do a test run without renaming anything with parameter test=1 ( Preserving Original Characters in the TitleTo preserve the original accented page name as a page title you may want to add it to the page with the Avoiding CamelCaseby Roman, 2007-11-20 If you want to avoid CamelCase and convert spaces to hyphens (which is more SEO friendly), you can modify the recipe this way. SDV($MakePageNamePatterns, array(
"/'/" => '',
"/[^$PageNameChars]+/" => '-',
'/((^|[^-\\w])\\w)/e' => "strtoupper('$1')"
));
Page EncodingIn order catch and convert characters to another encoding type, config.php must be saved using that encoding type, or PmWiki will be unable to find the characters to convert. Newer operating systems like GNU/Linux, FreeBSD and Apple generally default to saving text files in Unicode/UTF-8, in most versions of Windows it is CP1252 which is almost the same as Latin-1. PmWiki default is Latin-1 which works fine for English and most West-EU languages, but neither works for Central-EU (Czech, Polish), nor for other alphabets (Cyrillic, Greek, Arab, Hebrew, Chinese, Korean...). Using a Text Editor to Change Encoding TypeSince the encoding type of The Future of EncodingOver time PmWiki will be updated to default to Unicode/UTF-8 encoding, which allows all possible alphabets and languages. Release Notes
See Also
ContributorsComments
User notes?: If you use, used or reviewed this recipe, you can add your name. The following format is recognized:
* (+) Optional positive comment. Name, date * (-) Optional negative comment. Name, date These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki. |