UnaccentUTF8
Questions answered by this recipe
How to enable diacritics-insensitive search and pagelists?
Description
Diacritics-insensitive page index and searches.
When someone searches your wiki, the results should include pages that match both the accented and plain character variants.
The function removes / strips various accents from letters. For example, searching for either "Māori" or "Maori" should find pages containing either variant (also case insensitive).
This is for the text content and search terms only, it doesn't restrict or modify page names.
This currently works for Latin (Roman), Cyrillic, Greek, Arabic and Hebrew characters with diacritics.
Installation
Note, PmWiki.UTF-8 needs to be enabled, and your config.php file needs to be saved in the UTF-8 encoding.
- Delete
wiki.d/.pageindex
. - Add to config.php:
$StrFoldFunction = $PageIndexFoldFunction = 'UnaccentUTF8'; # See Cookbook:UnaccentUTF8 $PmTransliterator = Transliterator::createFromRules( ':: Latin-ASCII ; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD); function UnaccentUTF8($str) { global $PmTransliterator; # for German language umlauts ü->ue, uncomment next 2 lines # $str = preg_replace("/ä|ö|ü|Ä|Ö|Ü/", '$0e', $str); # $str = str_replace("\xcc\x88", 'e', $str); return $PmTransliterator->transliterate($str); }
This needs to be added before scripts/pagelist.php is loaded. Some recipes may load it (SearchCloud), they need to be included after this function is defined in order to use the new folding rules.
Configuration, Internationalization
N/A
Usage
Just search as usual.
Notes
- This requires the PHP extension Intl to be enabled on the server.
Change log / Release notes
- 20230203 First public release after 2 months of use on 2 high-volume websites.
See also
- Cookbook /
- ISO8859MakePageNamePatterns How to convert ISO 8859 character input for page names to unaccented ASCII equivalents
- PmWiki /
- UTF-8 Enabling UTF-8 Unicode language encoding in your wiki.
Contributors
Written and maintained by Petko.
Comments
See discussion at UnaccentUTF8-Talk?
User notes +1: If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.