Summary: How can I search my PmWiki content via the Sphinx engine?
Prerequisites: Sphinx, PmWiki, bash, sed
Questions answered by this recipe
All sections are optional, you can remove those that do not apply to your recipe, and add new ones.
Sphinx is a full-text search engine. PmWiki out of the box already provides PmWiki:Search and discussions have been made on PmWiki:SearchImprovements regarding what is done, what should be done and what should be avoided. Despite that if ones have to go further and support stemming, indexing of multiples wikis, search of special fields, custom ranking, etc it becomes problematic. Sphinx allow to do such things efficiently.
- Have a PmWiki instance running
- Have a Sphinx instance running
searchd and well configured indexer (use
search in CLI to make sure it is the case)
- try the documented tests if it is your first time
- download the bash script sphinx_sources.txtΔ to locate and parse your wikis
- rename to sphinx_sources
- fix the paths to your own directories
- download the sed script pmwiki-to-sphinxxml.txtΔ in the same directory
- rename to pmwiki-to-sphinxxml
- run the script and test that the output you get is correct (proper XML files)
- make sure the output is in the right charset format, if not check the example given in the first lines with
- once you are satisfied with the output, add the xmlpipe2 command to your
- if this is your first time, copy and adapt
type = xmlpipe2
xmlpipe_command = /path/to/bash/script/sphinx_sources | iconv -f ISO-8859-1 -t utf-8
source = pmwikis
path = /var/lib/sphinxsearch/data/pmwikis
docinfo = extern
mlock = 0
morphology = stem_en
min_word_len = 1
charset_type = utf-8
- see also http://sphinxsearch.com/docs/current.html#xmlpipe2 for details
indexer pmwikis -rotate and correct whatever problem might happen
- typically char set mismatch or documents too large to be in the XML 2MB field, usually not actual documents
- test your newly created index via
search pmwikis mykeyword, use also
indextool --dumpheader pmwikis to make sure you have indexed most, if not all, targeted documents
- use the Sphinx PHP API to integrate with your PmWiki instance
- download and install the official PHP API
- a lot of room for creativity here
- note that this is a no database solution (to stick to PmWiki flat files) hence IDs are generated manually via hashing and are not stored as attribute (they could and probably should anyway), consequently
sphinx_pmwikis_doc_ids.php plays that role and should be included to get the paths back
Change log / Release notes
If the recipe has multiple releases, then release notes can be placed here. Note that it's often easier for people to work with "release dates" instead of "version numbers".
See discussion at Sphinx-Talk?
User notes? : If you use, used or reviewed this recipe, you can add your name. These statistics appear in the Cookbook listings and will help newcomers browsing through the wiki.