SaintMagoo December 28, 2010, at 3:24 AM: Thanks, Peter. At the moment we would hate to have to usher-on a database. One of the reasons why we love PmWiki is that we did not have to mess with that.
If you're interested I had a pretty good start towards replacing .pageinfo with an SQLite database. The problem I ran into was simply a question of optimization. Every search I threw at it worked better with the simple fullscan of the text file (.pageindex) as compared with the indexed database... Let me know if you'd be interested in the code. Peter Bowers December 26, 2010, at 04:38 PM
SaintMagoo December 26, 2010, at 10:44 AM: Thanks, Peter. Impressive - most impressive :)
Christmas Idea: Like .pageinfo, how about creating another two files? One that is indexed: Containing only file names, it can be a way to register a page that has been processed. After that, .pageinfo2 could use the registered-page index-number in a sorted word-list dictionary. Naturally sorted and binary searched, imo such a new search-system could easily replace + hyper-speed up the present searching subsystem?
The only drawback here is the relative speed of growing the dictionary-file: Inserting new words could be slow. However, if the Dictionary is also indexed, then the main dictionary need not be sorted per se. -Only the dictionary-index-file need be sorted. Thereafter, inserting a mere integer + offset into the Dictionary's index - when a new word is added - would speed dictionary-growth-up a lot?
I am considering writing this beastie. Even (worst case) for Cron use, when a page-file mtime is greater than the registry-file mtime, it is a sign that a page or three might need to be re-indexed. Should be fun.
SaintMagoo December 24, 2010, at 09:30 AM: We just uploaded 200,000 pages to our PmWiki. Not too surprisingly, searching is now glacial. After taking a peek at .pageinfo we see that is it a listing of pages, with keywords. Are there any plans afoot to do something a little more 'Googlie?
lordmundi November 06, 2007, at 10:54 AM: Just to add to the discussion below, I thought I would put a link to a sample search result on Renato's site with Sphider integrated:
lordmundi November 05, 2007, at 09:05 AM: Wow... I really like the Sphider integration you did Renato!! It looks great. Looking at the sphider site, this looks like it could be a great cookbook recipe for pmwiki. I'm wondering how you or someone else might do the following:
All in all, it looks really nice. -- FG
08/31/07 - Renato - Okay, six months later, I think I've got an idea. I've been playing with Sphider since yesterday. I could implement the search feature in one night (I had some difficulties managing on how to get the results INSIDE the main "window" on PmWiki - most of PmWiki code is Greek to me)...
Oh, anyone can take a look at it on my site, if you don't mind reading Portuguese. :P Good keyphrases are "guitarra elétrica", "symphony x", "steve howe". It will give you the idea.
Henning July 18, 2007, at 12:38 PM: It just occurred to me that it would be nice to have a search engine that on request excludes pages older than a certain date from the result (in order to concentrate on recent content). Just brainstorming ...
Henning February 22, 2007, at 10:40 AM:' I'd be interested in a solution for multiple buttons, too. I`ve seen multiple search buttons used in a non-wiki CMS, and it looks like an efficient user interface device I'd like to copy.
02/07/07 - Renato - The tips on this thread (PmWikiUsers:2006-October/034807.html) are great for the ones willing to search only for titlenames. Is it possible to have two buttons (Go/Search, as in MediaWiki, for instance)? One for searching titles and the other one for searching content?
6/5/06 - I totally understand the frustration with PmWiki's search results... But perhaps the issue has come to enough of a head that it's time for me to go ahead and implement a valid way to excerpt (and possibly rank) search results, even if it's very suboptimal in a number of respects. Most notably, it will be suboptimal in terms of speed -- every task and option we add to searching/page lists makes it run even slower than it does now.
I think I need to remind the group that PmWiki is not a search engine, has never been designed to be a search engine, and I have no intent to make it one. My stance on searching continues to be that if a site wants fast searches with relevance ranking of results and excerpted text outputs, then get a "real" search engine that is designed for such tasks and let it index the PmWiki site. (Bonus: such an engine can index and search things that aren't wiki pages, such as attachments or other static pages on the site.)
I should also point out that any author can create a custom search page on pmwiki.org, it doesn't require me to do it. For example, to have a search page that defaults to fmt=#title for its output, just create a page that looks something like:
See, for example, http://www.pmwiki.org/wiki/Test/SearchByTitle . Then use that custom search page for searching instead of the PmWiki default.
Still, I'll see if I can write up an page variable in the very near future, as well as an order=rank option.
Also visit PmWiki.Search for a documented custom search page.
2/1/06 - A lot of people continue to ask for improvements to PmWiki's search capabilities. In the past I've essentially taken the position that "PmWiki is not a search engine", and that using another search engine package (one that is optimized for performing searches) would be much better than me trying to build one of my own.
The pmwiki.org site is starting to become so heavily used that I probably need to set up a search engine there, if only to help keep the server load down. Does anyone have any suggestions for a good, easy-to-install search engine package?
The two I've looked at in detail in the past include:
ht://Dig -- I've used this several times in the past for other
projects, but it doesn't appear to be actively maintained anymore, and integrating it to PmWiki would be slightly kludgey.
swish-e -- I did a few experiments with this and concluded that
it could be made to work, but curiously it seems to lack any sort of convenient "excerpting" capability. (I could probably live without this.)
I also briefly looked at mnoGoSearch, but for some reason I didn't think it was a good fit with what I'm trying to do.
6/13/05 - PmWiki's search engine scans the markup text directly, not the page's rendered output.
might make it possible for PmWiki's search to also scan the rendered version of the text, so maybe we could go that way... :-)
4/15/05 - I've always maintained that PmWiki *isn't* a search engine, and for advanced searches a site is much better off integrating an existing search engine package rather than us trying to reinventing that particular wheel.
Still, there are times when it may be useful to provide teasers to things that aren't "searches". Most search engines have no clue of PmWiki's structures such as groups, trails, or categories, and so being able to provide teaser information in the context of those structures still makes a lot of sense.
6/13/04 - But your point is well taken. I never really thought of searching for markup sequences. :-) > > I actually do that now and then, so I think we actually have to implement > our own search engine.
Well, I wasn't planning to eliminate the search engine, either. I've just felt that once a basic search capability is available that meets the needs of most PmWiki users, my time and effort is better spent on other aspects of PmWiki and not reinventing search engines that already exist.
(Old content added to this page before Pm ever got a chance to write anything.)
After Pm made this empty entry, I shamelessly hijacked it to think aloud, maybe spark ideas in others :) I'll presently move these scribbles to PITS entries.
-Radu March 11, 2005, at 01:43 PM
Pico March 27, 2006, at 03:51 PM