00392: increase pmwiki responsiveness by hard-coding some lists

Summary: increase pmwiki responsiveness by hard-coding some lists
Created: 2005-03-16 10:19
Status: Closed - fixed for 2.0.4
Category: Feature
From: Radu
Assigned:
Priority: 55
Version: 2 and up
OS: all

Description: The current way of handling server-generated lists is too server-intensive, even for medium-sized wikis with hundreds of pages as pmwiki.org is. On shared servers, this becomes disruptive to everyone's web server processes (resulting on 404 page not found reports and increased sluggishness).

Features affected:

In decreasing order of obviousness (how slow they currently render)

  
[[!Category]]
(:searchresults:)
(:pagelist:)

(others?)

Suggestion 1

Radu March 16, 2005, at 10:50 AM
Add handling for a 'rendered' argument (e.g. (:pagelist rendered:)) that would write out the lists on the page itself, between the (pagelist and a new (:endlist:) directive, and do the searching only when people edit the page (or when they click a 'refresh list' link rendered by the markup at the top and/or bottom of the list).

The Markup may have to change as:

 
(:pagelist args :) (or searchresults)
*[[Result1]] title or text added by author (maybe config option)
which may also be a block, as per recent discussion on @pmwiki-users@
*[[Result2]]
(:endlist:)

On ?action=refresh (and/or after a save from page edits), pmwiki would search the site as it normally does and the list would get augmented with any new-found elements. The lines (or blocks) starting with elements of the list which are now missing, would be moved at the bottom and marked with something like%lightgrey% so people can remove them by manually editing the page if they so wish. (or left there for a bit of on-the-spot list history, as in a ToDo list)

Result: way faster page loads at the expense of a little extra overhead when building the list.
Note: The fact that the list was not recently refreshed (or how often it is refreshed), I consider as positive rather than negative feature. It may act as an indicator of how important that list is to authors, and by checking the History of the page an admin could also see which users are most interested.

As for Categories, the [[!Keyword]] markup could render as <Keyword>(<->) where <Keyword> is a normal link to the Category page and - is a link to the category page that tells the current page to remove the [[!Keyword]] markup as well as the line starting with *[[$FullName]] on the Keyword category page.

Voila. Category page is simply served, two pages changed on request (to remove the category), plus we get to put any comments we like on the Category page on the same line with the group.page link.

Of course, dynamically generated lists could still be available where needed.

I started working on a recipe to do this but unfortunately my knowledge of the pmwiki innards is still almost inexistent. Plus, this feature looks core-like. :) Plus, I may have missed something already existing. Comments encouraged !!

Suggestion 2

Mateusz: I also believe that to be a problem, especially for Categories - at this moment they're virtually useless - I tend to get aggressive waiting for them to generate each time. My suggestion of a solution (for categories) is as below, and if I have some spare time, I'll try to make it into a recipe (but anyone wanting to do that quicker is encouraged). This solution would certainly make Categories much faster than now, when the whole Wiki is searched every time the index is generated - thus they might finally get usable:

  • every page should have a special list of Category-backlinks (either as a wiki.d/file attribute, or a separate file in a separate directory, possibly backlinks.d)
  • when some page is saved, the engine should step through every Category linked to from the page, and add the page's name to the backlinks-list of each of those
    • (the point above is responsible for adding links to Category-index)
  • when a Category/Something page is read, there should only be invoked a special function, which would sort+unique the list created in the point above, and verify each page in that list if it's still linking to the Category (by checking only if the Category is present in the targets attribute) and removing it from the list if it's no more.
    • (the point above is responsible for removing links from the Category-index)

Mateusz: Ok, I've written the recipe. See: Cookbook.FastBacklinks, that might help you!

See also

Comments

Huh? - David A. Spitzley

It seems like this would defeat the purpose of categories being used as an index since it would become out of date anytime anyone added/deleted a page belonging to the category. Updating this list automatically probably has the same issues as caching pages. -Martin Fick

Not necessarily. Assuming that index pages are consulted much more often than pages are added to them, it makes sense to have (at least some) lists semiautomatic. When I add a [[!Keyword]] to a page, I always click on it to make sure I didn't misspell. So I could simply click the refresh link, the page would re-render with the updated list and still be up to date. Two extra clicks when editing something belonging to a semiautomatic list would buy a lot of server activity every time someone simply checks the page with the list.-Radu

I have now run into category timeouts (default php 30 second execution limit) on my experimental site, and would therefore like to see a solution. This is a real problem with categories and it will be a problem for anyone using them, a proper fix is meritted. The important part to realize though is that a caching solution such as what you are suggesting will not work since things will still timeout when regenerating the cache. What is needed is a quicker way to actually "refresh" the page, a quicker way to search the pages. I think this is what pm was hinting at. -Martin Fick

Well if the issue is server load on shared hosts, that suggests a problem with the default behavior. I would think a "rendered" option would only apply to advanced users -- novice users would likely leave it off. Therefore, I suggest it is something that goes in config.php, $BakeStaticLists or something. Further, identify which is the preferred behavior (dynamic or baked pages), and set it that way in pmwiki.php. ~ RyanVarick