[pmwiki-users] Blog proposal

Fri Dec 16 10:50:59 CST 2005

On Fri, Dec 16, 2005 at 12:00:17AM -0600, Patrick R. Michaud wrote:
> On Fri, Dec 16, 2005 at 12:36:30AM -0500, Fick, Martin wrote:
> >    From: pmwiki-users-bounces at pmichaud.com on behalf of Patrick R. Michaud
> >    > I've been thinking about that.  Properties alone doesn't do it,
> >    > though, unless we have a way to build indexes for them.  The
> >    > nice thing about the current Category implementation is that the
> >    > linkindex speeds things up substantially.
> > 
> >      Uhmm, that's not actually always true.  I have done
> >    some benchmarks in the past and this is what I think
> >    I have noticed:
> > ...
> >      On the other hand, if you have say, a small group,
> >    with only about 10 pages in it, and you have specified
> >    group=smallgroup in your pagelist command, and your
> >    site has about 1000 files in it, it will actually be
> >    slower with the current indexing system.  
> 
> Oh, that makes sense.  But it's no problem for us to have 
> pagelist use the indexing system only if the number of pages 
> to be checked is large (and we can play a bit to figure out what
> "large" is).  Otherwise for small sets we can just scan the
> files directly.
> 
> But almost by definition, searching for page membership in
> a category involves most if not all pages on the site, so some
> sort of index is desirable.
> 
> Pm

  Well, in certain cases, if you have category hiearchies
setup, you might be on a Category page and only want to
display subcategories, then you will only be searching the
pages in the Category group.

  I know I do weird stuff, :) but I had been playing around
with PageListTemplates quite a bit when I first developed
it.  I was attempting to create a recursive PageListTemplate
that could start at the Category.Category page and show a
tree of every Category on the Site.  I don't think I ever
got it completly working, mostly because it was just too
slow.

  I am using a 200MHz Pentium and I have over 1000 Photos
each on separate wiki pages.  Trying to view the entire
Category tree (which is not really that big) meant having to
run a PageList for every Category on one page!  That meant
around 20 or so pagelists. :)

  I did however identify some simple optimizations that
still make pagelists quicker, but they would mostly only
make a difference if someone has more than one pagelist per
page.  I know most people will not, but since I've babeled
this much, I figured I should mention them.  Once the time
to search for terms inside of pages is reduced, the biggest
time consumer is actually the PageStore ls() function, who 
would have thought?  But again, mind you I have over 1000
small pages.

  In the ls function there are 2 thigns that can speed it up
each fairly dramatically.

  1) Cache the directory reads so that they do not get
     reread on consecutive pagelists, (they will never
     change in a way that we would care during page display
     would they)?  I know, this will raise the memory
     consumption.   

  2) Use preg_grep instead of individual preg_matchs

The ls() function from 2.0.0 is

  function ls($pats=NULL) {
    global $GroupPattern, $NamePattern;
    $pats=(array)$pats; 
    array_unshift($pats, "/^$GroupPattern\.$NamePattern$/");
    $dir = $this->pagefile('');
    $dirlist = array(preg_replace('!/*[^/]*\\$.*$!','',$dir));
    $out = array();
    while (count($dirlist)>0) {
      $dir = array_shift($dirlist);
      $dfp = opendir($dir); if (!$dfp) { continue; }
      while ( ($pagefile = readdir($dfp)) !== false) {
        if ($pagefile{0} == '.') continue;
        if (is_dir("$dir/$pagefile"))
          { array_push($dirlist,"$dir/$pagefile"); continue; }
        if (@$seen[$pagefile]++) continue;
        foreach($pats as $p) {
          if ($p{0} == '!') {
           if (preg_match($p,$pagefile)) continue 2;
          } else if (!preg_match($p,$pagefile)) continue 2;
        }
        $out[] = $pagefile;
      }
      closedir($dfp);
    }
    return $out;
  }

  For both of these it helps to split this into two separate
non nested main loops, one to read the files and one to
check for globbing (pattern matching).  This makes it real
easy to 1) cache arrays with the file contents of the
different $dirs.  Then for 2) globbing you can use preg_grep
on that array instead of looping through each entry and
running preg_match on it.

  Cheers,

  -Martin