00966: Aggressive static page cache with mod_rewrite

Summary: Aggressive static page cache with mod_rewrite
Created: 2007-08-11 08:46
Status: Closed (Cookbook.FastCache)
Category: Feature
Assigned:
Priority: 544

Performance: Attach:with_static_cache.txt Attach:without_static_cache.txt

See Cookbook.FastCache for a possible implementation of this.
-- EemeliAro September 17, 2007, at 07:58 AM

Version: 2.2.0 beta
OS: Apache / mod_rewrite / PHP 5.2.3

Description:
Web-servers perform the best when they serve plain HTML without PHP/Perl/Java.
Using a mod_rewrite static cache by-passes PHP and brings up pages much faster.
That also speeds up dynamic pages because the server has more resources for them.

Apache's mod_rewrite can do conditional file_exists rules.
E.g. when a file like Pagename,cache.html exists in a cache directory,
it redirects to the cache file, else it goes to "pmwiki.php".

Often some web pages are dynamic, others are static.
A way to enable/disable this would be to use a setting similar to (:title:) :
(:static-cache on:)
or if static-cache is always on:
(:static-cache off:)

or (more cache control with expiry)
(:static-cache active=on expires=day/hour/month:)

Or (more cache control with checking dependencies)
(:static-cache active=on dependencies=pagename1,pagename2,@group1,@group2:)

Or (more cache control with checking dependencies regexp)
(:static-cache active=on dependencies_regexp=pagename?|group?\..+:)


I'm already using this on www.simple-groupware.de and it's working great.

Automatic cache expiry can be also done by adding variables to the filename
in the mod_rewrite rule, there are:
TIME_YEAR, TIME_MON, TIME_DAY, TIME_HOUR, TIME_MIN, TIME_SEC, TIME_WDAY, TIME
=> E.g. cache for 1 day:
RewriteRule ^([^/a-z].*) cache.s/cms/$1_%{TIME_DAY}_%{TIME_MON}_%{TIME_YEAR}_.html [QSA,L]


The code would be sth like this:
function HandleBrowse($pagename, $auth = 'read') {
...
$FmtV['$PageText'] = MarkupToHTML($pagename, $text, $opt);

if ($EnablePathInfo) $pagename = str_replace(".","/",$pagename);

// don't cache post, ?action=xyz
$active = false;
if (empty($_POST) and empty($_SESSION) and (empty($_GET) or array_keys($_GET)==array("n")))
  $active = true;

if (<page_has_static_param_on> and $active and <page doesnt require authentication> and 
  <lastmod_pagename greater than lastmod_cache.s/$pagename,cache.html>
) {
  <foreach dependencies as $dep_pagename { unlink "cache.s/$dep_pagename,cache.html" }>
  if ($EnablePathInfo) mkdirp(dirname("cache.s/$pagename,new"));
  if ($fp = @fopen("cache.s/$pagename,new", "x")) { 
    fwrite($fp, <full_page_content>);
    fclose($fp);
    rename("cache.s/$pagename,new", "cache.s/$pagename,cache.html");
  }
}


# Some example mod_rewrite rules:
# If charset needs to be UTF-8
AddCharset UTF-8 .html

# using $EnablePathInfo = 0;
# don't cache if a query is given (e.g. action=edit), only cache get request
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{DOCUMENT_ROOT}/cms/cache.s%{QUERY_STRING}.html -f
RewriteRule pmwiki.php?n=([^/a-z].*) cache.s/$1,cache.html [QSA,L]

# use pmwiki.php if no cache is available / dynamic page
RewriteRule ^([^/a-z].*) pmwiki.php?n=$1  [QSA,L]


# using $EnablePathInfo = 1;
# don't cache if a query is given (e.g. action=edit), only cache get-request
RewriteCond %{REQUEST_METHOD} ^GET$
RewriteCond %{QUERY_STRING} ^$
RewriteCond %{DOCUMENT_ROOT}/cms/cache.s%{REQUEST_URI},cache.html -f
RewriteRule ^([^/a-z].*) cache.s/cms/$1.html [QSA,L]

RewriteRule ^([^/a-z].*) pmwiki.php?n=$1  [QSA,L]


From martin:
An option to clear the static folder cache (cache.s/*) using: ?action=clearStaticPages

Definition of normal PageCache:
- should handle markups dealing with the user's identity ($Author, $Authid, ReadProtectedPage, etc.)
- should handle time and date dependent markup (now, today, tomorrow, etc.)
- should handle randomized markup (rand, captcha, etc.)
- should handle markup like (:title:) correctly
- always active if mod_rewrite doesn't finish the request before

Definition of static PageCache used by this ticket:
- active if (:static ... :) is used
- should only handle markups dealing with the anonymous users (unauthorized)
- should handle time and date dependent markup once for building the cache
- should handle randomized markup once for building the cache (rand, captcha, etc.)
- should handle markup like (:title:) once for building the cache
- if the user doesn't want static PageCache, he simply turns (:static ... :) off

Regards,
Thomas