[pmwiki-users] Google local site search

Patrick R. Michaud pmichaud at pobox.com
Wed Dec 28 10:19:52 CST 2005


On Wed, Dec 28, 2005 at 05:00:15PM +0100, Joachim Durchholz wrote:
> >> I'm generally shy of doing pages differently depending
> >>on who visits it - what if there's a bug in the code that does the
> >>polymorphism? I'll never find out.)
> 
> I'd like to hear your view on this one. I think that's a relevant one - 
> with various parties writing code that transforms different aspects of 
> PmWiki, it could be difficult to reliably test whether Google&co are 
> really seeing the pages we think they see. Imagine a bug that mangles 
> the > in <a href=...> only when presenting oneself to Google - it will 
> go unnoticed for a long time. Worse, few people have the tools and 
> expertise to see that.

I'm generally shy of transforming pages as well.  However, the approach
I'm using isn't modifying <a ...> tags, it's modifying the url that
appears in the tag.  Specifically, it's removing any unpermitted 
"?action=" that appears after "$ScriptUrl".  This isn't that 
farfetched, since PmWiki is already modifying things that come after
$ScriptUrl in order to properly handle $EnablePathInfo (i.e.,
converting "$ScriptUrl/Group/Name" into "$ScriptUrl?n=Group.Name").

And if it fails to do the transformation, that just means the robots
see the unwanted links.  There's almost no chance that the result
will be invalid HTML.

> >2.  I'm not convinced that adding rel="nofollow" means that the
> >    robot won't follow the link.  According to 
> >    http://googleblog.blogspot.com/2005/01/preventing-comment-spam.html
> >    and http://microformats.org/wiki/relnofollow, the rel="nofollow"
> >    attribute simply means that the search engine shouldn't give the
> >    link any credit when ranking sites in search results.  It doesn't
> >    mean that the robot doesn't follow the link.
> 
> I'd expect most search engines to honor "nofollow" by not following the 
> link anyway. [...]

In reality, for the first six months after rel="nofollow" was
introduced, Google followed the links anyway (but didn't weight them).
Google didn't stop following the links until sometime in July 2005.
So, despite the reasons you give, Google chose to follow the 
rel="nofollow" links for quite some time after rel="nofollow" was 
introduced.  

> Um... which pattern? (I'd be hard pressed to come up with something 
> useful - robots identify themselves with a bewildering array of keywords.)

It's in scripts/robots.php, as $RobotPattern.  Currently $RobotPattern
is set to

    SDV($RobotPattern,'Googlebot|Slurp|msnbot|BecomeBot|HTTrack');

which catches the bots that regularly hit pmwiki.org.  I'll be adding
more bots as I come across them in the logs.

Pm




More information about the pmwiki-users mailing list