|
PITS /
00358Summary: Track pages of interest
Created: 2005-03-05 13:35
Status: Suspended - awaiting discussion
Category: Cookbook
From: Radu, Joachim Durchholz
Assigned:
Priority: 4433
Version: 2
OS: N/A
See also: PITS.00291 DescriptionFast access to recent changes *of interest* ProblemWiki visitors often want to be notified of changes in specific pages, or page groups. Existing solutionsThe PmWiki/MailPosts system is an approximation to such a notification system. It is too limited for our purposes because users cannot express interest in changes in just some specific pages. Design choice points
Caveats
Considerations for proposalsMail address verificationThe big issue here is how to verify that it's indeed the owner of a mail address who requested a subscription or unsubscription. The simplest mechanism seems to be this: 1. Pmwiki generates an "authentication ID" (actually a random number from a large domain). It remembers that ID together with the (un)subscription request information. 2. PmWiki sends a mail to the given mail address, reading roughly as follows: Dear user, somebody (probably you) requested that this email address be (not) notified whenever page SomePage is changed. To prevent malicious changes, we need you to confirm this request by clicking on the following link: http://wiki.tld/pmwiki.php?action=confirm&code=01946502399643654 If you didn't ask for this change in subscription, please ignore this messages - obviously somebody mistyped the email address or tried to play a dumb joke on you. If you're being sent such unwanted mail more often than you'd like, please contact abuse@wiki.tld, and we'll stop our software from ever bothering again - and please accept our apologies. Yours sincerely, wiki.tld 3. User clicks on the link. 4. Pmwiki gets the Cutting down on mail address verificationIn its simplest form, whenever a visitor requests a change in mail validation, he'll get a confirmation email. This is tedious on the users. For a wiki with password-based authentication, there's no way around it. (Even people who know an authentication password may be mischievous.) For a wiki with account-based authentication, it's possible to store their mail address with the account data, and not request mail verification when a user subscribes to a page. More on subscription handlingSince the script should be scalable we could implement a script that pmwiki would run once a day on the first 'read' call of the day (if access to the site crontab is not available). That script would compare yesterday's list of all pages in the site and their modification timestamps with the current timestamps, while caching current timestamps, and use a list of group.page1:user1:email1,user2:email2,user3:email3 or some such (details of format TBD), to determine who gets an email and/or which watchlists need to change. Then it would write the new list of pages and modification timestamps. To simplify data entry, the email could be stored in a cookie at the client side. The maintenance of the page:email list would be done with a click-to-toggle-registration mechanism as described below under proposals. That file would be invisible to browsers, since access to the wiki.d directory is denied by .htaccess New pages would be reported in the email only of the day they are created, and they would be added to all watchlists. ProposalsRadu's version (modified 2005March18)
On the page template, add a (:track:) directive that checks
When the page is saved from an edit (this is the ugly part: each save from edit could potentially process many pages - gotta think more on that),
track():
Dealing with the edit problem:
Pm's proposalWell, I still think that having the "save page" action update authors' watchlists is the hard way to go about it. Here's my algorithm:
(:pagelist trail={$FullName} order=-date:)
to the top.)
That's it. Much simpler, because it doesn't involve adding a page attribute, or updating lots of watchlists whenever a page is modified-- one only modifies watchlists in response to add-page-to-watchlist or remove-page-from-watchlist actions. Radu: I'd still like to see the pages either demoted or completely off the watchlist if the guy has seen the tracked page since its last modification time. According to your version, users will see the same list no matter how many times they've seen each page. It sort of takes away from the idea of ToCheck list. Maybe add a button that temporarily hides a bullet until the next modif of that page? Or adds a checked icon in front of the bullet... Aw. Any of this would have the same effect as my version, since on each update all tracked pages would need updating (but eliminates the delete problem) donno... what do other people think? My point is that external memory helpers (like the reference list we're designing here) are supposed to actually help remembering (in this case "remember what still needs my attention now" rather than "what am I generally interested in") *shrug* Pm's analysisThe mechanism I propose is *definitely* faster and involves less overall work and stress on the server, even in the scenario you propose. The trick is to realize that the watchlists are only "generated" when the watchlist page is viewed. Let's consider your scenario with 20 authors watching 200 pages each. When someone edits a watched page, how many pages are written or rewritten? RL: Up to 21 -- the edited page itself, plus each of the watchlists. Pm: Just one -- the edited page itself. The watchlist pages don't need updating because they hold a simple list of references to the watched pages. The "sorting" of this list takes place when the watchlist page is viewed. Result: Updating/editing is definitely less expensive in Pm's approach. What's the cost of an update? RL: Very high. In order to maintain page consistency, PmWiki uses exclusion locks to prevent any other process from accessing the database while files are being updated. Thus the entire wiki is blocked for the duration of up to twenty page updates. Things become much worse if we attempt to maintain any sort of page history on the watchlist pages. Pm: Since an update only updates one page, it's the same cost as the existing environment. How expensive is it to view a watchlist? RL: Very inexpensive -- simply display the watchlist page as a normal wiki page. Pm: Somewhat more expensive, in that the system has to scan the contents of the watchlist in order to sort the pagelist into the correct order. However, 1. This is known to be relatively inexpensive -- the (:pagelist:) and (:searchresults:) algorithms are known to work extremely well when dealing of lists of 200 pages. Furthermore, it's not like doing a search, where we have to find the pages meeting a given criteria -- the list of pages is already known (on the watchlist). 2. This expense is an "on-demand" expense -- it's only incurred when someone actually views a watchlist. If nobody views a watchlist, the expense is never incurred. Consider what happens if ten pages on the site are updated but nobody checks their watchlist. Under the other approach, we will have updated as many as 10x20 == 200 pages even though none of those updated pages were actually viewed. Result: Overall efficiency depends on the viewing pattern of the watchlists -- however, both approaches are known to have adequate performance. What happens if someone deletes a page and then adds new contents in the same location? RL: The watchlist subscriptions are lost, since they're stored as attributes of the watched page. Pm: Since the watchlists are held in separate watchlist pages, deleting a watched page and then adding a new one doesn't affect the subscriptions. Result: The Pm algorithm is more robust in light of page deletions. What happens as the users/pages ratio gets larger? RL: The cost of updates increases substantially, as each edit requires updating a larger number of pages. If a page is watched by 1000 users, then an edit to that page will require updating 1000 pages. Pm: It's not a substantial additional cost. The number of watchlists increases, which may mean more on-demand sorting of pagelists, but this is not onerous. If it becomes expensive, it's relatively easy to build in optimizations to the watchlist algorithms such that the scanning/sorting is only performed the first time a page is viewed, and the results of the scan/sort stored in the page until another update occurs. Overall I'm fairly certain the approach I describe will be much more efficient and effective, and will scale better and overall more flexible than the one you've described. But I'm open to hearing about any holes in my analysis. CommentsA watchlist feature would be a great addition for using PmWiki as ISO-9000-compliant document management system. 00386 (Upload Versioning) would be a prerequisite for that, though. --Henning March 16, 2005, at 03:56 AM Please do not replace the current MailPosts functionality -- as an admin I use it for monitoring all the wikisites (now complimented by getting posts when someone is blocked as well), my users do not get mailposts (especially as they would have to bother me to do so, because I would have to add them to the config files). I never considered the MailPosts feature to be a user-notification system. A system where users can subscribe to pages is great, though. -- Crisses Hm, I didn't notice the MailPosts limitations because my administrator didn't get the Apache to send any mails at all :-( I'm definitely looking for a user-specific notification system that can be managed by authors and by readers. In other words, an author should be able to add users to the notification list, and the readers should be able to add themselves to notification lists, too. (Now that I think about it, the notification system should extend on uploads as well as on wiki pages.) Well, these are my personal requirements - maybe I'm lucky and someone else needs something similar :-) --Henning May 12, 2005, at 12:27 PM I am in the process of evaluating PMWiki for use as an internal private wiki. A watchlist feature is critical for us ... users need to be able to specify their pages of interest. PM's design above is not only more performant but also more elegant and refined, in line with many PMWiki features; it gets my vote. This issue has been floating around for some time. Is there an expectation when this feature will be implemented? -- SteveF May 22, 2006 Also, on a related note, in general I think it is important to remember that not all PMWiki installations are large or public. I would prefer that PMWiki features address security issues optionally. The idea of making someone respond to a verification email every time they add themselves to a page watchlist would make a highly collaborative system very difficult to use. I would like to see verification be an optional setting by the admin. -- SteveF May 22, 2006 Finally, a nice feature beyond the basic watchlist would be the ability for a user to automate the watchlist process; for example, if the user had a way to say "always notify me for changes to any page I have created". -- SteveF May 22, 2006 Quite by accident I found PetkoYotov-watchlist? which seems to fit this feature request. --Henning July 03, 2006, at 08:50 AM |