01197: allow simple, secure auto-register with email confirmation

Summary: allow simple, secure auto-register with email confirmation
Created: 2010-05-23 15:30
Status: Open
Category: Feature
Assigned:
Priority: 55555444
Version: 2.2
OS: n/a

Description: On the strength of this pmwiki-users conversation: PmWikiUsers:2010-May/thread.html#57472...

Which also refers/references this conversation: PmWikiUsers:2009-January/thread.html#53368...

This entry is to pull together some thoughts on a desired feature-set as well as (perhaps) possible implementation paths for a self-registration feature for pmwiki.


Features

I'll put out my list and others can comment. If there's a better way to do this, either let me know or just do it yourself...

I reorganised the list as headings, to enable commenting on specific topics. I also tagged a few comments as Peter's. —Eemeli Aro
Thanks, Eemeli - that's a better organization. —Peter Bowers

Necessary: Collect/store email address in addition to username & password

Necessary: Allow user to initiate and follow through with the registration process

Necessary: Require email validation step to ensure valid email address

Necessary: Allow user to change password without admin intervention (when they know the previous password)

For wikis where security is important but whose users aren't security-conscious, the admin should be able to restrict what passwords the user can self-create - probably by configuring the rule with a regular expression. For example, the admin might require the password to be at least n characters, with at least one special character, one number, etc. RandyB May 24, 2010, at 11:29 AM

See capability of validating fields in forms below.Peter Bowers May 24, 2010, at 01:51 PM

Necessary: Allow user to recover password by sending a new one to their email address

Passwords sent unencrypted by email are visible to the world. Even if the password is meant to be temporary, some admins don't want to allow the possibility of users leaving it insecure. Better alternatives, in my opinion, are (a) to ask the user admin-defined security questions, or (b) to send a link that allows the user to reset the password, where the link expires when the password has been reset, or after three days - whichever comes first. - RandyB May 23, 2010, at 05:50 PM

I'd prefer avoiding security questions if at all possible. Really they're just setting up a secondary, easily-guessable password. Reset links sent by email are good, though, especially if or as the email address will be confirmed by the registration. —Eemeli Aro

Necessary: Allow user to change email address without admin intervention

Necessary: Simple installation & configuration

Desirable: Allow form-based user editing for admins

Desirable: Allow customizable fields in terms of what information is being collected

One admin wants only email while another wants full name and telephone number --Peter Bowers

Desirable: Allow customizable validation on the data according to admin's need

One site using AuthUserSignup needed email addresses to be validated from a given domain, some admins might have certain required and other optional fields, etc. --Peter Bowers

Desirable: Preferably built-for-purpose rather than built on top of other recipes

The one exception I'd raise is EditAttributes, as I wrote it in part to enable exactly this functionality (see its Notes section). If required, I see no problem with copying relevant parts of the code from it, though. —Eemeli Aro

My main thought was that a recipe which requires an admin to install several recipes as foundations before getting the desired functionality makes the installation hassle-full instead of hassle-free. However, the fact that I've listed it on my list of desirables doesn't mean it has to be there in the final analysis. Certainly if you packaged EditAttributes in the same zip and etc to ease installation then that wouldn't be too difficult. If you want to see what I'm trying to avoid, check the installation instructions for AuthUserSignup (my own recipe) and reference Hans's recent difficulties -- it's just too complicated with 4-5 different recipes needing to be separately installed...

Desirable: Expiration time periods for passwords / accounts


Implementation

There has been a lot of discussion about the much larger question as to where user attributes should be stored: should they be stored in (a) SiteAdmin.AuthUser (one central location) or (b) Profiles.X (invisible, protected page attributes on pages in the profiles group). I am openly biased towards the centralized solution (cards on the table), but I think PM's email re the need to abstract the form-based front-end from any back-end storage & authentication perhaps obviates the need to solve this problem up front. Here's what I mean: right now we have a working, robust system based on SiteAdmin.AuthUser. At some point in the future someone may want to set up an SQLite system or something based on profiles or whatever. If we've done a careful design we should be able to use the same front-end, simply accessing via a newly instantiated object.

And I'll take it one step further... While putting the whole authentication module into an OO class(es) (similar to pagestores) would be nice, it doesn't have to be all-or-nothing. A simple, recipe-based, built-for-purpose set of forms with some clear demarcation between data-access/modify-functionality (or classes) and UI-functionality (or classes) will be a big start down the road if PmWiki core someday decides to take authentication and split it off into a class as it has done for page store functionality.

The only *needed* core change (to get started!) is in scripts/authuser.php where the regex "/^\\s*([@\\w][^\\s:]*):(.*)/m" is hardcoded. If that were parameterized with a SDV variable then a recipe can be constructed which could presumably be simple, secure, and give a reasonable level of functionality.

On my personal assumption (which I recognize is not held by many others within the community) that SiteAdmin.AuthUser is the best place to hold the information, I refer to PM's suggestion of a possible slight modification to AuthUser data format as something to consider. Note that PM has carefully not expressed a clear preference for either (a) centralized or (b) profiles-based storage -- I'm not quoting him to claim his support for the AuthUser side of things but merely to show a possible concrete way the data might appear should things end up going that way.

May I hazard a guess, based on PmWiki Philosophy "5. Be easy to install, configure, and maintain" that even if Profiles-based security is implemented in the future probably AuthUser will continue to be supported? Maybe I'm wrong, but I see an awful lot of care being taken (both by PM and Petko) to make sure that existing sites not be broken by upgrades unless absolutely necessary. So if the Profiles-based authentication were implemented I'm guessing that it would be implemented as another alternative rather than breaking all the existing sites that are using AuthUser... So proceeding on the assumption that AuthUser will continue to exist in the future is probably not too bad of an assumption and it allows us to proceed meaningfully rather than being mired in that particular discussion...

So, what other features are needed? What other features are desired? Please comment. Do you see this as an important feature? If so, vote above...

--Peter Bowers


If you are going to force authorized users to have profiles pages, in order to store information there, there needs to be a way to keep selected profiles pages hidden. Login accounts don't necessarily correspond one-to-one to users: some may be only for testing purposes, while others may be meant to be temporary.

On the other hand, if you are going to centralize the information (as AuthUser currently does, or in a database implementation), it would be very useful to have an option to automatically generate a Profiles page from a template as part of the registration process.

- RandyB May 23, 2010, at 05:50 PM


I'm very much in favor of using profile page attributes for keeping this information, rather than complicating the SiteAdmin.AuthUser page further. Page attributes are exactly the right thing to use for this, instead of a single concatenated line of text for each user, which it would have to be otherwise (unless the authuser format is changed radically).

I'm working on the back-end code for user registration based on earlier stuff I've done, but can't commit to getting it finished in any specific time frame.

There's no need for changes in the authuser.php regex Peter refers to above, nor is there a need to force a decision between user accounts being listed in SiteAdmin.AuthUser or the Profiles group: you can have both. Here's a bit of code that can read user account info from pages in the Profiles group:

SDV($AuthUserFunctions['userprofilegroup'], 'AuthUserProfiles');

function AuthUserProfiles($pagename, $id, $pw, $pwlist) {
  foreach ((array)$pwlist as $pgroup) {
    $pn = MakePageName($pagename, "$pgroup.$id");
    $page = ReadPage($pn, READPAGE_CURRENT);
    if ($page
        && !empty($page['userid'])
        && ($id == $page['userid'])
        && !empty($page['passwdhash'])
        && (_crypt($pw, $page['passwdhash']) == $page['passwdhash'])
    ) return true;
  }
  return false;
}

To use, you'll need to also have a line userprofilegroup: Profiles in SiteAdmin.AuthUser, and have the userid and passwdhash attributes on the profile page. The tricky part, indeed, is how to get those attributes there and maintain them.

Some changes to authuser.php are required to allow for defining a user's group(s) on their profile page, as the $AuthUserFunctions can't manipulate the $auth array directly at the moment.

Eemeli Aro

If we can have both without changing the authuser.php regex, how does this suggested format get read in authuser.php?
username:password email="somebody@example.org" fullname="John Smith"
Or is the assumption that any self-registered users will store their info in Profiles? I'm also playing around with an approach to this which does not make that assumption (I similarly cannot make time-based commitments), and I believe having a configurable regex is necessary for that...
True, that format will require a change in authuser.php, but not to that regex. Instead, you'll need to modify the contents of the following foreach (lines 60-64), replacing the preg_match_all with a call to ParseArgs. You can still make it work with non-named parameters (ie. the current hash(es) and @groups) as well as the LDAP special case, but it'll be a bit slower than the current code

However, this is really a fundamental question that needs to be answered before any implementations are finalised. We should not both extend the default authuser format to accept parameters and start using profile pages for keeping the same information. In the end, I'll be content either way, but I do believe profile pages atrributes to provide a much more robust and extensible system. Keeping all the info on one page will cause problems later, especially as the amount of metadata increases.

For a practical example, I maintain a system where I'm using a custom EditAttributes-based solution to present a custom edit form for profile pages, with checkboxes and whatnot to edit a user's real name, phone number, email address, room number, job title as well as a dozen other pieces of data. Right now, all that is kept as page attributes and page text variables in the background. User control is separate. Now, if some of that data is moved to SiteAdmin.AuthUser the result will be a complex mess. Moving all of it there will give me a reeeaaally long page that'll be a pain to edit. On the other hand, if I can move the user control to the profile page as well, the system becomes much simpler. —Eemeli Aro
Actually if you change that regex to ignore anything after the password:
"^\\s*([@\\w][^\\s:]*):\s*((?:[^@]\S*?)?\s*(?:@[^@,\s]*(?:,\s*@[^@,\s]*)*)?)"
(for instance - although that is untested and may need tweaking) then anything after the password is ignored and the authorization side of things is still contained entirely within authuser.php with no further changes and without regard for whether a given user was added by an admin or by some sort of self-registration. The only other step is the front-end to populate/modify the other fields (which fields will be ignored and so I don't believe will have any impact or at worst a negligible impact on performance).
But that'll break the current ability of defining user groups on the same line, as in "user: HASH @group1 @group2".
Yes, you're right - it needed some more tweaking, as expected. I've "tweaked" above, now, to allow for this. (Incidentally, is this documented anywhere? I see that it works in the code, but I never saw the HASH with the @group in the documentation anywhere... Is there anyplace which gives kind of a "reference" to AuthUser syntax? AuthUser tends more towards examples, rather than reference, it seems...) Obviously we could simplify the regex simply by putting a character sequence that cannot appear in a hash ("==", for instance?) after the current specification and before any further data on the line.
That form of group definition is shown as the second example of Organizing accounts into groups on PmWiki.AuthUser, except that even there it's only implicit that you can also have a password hash on the same line. All I'm working from is the information there and the code. There isn't really much more syntax to AuthUser than that, mind you.

Also, your regex really is getting rather unwieldy, especially as ParseArgs can read the current format as well as your proposed extension. —Eemeli Aro
Yes, there will probably be simplifications I can build into the regex -- just haven't gone there yet. A few variables or even moving into /x with comments would clarify things. My point was more of a proof-of-concept that it could be done within a recipe with almost no core changes. Running ParseArgs() over it definitely doesn't fit in that category (besides the potential performance issues which are avoided via a regex and the efficiency of the PCRE engine). Being able to keep the same authentication engine that has been used for years is a huge advantage in my book, and more than makes up for the time required for some careful regex coding...
I think we're coming down to one of those differences of opinion where PmWiki shines. While I can appreciate the way you've got your site set up and I can see advantages (and I thought it was very elegant & laudable [bravo!] how you added profile authorization with virtually no core changes and still allowed SiteAdmin.AuthUser to be functional!), my preference as an administrator is still going to be to maintain it all in SiteAdmin.AuthUser. On the other hand, chances are you're pretty strongly persuaded that the Profiles solution has huge advantages. PmWiki shines in this type of different-admin-prefers-different-functionality because it keeps in the core what is necessary to allow recipes to do their thing and then administrators choose their recipes based on their preferred way of doing things. So my vote at this point is not to try to make a decision where we either end up with either a small majority (with a potentially large minority dissatisfied) or we end up with an interminable discussion with no forward progress (as demonstrated in the extended period of time between the discussions referenced at the top of this page). I think the core should be modified to facilitate the change you need (adding the parameter to the function) and also parameterize the regex with a global variable for slightly increased flexibility to allow my preferred solution.
Then I'd like to engage in a little (potentially off-list) correspondence with you, Eemeli (and anyone else with a vested interest in the implementation?) to see if we can come up with a common interface or API or whatever, whether object-oriented or not, so that future efforts in this area will be able to build on a relatively solid foundation...
I think that will give us the best of both worlds. Currently if an admin has a preference for a given skin they install and configure it. So if an admin has a preference for a certain type of authorization control, why not leave the same flexibility? If we can get the necessary core changes to support both then it's just a question of admins choosing a recipe which is a very pmwiki-way of doing things... What do you think?Peter Bowers
Ummm. So I kind of went and did a partial implementation of how I think the code should work at Cookbook.AuthUserProfiles. As the notes there say, it's nowhere near a complete recipe, but at least it's got partial functionality. As the name might suggest, it's primarily meant for the profile page attribute approach, but with a bit of work you should be able to generalise it to any storage solution — the data is read from or written to files in only three discrete places, the rest is framework, logic and input validation (argh, so far got 14 different status messages...).

The code is standalone so far, as it's only reading data or writing new data instead of modifying previous values. That's where EditAttributes will probably come to play, as I figure it'll be easiest for the user to add a few fields to the profile pages' edit form.

I'd be happy to talk more on this here or elsewhere; I'd suggest the pmwiki-devel list as the most appropriate. This page is getting a bit too full and non-linear for easy access. Also, this way we get to hold parts of the discussion at all reasonable places for talking about PmWiki development (both mailing lists, PITS, and Cookbook). —Eemeli Aro

I should add that all this can possibly be done today with a recipe without modifying the core (but could be a candidate). An "edit password/email form" could be done in ?action=attr (just add more $PageAttributes). Use the profile page name as an authid, and possibly the passwdattr hash as the user password field (a page with existing "attr" field cannot be deleted). A custom validation function could replace and call HandlePostAttr(). I think user groups are easier to manage in a central place like SiteAdmin.AuthUser, .htgroup or config.php, so I wouldn't bother changing that. --Petko May 24, 2010, at 02:15 AM

I don't think overloading ?action=attr is a good idea. Even though it's presented in the code as a very generic form, at least for now its only use is to control page or group access permissions, which is rather different from changing your email address or even your password. In other words, handling page access rights requires that you have some understanding of the wiki structure and eg. user groups, while signing up as a new user is one of the first things you might do on a site; having them happen on the same page or even with the same action seems like a bad idea to me.

Re: user groups, I would really like the possibility of handling them in the same place as user accounts. When you get beyond just a few users and a few groups, it quickly becomes a chore to edit access rights if you have to do something to a profile page as well as the AuthUser page. I'm not saying we should take away functionality, but extend the current ability of saying either "@group: user1, user2" or "user1: HASH @group" to covering users defined on profile pages. The minimal change to the core that should allow this is the following: —Eemeli Aro

Index: scripts/authuser.php
===================================================================
--- scripts/authuser.php	(revision 2556)
+++ scripts/authuser.php	(working copy)
@@ -68,7 +68,7 @@
   if (func_num_args()==2) $authid = $id;
   else
     foreach($AuthUserFunctions as $k => $fn) 
-      if (@$auth[$k] && $fn($pagename, $id, $pw, $auth[$k])) 
+      if (@$auth[$k] && $fn($pagename, $id, $pw, $auth[$k], $authlist)) 
         { $authid = $id; break; }

   if (!$authid) { $GLOBALS['InvalidLogin'] = 1; return; }
Fair enough -- I can be convinced. --Petko May 24, 2010, at 10:44 AM
Added to subversion for 2.2.17. --Petko June 07, 2010, at 03:47 PM