This document describes the proposed and partially constructed structure of the new Subversion document and code repositories at NCEAS. These svn repositories would replace the CVS repositories that we now support. Current CVS repositories would be migrated to Subversion using the cvs2svn utility.
Current Subversion status: Production
Major remaining TODO items:
Authentication is handled using LDAP. Accounts are drawn only from the 'ou=Account,dc=ecoinformatics,dc=org' tree, which means accounts are shared with CVS. We should eventually determine how to best move to using the regular ecoinformatics.org account tree so that users need not have new accounts created, just be added to the access control files for a particular module.
Authorization is done in two stages. First, some modules require authorization by Apache when accessing the subversion repository. This is configured in the 'Location' directive of the Apache configuration file for each virtual host. This access needs to be protected by SSL, which is in progress. The downside of this type of configuration is that it requires the webserver configuration to be reloaded upon every change, which is not something we want to do frequently, as errors can cause the webserver to stop servicing requests.
Second, Subversion provides mod_svn_authz, an Apache module which is used to provide additional, path-specific authorization rules. This allows us to choose which repositories and directories within repositories are accessible to particular users and groups. The current configuration uses this feature extensively, as described in the next section. Configuration for these directives is in three files, /etc/httpd/conf/svn-nceas.authz, /etc/httpd/conf/svn-ecoinfo.authz, and /etc/httpd/conf/svn-kepler.authz. Further restrictions should generally be made by placing rules in this file. Note that new repositories will need to be added to this file, and until that is done, the new module may be anonymously readable and writable.
Some complications occur with the NCEAS projects hierarchy, which is meant to support the working groups, postdocs, sabbatical fellows, and visitors to NCEAS. Because these are far more dynamic, accounts for accessing these need to be added and removed from LDAP dynamically by the admindb. There is a perl script written already that does most of this, but which needs to be hooked up to the admindb so that changes there trigger changes in the LDAP. In addition, access control rules for the 'nceasprojects' hierarchy will be controlled exclusively through LDAP, which allows the admindb to dynamically change users and group membership programatically. Consequently, each WG, PD, SB, or VS repository will need to be set up as its own Location within the code.nceas.ucsb.edu virtual host, so that each location can have its own access control directives applied. Scripts will need to be written to accomplish this integration when triggered by changes in the AdminDB. Presumably, these would be tied to other changes that allow the AdminDB to create collaborative areas, etc.
Separate virtual hosts exist for the NCEAS, Ecoinformatics, and Kepler groups and any new groups as needed, as they will likely have different authentication needs. Currently the vhosts are set up as code.nceas.ucsb.edu, code.ecoinformatics.org, and code.kepler-project.org.
Proposed structure for each of the virtual hosts is described here.
Apache access: on, requires LDAP “cn=nceas-staff” group<br/> AuthzSVN access: used for controlling internal access
-- NCEAS Administrative Database, including: -- original code for version 1 -- revised code for version 2 from contractors -- new drupal modules -- perl and other support code, such as NCEAS::AdminDB module -- ... Location: /var/code/nceas/admindb Access: r/w for authz nceas-staff directory
staff -- working directories for computing and development staff so that | they need not maintain separate svn repositories | Location: /var/code/nceas/staff | Access: r/w for cn=nceas-staff,dc=ecoinformatics,dc=org | |-- jones -- files for jones |-- walbridge -- files for walbridge |-- regetz -- files for regetz |-- schild -- files for schild |-- ...
website -- Text copy of code for the NCEAS web site, including drupal modules, css stlyes and other template info content is not stored here Location: /var/code/nceas/website Access: r/w for authz group nceas-staff
project -- working directories for NCEAS projects, including Working | Group projects, postdoc and sabbatical projects, etc | Each of these is a separate svn repository that is | controlled by its own access directives in the apache config | that point at the appropriate LDAP users and groups. | |-- wg -- working group repositories (accessible only by members of the | | group, LDAP group syncs with the | | admindb automatically) | | | |-- inference -- inference working group -- Location: /var/code/nceasprojects/wg/inference | |-- recovery -- recovery working group | |-- ... | |-- pd -- postdoc repositories (accessible only by pd only via LDAP acct | | which is created by the admindb | | automatically) | | | |-- broitman -- Location: /var/code/nceasprojects/pd/broitman | |-- rueda | |-- ... | |-- sb -- sabbatical repositories (accessible only by sb only via LDAP acct | | which is created by the admindb | | automatically) | | | |-- martinez -- Location: /var/code/nceasprojects/sb/martinez | |-- vieglais | |-- ... | |-- vs -- visitor repositories (accessible only by vs only via LDAP acct | which is created by the admindb | automatically) | |-- alroy -- Location: /var/code/nceasprojects/sb/alroy |-- ...
Apache access: on, allows anonymous and valid-user access<br/> AuthzSVN access: generally allows anon reads, requires authz group for writing
metacat – KNB Metacat
Location: /var/code/ecoinfo/metacat Access: r/w for authz group metacat r for anonymous
morpho – KNB Morpho
Location: /var/code/ecoinfo/morpho Access: r/w for authz group morpho r for anonymous
eml – Ecological Metadata Language
Location: /var/code/ecoinfo/eml Access: r/w for authz group eml r for anonymous
Apache access: on, allows anonymouns access, write requires cn=kepler group<br/> AuthzSVN access: on, generally allow anon read, requires authz group for writing
kepler – Kepler development tree
Location: /var/code/kepler/kepler Access: r/w for cn=kepler,dc=ecoinformatics,dc=org requires authz group kepler for writing r for anonymous
kepler-docs – Kepler documentation
Location: /var/code/kepler/kepler-docs Access: r/w for cn=kepler,dc=ecoinformatics,dc=org requires authz group kepler for writing r for anonymous
kepler-core-pi – Keler/CORE project
Location: /var/code/kepler/kepler-core-pi Access: r/w for cn=kepler,dc=ecoinformatics,dc=org requires authz group kepler-core-pi for writing no anonymous access
Derik Barsegian used the cvs2svn utility to convert the reap project to SVN from CVS. The utility has many options, but we only used a few to create a default SVN repository with the history from CVS intact. In this example, the top-level reap module is converted to a top-level SVN module, but in practice you can also move it to a subdirectory of a new SVN module.
One of the main challenges facing repository migration is the handling of binary files. On many of our existing CVS repositories, these are not set correctly, and setting them within CVS still leaves files as 'application/octet-stream'. These instructions use two approaches to solve this problem: rely on a mimetype registry, and explicitly set mimetypes for other unknown types. A few support scripts for batch-conversion: https://code.nceas.ucsb.edu/code/staff/walbridge/svn-migration/
First, check out the repository (in this example 'expertise' and look at the filetypes:
$ export CVSROOT=:ext:`email@example.com:/cvsnceas $ cvs co expertise && cd expertise
Generate a list of file extensions, using 'typer.sh' from the above repository (requires `my_autoprops' in the same folder):
This will output a list of the files found in the CVS repository, and whether we know what do do with the MIME type in question. The only types which need to be handled:
For both of these cases, figure out what's the appropriate MIME type, and add an entry to `my_autoprops'. Make sure that each file type which lacks an entry AND is binary has a mapping. Text files (e.g. .csv, .log) can safely be ignored, but mappings must be defined for other binary files. Add the unknown file types (default to application/octet-stream if unable to determine) to a new file, called here my_autoprops>
Once you've checked this list and insured that all file types are accounted for, you can continue with the cvs2svn conversion:
$ cvs2svn --eol-from-mime-type --auto-props=my_autoprops --mime-types=/etc/mime.types --fallback-encoding=utf_8 --tmpdir=/tmp/walbridge --keywords-off --svnrepos ./expertise /var/cvsnceas/expertise
Check the resulting SVN repository by checking it out, and looking for the mime type attribute:
$ svn co file:///./expertise/ expertise-checkout $ svn propget svn:mime-type -R
If this list looks good, you're all set.
This is used when you want to do something like this: You have a cvs repository that you want to import into a subdirectory of an SVN repository. cvs2svn won't do this implicitly. You have to first create an SVN dumpfile, then use cvsadmin to import the dumpfile. Here are the commands: