SVN configuration
From NCEAS Knowledge Base
This document describes the proposed and partially constructed structure of the new Subversion document and code repositories at NCEAS. These svn repositories would replace the CVS repositories that we now support. Current CVS repositories would be migrated to Subversion using the cvs2svn utility.
Current Subversion status: Testing
Major remaining TODO items:
- Migrate existing CVS modules to SVN
- Determine if the mod_svn_authz can utilize LDAP groups (seems to be a major missing feature)
- Determine the location and setup for the NCEAS projects repositories
- Link the AdminDB to create repositories for working groups automatically
Contents |
Authentication and authorization
Authentication is handled using LDAP. Accounts are drawn only from the 'ou=Account,dc=ecoinformatics,dc=org' tree, which means accounts are shared with CVS. We should eventually determine how to best move to using the regular ecoinformatics.org account tree so that users need not have new accounts created, just be added to the access control files for a particular module.
Authorization is done in two stages. First, some modules require authorization by Apache when accessing the subversion repository. This is configured in the 'Location' directive of the Apache configuration file for each virtual host. This access needs to be protected by SSL, which is in progress. The downside of this type of configuration is that it requires the webserver configuration to be reloaded upon every change, which is not something we want to do frequently, as errors can cause the webserver to stop servicing requests.
Second, Subversion provides mod_svn_authz, an Apache module which is used to provide additional, path-specific authorization rules. This allows us to choose which repositories and directories within repositories are accessible to particular users and groups. The current configuration uses this feature extensively, as described in the next section. Configuration for these directives is in three files, /etc/httpd/conf/svn-nceas.authz, /etc/httpd/conf/svn-ecoinfo.authz, and /etc/httpd/conf/svn-kepler.authz. Further restrictions should generally be made by placing rules in this file. Note that new repositories will need to be added to this file, and until that is done, the new module may be anonymously readable and writable.
NCEAS projects and the AdminDB
Some complications occur with the NCEAS projects hierarchy, which is meant to support the working groups, postdocs, sabbatical fellows, and visitors to NCEAS. Because these are far more dynamic, accounts for accessing these need to be added and removed from LDAP dynamically by the admindb. There is a perl script written already that does most of this, but which needs to be hooked up to the admindb so that changes there trigger changes in the LDAP. In addition, access control rules for the 'nceasprojects' hierarchy will be controlled exclusively through LDAP, which allows the admindb to dynamically change users and group membership programatically. Consequently, each WG, PD, SB, or VS repository will need to be set up as its own Location within the code.nceas.ucsb.edu virtual host, so that each location can have its own access control directives applied. Scripts will need to be written to accomplish this integration when triggered by changes in the AdminDB. Presumably, these would be tied to other changes that allow the AdminDB to create collaborative areas, etc.
Repository structure
Separate virtual hosts exist for the NCEAS, Ecoinformatics, and Kepler groups and any new groups as needed, as they will likely have different authentication needs. Currently the vhosts are set up as code.nceas.ucsb.edu, code.ecoinformatics.org, and code.kepler-project.org.
Proposed structure for each of the virtual hosts is described here.
code.nceas.ucsb.edu
Apache access: on, requires LDAP "cn=nceas-staff" group
AuthzSVN access: used for controlling internal access
Repositories:
admindb
-- NCEAS Administrative Database, including:
-- original code for version 1
-- revised code for version 2 from contractors
-- new drupal modules
-- perl and other support code, such as NCEAS::AdminDB module
-- ...
Location: /var/code/nceas/admindb
Access: r/w for authz nceas-staff directory
staff
staff -- working directories for computing and development staff so that | they need not maintain separate svn repositories | Location: /var/code/nceas/staff | Access: r/w for cn=nceas-staff,dc=ecoinformatics,dc=org | |-- jones -- files for jones |-- walbridge -- files for walbridge |-- regetz -- files for regetz |-- schild -- files for schild |-- ...
website
website
-- Text copy of code for the NCEAS web site, including drupal
modules, css stlyes and other template info
content is not stored here
Location: /var/code/nceas/website
Access: r/w for authz group nceas-staff
project
project -- working directories for NCEAS projects, including Working
| Group projects, postdoc and sabbatical projects, etc
| Each of these is a separate svn repository that is
| controlled by its own access directives in the apache config
| that point at the appropriate LDAP users and groups.
|
|-- wg -- working group repositories (accessible only by members of the
| | group, LDAP group syncs with the
| | admindb automatically)
| |
| |-- inference -- inference working group
-- Location: /var/code/nceasprojects/wg/inference
| |-- recovery -- recovery working group
| |-- ...
|
|-- pd -- postdoc repositories (accessible only by pd only via LDAP acct
| | which is created by the admindb
| | automatically)
| |
| |-- broitman -- Location: /var/code/nceasprojects/pd/broitman
| |-- rueda
| |-- ...
|
|-- sb -- sabbatical repositories (accessible only by sb only via LDAP acct
| | which is created by the admindb
| | automatically)
| |
| |-- martinez -- Location: /var/code/nceasprojects/sb/martinez
| |-- vieglais
| |-- ...
|
|-- vs -- visitor repositories (accessible only by vs only via LDAP acct
| which is created by the admindb
| automatically)
|
|-- alroy -- Location: /var/code/nceasprojects/sb/alroy
|-- ...
code.ecoinformatics.org
Apache access: on, allows anonymous and valid-user access
AuthzSVN access: generally allows anon reads, requires authz group for writing
Repositories:
metacat -- KNB Metacat
Location: /var/code/ecoinfo/metacat
Access: r/w for authz group metacat
r for anonymous
morpho -- KNB Morpho
Location: /var/code/ecoinfo/morpho
Access: r/w for authz group morpho
r for anonymous
eml -- Ecological Metadata Language
Location: /var/code/ecoinfo/eml
Access: r/w for authz group eml
r for anonymous
...
code.kepler-project.org
Apache access: on, allows anonymouns access, write requires cn=kepler group
AuthzSVN access: on, generally allow anon read, requires authz group for writing
Repositories:
kepler -- Kepler development tree
Location: /var/code/kepler/kepler
Access: r/w for cn=kepler,dc=ecoinformatics,dc=org
requires authz group kepler for writing
r for anonymous
kepler-docs -- Kepler documentation
Location: /var/code/kepler/kepler-docs
Access: r/w for cn=kepler,dc=ecoinformatics,dc=org
requires authz group kepler for writing
r for anonymous
kepler-core-pi -- Keler/CORE project
Location: /var/code/kepler/kepler-core-pi
Access: r/w for cn=kepler,dc=ecoinformatics,dc=org
requires authz group kepler-core-pi for writing
no anonymous access
Web-based access
- SVNIndex.xslt
- Trac?
- Other?
Converting CVS modules to SVN
Derik Barsegian used the cvs2svn utility to convert the reap project to SVN from CVS. The utility has many options, but we only used a few to create a default SVN repository with the history from CVS intact. In this example, the top-level reap module is converted to a top-level SVN module, but in practice you can also move it to a subdirectory of a new SVN module.
- Within cvs, mark all files that should be marked binary as such (I found a handful). This is so I could use --default-eol=native (check the cvs2svn webpage if you don't want to use that switch, there are other options).
- Restrict access to the target CVS module so that no further changes are made to it
- (as root) chmod -R 700 /var/cvs/reap
- (as root) chown -R barseghian /var/cvs/reap
- ssh barseghian@ceres
- umask 007
- cvs2svn --dry-run --default-eol=native --existing-svnrepos --svnrepos /var/code/ecoinfo/reap /cvs/reap
- cvs2svn --default-eol=native --existing-svnrepos --svnrepos /var/code/ecoinfo/reap /cvs/reap
- (as root) chown -R apache /var/cvs/reap
- (as root) chown -R apache /var/code/ecoinfo/reap
- test repo at: https://code.ecoinformatics.org/code/reap
Converting CVS modules to SVN, Alternative Instructions
One of the main challenges facing repository migration is the handling of binary files. On many of our existing CVS repositories, these are not set correctly, and setting them within CVS still leaves files as 'application/octet-stream'. These instructions use two approaches to solve this problem: rely on a mimetype registry, and explicitly set mimetypes for other unknown types. A few support scripts for batch-conversion: https://code.nceas.ucsb.edu/code/staff/walbridge/svn-migration/
First, check out the repository (in this example 'expertise' and look at the filetypes:
- export CVSROOT=:ext:`whoami`@cvs.ecoinformatics.org:/cvsnceas
- cvs co expertise && cd expertise
Generate a list of file extensions, using 'typer.sh' from the above repository (requires `my_autoprops' in the same folder):
- typer.sh
This will output a list of the files found in the CVS repository, and whether we know what do do with the MIME type in question. The only types which need to be handled:
- UNIDENTIFIED: MIME type matching has failed on this file, using both extensions and magic bytes.
- NO MAPPING: A MIME type was found, but isn't in /etc/mime.types or our my_autoprops file.
For both of these cases, figure out what's the appropriate MIME type, and add an entry to `my_autoprops'. Make sure that each file type which lacks an entry AND is binary has a mapping. Text files (e.g. .csv, .log) can safely be ignored, but mappings must be defined for other binary files. Add the unknown file types (default to application/octet-stream if unable to determine) to a new file, called here my_autoprops:
https://code.nceas.ucsb.edu/code/staff/walbridge/svn-migration/my_autoprops
Once you've checked this list and insured that all file types are accounted for, you can continue with the cvs2svn conversion:
- cvs2svn --eol-from-mime-type --auto-props=my_autoprops --mime-types=/etc/mime.types --fallback-encoding=utf_8 --tmpdir=/tmp/walbridge --keywords-off --svnrepos ./expertise /var/cvsnceas/expertise
Check the resulting SVN repository by checking it out, and looking for the mime type attribute:
- svn co file:///./expertise/ expertise-checkout
- svn propget svn:mime-type -R
If this list looks good, you're all set.
Importing a CVS repository into a subdirectory of an SVN repository
This is used when you want to do something like this: You have a cvs repository that you want to import into a subdirectory of an SVN repository. cvs2svn won't do this implicitly. You have to first create an SVN dumpfile, then use cvsadmin to import the dumpfile. Here are the commands:
- cvs2svn --dumpfile=/tmp/dumpfile.dump /cvs/cvsrepos/
- svnadmin load /var/code/ecoinfo/svnrepos/ --parent-dir yourSubDir < /tmp/dumpfile.dump
Creating a new SVN repo in an existing folder
- run svnadmin create projectname
- chown -R apache projectname or chown -R apache:dev projectname
- chmod -R o-rwx projectname
- add permissions to the svn-* file under /etc/httpd/conf/
- change the post-commit.sh file in the hooks directory if needed
