User Tools

Site Tools


svn_configuration

This document describes the proposed and partially constructed structure of the new Subversion document and code repositories at NCEAS. These svn repositories would replace the CVS repositories that we now support. Current CVS repositories would be migrated to Subversion using the cvs2svn utility.

Current Subversion status: Production

Major remaining TODO items:

  • Migrate existing CVS modules to SVN
  • Determine if the mod_svn_authz can utilize LDAP groups (seems to be a major missing feature)
  • Determine the location and setup for the NCEAS projects repositories
  • Link the AdminDB to create repositories for working groups automatically

Authentication and authorization

Authentication is handled using LDAP. Accounts are drawn only from the 'ou=Account,dc=ecoinformatics,dc=org' tree, which means accounts are shared with CVS. We should eventually determine how to best move to using the regular ecoinformatics.org account tree so that users need not have new accounts created, just be added to the access control files for a particular module.

Authorization is done in two stages. First, some modules require authorization by Apache when accessing the subversion repository. This is configured in the 'Location' directive of the Apache configuration file for each virtual host. This access needs to be protected by SSL, which is in progress. The downside of this type of configuration is that it requires the webserver configuration to be reloaded upon every change, which is not something we want to do frequently, as errors can cause the webserver to stop servicing requests.

Second, Subversion provides mod_svn_authz, an Apache module which is used to provide additional, path-specific authorization rules. This allows us to choose which repositories and directories within repositories are accessible to particular users and groups. The current configuration uses this feature extensively, as described in the next section. Configuration for these directives is in three files, /etc/httpd/conf/svn-nceas.authz, /etc/httpd/conf/svn-ecoinfo.authz, and /etc/httpd/conf/svn-kepler.authz. Further restrictions should generally be made by placing rules in this file. Note that new repositories will need to be added to this file, and until that is done, the new module may be anonymously readable and writable.

NCEAS projects and the AdminDB

Some complications occur with the NCEAS projects hierarchy, which is meant to support the working groups, postdocs, sabbatical fellows, and visitors to NCEAS. Because these are far more dynamic, accounts for accessing these need to be added and removed from LDAP dynamically by the admindb. There is a perl script written already that does most of this, but which needs to be hooked up to the admindb so that changes there trigger changes in the LDAP. In addition, access control rules for the 'nceasprojects' hierarchy will be controlled exclusively through LDAP, which allows the admindb to dynamically change users and group membership programatically. Consequently, each WG, PD, SB, or VS repository will need to be set up as its own Location within the code.nceas.ucsb.edu virtual host, so that each location can have its own access control directives applied. Scripts will need to be written to accomplish this integration when triggered by changes in the AdminDB. Presumably, these would be tied to other changes that allow the AdminDB to create collaborative areas, etc.

Repository structure

Separate virtual hosts exist for the NCEAS, Ecoinformatics, and Kepler groups and any new groups as needed, as they will likely have different authentication needs. Currently the vhosts are set up as code.nceas.ucsb.edu, code.ecoinformatics.org, and code.kepler-project.org.

Proposed structure for each of the virtual hosts is described here.

code.nceas.ucsb.edu

Apache access: on, requires LDAP “cn=nceas-staff” group<br/> AuthzSVN access: used for controlling internal access

Repositories:

admindb

     -- NCEAS Administrative Database, including:
           -- original code for version 1
           -- revised code for version 2 from contractors
           -- new drupal modules
           -- perl and other support code, such as NCEAS::AdminDB module
           -- ...
           Location: /var/code/nceas/admindb
           Access: r/w for authz nceas-staff directory

staff

  staff -- working directories for computing and development staff so that 
  |        they need not maintain separate svn repositories
  |        Location: /var/code/nceas/staff
  |        Access: r/w for cn=nceas-staff,dc=ecoinformatics,dc=org
  |
  |-- jones -- files for jones
  |-- walbridge -- files for walbridge
  |-- regetz -- files for regetz
  |-- schild -- files for schild
  |-- ...

website

  website
        -- Text copy of code for the NCEAS web site, including drupal
           modules, css stlyes and other template info
           content is not stored here
           Location: /var/code/nceas/website
           Access: r/w for authz group nceas-staff

project

  project  -- working directories for NCEAS projects, including Working
  |           Group projects, postdoc and sabbatical projects, etc
  |           Each of these is a separate svn repository that is
  |           controlled by its own access directives in the apache config
  |           that point at the appropriate LDAP users and groups.
  |
  |-- wg   -- working group repositories (accessible only by members of the
  |   |                                   group, LDAP group syncs with the
  |   |                                   admindb automatically)
  |   |
  |   |-- inference   -- inference working group 
                      -- Location: /var/code/nceasprojects/wg/inference
  |   |-- recovery    -- recovery working group 
  |   |-- ...
  |
  |-- pd   -- postdoc repositories (accessible only by pd only via LDAP acct
  |   |                             which is created by the admindb 
  |   |                             automatically)
  |   |
  |   |-- broitman -- Location: /var/code/nceasprojects/pd/broitman
  |   |-- rueda
  |   |-- ...
  |
  |-- sb   -- sabbatical repositories (accessible only by sb only via LDAP acct
  |   |                                which is created by the admindb 
  |   |                                automatically)
  |   |
  |   |-- martinez -- Location: /var/code/nceasprojects/sb/martinez
  |   |-- vieglais
  |   |-- ...
  |
  |-- vs   -- visitor repositories (accessible only by vs only via LDAP acct
      |                                which is created by the admindb 
      |                                automatically)
      |
      |-- alroy -- Location: /var/code/nceasprojects/sb/alroy
      |-- ...

code.ecoinformatics.org

Apache access: on, allows anonymous and valid-user access<br/> AuthzSVN access: generally allows anon reads, requires authz group for writing

Repositories:

metacat – KNB Metacat

           Location: /var/code/ecoinfo/metacat
             Access: r/w for authz group metacat
                     r for anonymous

morpho – KNB Morpho

           Location: /var/code/ecoinfo/morpho
             Access: r/w for authz group morpho
                     r for anonymous

eml – Ecological Metadata Language

           Location: /var/code/ecoinfo/eml
             Access: r/w for authz group eml
                     r for anonymous

code.kepler-project.org

Apache access: on, allows anonymouns access, write requires cn=kepler group<br/> AuthzSVN access: on, generally allow anon read, requires authz group for writing

Repositories:

kepler – Kepler development tree

                  Location: /var/code/kepler/kepler
                    Access: r/w for cn=kepler,dc=ecoinformatics,dc=org
                            requires authz group kepler for writing
                            r for anonymous

kepler-docs – Kepler documentation

                  Location: /var/code/kepler/kepler-docs
                    Access: r/w for cn=kepler,dc=ecoinformatics,dc=org
                            requires authz group kepler for writing
                            r for anonymous

kepler-core-pi – Keler/CORE project

                  Location: /var/code/kepler/kepler-core-pi
                    Access: r/w for cn=kepler,dc=ecoinformatics,dc=org
                            requires authz group kepler-core-pi for writing
                            no anonymous access

Web-based access

  • SVNIndex.xslt
  • Trac?
  • Other?

Converting CVS modules to SVN

Derik Barsegian used the cvs2svn utility to convert the reap project to SVN from CVS. The utility has many options, but we only used a few to create a default SVN repository with the history from CVS intact. In this example, the top-level reap module is converted to a top-level SVN module, but in practice you can also move it to a subdirectory of a new SVN module.

  1. Within cvs, mark all files that should be marked binary as such (I found a handful). This is so I could use –default-eol=native (check the cvs2svn webpage if you don't want to use that switch, there are other options).
  2. Restrict access to the target CVS module so that no further changes are made to it
    1. (as root) chmod -R 700 /var/cvs/reap
    2. (as root) chown -R barseghian /var/cvs/reap
  3. ssh barseghian@ceres
  4. umask 007
  5. cvs2svn –dry-run –default-eol=native –existing-svnrepos –svnrepos /var/code/ecoinfo/reap /cvs/reap
  6. cvs2svn –default-eol=native –existing-svnrepos –svnrepos /var/code/ecoinfo/reap /cvs/reap
  7. (as root) chown -R apache /var/cvs/reap
  8. (as root) chown -R apache /var/code/ecoinfo/reap

Converting CVS modules to SVN, Alternative Instructions

One of the main challenges facing repository migration is the handling of binary files. On many of our existing CVS repositories, these are not set correctly, and setting them within CVS still leaves files as 'application/octet-stream'. These instructions use two approaches to solve this problem: rely on a mimetype registry, and explicitly set mimetypes for other unknown types. A few support scripts for batch-conversion: https://code.nceas.ucsb.edu/code/staff/walbridge/svn-migration/

First, check out the repository (in this example 'expertise' and look at the filetypes:

$  export CVSROOT=:ext:`whoami`@cvs.ecoinformatics.org:/cvsnceas
$  cvs co expertise && cd expertise

Generate a list of file extensions, using 'typer.sh' from the above repository (requires `my_autoprops' in the same folder):

  • typer.sh

This will output a list of the files found in the CVS repository, and whether we know what do do with the MIME type in question. The only types which need to be handled:

  • UNIDENTIFIED: MIME type matching has failed on this file, using both extensions and magic bytes.
  • NO MAPPING: A MIME type was found, but isn't in /etc/mime.types or our my_autoprops file.

For both of these cases, figure out what's the appropriate MIME type, and add an entry to `my_autoprops'. Make sure that each file type which lacks an entry AND is binary has a mapping. Text files (e.g. .csv, .log) can safely be ignored, but mappings must be defined for other binary files. Add the unknown file types (default to application/octet-stream if unable to determine) to a new file, called here my_autoprops>

https://code.nceas.ucsb.edu/code/staff/walbridge/svn-migration/my_autoprops

Once you've checked this list and insured that all file types are accounted for, you can continue with the cvs2svn conversion:

$ cvs2svn --eol-from-mime-type --auto-props=my_autoprops --mime-types=/etc/mime.types --fallback-encoding=utf_8 --tmpdir=/tmp/walbridge --keywords-off --svnrepos ./expertise /var/cvsnceas/expertise

Check the resulting SVN repository by checking it out, and looking for the mime type attribute:

$ svn co file:///./expertise/ expertise-checkout
$ svn propget svn:mime-type -R

If this list looks good, you're all set.

Importing a CVS repository into a subdirectory of an SVN repository

This is used when you want to do something like this: You have a cvs repository that you want to import into a subdirectory of an SVN repository. cvs2svn won't do this implicitly. You have to first create an SVN dumpfile, then use cvsadmin to import the dumpfile. Here are the commands:

  • cvs2svn –dumpfile=/tmp/dumpfile.dump /cvs/cvsrepos/
  • svnadmin load /var/code/ecoinfo/svnrepos/ –parent-dir yourSubDir < /tmp/dumpfile.dump

Creating a New SVN Repo on the SVN Server

  • SSH to saturn, cd to the appropriate dir under /var/code/
  • run sudo svnadmin create projectname
  • run sudo chown -R www-data projectname
  • add permissions to the svn-* file under /etc/apache2/authz/
  • change the post-commit script in the hooks directory to send emails (and optionally irc notifications). Use other post-commit files as examples.

Authz LDAP Sync

  • dataone authz files sync with ldap through ubuzilla
    • The script that syncs LDAP with the SVN authz file for DataONE still resides and runs on ubuzilla from my account's crontab. It takes the information from LDAP, generates a new authz file based on a template, and then uses scp (and my public key) to scp the new authz file to the right location on ceres. When the SVN server moves, this script could be simplified and move directly onto the new SVN server as we've done for the mailman sync. (from an email sent from Matt)
svn_configuration.txt · Last modified: 2012/11/20 17:47 by brand