User Tools

Site Tools


scicomp_knowledge_base

This page is for discussion of content and structure of the NCEAS Scientific Computing knowledge base. The current plan is to integrate this knowledge base (which may or may not be called a “knowledge base”) into the new NCEAS website, and continue to develop it within Drupal.

Much of the existing content is currently located in the Scientific Computing Services Home Page, with additional (mostly R-specific) resources linked to the Scientific Computing area of our help wiki. It shouldn't be difficult to spin up new content based on personal notes and such.

Content areas

Computational tasks

  • GIS and spatial data analysis (click link for more discussion)
  • Ecological data modeling: Regression and friends; likelihood estimation; hierarchical models; Bayesian computation
  • Multivariate data analysis: PCA, CCA, NMDS and friends; distance metrics; ordination/classification
  • Data manipulation: Merging, aggregating, reshaping, sorting, filtering
  • Data management: File formats and conversion; consistency checking; common pitfalls
  • ?more?

Software and stuff

  • The Trinity: R, SAS, Matlab (maybe Octave?)
  • Desktop GIS: GRASS, ArcGIS, QGIS, PostGIS
  • RMDBS: PostgreSQL, MySQL; interfaces to both
  • Misc spatial apps: StarSpan, OpenEV, GDAL/OGR tools
  • Misc stats apps: WinBUGS, Metawin, Primer
  • Misc eco apps: Ecopath/Ecosim
  • Programming languages: C/C++, Java, Ruby, Python, PHP
  • ?more?

Structure

Types of resources

  • Concise How-to web pages
  • More detailed case study web pages
  • Portal pages to external resources
  • Portal pages to data
  • Download-able local resources: PDFs, presentations, scripts & such
  • ?more?

Back-end software alternatives

In order to allow the Sci Comp knowledge base to scale up gracefully as more content is added, we need something more sophisticated than a simple menu-style listing of content pages. Moreover, users may prefer to search for information along one of several orthogonal axes: one user may want to find all geospatial knowledge resources regardless of software (including pages that make no reference to software whatsoever), whereas another user may want to find all Matlab How-to's regardless of the type of operation.

Option 1: Drupal taxonomy

Drupal supports flexible organization of content via its taxonomy (aka categories) module. Pages can be tagged with one or more terms, which themselves can come from one or more orthogonal vocabularies. To address the use case described above, it might be most useful to implement two vocabularies: (1) a task vocabulary (specific analytical, quantitative, and computational methods), and (2) an implementation vocabulary (specific software apps, programming languages, etc).

Key questions regarding Drupal taxonomy:

  • Can we create/apply new vocabularies without affecting the rest of the NCEAS site?
  • How easy is it to add/change/remove terms (and/or entire vocabularies) later?
  • Can we implement a single-level vocabulary now, but possibly introduce hierarchical structure later?
  • Can we agree on an initial set of terms? (Proposal: start with bulleted Content areas above)
  • Can we implement a SciComp-specific search interface that takes advantage of our taxonomy?

Option 2: Semantic MediaWiki

The standard MediaWiki software already supports categories. Although this is a somewhat simplistic feature, it would at least enable the automated creation of simple (and potentially hierarchical) index pages.

A richer option may be to use Semantic MediaWiki (SMW), a semantically-enabled version of the wiki software we use for our NCEAS help site. As described in the online documentation, SMW supports annotation, semantically-aware searching/browsing, and integration with the broader Semantic Web via mechanisms such as RDF export. SMW is currently used by the SIMILE project, among others. Note that SMW's semantic search feature currently requires construction of query statements using a somewhat awkward and difficult syntax. Nevertheless, we could capitalize on the semantic information by embedding dynamic semantic queries within pages, and creating more sophisticated dynamic index pages.

An intriguing complement to SMW is the Halo extension, developed in support of the Project Halo effort funded by Paul Allen's Vulcan Inc. Currently in beta status (1.0beta released on 2007-Oct-31), this extension provides some nifty GUI tools for leveraging the semantic content of the wiki. For example, the Ontology Browser provides a way to identify wiki content by navigating the ontology. Watch the Halo extension demo video (~10 min) for a nice overview of its capabilities.

scicomp_knowledge_base.txt · Last modified: 2014/05/02 16:25 by brand