Computing

Scientific Computing at NCEAS

At NCEAS, we regard modern ecology as a synthetic, integrative, and collaborative science. This means much more than simply communicating results via publications and presentations. The underlying data and analyses are themselves valuable products of scientific investigation, and their re-use and synthesis frequently opens novel avenues of research. We recognize that this new paradigm places several unfamiliar demands on many ecologists, who often work exclusively with their own data. First, our scientists increasingly require informatics and analytical approaches that can effectively handle larger-scale, longer-term, and thematically diverse information as inputs. Second, scientists must be prepared to use computer applications that generate reusable workflows (e.g., analytical procedures) and replicable results that are well-documented. This is very different from the antiquated model of describing workflows only with terse text in publications, and archiving quantitative output only in the form of streamlined tables and figures. NCEAS provides both tools and training to facilitate this transition, helping ecologists not only to use existing technologies more effectively, but also to adopt new technologies where relevant. Our experiences have demonstrated that exposure to these methods can increase the productivity of individual ecologists who use them, allow NCEAS working groups to collaborate far more productively, and ultimately preserve the scientific value of all analytical and data products.

Our scientific computing services can be roughly broken into two major categories:

Analytics

Analytics includes statistical procedures, computational algorithms, numerical models, and other methods for framing and solving quantitative problems. We believe “one-off” solutions are of limited value to the research community, and indeed often of only short-term value to the researcher. Whenever possible, we actively seek and strongly advocate computational approaches that (1) can be easily shared among, and repeated by, collaborating scientists who may have different operating systems and limited access to software; (2) are transparent, reliable, and verifiable; (3) have reusable components that future researchers can implement to solve similar computational challenges; and (4) can be easily scaled to handle arbitrarily large problems of similar design.

The analytical software options available at NCEAS follow directly from these considerations. Although occasionally providing specialty programs (upon request) that do not meet all of these criteria, we have otherwise carefully assembled a powerful lineup of scripted, cross-platform, scalable applications that are well-supported, generate robust numerical results, and permit batch processing. Although these packages require an initial learning investment, and may seem intimidating to scientists familiar with only “point-and-click” software, we strongly argue that the long-term payoff is significant. We also favor applications that are open-source, allowing researchers to inspect, customize, and ultimately better understand the underlying algorithms and procedures. Finally, although we do support several large, proprietary software applications, we seek solutions that are low-cost and lightweight whenever this can be done without compromising the analysis; this approach maximizes sharing among working group collaborators who may have limited computing resources at their home institutions.

General quantitative analysis: For virtually any type of statistical analysis, we recommend and support R. R is open-source, cost free, highly flexible, widely used by academic scientists and statisticians, and supported by a remarkably extensive library of community-developed functions. We also support Matlab, which is often the preferred programming environment for ecologists developing simulation models and implementing other kinds of numerical analysis. Both of these applications are installed by default on NCEAS workstations and servers. Although the vast majority of analytical needs of NCEAS scientists are met by either R or Matlab, we can help you to develop custom solutions in other cases. If your computational demands exceed the capabilities of these applications and require finer-grained control of processing tasks and memory addressing, we can assist you with programming in lower-level languages such as C, C++, and Fortran.

Spatial analysis and mapping: For visualization of multi-layer maps and advanced GIS functionality, we offer both ESRI ArcGIS and the open-source GRASS GIS. However, ecologists often need to process and analyze spatial data in ways that do not actually require a GIS. In such cases, it is generally preferable to use simpler tools that can deliver results much more rapidly and with fewer data manipulations. If your project involves any mapping, spatial analysis, or geographic data manipulation, first check in with the scientific computing staff to identify optimal approaches that fit easily into your overall workflow.

High Performance Computing: Although many of our scientists find that their desktop computers are powerful enough to meet their needs, we also provide access to high performance computing resources. Our fast compute server boasts 44 powerful processing cores that share 384 GB of memory; this server thus provides a second option for running parallelized code, and can also be very useful for any computational task that is especially processor and/or memory-limited. Our computing experts can help you determine which solution is most appropriate, and help to get you up and running.

Informatics

Informatics provides both a strategic framework and specific tools for acquiring, handling, interpreting, and storing data in a useful and efficient manner. Because of the extremely heterogeneous nature of ecological data, NCEAS scientists often face special informatics challenges. In addition to providing advice and ready-to-use tools for managing data, NCEAS is also a leader of novel research in this area.