:R offers a variety of functions for importing, manipulating, analyzing, and exporting spatial data. Although one might at first consider this to be the exclusive domain of GIS software, using R can frequently provide a much more lightweight, yet equally effective solution that embeds within a larger analytical workflow.
Many of the functions show below rely on add-on packages, which are loaded via the library(packagename) statement. If the package is not already installed on your machine, you can get it from within R by submitting the following statement (inserting the appropriate package name, of course):
:::rsplus
> install.packages("packagename")`</code>`
# Understanding spatial data formats in R
One of the tricky aspects of pulling spatial data into your analytical workflow is that there are numerous complicated data formats. In fact, even within R itself, functions from different user-contributed packages often require the data to be structured in very different ways. The good news is that efforts are underway to standardize spatial data classes in R. This movement is facilitated by sp, an important base package for spatial operations in R. It provides definitions for basic spatial classes (points, lines, polygons, pixels, and grids) in an attempt to unify the way R packages represent and manage these sorts of data. It also includes some core functions for creating and manipulating these data structures. The hope is that all spatial R packages will use (or at least provide conversions to) the 'Spatial' data class and it's derivatives, as now defined in the sp package. All else being equal, we favor R functions and packages that conform to the sp standard, as these are likely to provide the greatest future utility and durability.
In addition to the numerous spatial packages in the CRAN repository (documented in the http://cran.r-project.org/src/contrib/Views/Spatial.html Spatial task view], interim "wrapper" packages have been developed for interfacing sp with several non-sp packages that are built around their own (legacy) data structures. These wrapper packages are not available in CRAN, but can be obtained from the [[http://r-spatial.sourceforge.net|Sourceforge repository]]. As of May 2006, the available wrapper packages are:
* spmaps: Interface to the maps package
* spPBS: Interface to the PBSmapping package
* spgpc: Interface to the gpclib package
# How can I...
## Read ESRI shapefiles into R
Although all methods described below can be used to import a shapefile, they differ in terms of how they represent the data in R. This has implications for what you can do with the data, because different functions (as provided by different contributed packages) operate on different types of R objects. In addition, there are some subtle variations in how the data are imported. For example, are character variables in the attribute table converted to factors, or left as-is? Are blank cells in the attribute table represented as NA or as empty characters? Often these subtleties are unimportant or can be dealt with after you read in the data; furthermore, in some cases there exist functions that convert between the different types of R spatial formats. Nevertheless, it pays to be cognizant of these potential gotchas.
The 'rgdal' solution:
The rgdal package reads numerous spatial data formats using the standalone, open-source [[GDAL|Geospatial Data Abstraction Library (GDAL)]]. This is also the recommended package for performing map projections and datum transformations, using the external PROJ.4 library.
`<code rsplus>`
# Uses external OGR library (in GDAL) for reading vector formats.
library(rgdal) #Also requires sp package to be installed
# Read in shapefile, creating a SpatialPolygonsDataFrame object
WA.ogr `<- readOGR("WA.shp", "WA")</code>`
Linux and Mac users, take note: The 'rgdal' package depends on external libraries that must be installed on the local system. For Windows users, these libraries are included in the package binary, and thus no other action is required beyond installing rgdal itself. For Linux and Mac users, it may be necessary to obtain and install both the (free) GDAL and PROJ.4 libraries before installing rgdal. For more information and instructions, visit http://cran.r-project.org/src/contrib/Descriptions/rgdal.html.
The 'maptools' solution (two options):
The maptools package provides functions for reading, writing, and manipulating spatial data, especially shapefiles. The first option below also produces a 'SpatialPolygonDataFrame' object. However, because maptools includes some handy functions that operate on data stored as an older 'Map' object type, a second import function exists that will produce a 'Map' object.
`<code rsplus>`
library(maptools) #Also requires sp package to be installed
# OPTION 1: Read shapefile, creating a SpatialPolygonsDataFrame object
WA.maptools1 <- readShapePoly("WA")
# OPTION 2: Read in shapefile, creating a Map object
WA.maptools2 `<- read.shape("WA.shp")</code>`
The 'shapefiles' solution:
The shapefiles package provides functions for reading, writing, and constructing shapefiles within R. This method for reading a shapefile imports data in way that simply produces a list of lists, rather than creating a customized data type. This is nice and simple if you just want to extract some information from the shapefile (e.g. a column from the attribute table), but you will probably need to convert to an explicit spatial data class before doing more processing.
`<code rsplus>`
library(shapefiles)
# Read in shapefile, creating a list object
WA.shapefiles `<- read.shapefile("WA")</code>`
Reading Point Locations from a .csv file, converting to ESRI Shape File:
Often, scientists will have 'geocoded' field data (e.g., containing the location coordinates of each study area)
that they wish to import into a geospatial or GIS software environment. Typically, these data are stored in an
spreadsheet or data base software that generate comma-separated-value (.csv) files, in which each location is
represented by a multi-column row whose columns are descriptive attributes (including location coordinates).
Following is a brief R script that reads such records from a CSV file, converts them to the appropriate R
internal data format, and writes the location records as an ESRI Shape File. The Shape File format is an
industry-standard format for interchanging vector (point / line / polygon) geospatial data, and is compatible
with all major GIS systems.
`<code rsplus>`
#
# This example demonstrates one good way to read a set of georeferenced
# site locations (in this case, lakes), into the R environment, and then
# create a valid ESRI Point Shapefile:
#
# Here is a sample of the data file. For compatiblility with ArcMap GIS,
# Longitude must appear before Latitude.
#
# coordinates is important
# Column 1: LAKE_ID,
# Column 2: LAKENAME
# Column 3: Longitude
# Column 4: Latitude
#
LAKE_ID,LAKENAME,LON_DD,LAT_DD
CT002L,BISSONETTE POND,-72.21889,41.92417
CT004L,WYASSUP LAKE,-71.87278,41.48833
CT250L,CRANBERRY LAKE,-72.03389,41.67667
CT500L,OXOBOXO LAKE,-72.2015,41.4878
CT501L,WHEELER POND,-72.1489,41.4637
#
# Here is the script:
#
library(sp)
library(maptools)
#
# The Three Step Procedure: 1) Read the file into an R Data Frame,
2) convert to R Data Structure 'SpatialPointsDataFrame',
3) Write the converted data object to a standard Point Shape File.
#
LakePoints = read.csv("LakesRevCoord.csv")
LakePointsSPDF = SpatialPointsDataFrame(LakePoints[[,3:4]],data.frame(LakePoints[[,1:4]]))
write.pointShape(coordinates(LakePointsSPDF),data.frame(LakePointsSPDF),"LakePointsShapeRev")
#
# End of Example...
#`</code>`
## Match the map projections of two or more spatial data sets
More often than not, two (or more) spatial data sets of interest to the analyst will
be projected into different coordinate reference systems, which are implemented within geospatial data sets by specifying the data set's [[http://earth.google.com/support/bin/static.py?page=guide.cs&guide=22373&topic=23750&answer=148111|Map Projection]]. When this is the case, in order to display or analyze the group of data sets (data layers)
as a collection, all of the data sets must be transformed into the same map projection. The R Spatial classes
make this easy to do providing the analyst knows 1) each data layer's map projection, 2) which of the data sets' map projection will be used as the reference projection.
Solution:
The following example uses two spatial data sets: a satellite image, and a set of point locations from the area within the image. The map projections for both data sets are known; this information is contained within each files spatial metadata. Download this example, including sample data sets, [[http://www.nceas.ucsb.edu/files/scicomp/Dloads/ExamplePrograms/MatchSpatialDataProjections.zip|Here]].
`<code rsplus>`
MatchSpatialDataProjections <- function()
{
require(rgdal)
inPoints <- readOGR(".","SamplePointsGeogProj")
inGridAlbers <- readGDAL("SatImageAlbersEqualArea.tif")
#
# Assume that both data sets have valid projection strings in their spatial metadata.
# Given this, re-project the point file into the projection space of the satellite image.
#
projStringAlbers <- proj4string(inGridAlbers)
inPointsAlbers <- spTransform(inPoints, CRS=CRS(projStringAlbers))
pointsWithTempsAE <- overlay(inGridAlbers,inPointsAlbers)
#
# Create a custom color map after removing NAs from image
#
NAs <- (is.na(inGridAlbers@data))
inGridAlbers@data[[NAs]] <- 0
nonZero `<- (inGridAlbers@data>`0)
minMax <- range(inGridAlbers@data[[nonZero]])
nColors <- diff(minMax)
colorMap <- topo.colors(nColors+1)
breakPts <- c(0,minMax[[1]]:minMax[[2]])
# Display the Albers image as a 'pseudo-color' image,
# then overlay the point data set.
image(inGridAlbers, col=colorMap, breaks=breakPts)
plot(pointsWithTempsAE,add=TRUE,pch=20,col="red")
}
#
# End of Example...
#`</code>`
## Read a PostGIS layer into R
Solution:
The following example assumes you already have a local [[PostGIS]] database named mygeodb, containing a spatial table named mylayer.
`<code rsplus>`
library(rgdal)
mydf `<- readOGR("PG:dbname=mygeodb", "mylayer")</code>`
The requested table will then be represented as an sp object in R. For example, if mylayer is a point layer, then mydf will be a SpatialPointsDataFrame.
See another example [[http://wiki.intamap.org/index.php/PostGIS|here.]]
## Read HDF4 images into the R programming environment
Solution:
I use the Geographic Data Abstraction Library (GDAL), an open source software
package (available at: http://www.gdal.org/.) to extract the individual
image bands from the .hdf file. Then, I use the GDALinfo() and readGDAL()
methods from the R rgdal library to evaluate and read the image into an R
object, and the image() command to display the image.
From the (unix) command line:
`<code rsplus>`
% gdalinfo myimage.hdf - displays information on the image contents
% gdal_translate -help
% gdal_translate -ot Float32 -of JPEG -b 1 myimage.hdf band1.jpg # READS the first band of the hdf image
# into the file 'band1.jpg'. -b 2 reads second band, etc....`</code>`
Now, into the R environment....
`<code rsplus>`
%R
> library(rgdal)
> GDALinfo("band1.jpg") # displays information on the image contents
> image1 = readGDAL("band1.jpg") # read the image into R data SpatialGridDataFrame object
> class(image1) # confirm this
> image(image1) # display the image `</code>`
Once the image is stored in SpatialGridDataFrame format, it is available to all R analysis and display functions.
## Append a second set of attributes to a spatial data file's attribute table
The R Spatial(Point/Line/Polygon/Grid) data types store the location coordinates and descriptive attributes
(e.g., area, altitude, land use type) of the file's individual spatial objects in separate components; Descriptive attributes
are kept in the @data component. Among many other advantages, this makes it easy to append collections of NEW descriptive
attributes (for example, results of a recent data collection effort on a study area) to the existing Spatial data object.
And, as is typically the case with R, this can be performed in several ways, depending upon the analyst's specific needs.
Here are two R scripts that use the merge() command to append a new attribute collection to the @data component of a
SpatialPolygonDataFrame object which contains an ESRI Polygon Shape File read from disk.
Note: this particular approach assumes that there are no duplicate values in either of the join columns (HUC and HUC8).
Solution:
The first example appends a new attribute set read from a disk file, onto the Spatial data file's @data component,
and writes the modified attribute set/@data component to the Shape File's attribute component file (the .dbf file):
`<code rsplus>`
#
# foreign library supplies the read.dbf() command
# used to read the Shape File attribute component (.dbf) file
#
library(foreign)
hucs <- read.dbf("HUC8.dbf")
fourier <- read.csv("fourierStats8.csv", stringsAsFactors=FALSE)
#
# important: include in the merge command 'sort=FALSE', to prevent
# the data frame rows from being re-ordered on the merge (by) column
#
hucs.new <- merge(hucs, fourier, by.x="HUC", by.y="HUC8", all.x=TRUE, sort=FALSE)
#
# write the modified attribute set to disk, resulting in a modified ESRI Shape File
#
write.dbf(hucs.new, "HUC8.dbf")`</code>`
The second R script also appends a new attribute set onto an existing Spatial data file's @data component;
however, this script creates a NEW copy of the entire ESRI Shape File, leaving the original Shape File
components unmodified. This example also resolves the circumstance that the new set of statistical attributes
e.g., rows in the second input ".csv" file) does NOT contain a data row for each row (spatial object) in the
polygon Shape File: attribute columns for these "No Data" spatial object rows are set to a sentinal value (-9999).
`<code rsplus>`
#
# R script merges statistical attribute table with existing shape file attribute table,
# then writes out a new shape file containing the merged attribute table.
# Note: The statistical table does NOT contain a row for all Shape File attribute table
# rows. Structure the merge in such a way that resulting table does have one row
# per input (shape file) attribute table row, with the (missing) statistical columns
# set to NA.
# Rick Reeves, NCEAS 7/29/2010
#
require(maptools)
HucPolys = readShapeSpatial("HUC8")
FourierStats = read.csv("fourierStats8.csv")
#
# join the fourier stats table onto the attribute table component of shapefile
# accomodate different join column names (by.x/by.y), and assure that
# resulting table retains one row for each feature in shape file, even
# for features with no corresponding stats attributes (all.x=TRUE).
# The stats attribute columns for these features will be set to NA
#
MergeHucAttrs = merge(HucPolys@data,FourierStats,by.x="HUC",by.y="HUC8",all.x=TRUE,sort=FALSE)
#
# Optional step: if desired, assign to the NA columns a sentinal value -9999 before
# replacing the shape file attribute table. Reason: to explicitly mark columns with
# no data, preventing them from being interpreted as zero values by other software.
#
naflag = is.na(MergeHucAttrs)
MergeHucAttrs[[naflag]] = -9999
#
# write an updated shape file, with the new, merged table
#
HucPolys@data = MergeHucAttrs
writeSpatialShape(HucPolys,"HUC8_WithFourierStats")`</code>`
## Measure distances between longitude-latitude coordinates
Solution:
Use the spDistsN1() function from the sp package. In order to get "Earth" distance (in kilometers) rather than simple Euclidean distance, you must include the ''longlat=TRUE'' argument. This calculation is based on the standard WGS84 ellipsoid model of the shape of the Earth. Also be sure that your coordinate pairs are given in longitude-latitude order, not the reverse!
`<code rsplus>`
library(sp)
# Specify a point of interest as a long-lat pair
myPoint <- c(-119.84, 34.43)
# Create a matrix with three other points (one long-lat pair per row)
otherPoints <- matrix(c(-120.12, 34.41, -120.15, 34.43, -120.21, 34.39), nrow=3, byrow=TRUE)
# Calculate distance between myPoint and all otherPoints
spDistsN1(as.matrix(otherPoints), myPoint, longlat=TRUE)`</code>`
Extra tip:
To generate a matrix containing distances between all pairs of coordinates in a matrix, use the apply function:
`<code rsplus>`
someCoords <- data.frame(long=c(-120.12, -120.15, -120.21), lat=c(34.41, 34.43, 34.39))
apply(someCoords, 1, function(eachPoint) spDistsN1(as.matrix(someCoords), eachPoint, longlat=TRUE))`</code>`
## Merge two ESRI Shape Files of the same spatial type and attribute list
Sometimes your project data will contain two spatial data files with the same internal format,
but representing different locations or times. For example, a scientist might measure ten biological
attributes at 5 randomly selected locations within three different study areas. The measurements
at each location are represented by a single point with 10 associated attributes, resulting in
one spatial data file (stored as ESRI Point Shape Files) for each study area, each containing five
points (rows). Suppose that the scientist wishes to combine the three separate files into a single file
containing all measurements from the three study areas.
Solution:
Within R, this is done by: 1) Reading the three Shape Files into standard R 'Spatial Data Frame' data objects,
2) combining the components of the separate Spatial Data Frame into a single Spatial Data Frame,
3) Writing the merged Spatial Data Frame as a new Shape File.
Here is sample R code that merges separate ESRI Shape Files with the same internal format into a single
Shape file. The technique is demonstrated separately for both Point and Polygon spatial data types.
`<code rsplus>`
#
# load the required R library: maptools (which in turn loads the 'sp' library as a dependency)
#
library(maptools)
#
# Example 1: Merging two Point Shape files
#
InPoints1 = readShapePoints("Part1Dams.shp")
InPoints2 = readShapePoints("Part2Dams.shp")
#
# Data attributes: standard table, with one row per spatial object
#
# Combine the two data components by stacking them row-wise.
#
mergeData = rbind(InPoints1@data,InPoints2@data)
#
# Combine the spatial components of the two files
# (easy, in the case of Spatial Points, since the
# spatial component is 2-dim matrix of lat/long point coordinates)
#
mergePoints = rbind(InPoints1@coords,InPoints2@coords)
#
# Promote the combined list into a SpatialPoints object
#
mergePointsSP = SpatialPoints(mergePoints)
#
# Finally, create the SpatialPolygonsDataFrame
#
newSPDF = SpatialPointsDataFrame(mergePointsSP,data = mergeData)
#
# Write the shape file; supply the components of the SpatialPointsDataFrame to the routine
#
write.pointShape(coordinates=coordinates(newSPDF),df = as(newSPDF,"data.frame"),"TestMrgPtsFile2")
#
# Example 2: Merging two Polygon Shape files
#
# Merge two polygon shape files containing different collections of watershed regions.
# The two files were extracted from a file containing watersheds for the continental USA.
# readShapePoly() reads the polygon shape file into a SpatialPolygonsDataFrame object.
#
InPolys1 = readShapePoly("subbasin4.shp")
InPolys2 = readShapePoly("subbasin5.shp")
#
# The SpatialPolygonsDataFrame contains two main components:
#
# Polygon: A list in which each element is a set of parameters defining one
# spatial object (in this case, a single polygon)
# Data: A two-dimensional attribute table with one row for each spatial feature
#
# Combine the two data components from each file in two steps.
#
# First, 'stack' the attribute list rows using rbind()
#
# Note: This method only works if the two Shape Files have the same spatial data type
# and the IDENTICAL (in type, format, number) attribute table (Data component). If this is
# not the case, you will need to create a NEW 'merged' attribute table from the Data components
# of each input file.
#
mergeData = rbind(InPolys1@data,InPolys2@data)
#
# Next, combine the two polygon lists into a single list using c()
#
mergePolys = c(InPolys1@polygons,InPolys2@polygons)
#
# Next, generate a new polygon ID for the new SpatialPolygonDataFrame object,
#
offset = length(mergePolys)
browser()
for (i in 1: offset)
{
sNew = as.character(i)
mergePolys[[i]]@ID = sNew
}
#
# Create an identical ID field and append it to the merged Data component
#
ID = c(as.character(1:length(mergePolys)))
mergeDataWithID = cbind(ID,mergeData)
#
# Promote the merged list to a SpatialPolygons data object
#
mergePolysSP = SpatialPolygons(mergePolys,proj4string=CRS(proj4string(InPolys2)))
#
# Combine the merged Data and Polygon components into a new SpatialPolygonsDataFrame.
#
mySPDF = SpatialPolygonsDataFrame(mergePolysSP,data = mergeDataWithID,match.ID = FALSE)
#
# Finally, write the new Polygon Shape File
#
writePolyShape(mySPDF,"MergedSubbasins")
#`</code>`
## Read GPS Tracks and Waypoints from a GPX-format file using R
Suppose that you wish to import location data collected with a GPS
unit into an R program so that you can display/analyze the data.
This example demonstrates how to read track and waypoint data colleted with a GPS device.
The input data are stored in the widely used [http://www.topografix.com/gpx.asp
GPX GPS Exchange format], which is one of the many formats accessible to the readGPS() function.
This demonstration function separately reads GPX-format Track and Waypoint sets,
and then writes a subset of each input file's data to comma-separated variable files.
Solution:
In this Wiki example, we present only the R script; for a more complete description of the process, including
example plots of the input and output data sets, please see the NCEAS Scientific Programming Web site
[[http://www.nceas.ucsb.edu/scicomp/usecases/ReadAndDisplayGPSData|Read and Display Data from GPS Devices Using R]].
`<code rsplus>`
################################################################################
#
# Demonstration: Reading and converting GPS datasets
# using the R readGPS function (maptools package)
#
# Abstract: This program reads GPX-format files containing, respectively,
# waypoints and tracks collected with a GPS reciever, into
# an R Data Frame. Then, it extracts selected columns from
# each incoming data frame (date/time/latitude/longitude/altitude)
# into a new and separate data frame. The new data frame
# is then written to a comma-separated-value (CSV) file.
#
# Functions: TheDriver() - Manages execution of demonstration function
# ConvertGPXFiles() - Reads input .GPX- format file, writes
# a subset to an output .CSV file.
#
# Note: The R debug() and browser() statements are incorporated to allow the
# analysis to 'single-step' through the example:
# debug() : Sets up single-line execution of the function.
# browser(): Interrupts execution of function, allows inspection
# of R variable values.
#
# Author: Rick Reeves, NCEAS Scientific Programmer
#
# Copyright 2008 National Center for Ecological Anslysis and Synthesis
# All Rights Reserved
# Intended for use by the scientific community
################################################################################
TheDriver <-function()
{
library(maptools)
#
# debug(): single-step execution begins at first-line of function ConvertGPXFiles))
#
debug(ConvertGPXFiles)
setwd("C:/Projects/Condit/GPSData2008")
#
# convert a file containing Waypoints
#
browser()
ConvertGPXFiles("SampleWaypointsOnly.gpx","SampleWaypointsOut.csv","w")
#
# convert a file containing Tracks
#
browser()
print(sprintf("Convert tracks file..."))
ConvertGPXFiles("SampleTracksOnly.gpx","SampleTracksOut.csv","t")
#
print(sprintf("done with program"))
browser()
}
#
# Routine ConvertGPXFiles demonstrates 1) extraction of GPS data
# stored in the standard GPX format into an R data frame,
# using the R readGPS function; 2) extraction of key fields
# (Date/Time, Latitude/Longitude, Elevation) from the initial
# data frame into a new data frame.
#
# function arguments:
#
# inGPSFile (string): Input GPX file name
# outConvertFile (string): Output CSV file name
# FileType (string): Flag indicates input
# file data type: "w" for waypoints,
# "t" for tracks.
#
ConvertGPXFiles <- function(inGPSFile,outConvertFile,FileType)
{
browser()
if (FileType == "w")
{
#
# Read the GPX-format waypoints into a Data Frame
# Note here: readGPS converts waypoints and tracks into
# data frames with different column layouts.
# Columns are labeled V1 - Vn)
#
gRawWaypt = readGPS("gpx",inGPSFile,"w")
#
# Get number of observations (waypoints)
# gRawWaypoint data frame contains the attributes that
# we desire in the following columns:
#
# V3: Observation Date (factor)
# V4: Observation Time (factor)
# V10: Latitude (numeric)
# V11: Longitude (numeric)
# V21: Elevation (Factor, includes 'M' in last character position)
#
# Let's extract these columns, convert Date, Time to strings
# and elevation to numeric format, and construct a new data frame
# containing only the columns of interest.
#
nObs = length(gRawWaypt[[,"V3"]])
sDate = as.character(gRawWaypt[[,"V3"]])
sTime = as.character(gRawWaypt[[,"V4"]])
fLat = gRawWaypt[[,"V10"]]
fLong = gRawWaypt[[,"V11"]]
#
# Elevation is a factor with the letter 'M' appended.
# Remove this, and convert the elevation to numeric
# This formula extracted from the HTML help file for the R readGPS package
#
fAlt <- as.numeric(substring(as.character(gRawWaypt$V21), 1,(nchar(as.character(gRawWaypt$V21))-1)))
#
# Assemble the output data frame - Assign to global variable dfWaypoints
#
dfWaypoints <<- as.data.frame(cbind(sDate,sTime,fLat,fLong,fAlt))
write.csv(dfWaypoints,outConvertFile)
print(sprintf("done - waypoints"))
}
else if (FileType == "t")
{
gRawTracks = readGPS("gpx",inGPSFile,"t")
#
# The data frame generated by readGPS for Tracks has a different layout:
#
# V3: Latitude (numeric)
# V4: Longitude (numeric)
# V14: Elevation (Factor, includes 'M' in last character position)
# `<Note: Date not yet returned for Tracks by readGPS>`
#
# Let's extract these columns, convert elevation to numeric format,
# and construct a new data frame containing only the columns of interest.
#
fLat = gRawTracks[[,"V3"]]
fLong = gRawTracks[[,"V4"]]
#
# Elevation is a factor with the letter 'M' appended.
# Remove this, and convert the elevation to numeric
# This formula extracted from the HTML help file for the R readGPS package
#
fAlt <- as.numeric(substring(as.character(gRawTracks$V14), 1,(nchar(as.character(gRawTracks$V14))-1)))
#
# Assemble the output data frame - - Assign to global variable dfTracks
#
dfTracks <<- as.data.frame(cbind(fLat,fLong,fAlt))
write.csv(dfTracks,outConvertFile)
print(sprintf("done - tracks"))
}
}`</code>`
## Sample a Vector Polygon Map using a Point Grid Overlay
Imagine that you have a map,consisting of irregularly-shaped polygons, that represents a particular
geographic attribute. Suppose that you wish to: 1) extract sample values for this attribute at regular
intervals across the map, and then 2) write the sampling results to a standard-format spatial data
file. The R sp package includes methods that 1) define a spatial point grid for a region, 2) overlay the
grid on the polygon map, 3) extract values for one or more attributes assigned to the polygons at each
point in the grid, 4) write the point grid to an ESRI Shape File.
Solution:
The following code performs this operation on a vector polygon map depicting [[http://www.nationalatlas.gov/mld/ecoregp.html|Bailey's Ecoregions.]] In this
Wiki example, we present only the R script; for a more complete description of the process, including
example plots of the input and output data sets, please see the NCEAS Scientific Programming Web site
[[http://nceas.ucsb.edu/scicomp/GISSeminar/UseCases/SampleVectorPolygonsRasterGrid/SampleVectorPolysRastGrid.html|Sampling Vector Polygon using Point Grid Overlay ]]
`<code rsplus>`
TheDriver <- function()
{
library(sp)
library(maptools)
library(PBSmapping)
#
# set the working directory for your local machine...
#
setwd("e:/Projects/UseCase/SampleVectorPolygonsRasterGrid")
debug(BaileysTest)
BaileysTest()
print(sprintf("End of example!\n"))
}
#
# R script demonstrates overlay of regularly-spaced point grid
# at specified spatial resolution over a polygon-based data layer,
# AND the sampling of the polygon layer at each point location.
# Uses the R sp() package methods to perform the sampling, and
# some methods from PBSMapping to plot the data.
#
BaileysTest <- function()
{
browser()
#
# Test dataset: Continental US states with 50 attached demographic attributes
# This produces a SpatialPolygonsDataFrame.
#
EcoRegion = readShapePoly("BaileysEcoRegionsConusGP")
#
NewData = EcoRegion@data[[,3:5]]
EcoRegionTrunc = EcoRegion
EcoRegionTrunc@data = NewData
#
# Calculate the sampling grid parameters, get the boundaries from the new SPDF's
# bounding box element. Make the grid a bit bigger, span the entire shape file.
# Then, create a GridTopology object containing the parameters.
#
vals = EcoRegionTrunc@bbox
deltaLong = as.integer((vals[[1,2]] - vals[[1,1]]) + 1.5)
deltaLat = as.integer((vals[[2,2]] - vals[[2,1]]) + 1.5)
#
# grid resolution in fractional degress.
#
gridRes = 1.0
#
gridSizeX = deltaLong / gridRes
gridSizeY = deltaLat / gridRes
#
# GridTopology object forms the basis for the sampling grid
#
GridTopoOneDeg = GridTopology(vals[[,1]],c(gridRes,gridRes), c(gridSizeX,gridSizeY))
#
# Create the SpatialGrid object....
#
SamplingGridOneDeg = SpatialGrid(GridTopoOneDeg)
#
# for educational purposes, we could compare the two bounding boxes....
#
# EcoRegionTrunc@bbox
# SamplingGrid@bbox
#
# We get a more suitable output from the overlay() method
# if we use a SpatialPoint object as an input, so promote
# the SpatialGrid object to one.
#
SamplingPointsOneDeg = as(SamplingGridOneDeg,"SpatialPoints")
NumPoints = length(SamplingPointsOneDeg@coords[[,1]])
OneDegEvents <- as.EventData(data.frame(EID=1:NumPoints,X=SamplingPointsOneDeg@coords[[,1]],
Y=SamplingPointsOneDeg@coords[[,2]]),projection = "NA")
#
# Have a look at the sampling grid: plot polygons and then overplot the points
#
EcoRegionPS = (SpatialPolygons2PolySet(EcoRegionTrunc))
plotPolys(EcoRegionPS)
addPoints(OneDegEvents,col="red",pch=20)
#
# Generate and plot the two-degree sampling grid
#
# grid resolution in fractional degress.
#
gridRes = 2.0
#
gridSizeX = deltaLong / gridRes
gridSizeY = deltaLat / gridRes
#
# GridTopology object forms the basis for the sampling grid
#
GridTopoTwoDeg = GridTopology(vals[[,1]],c(gridRes,gridRes), c(gridSizeX,gridSizeY))
#
# Create the SpatialGrid object....
#
SamplingGridTwoDeg = SpatialGrid(GridTopoTwoDeg)
SamplingPointsTwoDeg = as(SamplingGridTwoDeg,"SpatialPoints")
#
# Define a new graphics device, generate the plot
#
dev.set(1)
plotPolys(EcoRegionPS, xlab = "W Longitude", ylab = "N Latitude")
NumPoints = length(SamplingPointsTwoDeg@coords[[,1]])
TwoDegEvents <- as.EventData(data.frame(EID=1:NumPoints,X=SamplingPointsTwoDeg@coords[[,1]],
Y=SamplingPointsTwoDeg@coords[[,2]]),projection = "NA")
#
# plot the two-degree grid by itself
#
dev.set(1)
plotPoints(TwoDegEvents, col = "green", pch = 20)
#
# plot points against map background
#
dev.set(1)
plotPolys(EcoRegionPS, xlab = "W Longitude", ylab = "N Latitude")
addPoints(TwoDegEvents,col="green",pch = 20)
#
# We promoted the SpatialGrid to SpatialPoints to get the proper case for 'overlay':
# The sampling output type depends on the order of the parameters: see 'overlay-methods' sp help page
#
# SamplingResultsDF is a data frame with one entry per sampling point (with sampled attribute values attached)
# ol2 is a vector of integers, each entry is the polygon ID beneath one sampling point.
#
# Use the overlay() method to extract a polygon attribute value
# at each point location within the polygon data set boundary
#
SamplingResultsDFOneDeg = overlay(EcoRegionTrunc,SamplingPointsOneDeg)
SamplingResultsDFTwoDeg = overlay(EcoRegionTrunc,SamplingPointsTwoDeg)
#
# Lets create a point shape file to contain the
# sampling results. The attribute table will include
# all three Bailey's measures along with the corresponding
# point location.
# Make some modifications to the overlay results data frame
# so that the shape file has usable attributes.
#
# First, add the sampling point coordinates
#
Temp1 = cbind(SamplingPointsOneDeg@coords,SamplingResultsDFOneDeg)
Temp2 = cbind(SamplingPointsTwoDeg@coords,SamplingResultsDFTwoDeg)
#
# Next, convert the ECOCODE attribute from a factor to a character string
#
ecoCode = as.character(Temp1$ECOCODE)
flags = is.na(ecoCode)
ecoCode[[flags]] = "NoMatch"
OutputDataFrame = cbind(Temp1[[1:4]],ecoCode)
#
# Finally, generate a point shape file that includes the EcoRegion attribute.
#
write.pointShape(SamplingPointsOneDeg@coords,OutputDataFrame,"BaileyEcoRegionGridOneDegJuly21")
#
# For the Two-Degree grid, ALSO create a 'clipped' dataset with ONLY points inside USA.
#
ecoCode = as.character( Temp2$ECOCODE)
flags = is.na(ecoCode)
ecoCode[[flags]] = "NoMatch"
OutputDataFrame = cbind(Temp2[[1:4]],ecoCode)
write.pointShape(SamplingPointsTwoDeg@coords,OutputDataFrame,"BaileyEcoRegionGridTwoDegJuly21")
#
# Create subset of the point grid consisting only of those within the USA boundaries
#
TwoDegEventsWithCodes = cbind(TwoDegEvents,ecoCode)
PointsInsideRegions = findPolys(TwoDegEventsWithCodes,EcoRegionPS)
PointIndexesInsideRegions = sort(unique(PointsInsideRegions$EID))
PointsInsideRegions = TwoDegEventsWithCodes[[PointIndexesInsideRegions,]]
#
# Add them to the plot
#
addPoints(PointsInsideRegions,col="green",pch = 20)
#
# Finally, Write the clipped point shapefile, read it back in, and and plot it.
#
write.pointShape(cbind(PointsInsideRegions$X,PointsInsideRegions$Y),PointsInsideRegions,"BaileyEcoRegionGridTwoDegClip")
#
print("hit key to process The FINAL plot..")
browser()
TwoDegInUSA = readShapePoints("BaileyEcoRegionGridTwoDegClip")
#
# look at the structure of the point data object
# before and after conversion to Events
#
str(TwoDegInUSA)
NumPoints = length(TwoDegInUSA@data[[,1]])
PointEvents <- as.EventData(data.frame(EID=1:NumPoints ,X=TwoDegInUSA@coords[[,1]],
Y=TwoDegInUSA@coords[[,2]]))# ,projection = "NA")
str(PointEvents)
#
plotPolys(EcoRegionPS, xlab = "W Longitude", ylab = "N Latitude")
addPoints(PointEvents,col="orange",pch = 20)
}`</code>`
# More R-Spatial resources on the web
## Available tools
* [[http://cran.r-project.org/src/contrib/Views/Spatial.html|CRAN Spatial task view]]
* [[http://www.sal.uiuc.edu/tools/tools-sum/rgeo/r-spatial-projects|R spatial projects page]]
* [[http://r-spatial.sourceforge.net/|R-spatial at Sourceforge]]
## Online tutorial materials
* [[http://www.geog.uu.nl/~pebesma/wun|WUN Global GIS Academy (Dec 2006)]]
* [[http://www.bias-project.org.uk/ASDARcourse|Imperial College London (Aug 2007)]]