2 Vector: Hello, world!

This chapter will introduce you to some of the key concepts surrounding the use of vector spatial data in R. You will learn the basics of how to import and manipulate vector data.

2.1 Importing vector data

2.1.1 Spatial data formats

The Geospatial Data Abstraction Library (GDAL) is a translator library for raster and vector geospatial data formats. It underpins most GIS software (both proprietary and open-source). We will start by loading the R package of this library and check all the vector data formats that can be imported into R.

##         name                     long_name write  copy isVector
## 1 AeronavFAA                   Aeronav FAA FALSE FALSE     TRUE
## 2 AmigoCloud                    AmigoCloud  TRUE FALSE     TRUE
## 3     ARCGEN             Arc/Info Generate FALSE FALSE     TRUE
## 4     AVCBin      Arc/Info Binary Coverage FALSE FALSE     TRUE
## 5     AVCE00 Arc/Info E00 (ASCII) Coverage FALSE FALSE     TRUE
## 6        BNA                     Atlas BNA  TRUE FALSE     TRUE

Here, vector_formats is a data frame containing names and information of various formats supported by GDAL. You can explore vector_formats. How many different formats are available? Are you familiar with any of them?

2.1.2 Import vector data: Shapefile

Shapefile is a popular format used in GIS. Typically shapefiles consist of four (and sometimes more) files. These have the following extensions: .shp, .shx, .dbf, and .prj. You are likely to encounter shapefiles a lot, especially if working with 3rd party data (e.g. point location data from an NGO).

We will import a polygon layer of the world countries extracted from Natural Earth Data. This data should be available in your data folder:

## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Git\spatial_data_analysis\data\ne\ne_110m_admin_0_countries", layer: "ne_110m_admin_0_countries"
## with 177 features
## It has 94 fields
## Integer64 fields read as strings:  POP_EST NE_ID

OGR is the vector part of the GDAL library. For future reference, any functions with OGR in its name will mostly likely be used to read, write or manipulate vector data.

It is also worth noting that the syntax used within readOGR can vary depending on the vector format being read. If you have trouble reading in other types vector data, you will likely be able to find examples online.

Lets make our first plot using base R graphics:

We can check the class and spatial attributes of this layer:

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"

We can see that this data is a SpatialPolygonsDataFrame. Within this object, there are multiple pieces of information stored in slots. Slots can be accesed with the @ symbol. Lets see what is contained within the slots of countries:

2.1.3 Import vector data: Geopackage

Despite the popularity of Shapefiles, we (well, mainly James) strongly encourage you to adopt the Geopackage format. One huge benefit of the .gpkg file type is that all data is stored in a single file, as opposed to the 4+ found in shapefiles.

We will import a modified version of the “Populated places of the world” dataset, also from from Natural Earth Data:

## OGR data source with driver: GPKG 
## Source: "C:\Git\spatial_data_analysis\data\ne\ne_110m_populated_places_simple.gpkg", layer: "places"
## with 243 features
## It has 38 fields
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

As you can see, the class of this data is a SpatialPointsDataFrame. It has similar properties to the SpatialPolygonsDataFrame seen with the countries object, except, that each feature is represented by a single point, rather than group of points making a polygon.

The plot of points is not very informative on its own. Let’s combine with the world map:

This plot demonstrates the key role that layering plays in GIS and spatial data visualisation. Some maps will contain many layers of information stacked on top of one another, each representing a different feature or set of features.

2.2 Coordinate Reference Systems (CRS)

You can check the CRS of your spatial object. This is stored as a character string:

## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

Sometimes, you may wish to reproject your data, either for visualisation or analysis purposes. In this case, we transform to the Mollweide projection. This is an accurate single global projection that preserves geographic area. You can see an example of application in March et al. 2019:

2.3 Geographic subset

Subsetting data is a useful way to select data only relevant to you. This can be done spatially, i.e. selecting data that falls within an area of interest, or by attributes. The latter utilises the attirubte table to filter the data, returning spatial objects that match the desired profile.

2.3.1 Geographic subset by setting a bounding box

First, create a bounding area for subsetting the data:

## [1] "Extent"
## attr(,"package")
## [1] "raster"

Then, you can use the extent object to subset the world map using the crop function (from the raster package, but also works on vector data):

Task
  • Make a plot of South America
  • Tip: you can use QGIS or Google Earth to search for coordinates to use in your extent object.

2.3.2 Geographic subset by data attributes

As we discovered earlier, the linked attribute table is stored in the data slot. In this case, the table contains specific information for each country. You can explore convert the data object from countries into a data.frame and then explore the attribute data more easily:

List of continents:

## [1] Oceania                 Africa                  North America          
## [4] Asia                    South America           Europe                 
## [7] Seven seas (open ocean) Antarctica             
## 8 Levels: Africa Antarctica Asia Europe North America ... South America

Subset countries from South America:

## class      : Extent 
## xmin       : -81.41094 
## xmax       : -34.72999 
## ymin       : -55.61183 
## ymax       : 12.4373

2.4 Spatial overlap

Determining the overlap between spatial layers in order to extract information from one to another is one of the most common tasks in GIS.

Here, we want to plot the populated places, but only in Africa. However, there is no information about the continent in the places layer We will use a spatial overlay to extract information from the countries layer using the over function:

## [1] "data.frame"

This operation returns a data.frame containing the same number of features as places, but with the spatially matching attributes from countries. We then can append the continent information to the attribute table of the places point layer:

Note that there are features with NA in the CONTINENT attribute. This is due to a spatial missmatch between layers in terms of resolution, more specifically because of the coarse resolution of the countries layer. There are several alternatives: 1) use a high-res countries map, 2) calculate the nearest polygon.

2.5 Interactive maps

Spatial datasets can sometimes require an interactive visualization of the data. In R, we can use the leaflet library to generate a dynamic viewer:

You can customize your map using different basemaps, add more spatial layers and much more. You can check the official package website for many examples.