More

Shapefiles plots together in a GIS, but not in R

Shapefiles plots together in a GIS, but not in R


I tried plotting these two files and a set of points (dropbox folder) in QGIS and MapInfo. They align pretty good (not perfect though).

But in R, only the data from www.naturalearthdata.com can plot along with other things like the points. My 'own' shapefiles plots nicely, but always alone (not adding on top of or "allowing" points on top of) and…

My own shapefiles are rotated (clipped from a larger file where the center has north up) and I can't figure out how to save the applied rotation in MapInfo (I've tried exporting numerous ways). Maybe this is causing my issue?

library(rgdal) library(raster) #Loading "my" shape files: shp.WashingtonLand<-readOGR(".","WLmap2_polyline") plot(shp.WashingtonLand) #Loading naturalearthdata.com data for only Washington Land shp.greenland<-readOGR(".","GRL_adm2") ext.washington.land <- extent(-68, -58, 79.9, 81.2) shp.clip.WashingtonLand <- crop(shp.greenland, ext.washington.land) plot(shp.clip.WashingtonLand) #data points SamplePoints <- read.csv2("SamplenumberAndCoordinates",header=TRUE) locs <- subset(SamplePoints, select = c("Latitude", "Longitude")) coordinates(locs) <- c("Longitude", "Latitude") plot(locs, col="red", add=T)

Hope it makes sense, it took a while getting all the things together and simplified appropriately.


For geoprocessing, I suggest to turn on-the-fly-reprojection in QGIS OFF to see whether your shapes align or not. In many cases, geoprocessing does nor work when the shapes are in different CRS. So save your polygon layer as WGS84 EPSG:4326 (do NOT useSet CRS for Layerfor that!), to match with the points layer coordinates.

For the NaturalEarth dataset, it does not perfectly align for me too. Using QGIS, you can switch the project CRS to EPSG:3857, and add Google or Openstreetmap imagery from the Openlayers plugin as a reference background:

You see that points and polygons fit to the background, but the green GADM layer does not. I don't know how they digitized that. If you just need the coastline, you can take that from Openstreetmap:

http://openstreetmapdata.com/data/coastlines

It is the same source that is used for the Openstreetmap tiles.


The WLmap2_polyline data is using WGS 1984 UTM zone 24N, but the data is actually in zone 20N. Denmark uses a wide-area implementation of transverse Mercator for Greenland. I checked against ArcGIS and our "complex math" version of TM doesn't improve the offset. Or the data was projected into UTM 24N using the a more standard UTM implementation and that's causing the offset.

The administrative boundary data is in WGS84, as is the csv (actually semi-color separated values as the lat/lon values are using commas for the decimal point), so if R doesn't support reprojecting the data automatically, you'll have to do that.


Data Inventory & Geographic Information Systems

North Carolina's archaeological resources represent over 12,000 years of culture and history. Today these resources are becoming increasingly rare as archaeological sites are lost to construction and urban expansion. Even worse, important archaeological sites are threatened by vandalism. Each year hundreds of sites in North Carolina (and thousands over the United States) are damaged or destroyed by unscrupulous collectors who dig for artifacts to sell or to add to their own collections. These activities destroy historic and scientific resources.

It is important that amateur archaeologists, who enjoy collecting Native American artifacts, understand the fragile nature of archaeological sites and practice proper techniques when investigating them. First and foremost, the collector must understand the difference between collecting artifacts from the ground surface and digging into a site. Digging an archaeological site without the supervision of a trained professional destroys most of the information that archaeologists need to interpret a site and should never be attempted. On the other hand, responsible amateur archaeologists can engage in surface collecting of sites and contribute to the knowledge of the prehistory of our state.

Help save our archaeological heritage by accomplishing the following, and together, we can save the past for the future.

Site Record Inventory

The OSA maintains a statewide, computer-based inventory of archaeological sites along with maps, photographs, artifact collections and other data sources that support the inventory. Comprehensive libraries of archaeological reports and publications are also housed at OSA facilities. Access to the site record inventory is limited to in-person visits. To conduct research or view site records please visit our site file search guidelines.

OSA is currently undertaking the laborious process of creating a GIS database of North Carolina's archaeological sites and systematically surveyed areas. This digitization effort has enabled staff to record sites and conduct environmental review within GIS. OSA now accepts recorded sites and surveyed areas in either shapefile or geodatabase formats. Consultants and researchers who visit our Raleigh office have access to the GIS in its current state however, at this time, we do not offer web-based access.

Individuals seeking to do background research at an OSA facility must meet or be under the supervision of an individual who meets the Secretary of the Interior’s Professional Qualification Standards as described in 36 CFR Part 61 (See OSA Standards and Guidelines Part 2b). It is expected individuals doing background research will have been trained in how to conduct research at the OSA prior to scheduling an appointment.

To protect sites from looting, OSA's data and inventory is protected by state statute and we reserve the right to restrict access to information when deemed necessary.


Using R — Working with Geospatial Data

GIS, an acronym that brings joy to some and strikes fear in the heart of those not interested in buying expensive software. Luckily fight or flight can be saved for another day because you don’t need to be a GIS jock with a wad of cash to work with spatial data and make beautiful plots. A computer and internet connection should be all you need. In this post I will show how to

  • Get a machine ready to use R to work with geospatial data
  • Describe what type of data can be used and some of the exciting sources of free GIS data
  • Use data from the Washington Department of Natural Resources to generate some pretty plots by working through an example script

Getting your Machine Ready

First, if you do not have R on your computer you can download it here. Even better, download Rstudio, an incredibly efficient and easy to use interface for working with R available here. Working with R studio is highly recommended and will be more clearly outlined in this post. R does not support working with spatial data straight out of the box so there are a couple of packages that need to be downloaded to get R working with spatial data. The two packages required are ‘sp’ and ‘rgdal’. We will also use a third package, ‘rgeos’ for some fancy geospatial tricks. Unfortunately, the latest release of the sp package is not compatible with the latest version of R — v 3.0 at this time. When the Rstudio install prompts to install R, download version 2.15.3.

First intall and open RStudio. To add the required packages open RStudio and click on the “packages” tab in the lower right hand panel of the interface. The lower right window will show a list of packages that come with a standard download of RStudio. Some of the packages will have check marks next to them, this means that those libraries are loaded and ready to be used. If you just downloaded R for the first time sp and rdgal will not be on that list, click on the “Install Packages” button. Make sure the “Install from” option is set to “Repository (CRAN)” and type “sp” into the “Packages” space. Check the “Install Dependencies” option and download! By checking the “Install Dependencies” option packages sp needs to function properly will automatically be downloaded. Download rgdal in the same way and you have the tools needed to start! Download rgeos as well and you can run the portion of the example script that uses centroids.

Sp and Rgdal abilities and sources of Data

Rgdal is what allows R to understand the structure of shapefiles by providing functions to read and convert spatial data into easy-to-work-with R dataframes. Sp enables transformations and projections of the data and provides functions for working with the loaded spatial polygon objects. The take away should be that these are powerful libraries that allow R to use .shp files. The US Geological Survey, the National Park Serivce, and the Washington State Department of Natural Resources are a just a few examples of organizations that make enormous stockpiles of spatial data available to the public.

Example Script

The following code uses Watershed Resource Inventory Area (WRIA) spatial data from the Washington State Department of Ecology. This dataset contains information on Washington State’s different water resource management areas. Exactly what information is stored in the shapefiles will be explored using R! If a function or any of the code looks mysterious try “? mysteriousFunctionName()” and handy documentation will fill you in on what a function does. Lets start using R to investigate the data. Just cut and paste the following code into RStudio.

NOTE: Check the Department of Ecology GIS data page if any of the links are unavailable.

We don’t know yet what the different ‘slots’ contain but Holy Smokes! That plot looks like Washington. The plot is not pretty (yet) but in just a few lines of code you can already make a primitive map of the polygons in this dataset. The fact that this object is a

DataFrame suggests that we can treat it like an R dataframe. Lets find out what data it has, select the interesting columns and work with only those. (Not surprisingly, the @data slot is where the actual dataframe exists but many dataframe methods also work on the entire object.)

We’ll go to the trouble of renaming variables, reprojecting the data to “longlat” and then saving the data to a .RData file for easy loading in the future.

We have now saved both WRIA and WRIAPugetSound data as R spatial objects that can easily be loaded! Now the real fun begins, those of you who have been waiting for pretty plots are about to be rewarded. The rest of the script is a walk through of some of the fun analysis and basic figure creating that can easily be done with the converted WRIA data.

Who needs expensive GIS software?

Another useful package for plotting spatial data is the “maptools” library available from from the Cran repository. The code below requires the maptools package so make sure it is installed before running the code.

These examples only scratch the surface of the plotting and analysis potential available in R using the sp, rgdal, and maptools packages. Exploring the plot-methods in sp is highly recommended before investing a large amount of time into scripting a plot on your own.

Congratulations! You now know how to find out what is in geospatial data and how to make plots using R.


Installation

The package is available as a CRAN version, which is updated infrequently (a few times a year), and a GitHub version, which is updated whenever the author works with the package. Try the GitHub version if you encounter a bug in the CRAN version.

Due to the package size limitations, ggOceanMaps requires the ggOceanMapsData package which stores shapefiles used in low-resolution maps. The ggOceanMapsData package is not available on CRAN but can be installed from a drat repository on GitHub. To install both packages, write:

The (more) frequently updated GitHub version of ggOceanMaps can be installed using the devtools package.


Shapefiles plots together in a GIS, but not in R - Geographic Information Systems

Night light satellite data can be useful as a proxy for economic activity in regions for which no GDP data are available, for example at the sub-national level, or for regions in which GDP measurement is of poor quality, for example in some developing countries (see e.g. Henderson et al., 2012).

This package was built using the code of the paper "The Elusive Banker. Using Hurricanes to Uncover (Non-)Activity in Offshore Financial Centers" by Jakob Miethe but while that paper only focuses on small island economies, this package allows construction of nightlight statistics for most geospatial units on the planet which is hopefully useful for researchers in other areas. The R package allows to perform calculations on night light satellite data and build databases for any given region using the function nightlight_calculate . Plots of the night lights in the desired area are also made very easy with nightlight_plot .

To install the package, run devtools::install_github("JakobMie/nightlightstats") .

You can either work with yearly DMSP data ranging from 1992 to 2013 (https://www.ngdc.noaa.gov/eog/dmsp/downloadV4composites.html - Image and data processing by NOAA's National Geophysical Data Center, DMSP data collected by US Air Force Weather Agency) or monthly VIIRS data beginning in Apr 2012 (https://eogdata.mines.edu/products/vnl/ - Earth Observation Group, Payne Institute for Public Policy, Elvidge et al., 2013). The package (if desired) automatically downloads spatial data by country for any administrative level from GADM (https://gadm.org/data.html).

Yearly VIIRS data are now available at https://eogdata.mines.edu/nighttime_light/annual/v20/. The package does not yet process these data.

If at any point you experience trouble with the calculation capacities of your computer because the region you want to investigate is too large, you can take a look at the auxiliary function slice_shapefile which conveniently cuts a region into smaller pieces.

You can download DMSP night light satellite data with this function.

Note: as the Colorado School of Mines has changed the access to their VIIRS nighttime light data to require a free user account, you have to download the VIIRS data manually via their website. For further information, see https://payneinstitute.mines.edu/eog-2/transition-to-secured-data-access/.

A note about the disk space used by the files: The yearly data are of lower resolution than the monthly data and take up less space (1 yearly DMSP image for the whole world = 1/16 space of a monthly VIIRS image for the whole world). Hence, yearly data will probably be fine on your normal drive (all years together ca. 45 GB incl. quality files), but working with monthly data likely requires an external drive (about 1.5+ TB for all files incl. quality files). Quality files (recognizable by the ending cf_cvg) show how many observations went into the value of a pixel in the aggregated night light image in a given period.

The DMSP data are available per year in one image for the whole world. The only relevant information is the timespan and the location where you want to save the images on your drive. For example, to download all of the data ranging from 1992 to 2013 at once, you can input the following into the function:

Monthly VIIRS images divide the whole world into 6 geographic tiles. It may happen that the region you want to analyze is overlapping on two or more of these tiles, so in order to analyze that region, you would need to download all relevant tiles. You can check which tiles to download by using the coordinates of your area, or you can just download all tiles.

This function allows to perform calculations on night lights of a region in a given time interval. The output will be an aggregated dataframe in your environment, or if desired an additional dataframe for each area provided in area_names . To get these single dataframes, you have to set the argument single_dataframes to TRUE.

For example, if you want to get a dataframe for the DMSP night lights in adm 1 regions of Luxembourg between 1992 and 1995, you can input:

which will give you this dataframe called "lights" in the R environment:

You can see that there are some useful default output elements. Firstly, you get the name of the area you input in area_names . If there is an iso3c countrycode in case this area is a country, this will be registered and output as well. The area in square kilometers will automatically be calculated based on your shapefiles or coordinates, so you can easily integrate it into further calculations. NAME_1 indicates the names of the adm 1 regions of Luxembourg. Columns with lower-level administrative region names will only appear if you specify the administrative level in the argument admlevel (the default is 0, which refers to country borders for countries or does nothing in case your shapefile is e.g. a city or other region not included in a system of administrative districts). "mean_obs" refers to the mean number of observations per pixel that went into the aggregated image for a time period in a given area (taken from the quality files). Useful default calculations are the sum of the light values in your area and their mean. Outliers can be identfied with the minimum and maximum light values.

You can, however, use any function for the calculations that you wish. You have to load it into your environment first in case it is a user-written function. Existing functions from base R or packages work as well. Then, you can input the name of the R object as a string into a vector using the argument functions_calculate . The function has to accept an na.rm argument. In case it does not, you have to wrap the function accordingly. If encountering problems, check the documentation of raster::extract , into which all functions are fed. This function sets the conditions for which other functions than the default settings work.

Other useful arguments in nightlight_calculate , for which you can consult the helpfiles for further details about their specific usage are:

rawdata : This argument allows to output a dataframe with simply the raw light pixels and their values and coordinates for each region and time period additionally to the standard processed dataframe.

cut_low , cut_high , cut_quality : These arguments allow to exclude certain pixels from the calculation. If desired, any values smaller than cut_low , any values larger than cut_high and any pixels with number of observations less or equal to cut_quality will be set to NA. The default setting for cut_quality is 0, which means that pixels with 0 observations in a time period will be set to NA.

rectangle_calculate : In case your shapefile does not feature an enclosed area with which calculations can be performed, the code will automatically transform your shapefile into a rectangle based on the minimum/maximum x- and y-coordinates. If this for some reason does not work, you can set this to TRUE or FALSE manually (the default is NULL, which activates automatic detection, but for non-standard shapefiles the detection might fail). Below, you can see an illustration of what the code would do in case you input this shapefile with a non-enclosed area (the railway system of Ulm, Germany).

This function allows to plot a shapefile with its night lights for a given period of time. Note: even though it is possible to produce multiple plots by using multiple inputs for area_names and a timespan for time , you should pay attention to the number of plots that will be produced this way - all plots will be loaded into your global environment as ggplot objects, hence a large number of objects can be loaded into your environment quickly.

The basic input arguments are the same as for the other functions. For example, if you input:

You get the following image, either by already having the shapefile for Germany adm1 stored in your shapefile_location , or, if this is not the case, by the function automatically downloading the shapefile from GADM.

In case you want to plot a region that is not available on GADM (i.e. a region that is not a country), you must have the downloaded shapefile in your shapefile_location , so the function can detect it according to the name you give in area_names . If this fails, there is always the option to use the shapefiles argument and just give the filenames of the shapefiles instead (you still have to set the area_names for the naming of the output). This applies to nightlight_calculate as well.

In case you input a set of coordinates, you will get an image with a rectangular shapefile constructed from your coordinates.

Limitations and alternative data sources

The code is not explicitly written for fast performance.

The yearly DMSP data are of suboptimal quality. Problems are for example a lower resolution and more blooming/blurring of the lights compared to the VIIRS data. Moreover, the DMSP data feature a discrete scale that is top-coded at a digital number of 63, while the VIIRS data have a continuous scale and no top-coding. Detection for low illumination ranges is also better in VIIRS.

There is a Matlab code to de-blur the DMSP data (see Abrahams et al., 2018).

You could use a Pareto distribution to circumvent top-coding and extrapolate light values e.g. in city centers (see Bluhm & Krause, 2018). There is a version of the DMSP data that has no top-coding, although only for the years 1996, 1999, 2000, 2003, 2004, 2006 and 2010 (available at https://eogdata.mines.edu/dmsp/download_radcal.html). You can use these images by setting corrected_lights to TRUE. These are different data, so you have to download them first by setting this argument to TRUE in nightlight_download as well (unzipping these files takes quite long and they are larger than the normal DMSP files, about 6GB per yearly file incl. quality file). Note that these radiance-calibrated data come with their own problems (see again Bluhm and Krause, 2018).

Temporal consistency is not an issue with the VIIRS data, since the light values are consistently calibrated and standardized to describe a fixed unit of radiance (nano Watts / cm2 / steradian).

Temporal consistency can be an issue with the DMSP data. There are several versions of DMSP satellites. For some years, these versions overlap and there are two images available for a year. nightlight_download always downloads both. Then, for nightlight_calculate and nightlight_plot , the versions to be used will be chosen based on the timespan you input into the functions. If possible, a consistent version for your timespan will be chosen. If 2 consistent versions are available for your timespan, the newer one will be selected. If no consistent version is available, the newest version for each year will be chosen. You will be notified about the selected DMSP versions. Using just one version is generally desirable, since the light intensity scale is not consistently calibrated and the measured values could thus not be perfectly comparable across satellite versions (see e.g. Doll, 2008) values might even not be fully comparable across years within a satellite version. In an econometric analysis using DMSP data, satellite version fixed effects and year fixed effects are advisable (see e.g. Gibson et al., 2020).

Natural factors influence data quality. Cloud coverage affects the number of observed nights that go into an aggregated image and can be especially strong in tropical regions. The data often lack values for summer months for regions close to the Poles due to stray light during summer nights. Aurora light and snowfall could influence observed lights for regions close to the Poles. These points can nicely be illustrated by looking at the mean of light values in Moscow, a bright city with a rather short distance to the North Pole. You can see that values for the summer months are not available and that values in the winter months fluctuate strongly, possibly due to disturbances caused by aurora light or varying reflection intensity of human-produced light due to variation in snowfall.

To increase coverage during summer months for the monthly VIIRS data, you can use a straylight-corrected VIIRS version (see Mills et al., 2013) by setting corrected_lights to TRUE. These are different images, so you have to download them first as well. The trade-off is that these data are of reduced quality.

You can also work with a harmonized DMSP-VIIRS yearly dataset spanning from 1992 to 2018 (see Li et al., 2020, available at https://figshare.com/articles/Harmonization_of_DMSP_and_VIIRS_nighttime_light_data_from_1992-2018_at_the_global_scale/9828827/2). To use these data, you have to set harmonized_lights to TRUE in all functions. The harmonized dataset is built with the non straylight-corrected VIIRS data. The VIIRS data are cleaned from disturbances due to aurora and temporal lights and then matched to the resolution and top-coding of the DMSP data. The DMSP data are temporally calibrated to ensure temporal consistency. The data are already produced with quality weights separate quality files are not included in the dataset. Note that these data occupy way less space than the other ones, with about 40 MB per yearly image.

Abrahams, A., Oram, C., & Lozano-Gracia, N. (2018). Deblurring DMSP nighttime lights: A new method using Gaussian filters and frequencies of illumination. Remote Sensing of Environment, 210, 242-258.

Bluhm, R. & Krause, M. (2018). Top lights - Bright cities and their contribution to economic development. CESifo Working Paper No. 7411.

Doll, C. (2008). CIESIN thematic guide to night-time light remote sensing and its applications. Center for International Earth Science Information Network, Columbia University, New York.

Elvidge, C. D., Baugh, K. E., Zhizhin, M., & Hsu, F.-C. (2013). Why VIIRS data are superior to DMSP for mapping nighttime lights. Asia-Pacific Advanced Network 35(35), 62.

Gibson, J., Olivia, S. Boe-Gibson, G. (2020). Night lights in economics: Sources and uses. CSAE Working Paper Series 2020-01, Centre for the Study of African Economies, University of Oxford.

Henderson, J. V., Storeygard, A., & Weil, D. N. (2012). Measuring economic growth from outer space. American Economic Review, 102(2), 994–1028.

Li, X., Zhou, Y., Zhao, M., & Zhao, X. (2020). A harmonized global nighttime light dataset 1992–2018. Scientific Data, 7(1).

Mills, S., Weiss, S., & Liang, C. (2013). VIIRS day/night band (DNB) stray light characterization and correction. Earth Observing Systems XVIII.


Shapefiles to download

I use a lot of different shapefiles in this example. To save you from having to go find and download each individual one, you can download this zip file:

Unzip this and put all the contained folders in a folder named data if you want to follow along. You don’t need to follow along!

Your project should be structured like this:

These shapefiles all came from these sources:

  • World map: 110m “Admin 0 - Countries” from Natural Earth
  • US states: 20m 2018 state boundaries from the US Census Bureau
  • US counties: 5m 2018 county boundaries from the US Census Bureau
  • US states high resolution: 10m “Admin 1 – States, Provinces” from Natural Earth
  • Global rivers: 10m “Rivers + lake centerlines” from Natural Earth
  • North American rivers: 10m “Rivers + lake centerlines, North America supplement” from Natural Earth
  • Global lakes: 10m “Lakes + Reservoirs” from Natural Earth
  • Georgia K–12 schools, 2009: “Georgia K-12 Schools” from the Georgia Department of Education(you must be logged in to access this)

Abstract

Buildings are the major source of energy consumption in urban areas. Accurate modeling and forecasting of the building energy use intensity (EUI) in the urban scale have many important applications, such as energy benchmarking and urban energy infrastructure planning. The use of Big Data technology is expected to have the capability of integrating a large number of predictors and giving an accurate prediction of the energy use intensity of buildings in the urban scale. However, past research has often used Big Data technology in estimating energy consumption of a single building rather than the urban scale, due to several challenges such as data collection and feature engineering. This paper therefore proposes a geographic information system integrated data mining methodology framework for estimating the building EUI in the urban scale, including preprocessing, feature selection, and algorithm optimization. Based on 216 prepared features, a case study on estimating the site EUI of 3640 multi-family residential buildings in New York City, was tested and validated using the proposed methodology framework. A comparative study on the feature selection strategies and the commonly used regression algorithms was also included in the case study. The results show that the framework was able to help produce lower estimation errors than previous research, and the model built by the Support Vector Regression algorithm on the features selected by Elastic Net has the least cross-validation mean squared error.


3D surface produced from wind speed data at 100m altitude over the Continental US

I knew someone would call me out. Should have gone with Albers EAC. This was actually just a quick experiment that came out looking better than I expected so I posted as is.

Twitter and IG: @geo_spatialist for more info.

That looks awesome! I actually like how the lakes look. It represents the phenomenon that you are trying to represent, and cutting them out to me would suggest no wind and I would probably get upset lol. Amazing idea!

Feel like blue and white would have been more appropriate for the data, but damn if it ain't beautiful

What program/library did you use? Python?R? Very nice plot!

Cool that you can clearly see the Columbia plateau

Want this on my wall! Awesome job!

This is phenomenal, awesome job!

This is absolute garbage. Send it to me and I will get rid of it for you.

Edit: Wait, this is an image? I literally thought it was a 3D print hung on a wall. Damn dude. This is good stuff.

You had me in the first half, not gonna lie.

Very cool, I love how it shows the complexity of the Great Plains. I think I would go with a different color scale because the browns with the shadow diminish the detail in some areas. It would be really interesting to see wind energy projects mapped over this.


Project 2: Setting Up Your Analysis Environment in R

Many of the analyses you will be using require commands contained in different packages. You should have already loaded many of these packages, but just to ensure we document what packages we are using, let's add in the relevant information so that if the packages have not been switched on, they will be when we run our analysis.

Note that the lines that start with #install.packages will be ignored in this instance because of the # in front of the command. Lines of code with a # symbol are treated as comments by the compiler. If you do need to install a package (e.g., if you hadn't already installed all of the relevant packages), then remove the # when you run your code so the line will execute.

Note that for every package you install, you must also issue the library() function to tell R that you wish to access that package. Installing a package does not ensure that you have access to the functions that are a part of that package.

Set the working directory

Before we start writing R code, you should make sure that R knows where to look for the shapefiles and text files. First, you need to set the working directory. In your case, the working directory is the folder location where you placed the contents of the .zip file when you unzipped it. To follow up, make sure that the correct file location is specified. There are several ways you can do this. As shown below, you can set the directory path to a variable and then reference that variable whereever and whenever you need to. Notice that there is a slash at the end of the directory name. If you don't include this, it can lead to the final folder name getting merged with a filename, meaning R cannot find your file.

Note that R is very sensitive to the use of "/" as a directory level indicator. R does not recognize "" as a directory level indicator and an error will return. DO NOT COPY THE DIRECTORY PATH FROM FILE EXPLORER AS THE FILE PATH USES "" but R only recognizes "/" for this purpose.

You can also issue the setwd() function that "sets" the working directory for your program.

Finally and most easily, you can also set the working directory through RStudio's interface: Session - Set Working Directory - Choose Directory.

Issue the getwd() function to confirm that R knows what the correct path is.

When you've finished replacing the filepaths above to the relevant ones on your own computer, check the results in the RStudio console with the list.files() command to make sure that R is seeing the files in the directories. You should see outputs that print that list the files in each of the directories. 90% of problems with R code come from either a library not being loaded correctly or a filepath problem, so it is worth taking some care here.

Loading and viewing the data file (Two Code Options)

Add the crime data file that you need and view the contents of the file. Inspect the results to make sure that you are able to read the crime data.


Installing the Cartogram R packages

Two packages Rcartogram and getcartr make the functionality of the Gastner and Newman (2004) procedure available for working with objects of class Spatial* . Installing Rcartogram requires the fftw library to be installed. How to best do that depends on your system, for Mac OS X the homebrew package system makes this installation easy.

We are now ready to compute our first cartogram using the getcartr::quick.carto function.

With the cartogram functionality now being directly available through R allows one to embedd cartogram making in a full R pipeline. We illustrate this by generating a sequence of cartograms into an animated GIF file using the animation package. The animation below shows a cartogram for the population size for each of the 32 age groups in the Berlin data set. One observes that the 25-45 year old tend to live in the city centre, while the 95-110 year old seem to concentrate in the wealthy regions in the south west.


Watch the video: Python GIS - Open and Display a Shapefile with Geopandas