Working with geopackages in R

Working with geopackages in R

- 3 mins

Summary:


What is a geopackage?

A geopackage1 (*.gpkg) is an open format for geospatial information, platform-independent, implemented as a SQLite database. It contains vector features, tile matrix sets of imagery and raster maps at various scales, schema and metadata. Everything is collected in a single file ready to be use, facilitating transfer and usability of the information.


Using a geopackage file in R

In order to read geopackages in R we are going to use two libraries: rgdal and RSQLite. I am going to explain what I have done using as an example forestry data from the Finnish center Metsäkeskus2, this data is available here3. I will for this example the mapsheet file MV_P4441E.gpkg. If you want to follow my example with the same data, please go ahead and download and unzip the file. If you are curius about the forestry data cointaned here you can explore the variables and learn more about the databases in here4.

Lets’ start! A good first step is to explore the layers from the geopackages file:

  library (rgdal)
  library (RSQLite)

  # Explore the layers available 
  ogrListLayers("Data/MV_P4441E.gpkg")

In this case we can see that there are 14 layers:

 [1] "stand"            "study_area_grid"  "stand_dissolved" 
 [4] "stand_clipped"    "stand_grid"       "restriction"     
 [7] "treestratum"      "treestandsummary" "operation"       
[10] "assortment"       "specialfeature"   "datasource"      
[13] "treestand"        "specification"  
attr(,"driver")
[1] "GPKG"
attr(,"nlayers")
[1] 14

If I am interested in loading only one of the layers, for example in the layer “stratum “ I can do so by using dplyr semantics to directly query the gpkg file:

dta <- src_sqlite ("Data/MV_P4441E.gpkg") 
tbldata <- tbl (dta, "stratum") #Create a table from a data source
tbldf <- as.data.frame (tbldata) #Create a data frame

I have also created a super simple function to save time when I load several files, it looks like this:

load_databasegpkg <- function (GPKG, layer){ # e.g.GPKG = "MV_N5411E.gpkg",  layer = "stratum"
  GPKGpath <- paste0("Data/", GPKG)
  dta <- src_sqlite (GPKGpath)
  tbldata <- tbl (dta, layer)
  tbldf <- as.data.frame (tbldata)
  return(tbldf)
}

Thats all for now about the gpkg-R connection. Do you have a better approach? Please let me know!

Olalla Díaz Yáñez

Olalla Díaz Yáñez

#DataScience #ScientificCommunication #Science #Risk #ForestDynamics #ForestManagement #ForestBiomass #ForestEcology

comments powered by Disqus
rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora