Skip to contents

This tutorial walks through the process of adding a new data source and variable to PRIOGRID. Contributions are welcome — report issues or suggest new sources via the Issue Tracker.

Package Layout for Contributors

priogrid/
├── R/
│   ├── data_{source}.R        # read_*() and gen_*() functions for each source
│   ├── config.R               # pg_config(), pg_set_config(), pg_set_rawfolder()
│   ├── build_priogrid.R       # calc_pg(), load_pgvariable(), read_pg_static(), etc.
│   ├── utility.R              # robust_transformation(), rast_to_df(), prio_blank_grid()
│   ├── references.R           # pgcitations(), get_bibliography()
│   ├── source_class.R         # Source R6 class (dev-only: used in devtools::load_all())
│   └── download_data.R        # pgsearch(), pg_rawfiles(), check_pgsourcefiles()
├── data_raw/
│   ├── sources.csv            # Add new source here (54 entries)
│   ├── variables.csv          # Add new variable here (32 entries)
│   ├── pgsources.R            # Regenerates data/pgsources.rda from sources.csv
│   └── pgvariables.R          # Regenerates data/pgvariables.rda from variables.csv
├── data/
│   ├── pgsources.rda          # Compiled metadata (auto-generated, do not edit)
│   └── pgvariables.rda        # Compiled metadata (auto-generated, do not edit)
├── inst/
│   ├── REFERENCES.bib         # Full bibliography — add new BibTeX entries here
│   └── extdata/urls/          # Multi-file URL lists (one .txt per source)
└── tests/testthat/
    └── test-build_priogrid.R  # Integration tests for gen_*() functions

Step 1: Register the Data Source

Add to data_raw/sources.csv

Every data source needs a row in data_raw/sources.csv. Required fields:

Column Description Example
source_name Full name of the dataset "My New Dataset"
source_version Version string "1.0"
license SPDX or common license name "CC BY 4.0"
website_url Landing page "https://example.com/data"
spatial_extent One of: "World", "Multiple continents", "Single continent", "Several countries" "World"
temporal_resolution One of: "Static", "Higher than monthly", "Monthly", "Quarterly", "Yearly", "Less than yearly" "Yearly"
citation_keys Semicolon-separated BibTeX keys "doeNewDataset2025"
download_url Direct download URL, or "urls/{uuid}.txt" for multi-file sources "https://example.com/data.zip"

Optional fields: aws_bucket, aws_region, prio_mirror, tags, reference_keys.

Generate a UUID for the id column using uuid::UUIDgenerate():

uuid::UUIDgenerate()
# "a1b2c3d4-e5f6-7890-abcd-ef1234567890"

Use the Source Class (dev mode)

In development mode (devtools::load_all()), use the Source R6 class to validate a new source before adding it to the CSV:

new_source <- Source$new(
  source_name          = "My New Dataset",
  source_version       = "1.0",
  license              = "CC BY 4.0",
  website_url          = "https://example.com/data",
  spatial_extent       = "World",
  temporal_resolution  = "Yearly",
  citation_keys        = "doeNewDataset2025",
  download_url         = "https://example.com/data/dataset_v1.zip",
  tags                 = "climate, land cover"
)
new_source  # prints validation report; warns if citation key not in REFERENCES.bib

The Source class validates required fields, checks that spatial_extent and temporal_resolution use the allowed vocabulary, and verifies that citation keys exist in inst/REFERENCES.bib.

Add the Citation to inst/REFERENCES.bib

Add a BibTeX entry to inst/REFERENCES.bib for each citation_keys value:

@article{doeNewDataset2025,
  author  = {Doe, Jane},
  title   = {My New Dataset: A Global Resource},
  journal = {Scientific Data},
  year    = {2025},
  volume  = {12},
  pages   = {1--10},
  doi     = {10.1234/sdata.2025.001}
}

Regenerate pgsources.rda

After editing sources.csv, regenerate the .rda file:

source("data_raw/pgsources.R")

Verify the result:

pgsources[pgsources$source_name == "My New Dataset", ]

Step 2: Write read_*() and gen_*() Functions

Create a new file R/data_mynewsource.R.

read_*() — Load Raw Data

The read function downloads (via get_pgfile()) and returns the raw data as an sf or SpatRaster object:

#' Read My New Dataset
#'
#' Downloads and imports My New Dataset.
#'
#' @return An \code{sf} object
#' @export
#' @references
#' \insertRef{doeNewDataset2025}{priogrid}
read_mynewsource <- function() {
  f <- get_pgfile(
    source_name    = "My New Dataset",
    source_version = "1.0",
    id             = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  )
  sf::st_read(f)
}

get_pgfile() automatically downloads the file if it is not present locally (when automatic_download = TRUE in the config).

gen_*() — Generate the PRIOGRID Variable

The generate function transforms raw data to the PRIOGRID grid and returns a SpatRaster. The signature must accept config:

#' My New Variable
#'
#' Computes my_new_variable for each PRIOGRID cell.
#'
#' @param config A \code{pg_config} object. Defaults to \code{\link{pg_current_config}()}.
#' @return A \code{SpatRaster} object
#' @export
#' @references
#' \insertRef{doeNewDataset2025}{priogrid}
gen_my_new_variable <- function(config = pg_current_config()) {
  raw <- read_mynewsource()

  # Reproject if needed
  pg_crs <- sf::st_crs(config$crs)
  if (sf::st_crs(raw) != pg_crs) {
    raw <- sf::st_transform(raw, pg_crs)
  }

  # Convert to raster and re-grid to PRIOGRID resolution
  raw_rast <- terra::rasterize(terra::vect(raw), prio_blank_grid(config),
                               field = "my_value_column", fun = "mean")

  r <- robust_transformation(raw_rast, agg_fun = "mean", config = config)

  names(r) <- "my_new_variable"
  return(r)
}

Using robust_transformation()

robust_transformation() handles reprojection, cropping, aggregation, disaggregation, and final resampling in a single call. It is the standard way to re-grid any raster:

robust_transformation(
  r             = raw_rast,      # Any SpatRaster (any resolution / CRS / extent)
  agg_fun       = "mean",        # Aggregation function for higher-res inputs
  disagg_method = "near",        # Disaggregation for lower-res inputs
  config        = config         # Target config
)

Common agg_fun values: "mean", "sum", "max", "min", "modal" (for categorical).

Temporal Variables

For time-varying variables, iterate over pg_dates(config) and combine layers:

gen_my_yearly_variable <- function(config = pg_current_config()) {
  dates <- pg_dates(config)
  layers <- lapply(dates, function(d) {
    raw <- read_mynewsource_for_year(lubridate::year(d))  # your reader
    r   <- robust_transformation(raw, agg_fun = "mean", config = config)
    names(r) <- as.character(d)
    r
  })
  do.call(c, layers)  # stack into a multi-layer SpatRaster
}

Step 3: Register the Variable

Add a row to data_raw/variables.csv:

name,static,source_ids
my_new_variable,FALSE,a1b2c3d4-e5f6-7890-abcd-ef1234567890
  • name must match the string passed to names(r) in gen_*() and equal gen_{name} without the prefix.
  • static is TRUE for variables without a time dimension, FALSE for time-varying.
  • source_ids is the UUID from sources.csv. Multiple sources are comma-separated.

Regenerate pgvariables.rda:

source("data_raw/pgvariables.R")

Verify:

pgvariables[pgvariables$name == "my_new_variable", ]

Step 4: Document with roxygen2

PRIOGRID uses roxygen2 with Markdown support and Rdpack for citation references.

Key tags: - @param config — always document the config parameter for gen_*() functions - @return — describe what the function returns - @export — all user-facing functions must be exported - @references \insertRef{key}{priogrid} — links to inst/REFERENCES.bib

Regenerate documentation:

devtools::document()

Step 5: Add Tests

Add tests to tests/testthat/. For a gen_* function, the minimal test:

# tests/testthat/test-data_mynewsource.R
test_that("gen_my_new_variable returns a SpatRaster", {
  skip_if_not_installed("terra")
  cfg <- pg_config(
    nrow = 36, ncol = 72,   # 5-degree resolution for fast tests
    start_date = as.Date("2010-12-31"),
    end_date   = as.Date("2010-12-31")
  )
  r <- gen_my_new_variable(config = cfg)
  expect_s4_class(r, "SpatRaster")
  expect_equal(names(r), "my_new_variable")
})

Run tests:

devtools::test()

Step 6: Verify End-to-End

Test the full pipeline:

cfg <- pg_config()

# 1. Calculate
calc_pg("my_new_variable", config = cfg)

# 2. Load as raster
r <- load_pgvariable("my_new_variable", config = cfg)
terra::plot(r)

# 3. Load as table
pg_tv <- read_pg_timevarying(config = cfg)
"my_new_variable" %in% names(pg_tv)

# 4. Check citations resolve
pgcitations("my_new_variable")

Summary Checklist