Skip to contents

Installation

Install the development version from GitHub:

# install.packages("remotes")
remotes::install_github("dhis2-chap/chap_r_sdk")

Note: If prompted for GitHub authentication, you can skip it by pressing Enter (the repository is public).

Load the package:

What is the Chap R SDK?

The chapr package provides infrastructure for developing disease forecasting models compatible with the Chap platform. Chap (Climate Health Analytics Platform) enables health ministries to run predictive models for disease surveillance.

This SDK simplifies model development by handling:

  • CLI creation: Command-line interfaces for train/predict workflows
  • File I/O: Automatic CSV loading, tsibble conversion, output formatting
  • Configuration: YAML/JSON config parsing with schema validation
  • Validation: Test suites to verify Chap compatibility

Quick Start

The recommended pattern uses create_chap_cli() to create a complete command-line interface:

library(chapr)
library(dplyr)

# Define training function - receives loaded tsibble, not file paths
train_my_model <- function(training_data, model_configuration = list(), run_info = list()) {
  means <- training_data |>
    group_by(location) |>
    summarise(mean_cases = mean(disease_cases, na.rm = TRUE))

  return(list(means = means))
}

# Define prediction function - all inputs already loaded
predict_my_model <- function(historic_data, future_data, saved_model,
                              model_configuration = list(), run_info = list()) {
  predictions <- future_data |>
    left_join(saved_model$means, by = "location") |>
    mutate(samples = purrr::map(mean_cases, ~c(.x))) |>
    select(-mean_cases)

  return(predictions)
}

# Enable CLI with one function call
if (!interactive()) {
  create_chap_cli(train_my_model, predict_my_model)
}

Command Line Usage

Save the above code as model.R, then use from the command line.

In terminal:

# Train the model
Rscript model.R train --data training_data.csv

# Generate predictions
Rscript model.R predict --historic historic.csv --future future.csv \
    --output predictions.csv

# Display model information
Rscript model.R info

Model Function Interface

Your model needs two functions:

Training Function

train_fn <- function(training_data, model_configuration = list(), run_info = list()) {
  # training_data: tsibble with time_period index, location key, disease_cases
  # model_configuration: optional list of parameters from config file
  # run_info: runtime info from Chap (prediction_length, additional_continuous_covariates, etc.)
  # Returns: model object (saved as RDS)
}

Prediction Function

predict_fn <- function(historic_data, future_data, saved_model,
                       model_configuration = list(), run_info = list()) {
  # historic_data: tsibble with historical observations
  # future_data: tsibble with time periods to predict (no disease_cases)
  # saved_model: object returned by train_fn
  # run_info: runtime info from Chap
  # Returns: tibble with samples list-column
}

Important: historic_data may contain more recent observations than training data. Time series models should refit to historic_data before forecasting.

What the SDK Handles

You don’t need to write code for:

Task SDK handles it
Loading CSV files readr::read_csv()
Converting to tsibbles tsibble::as_tsibble()
Detecting time columns Finds time_period, date, week, etc.
Detecting key columns Finds location, region, etc.
Loading/saving models readRDS() / saveRDS()
Parsing configs yaml::yaml.load_file()

Your functions only contain business logic - no file I/O boilerplate.

Data Format

Training/Historic Data

CSV with time, location, target, and covariates:

time_period,location,disease_cases,population,rainfall
2023-01,LocationA,45,10000,120.5
2023-02,LocationA,52,10000,85.2
2023-01,LocationB,78,15000,130.1

Future Data

Same structure without the target variable:

time_period,location,population,rainfall
2023-05,LocationA,10000,95.0
2023-06,LocationA,10000,110.3

Prediction Output

Tibble with samples list-column containing numeric vectors:

# Deterministic: single value per row
tibble(
  time_period = "2023-05",
  location = "LocationA",
  samples = list(c(42))
)

# Probabilistic: multiple Monte Carlo samples
tibble(
  time_period = "2023-05",
  location = "LocationA",
  samples = list(rpois(1000, lambda = 42))
)

Configuration

Reading Configuration Files

# Via CLI - config passed automatically to your functions
Rscript model.R train --data data.csv --config config.yaml

# Or read manually:
config <- read_model_config("config.yaml")

Safe Parameter Extraction

config <- list(
  model = list(
    params = list(learning_rate = 0.01, epochs = 100)
  )
)

# Extract nested parameters with defaults
lr <- get_config_param(config, "model", "params", "learning_rate", .default = 0.001)
print(lr)
#> [1] 0.01

# Returns default if path not found
missing <- get_config_param(config, "model", "missing", .default = "default")
print(missing)
#> [1] "default"

Configuration Schema

Define a schema for validation and the info subcommand:

config_schema <- create_config_schema(
  title = "My Model Configuration",
  properties = list(
    n_samples = schema_integer(default = 100L, minimum = 1L),
    learning_rate = schema_number(default = 0.01, minimum = 0, maximum = 1)
  )
)

create_chap_cli(train_fn, predict_fn, model_config_schema = config_schema)

Next Steps

Getting Help