Skip to content

Describing your model in our yaml-based format

To make your model chap-compatible, you need your train and predict endpoints (as discussed here) need to be formally defined in a YAML format that follows the popular MLflow standard. Your codebase need to contain a file named MLproject that defines the following: - An entry point in the MLproject file called train with parameters train_data and model - An entry point in the MLproject file called predict with parameters historic_data, future_data, model and out_file

These should contain commands that can be run to train a model and predict the future using that model. The model parameter should be used to save a model in the train step that can be read and used in the predict step. CHAP will provide all the data (the other parameters) when running a model.

Here is an example of a valid MLproject file (taken from our minimalist_example).

The MLproject file can specify a docker image, Python virtual environment, uv-managed environment, or renv environment (for R models) that will be used when running the commands. An example of this is the MLproject file contained within our minimalist_example_r.

Environment options

Docker environment

Use docker_env to specify a Docker image:

docker_env:
  image: python:3.11

MLflow/Conda environment

Use python_env to specify a conda/pip environment file (uses MLflow to manage):

python_env: python_env.yml

uv environment

Use uv_env to specify a pyproject.toml for uv-managed environments. This is useful for models that use uv for dependency management:

uv_env: pyproject.toml

Commands will be executed via uv run, which automatically handles the virtual environment. Make sure your model directory contains a valid pyproject.toml with dependencies specified. See the example uv model for a complete example.

Example MLproject file with uv:

name: my_model
uv_env: pyproject.toml
entry_points:
  train:
    parameters:
      train_data: str
      model: str
    command: "python main.py train {train_data} {model}"
  predict:
    parameters:
      model: str
      historic_data: str
      future_data: str
      out_file: str
    command: "python main.py predict {model} {historic_data} {future_data} {out_file}"

renv environment (for R models)

Use renv_env to specify an renv.lock file for R models that use renv for dependency management:

renv_env: renv.lock

When CHAP runs your model, it will automatically:

  1. Look for the renv.lock file in your model directory
  2. Run renv::restore(prompt = FALSE) to install all required R packages
  3. Execute your R commands with the restored environment

Your model directory should contain:

  • renv.lock - The lockfile specifying exact package versions (generated by renv::snapshot())
  • renv/ directory - Contains renv activation scripts
  • .Rprofile - Auto-activates renv when R starts (typically contains source("renv/activate.R"))

Example MLproject file with renv:

name: my_r_model
renv_env: renv.lock
entry_points:
  train:
    parameters:
      train_data: str
      model: str
    command: "Rscript main.R train --train_data {train_data} --model {model}"
  predict:
    parameters:
      historic_data: str
      future_data: str
      model: str
      out_file: str
    command: "Rscript main.R predict --model {model} --historic_data {historic_data} --future_data {future_data} --out_file {out_file}"

Setting up renv for your R model

  1. Initialize renv in your R project:

    renv::init()
    

  2. Install your required packages:

    renv::install("dplyr")
    renv::install("argparser")
    # ... other packages
    

  3. Create the lockfile:

    renv::snapshot()
    

This creates renv.lock with exact versions of all dependencies, ensuring reproducible environments.

See the minimalist R model example for a complete working example.

Model Configuration Options

You can define configurable parameters in your MLproject file using user_options. This allows users to customize model behavior when running your model, without modifying the model code itself.

Schema structure

Each option in user_options has the following fields:

  • title: Display name for the parameter
  • type: One of string, integer, number, boolean, or array
  • description: What the parameter does
  • default: Optional default value. If omitted, the parameter is required

Example MLproject with user_options

name: my_model

docker_env:
  image: python:3.11

entry_points:
  train:
    parameters:
      train_data: str
      model: str
    command: "python train.py {train_data} {model}"
  predict:
    parameters:
      historic_data: str
      future_data: str
      model: str
      out_file: str
    command: "python predict.py {model} {historic_data} {future_data} {out_file}"

user_options:
  n_lag_periods:
    title: n_lag_periods
    type: integer
    default: 3
    description: "Number of lag periods to include in the model"
  learning_rate:
    title: learning_rate
    type: number
    description: "Learning rate for training (required)"

Providing configuration values

Configuration values can be provided via the --model-configuration-yaml CLI flag when running evaluate2 or other commands:

chap evaluate2 my_model data.csv results.nc --model-configuration-yaml config.yaml

The configuration YAML file should contain the parameter values:

n_lag_periods: 5
learning_rate: 0.01

Validation rules

  • Options without a default value are required and must be provided
  • Only options defined in user_options are allowed in the configuration file
  • Values must match the specified type (e.g., integers for integer type)

Examples in the codebase

See the following examples that use user_options: