Describing your model in our yaml-based format¶
To make your model chap-compatible, you need your train and predict endpoints (as discussed here) need to be formally defined in a YAML format that follows the popular MLflow standard.
Your codebase need to contain a file named MLproject that defines the following:
- An entry point in the MLproject file called train with parameters train_data and model
- An entry point in the MLproject file called predict with parameters historic_data, future_data, model and out_file
These should contain commands that can be run to train a model and predict the future using that model. The model parameter should be used to save a model in the train step that can be read and used in the predict step. CHAP will provide all the data (the other parameters) when running a model.
Here is an example of a valid MLproject file (taken from our minimalist_example).
The MLproject file can specify a docker image, Python virtual environment, uv-managed environment, or renv environment (for R models) that will be used when running the commands. An example of this is the MLproject file contained within our minimalist_example_r.
Environment options¶
Docker environment¶
Use docker_env to specify a Docker image:
MLflow/Conda environment¶
Use python_env to specify a conda/pip environment file (uses MLflow to manage):
uv environment¶
Use uv_env to specify a pyproject.toml for uv-managed environments. This is useful for models that use uv for dependency management:
Commands will be executed via uv run, which automatically handles the virtual environment. Make sure your model directory contains a valid pyproject.toml with dependencies specified. See the example uv model for a complete example.
Example MLproject file with uv:
name: my_model
uv_env: pyproject.toml
entry_points:
train:
parameters:
train_data: str
model: str
command: "python main.py train {train_data} {model}"
predict:
parameters:
model: str
historic_data: str
future_data: str
out_file: str
command: "python main.py predict {model} {historic_data} {future_data} {out_file}"
renv environment (for R models)¶
Use renv_env to specify an renv.lock file for R models that use renv for dependency management:
When CHAP runs your model, it will automatically:
- Look for the
renv.lockfile in your model directory - Run
renv::restore(prompt = FALSE)to install all required R packages - Execute your R commands with the restored environment
Your model directory should contain:
renv.lock- The lockfile specifying exact package versions (generated byrenv::snapshot())renv/directory - Contains renv activation scripts.Rprofile- Auto-activates renv when R starts (typically containssource("renv/activate.R"))
Example MLproject file with renv:
name: my_r_model
renv_env: renv.lock
entry_points:
train:
parameters:
train_data: str
model: str
command: "Rscript main.R train --train_data {train_data} --model {model}"
predict:
parameters:
historic_data: str
future_data: str
model: str
out_file: str
command: "Rscript main.R predict --model {model} --historic_data {historic_data} --future_data {future_data} --out_file {out_file}"
Setting up renv for your R model¶
-
Initialize renv in your R project:
-
Install your required packages:
-
Create the lockfile:
This creates renv.lock with exact versions of all dependencies, ensuring reproducible environments.
See the minimalist R model example for a complete working example.
Model Configuration Options¶
You can define configurable parameters in your MLproject file using user_options. This allows users to customize model behavior when running your model, without modifying the model code itself.
Schema structure¶
Each option in user_options has the following fields:
title: Display name for the parametertype: One ofstring,integer,number,boolean, orarraydescription: What the parameter doesdefault: Optional default value. If omitted, the parameter is required
Example MLproject with user_options¶
name: my_model
docker_env:
image: python:3.11
entry_points:
train:
parameters:
train_data: str
model: str
command: "python train.py {train_data} {model}"
predict:
parameters:
historic_data: str
future_data: str
model: str
out_file: str
command: "python predict.py {model} {historic_data} {future_data} {out_file}"
user_options:
n_lag_periods:
title: n_lag_periods
type: integer
default: 3
description: "Number of lag periods to include in the model"
learning_rate:
title: learning_rate
type: number
description: "Learning rate for training (required)"
Providing configuration values¶
Configuration values can be provided via the --model-configuration-yaml CLI flag when running evaluate2 or other commands:
The configuration YAML file should contain the parameter values:
Validation rules¶
- Options without a
defaultvalue are required and must be provided - Only options defined in
user_optionsare allowed in the configuration file - Values must match the specified type (e.g., integers for
integertype)
Examples in the codebase¶
See the following examples that use user_options: