MLproject Configuration¶
Note: We have future plans of going away from using MLproject files for configuring models, and instead use the new chapkit framework. This document describes the current implementation using MLproject files.
MLproject files define model templates in CHAP. They specify the model name, execution environment, and entry points for training and prediction.
MLproject File Structure¶
MLproject files use YAML format with the following fields:
| Field | Required | Description |
|---|---|---|
name |
Yes | Model identifier |
| Environment | Yes (one of) | docker_env, python_env, uv_env, renv_env, or rest_api_url |
entry_points |
Yes | Train and predict commands |
user_options |
No | Configurable parameters exposed to users |
meta_data |
No | Display name, author, description, status |
required_covariates |
No | List of required covariate names |
min_prediction_length |
No | Minimum prediction horizon |
max_prediction_length |
No | Maximum prediction horizon |
Note that defining rest_api_url is experimental, and is used for using MLproject files to configure chapkit models that run via REST API calls.
Example¶
From external_models/naive_python_model_with_mlproject_file_and_docker/MLproject:
name: naive_python
docker_env:
image: python:3.13
entry_points:
train:
parameters:
train_data: str
model: str
command: "python train.py {train_data} {model}"
predict:
parameters:
historic_data: str
future_data: str
model: str
out_file: str
command: "python predict.py {model} {historic_data} {future_data} {out_file}"
user_options:
some_option:
title: some_option
type: integer
default: '10'
description: "Some option for the model"
Parsing Flow¶
The following describes how the chap-core codebase parses MLproject files from local paths or GitHub URLs, and how we internally represent the information.
Local Files¶
get_model_template_from_mlproject_file() in chap_core/models/utils.py:
This function validates against ModelTemplateConfigV2 using Pydantic and returns a ModelTemplate instance.
GitHub URLs¶
fetch_mlproject_content() in chap_core/external/github.py:
- Parses URL to extract owner, repo name, and commit/branch
- Constructs raw GitHub URL:
https://raw.githubusercontent.com/{owner}/{repo}/{commit}/MLproject - Fetches and returns the YAML content from the MLproject file.
Class Representation¶
Core Classes (chap_core/external/model_configuration.py)¶
-
ModelTemplateConfigV2- Main config class that combines all MLproject fields. Inherits fromModelTemplateConfigCommonandRunnerConfig. -
RunnerConfig- Environment settings. This is used to define the environment in which the model will run. It includes one of the following fields: entry_points: EntryPointConfigdocker_env: DockerEnvConfigpython_env: struv_env: str-
renv_env: str -
EntryPointConfig- Containstrainandpredictcommands asCommandConfigobjects -
CommandConfig- Single command withcommand: strand optionalparameters: dict
Metadata Classes (chap_core/database/model_templates_and_config_tables.py)¶
-
ModelTemplateMetaData- Display information:display_name,author,description,author_assessed_status,organization,contact_email,citation_info -
ModelTemplateInformation- Technical details:supported_period_type,user_options,required_covariates,min_prediction_length,max_prediction_length,target,allow_free_additional_continuous_covariates
Database Storage¶
ModelTemplateDB (chap_core/database/model_templates_and_config_tables.py:47)¶
Stores parsed MLproject data. Inherits from ModelTemplateMetaData and ModelTemplateInformation.
Key fields:
- name: str - Unique model identifier
- source_url: str - GitHub URL or local path
- version: str - Version string
- archived: bool - Whether the template is archived
ConfiguredModelDB (chap_core/database/model_templates_and_config_tables.py:65)¶
Stores configured model instances with specific parameter values.
Key fields:
- name: str - Unique configuration name
- model_template_id: int - Foreign key to ModelTemplateDB
- user_option_values: dict - User-specified option values
- additional_continuous_covariates: list - Extra covariates for this configuration
Runner Selection¶
get_train_predict_runner_from_model_template_config() in chap_core/runners/helper_functions.py:17-96 selects the appropriate runner based on environment configuration:
| Environment Field | Runner Class |
|---|---|
docker_env |
DockerTrainPredictRunner |
uv_env |
UvTrainPredictRunner |
renv_env |
RenvTrainPredictRunner |
python_env |
MlFlowTrainPredictRunner |
| None | CommandLineTrainPredictRunner |
The runner handles executing the train and predict commands in the appropriate environment.