MLproject Runner¶
chapkit mlproject run turns any directory containing an MLflow-style MLproject file into a running chapkit service — no code generation, no changes to the MLproject repo. It is aimed at users coming to chapkit with an existing train.r / predict.py (or any shell-command-driven model) who want a chapkit HTTP API around their scripts in seconds.
Quick Start¶
Point it at a directory. All three forms work:
chapkit mlproject run # uses current directory
chapkit mlproject run . # same
chapkit mlproject run /path/to/my_mlproject
This parses the MLproject file, translates the entry-point commands to chapkit's workspace conventions, builds a FastAPI service with /api/v1/ml/$train and /$predict endpoints, and serves on 127.0.0.1:8000 by default. Override host/port with --host and --port.
A minimal R MLproject like dhis2-chap/minimalist_example_r:
name: minimalist_r
renv_env: renv.lock
entry_points:
train:
parameters:
train_data: path
model: str
command: "Rscript train.r {train_data} {model}"
predict:
parameters:
historic_data: path
future_data: path
model: str
out_file: path
command: "Rscript predict.r {model} {historic_data} {future_data} {out_file}"
...becomes a service that accepts POST /api/v1/ml/$train (with your CSV as a DataFrame payload), runs Rscript train.r data.csv model in an isolated workspace, and stores the resulting workspace as an artifact. Predict works the same way, re-entering the training workspace.
Canonical Parameter Mapping¶
chapkit mlproject run recognises the canonical MLproject parameter names used across chap-core-compatible models. Each is substituted with a fixed filename that matches chapkit's ShellModelRunner workspace layout:
| MLproject parameter | Substitutes to | Notes |
|---|---|---|
train_data |
data.csv |
Training data CSV written by chapkit |
historic_data |
historic.csv |
Historic data CSV during predict |
future_data |
future.csv |
Future-period data CSV during predict |
out_file |
predictions.csv |
Where your script writes predictions |
model |
model |
Literal path; your script saves to and loads from it. The file (or directory) persists across train → predict via workspace copy. |
model_config |
config.yml |
Config YAML chapkit writes from user_options + prediction_periods |
polygons |
geo.json |
Optional GeoJSON for spatial models |
These names are the lingua franca used by chap-core (chap_core/runners/command_line_runner.py) so any MLproject that already runs under chap-core is expected to run under chapkit mlproject run without change.
Overriding the Map¶
If your MLproject uses a non-canonical placeholder (e.g. {dataset}), provide an override at launch:
The --param NAME=FILENAME flag is repeatable. Overrides win over the canonical map, which lets you re-point {model} or any other parameter if your scripts expect a different filename.
Dynamic Config from user_options¶
MLproject user_options become typed fields on a chapkit BaseConfig subclass, generated at startup with pydantic.create_model. Given:
name: ewars_template
user_options:
n_lags:
type: integer
default: 3
description: Number of lags to include in the model.
precision:
type: number
default: 0.01
description: Prior on the precision of fixed effects.
chapkit mlproject run builds an ewars_templateConfig with n_lags: int = 3, precision: float = 0.01, and the standard prediction_periods: int = 3 injected automatically. Scripts read these values from config.yml, which chapkit writes to the workspace root before invoking your train/predict command.
Supported type values: integer/int, number/float, string/str, boolean/bool, path (treated as string). Unknown types fall back to str. Options without a default become required fields.
Environment Hints¶
If your MLproject declares a runtime environment (docker_env, renv_env, python_env, uv_env, conda_env), chapkit mlproject run warns about it on startup but does not auto-activate it:
WARNING: chapkit mlproject run does not auto-activate environments.
- uv_env: pyproject.toml
Activate the right runtime (R/renv, conda, Docker image, etc.) before launching
chapkit mlproject run, or invoke chapkit mlproject run from inside it.
Because chapkit mlproject run shells out python ... / Rscript ... via ShellModelRunner, the subprocess inherits whatever is on PATH at launch time. Invoking the chapkit entry point directly (e.g. ./.venv/bin/chapkit mlproject run .) does not put .venv/bin on subprocess PATH, so import pandas (or similar) will fail.
Recommended invocations:
- Python MLproject:
uv run chapkit mlproject run .(preferred) orsource .venv/bin/activate && chapkit mlproject run . - R MLproject: run
chapkit mlproject run .from inside an R-capable container or anrenv-activated shell.
The published container images (below) set PATH correctly so this is a non-issue there.
Running in a Container¶
The Dockerfiles and publish workflow live in the companion dhis2-chap/chapkit-images repo so image builds don't block chapkit's own CI. Two variants are published per language flavour:
chapkit-py/chapkit-r/chapkit-r-tidyverse/chapkit-r-inla— runtime base only (Python+uv, plus the R toolchain / R + tidyverse / R + INLA where relevant). Chapkit is not installed. Use these as theFROMfor your own Dockerfile when your project'spyproject.tomlalready pins chapkit and auv syncstep installs it. This is whatchapkit initandchapkit mlproject migrateproduce. See the deployment guide for that flow.chapkit-py-cli/chapkit-r-cli/chapkit-r-tidyverse-cli/chapkit-r-inla-cli— same runtime base, with chapkit pre-installed. Use these for the "no local chapkit needed" workflow:docker runan MLproject directly via the bundledchapkit run .CMD, or invoke any otherchapkitsubcommand (e.g.chapkit mlproject migrate) without installing chapkit on the host.
The two-variant split exists because pre-installing chapkit in the runtime base caused a measurable runtime memory overhead for projects that pinned a different chapkit version than the base image — uv sync had to take an uninstall + reinstall path with a heavier profile. Splitting keeps the install path uniform for FROM-style use, while preserving the convenience of docker run for CLI-style use.
The -cli images, all built on debian:trixie-slim:
| Image | Contents | Architectures | Typical size (amd64) |
|---|---|---|---|
ghcr.io/dhis2-chap/chapkit-py-cli:latest |
Python 3.13, chapkit, uv | linux/amd64, linux/arm64 |
~220 MB |
ghcr.io/dhis2-chap/chapkit-r-cli:latest |
R 4.5 + renv + pak, Python 3.13, chapkit, uv |
linux/amd64, linux/arm64 |
~400 MB |
ghcr.io/dhis2-chap/chapkit-r-tidyverse-cli:latest |
R 4.5 + tidyverse + Posit forecasting / ML stack (fable, tsibble, feasts, forecast, ranger, xgboost, glmnet, lubridate, janitor, ...), Python 3.13, chapkit | linux/amd64, linux/arm64 |
~600 MB |
ghcr.io/dhis2-chap/chapkit-r-inla-cli:latest |
R 4.5 + INLA + spatial / time-series R stack (sf, spdep, dlnm, tsModel, sn, xgboost, fmesher, ...), Python 3.13, chapkit | linux/amd64 (INLA x86_64 only) |
~570 MB |
Each -cli image also publishes a :dev tag, rebuilt nightly with chapkit installed from the main branch instead of PyPI — use it to test against unreleased chapkit changes.
Which one to pick:
- Python MLproject? Use
chapkit-py-cli— lean and multi-arch. - R MLproject with no preinstalled stack? (e.g.
minimalist_example_r) Usechapkit-r-cli. Multi-arch, includes the R toolchain andrenv/pakso you can install additional CRAN packages or restore a lockfile at runtime. - R MLproject using the tidyverse / forecasting stack? Use
chapkit-r-tidyverse-cli. Multi-arch, ships tidyverse + fable/tsibble/feasts/forecast + ranger/xgboost/glmnet/lubridate/janitor and friends. - R MLproject that uses INLA? (e.g. EWARS-style) Use
chapkit-r-inla-cli. INLA + fmesher + the chap-core R-model parity set (sf, spdep, sn, dlnm, tsModel, xgboost, ...) are pre-installed. amd64 only; you will need Rosetta emulation on Apple Silicon.
All four set WORKDIR /work and default to CMD ["chapkit", "run", ".", "--host", "0.0.0.0", "--port", "8000"], so mounting your MLproject into /work is enough:
# Python model
docker run --rm -p 8000:8000 \
-v "$(pwd):/work" \
ghcr.io/dhis2-chap/chapkit-py-cli:latest
# R model on a minimal R image (multi-arch)
docker run --rm -p 8000:8000 \
-v "$(pwd):/work" \
ghcr.io/dhis2-chap/chapkit-r-cli:latest
# R model with tidyverse / forecasting stack (multi-arch)
docker run --rm -p 8000:8000 \
-v "$(pwd):/work" \
ghcr.io/dhis2-chap/chapkit-r-tidyverse-cli:latest
# R model with INLA (amd64-only; Rosetta on Apple Silicon)
docker run --rm -p 8000:8000 --platform=linux/amd64 \
-v "$(pwd):/work" \
ghcr.io/dhis2-chap/chapkit-r-inla-cli:latest
The same images run any other chapkit subcommand without local install — for example, migrating an existing MLproject directory:
docker run --rm -v "$(pwd):/work" \
ghcr.io/dhis2-chap/chapkit-py-cli:latest \
chapkit mlproject migrate . --yes
Model-Level Dependencies¶
The -cli images contain chapkit, Python, and (for chapkit-r-cli) the R toolchain (renv + pak, no preinstalled R packages); chapkit-r-tidyverse-cli and chapkit-r-inla-cli add the tidyverse / forecasting and INLA / spatial stacks respectively. They do not ship your model's extra dependencies (pandas, scikit-learn, additional CRAN packages, etc.). Add them in one of two ways:
-
Bake your own image (production-recommended):
-
Install at container start (quick dev):
Security¶
Both images currently run as root. Non-root hardening needs the usual volume-mapping dance (writable /tmp, per-user cache dirs, etc. — see chap-core/compose.yml for the reference pattern) and is a planned follow-up. The images are intended to sit in a trusted compose network behind chap-core.
Integration with chap-core¶
chapkit mlproject run is designed to sit on a docker compose network alongside chap-core. Self-registration with chap-core is handled by servicekit's SERVICEKIT_ORCHESTRATOR_URL environment variable (the same mechanism used by chap-core/compose.ewars.yml):
services:
my-model:
image: ghcr.io/dhis2-chap/chapkit-r-cli:latest
platform: linux/amd64
volumes:
- ./my_mlproject:/work
environment:
SERVICEKIT_ORCHESTRATOR_URL: http://chap:8000/v2/services/$$register
# Optional shared secret, if chap has SERVICEKIT_REGISTRATION_KEY set:
# SERVICEKIT_REGISTRATION_KEY: ${SERVICEKIT_REGISTRATION_KEY:-}
depends_on:
chap:
condition: service_healthy
networks:
- chap
No chapkit-side configuration is needed — if SERVICEKIT_ORCHESTRATOR_URL is set, the service registers itself with chap-core on startup.
Limitations and When to Prefer chapkit mlproject migrate or chapkit init¶
chapkit mlproject run is a thin runtime wrapper: it does not generate, edit, or version any files in your MLproject. That keeps it ideal for:
- Rapid evaluation of an existing MLproject under chapkit.
- Running a model in a docker-compose network alongside chap-core without a port to chapkit.
- Models whose train/predict logic is stable and already well-tested outside chapkit.
When you're ready to own the service code (commit it, extend it, ship it as your own image), reach for chapkit mlproject migrate: it's the code-generating sibling of run that adopts your MLproject in place and produces a committable main.py, Dockerfile, pyproject.toml (with your deps merged in), compose.yml, and CHAPKIT.md. Your train/predict scripts stay put; only chapkit-owned metadata and chaff moves to _old/.
Use chapkit init instead when you want to start a greenfield chapkit project — no existing MLproject to adopt, full template choice (fn-py, shell-py, shell-r, shell-r-tidyverse, shell-r-inla), validation-hook stubs.
| You have… | Use |
|---|---|
| An MLproject you want to quickly evaluate | chapkit mlproject run |
| An MLproject you want to own as a chapkit project | chapkit mlproject migrate |
| Nothing yet | chapkit init |