MLproject Migrate¶
chapkit mlproject migrate turns an existing MLflow-style MLproject directory into a first-class chapkit project in place. It's the code-generating sibling of chapkit mlproject run: where run adapts an MLproject at runtime without writing anything, migrate gives you a main.py, a Dockerfile pointing at the right chapkit-images base, a pyproject.toml, a compose.yml, and a CHAPKIT.md that you can commit, extend, and ship.
Your existing train/predict scripts, helpers, and arbitrary config / data files are not moved or modified. Project metadata that chapkit regenerates (README.md, .gitignore, pyproject.toml, Dockerfile, compose.yml, .python-version, Makefile, the MLproject itself), plus obvious chaff (input/, output/, example_data*/, isolated_run.*, stale data CSVs, serialised model files, renv/), are swept into _old/ so nothing is lost — your original repo state is fully recoverable from there.
Before and after migrate: the lifecycle checklist
The more your MLproject tells chapkit up front (required_covariates, user_options, pinned Python deps, service flags), the less hand-editing you need afterwards. The Migration Checklist walks through both the pre-migration prep and the post-migration smoke-test / iterate / ship steps — ten minutes there saves the "why did chapkit test fail" loop after.
When to use which¶
| Command | Writes files | Good for |
|---|---|---|
chapkit mlproject run |
No | Evaluating an MLproject against chapkit's API without changes to the repo. |
chapkit mlproject migrate |
Yes (at project root) | Turning an MLproject into a chapkit project you own, commit, and containerise. |
chapkit init |
Yes (creates a new dir) | Greenfield chapkit project from scratch. |
Quick start¶
cd /path/to/your/mlproject
chapkit mlproject migrate --dry-run # preview the plan without touching anything
chapkit mlproject migrate # execute with interactive prompts
chapkit mlproject migrate --yes # execute non-interactively (scripts / CI)
No local chapkit install? Run it via the -cli container (chapkit pre-installed):
docker run --rm -v "$(pwd):/work" \
ghcr.io/dhis2-chap/chapkit-py-cli:latest \
chapkit mlproject migrate . --yes
After a successful run, the project root holds your unchanged source files alongside the generated chapkit wrapper, with the original structure preserved under _old/.
What moves where¶
| Kind | Destination | Examples |
|---|---|---|
| Original MLproject definition | _old/ |
MLproject, MLproject.yaml |
| User's project metadata (chapkit regenerates these) | _old/ |
README.md, .gitignore, .dockerignore, .python-version, pyproject.toml, Makefile, Dockerfile, compose.yml, CHAPKIT.md |
| Built-for-dev runners | _old/ |
isolated_run.r, isolated_run.R, evaluate_one_step.py, slides.md |
| Example / demo data | _old/ |
input/, output/, example_data/, example_data_monthly/, root-level training_data*.csv, predictions*.csv, future_data*.csv, historic_data*.csv, *.pickle, *.rds, *.model |
| R environment-local state | _old/ |
renv/, .Rprofile |
| Source code, user data, lockfiles, LICENSE | kept at root | train.r, predict.r, train.py, predict.py, lib.R, transformations.py, Python packages, arbitrary *.yaml / *.json / *.toml config your scripts read, renv.lock, uv.lock, LICENSE, .github/ |
| Git / venv / caches | ignored | .git/, .venv/, __pycache__/, .pytest_cache/, .ruff_cache/ |
Arbitrary files at the project root that aren't in the chaff list stay put — scripts that read schema.yaml, model_params.json, etc. at startup continue to work. The user's original pyproject.toml lands at _old/pyproject.toml; merge its dependencies into the generated chapkit pyproject.toml manually.
Base image auto-detection¶
chapkit mlproject migrate picks the right base image from chapkit-images by scanning your repo:
- Python only (
.pyat root) →ghcr.io/dhis2-chap/chapkit-py:latest(multi-arch, lean). - R + INLA (
library(INLA)/library(fmesher)in any root R script, ordocker_r_inlain MLprojectdocker_env) →ghcr.io/dhis2-chap/chapkit-r-inla:latest(amd64only). - R + tidyverse (any of
library(tidyverse),dplyr,ggplot2,tidyr,readr,purrr,tibble,fable,tsibble,feasts,forecast,ranger,xgboost,glmnet,lubridate,janitorand no INLA) →ghcr.io/dhis2-chap/chapkit-r-tidyverse:latest(multi-arch). Detection coverslibrary/require/requireNamespace, single or double quotes, and trailing arguments —library(fable),library('fable'),library(fable, quietly = TRUE), andrequireNamespace("fable")all count. - Both R and Python with tidyverse hints → still
chapkit-r-inla(the only image bundling both runtimes), but migrate flags the missing tidyverse stack so you can bake it in viainstall_packages.Rorrenv.lock. - R, no INLA or tidyverse hints →
ghcr.io/dhis2-chap/chapkit-r:latest(multi-arch, minimal). - Both R and Python →
chapkit-r-inla(fat image covers both); migrate prompts to confirm unless--yesis set.
Override the detection with --base-image {chapkit-py,chapkit-r,chapkit-r-tidyverse,chapkit-r-inla}.
Dynamic config becomes a typed class¶
Whatever you declared in user_options in the MLproject is emitted as typed fields on a BaseConfig subclass in the generated main.py. For an EWARS-style MLproject:
...you get:
class ewars_templateConfig(BaseConfig):
"""Configuration for ewars_template."""
prediction_periods: int = 3
n_lags: int = 3
precision: float = 0.01
prediction_periods: int = 3 is injected automatically if your MLproject doesn't declare it — chapkit's ML runner requires it.
Interactive prompts¶
By default migrate asks for confirmation before executing the plan, asks for missing filename mappings if a placeholder like {dataset} isn't in the canonical set (train_data, historic_data, future_data, out_file, model, model_config, polygons), and asks before accepting an ambiguous base-image pick. --yes skips all prompts with the sensible defaults; --dry-run never prompts.
Use --param NAME=FILENAME (repeatable) to pre-answer the non-canonical-parameter prompt without going interactive.
Your own pyproject.toml¶
If your repo already has a pyproject.toml, migrate:
- Parses
[project.dependencies]from it (skipping anychapkitentry — we add our own). - Writes a fresh chapkit
pyproject.tomlat the project root that includes bothchapkitand the deps pulled from your original. - Moves your original
pyproject.tomlto_old/pyproject.tomlso it's still available for reference (e.g. if you had optional-dep groups,[tool.*]config, or environment markers you want to port over manually).
The migrate output summary tells you how many deps were merged, and points at the preserved original file.
Next steps after migrate¶
uv sync && uv run python main.py # local dev server
docker build -t my-model . && docker run --rm -p 9090:8000 my-model # containerised
uv run python main.py & # in background
uv run chapkit test # drive an end-to-end train+predict
git add -A && git commit -m "chore: migrate MLproject to chapkit"
CHAPKIT.md (generated at root) has a short version of this for the migrated repo itself.
What's deferred¶
--restructureflag for thescripts/+ top-level Python package layout manually applied by some existing chap-models repos. v1 leaves source files where they are.--with-validationmirroring thechapkit initflag.chapkit mlproject migrate --reverseto restore from_old/in one shot.git mvmode to preserve history for moved files.