Testing ML Services¶
This guide covers how to test chapkit ML services during development.
Using the chapkit test Command¶
The chapkit test command runs end-to-end tests against your ML service, verifying the complete workflow from config creation through training and prediction.
Note: This command only appears when running chapkit from inside a chapkit project directory (a directory containing main.py with chapkit imports).
Basic Usage¶
First, start your service:
Then in another terminal, run the test:
Auto-Starting the Service¶
Use --start-service to automatically start the service with an in-memory database:
This is the easiest way to test your service - it handles starting and stopping the service automatically.
Command Options¶
| Option | Short | Default | Description |
|---|---|---|---|
--url |
-u |
http://localhost:9090 |
Service URL |
--configs |
-c |
1 |
Number of configs to create |
--trainings |
-t |
1 |
Training jobs per config |
--predictions |
-p |
1 |
Predictions per trained model |
--rows |
-r |
250 |
Target rows in training data (locations x periods). Default is 50 periods x 5 locations (~4 years monthly, ~1 year weekly). Most epi / climate-health models expect >=2 years of training data; bump for weekly period types or long-context models. |
--predict-rows |
150 |
Target rows in historic + future prediction data (each). Default is 30 periods x 5 locations (~2.5 years monthly). Enough headroom for lag-based models with up to ~6 target lags; bump for models with longer context windows or weekly period types. | |
--timeout |
60.0 |
Job completion timeout (seconds) | |
--delay |
-d |
1.0 |
Delay between job submissions (seconds) |
--verbose |
-v |
false |
Show detailed output |
--start-service |
false |
Auto-start service with in-memory DB | |
--save-data |
false |
Save generated test data files | |
--save-data-dir |
target |
Directory for saved test data | |
--parallel |
1 |
Number of jobs to run in parallel (experimental) | |
--debug |
false |
Show full stack traces on errors | |
--period-type |
monthly |
Period format: monthly (YYYY-mm) or weekly (YYYY-Wxx) |
|
--geo-type |
polygon |
Geometry type: polygon or point |
Examples¶
Run a quick test with auto-start:
Run multiple configs, trainings, and predictions:
Test against a remote service:
Save generated test data for inspection:
chapkit test --start-service --save-data
ls target/ # Contains JSON and CSV files for training/prediction data
The --save-data option creates:
- config_*.json - Configuration data
- training_*.json / training_*.csv - Training panel data
- prediction_*_historic.json / .csv - Historic data for prediction
- prediction_*_future.json / .csv - Future data for prediction
- geo.json - GeoJSON with polygon or point geometries (if service requires geo)
Run jobs in parallel (experimental):
Use weekly periods instead of monthly:
chapkit test --start-service --period-type weekly --save-data
# Generates periods like 2020-W01, 2020-W02, etc.
For weekly models, bump the row counts so you still get >=2 years of context:
# 2 years weekly = 104 periods x 5 locations = 520 rows of training,
# with a 60-period (~1.15 year) historic+future window for prediction.
chapkit test --period-type weekly --rows 520 --predict-rows 300
Models that use lag-based feature transformations (e.g. multistep forecasters with N_TARGET_LAGS > 6) can fail at the predict stage if the historic data window is smaller than the training window — the derived-feature column count won't match. If you see an error like X in predict methods must have same columns as X in fit, increase --predict-rows so historic has at least as many periods as the model's largest lag:
Use point geometries instead of polygons:
chapkit test --start-service --geo-type point --save-data
# Generates Point geometries instead of Polygon in geo.json
Generated Data Structure¶
The test data generator creates panel data for climate-health correlation analysis:
time_period, location, disease_cases, feature_0, feature_1, feature_2
2020-01, location_0, 42.0, 23.1, 45.2, 67.3
2020-01, location_1, 38.0, 25.3, 41.8, 62.1
2020-01, location_2, 51.0, 18.7, 52.1, 71.4
2020-02, location_0, 35.0, 21.4, 48.9, 65.8
...
- time_period: Monthly (YYYY-mm) or weekly (YYYY-Wxx) format
- location: Matches GeoJSON
properties.idvalues - disease_cases: Health outcome (positive integer as float)
- feature_N: Climate/covariate data
Training data uses periods starting from 2020, prediction future data uses 2025.
Manual Service Startup¶
For more control, you can start the service manually with specific configurations.
Using In-Memory Database¶
For faster testing without persistent data:
Using a Test Database File¶
To persist test data for debugging:
Testing with Docker¶
Build and Run¶
docker build -t my-ml-service .
docker run -p 9090:8000 -e DATABASE_URL="sqlite+aiosqlite:///:memory:" my-ml-service
Then test from the host:
Docker Compose¶
# compose.test.yml
services:
service:
build: .
environment:
- DATABASE_URL=sqlite+aiosqlite:///:memory:
ports:
- "9090:8000"
CI/CD Integration¶
GitHub Actions Example¶
name: Test ML Service
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.13'
- name: Install uv
run: pip install uv
- name: Install dependencies
run: uv sync
- name: Run ML service tests
run: uv run chapkit test --start-service -c 2 -t 2 -p 5
Troubleshooting¶
Database Lock Errors¶
If you see SQLite "database is locked" errors when running many predictions:
- Use
--start-servicewhich uses an in-memory database - Or manually start with in-memory:
DATABASE_URL="sqlite+aiosqlite:///:memory:" - Increase the delay between jobs:
--delay 2
Service Not Ready¶
If the service takes a long time to start:
- The default wait timeout is 30 seconds
- Check service logs for startup errors
- Ensure all dependencies are installed
Connection Refused¶
If you get "Cannot connect" errors:
- Verify the service is running
- Check the URL matches the service address
- Ensure no firewall is blocking the port