Skip to content

Testing ML Services

This guide covers how to test chapkit ML services during development.

Using the chapkit test Command

The chapkit test command runs end-to-end tests against your ML service, verifying the complete workflow from config creation through training and prediction.

Note: This command only appears when running chapkit from inside a chapkit project directory (a directory containing main.py with chapkit imports).

Basic Usage

First, start your service:

uv run python main.py

Then in another terminal, run the test:

chapkit test

Auto-Starting the Service

Use --start-service to automatically start the service with an in-memory database:

chapkit test --start-service

This is the easiest way to test your service - it handles starting and stopping the service automatically.

Command Options

Option Short Default Description
--url -u http://localhost:9090 Service URL
--configs -c 1 Number of configs to create
--trainings -t 1 Training jobs per config
--predictions -p 1 Predictions per trained model
--rows -r 250 Target rows in training data (locations x periods). Default is 50 periods x 5 locations (~4 years monthly, ~1 year weekly). Most epi / climate-health models expect >=2 years of training data; bump for weekly period types or long-context models.
--predict-rows 150 Target rows in historic + future prediction data (each). Default is 30 periods x 5 locations (~2.5 years monthly). Enough headroom for lag-based models with up to ~6 target lags; bump for models with longer context windows or weekly period types.
--timeout 60.0 Job completion timeout (seconds)
--delay -d 1.0 Delay between job submissions (seconds)
--verbose -v false Show detailed output
--start-service false Auto-start service with in-memory DB
--save-data false Save generated test data files
--save-data-dir target Directory for saved test data
--parallel 1 Number of jobs to run in parallel (experimental)
--debug false Show full stack traces on errors
--period-type monthly Period format: monthly (YYYY-mm) or weekly (YYYY-Wxx)
--geo-type polygon Geometry type: polygon or point

Examples

Run a quick test with auto-start:

chapkit test --start-service

Run multiple configs, trainings, and predictions:

chapkit test --start-service -c 2 -t 2 -p 5 -v

Test against a remote service:

chapkit test --url http://my-service:8000

Save generated test data for inspection:

chapkit test --start-service --save-data
ls target/  # Contains JSON and CSV files for training/prediction data

The --save-data option creates: - config_*.json - Configuration data - training_*.json / training_*.csv - Training panel data - prediction_*_historic.json / .csv - Historic data for prediction - prediction_*_future.json / .csv - Future data for prediction - geo.json - GeoJSON with polygon or point geometries (if service requires geo)

Run jobs in parallel (experimental):

chapkit test --start-service -c 2 -t 4 -p 4 --parallel 4

Use weekly periods instead of monthly:

chapkit test --start-service --period-type weekly --save-data
# Generates periods like 2020-W01, 2020-W02, etc.

For weekly models, bump the row counts so you still get >=2 years of context:

# 2 years weekly = 104 periods x 5 locations = 520 rows of training,
# with a 60-period (~1.15 year) historic+future window for prediction.
chapkit test --period-type weekly --rows 520 --predict-rows 300

Models that use lag-based feature transformations (e.g. multistep forecasters with N_TARGET_LAGS > 6) can fail at the predict stage if the historic data window is smaller than the training window — the derived-feature column count won't match. If you see an error like X in predict methods must have same columns as X in fit, increase --predict-rows so historic has at least as many periods as the model's largest lag:

chapkit test --rows 500 --predict-rows 250  # longer-context model

Use point geometries instead of polygons:

chapkit test --start-service --geo-type point --save-data
# Generates Point geometries instead of Polygon in geo.json

Generated Data Structure

The test data generator creates panel data for climate-health correlation analysis:

time_period, location, disease_cases, feature_0, feature_1, feature_2
2020-01,     location_0, 42.0,        23.1,      45.2,      67.3
2020-01,     location_1, 38.0,        25.3,      41.8,      62.1
2020-01,     location_2, 51.0,        18.7,      52.1,      71.4
2020-02,     location_0, 35.0,        21.4,      48.9,      65.8
...
  • time_period: Monthly (YYYY-mm) or weekly (YYYY-Wxx) format
  • location: Matches GeoJSON properties.id values
  • disease_cases: Health outcome (positive integer as float)
  • feature_N: Climate/covariate data

Training data uses periods starting from 2020, prediction future data uses 2025.

Manual Service Startup

For more control, you can start the service manually with specific configurations.

Using In-Memory Database

For faster testing without persistent data:

DATABASE_URL="sqlite+aiosqlite:///:memory:" uv run python main.py

Using a Test Database File

To persist test data for debugging:

DATABASE_URL="sqlite+aiosqlite:///test_data/test.db" uv run python main.py

Testing with Docker

Build and Run

docker build -t my-ml-service .
docker run -p 9090:8000 -e DATABASE_URL="sqlite+aiosqlite:///:memory:" my-ml-service

Then test from the host:

chapkit test --url http://localhost:9090

Docker Compose

# compose.test.yml
services:
  service:
    build: .
    environment:
      - DATABASE_URL=sqlite+aiosqlite:///:memory:
    ports:
      - "9090:8000"
docker compose -f compose.test.yml up -d
chapkit test
docker compose -f compose.test.yml down

CI/CD Integration

GitHub Actions Example

name: Test ML Service

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.13'

      - name: Install uv
        run: pip install uv

      - name: Install dependencies
        run: uv sync

      - name: Run ML service tests
        run: uv run chapkit test --start-service -c 2 -t 2 -p 5

Troubleshooting

Database Lock Errors

If you see SQLite "database is locked" errors when running many predictions:

  1. Use --start-service which uses an in-memory database
  2. Or manually start with in-memory: DATABASE_URL="sqlite+aiosqlite:///:memory:"
  3. Increase the delay between jobs: --delay 2

Service Not Ready

If the service takes a long time to start:

  1. The default wait timeout is 30 seconds
  2. Check service logs for startup errors
  3. Ensure all dependencies are installed

Connection Refused

If you get "Cannot connect" errors:

  1. Verify the service is running
  2. Check the URL matches the service address
  3. Ensure no firewall is blocking the port