Integrating external models

CHAP can run external models in two ways:

  • By specifying a path to a local code base

  • or by specifying a github URL to a git repo. The url needs to start with https://github.com/

In either case, the directory or repo should be a valid MLproject directory, with an MLproject file. Se the specification in the MLflow documentation for details. In addition, we require the following:

  • An entry point in the MLproject file called train with parameters train_data and model

  • An entry point in the MLproject file called predict with parameters historic_data, future_data, model and out_file

These should contain commands that can be run to train a model and predict the future using that model. The model parameter should be used to save a model in the train step that can be read and used in the predict step. CHAP will provide all the data (the other parameters) when running a model.

Here is an example of a valid directory with an MLproject file.

The following shows how you can run models that follow the specification above. If you have your own model that you want to make compatible with CHAP, follow this guide.

The MLproject file can specify a docker image or Python virtual environment that will be used when running the commands.

Running an external model on the command line

External models can be run on the command line using the chap evaluate command. See chap evaluate –help for details.

This example runs an auto ewars R model on public ISMIP data for Brazil using a public docker image with the R inla package. After running, a report file report.pdf should be made.

chap evaluate --model-name https://github.com/dhis2-chap/chap_auto_ewars --dataset-name ISIMIP_dengue_harmonized --dataset-country brazil

Running an external model in Python

CHAP contains an API for loading models through Python. The following shows an example of loading and evaluating three different models by specifying paths/github urls, and evaluating those models:

import pandas as pd

from chap_core.assessment.prediction_evaluator import evaluate_model
from chap_core.external.external_model import get_model_from_directory_or_github_url
from chap_core.file_io.file_paths import get_models_path
from chap_core.file_io.example_data_set import datasets
import logging

if __name__ == '__main__':
    logging.basicConfig(level=logging.INFO)
    models_path = get_models_path()
    model_names = {
        #'deepar': models_path / 'deepar',
        'naive_model': models_path / 'naive_python_model_with_mlproject_file',
        # 'ewars': 'https://github.com/sandvelab/chap_auto_ewars'
    }

    dataset = datasets['ISIMIP_dengue_harmonized'].load()
    dataset = dataset['vietnam']
    n_tests = 7
    prediction_length = 6
    all_results = {}
    for name, model_name in model_names.items():
        model = get_model_from_directory_or_github_url(model_name)
        results = evaluate_model(model, dataset,
                                 prediction_length=prediction_length,
                                 n_test_sets=n_tests,
                                 report_filename=f'{name}_{n_tests}_{prediction_length}_report.pdf')
        all_results[name] = results

    report_file = 'evaluation_report.csv'
    df = pd.DataFrame([res[0] | {'model': name} for name, res in all_results.items()])
    df.to_csv(report_file, mode='w', header=True)