Code overview

The following is a very brief overview of the main modules and parts of the chap-core code-base, which can be used as a starting point for getting to know the code:

The chap command line interface

  • The entry point can be found in cli.py. Note that there is also a file called chap_cli.py which is an old entry point that is not being used.

  • The cli.py file defines commands like chap evaluate are defined.

By looking at the code in the cli.py file, you can see how the different commands are implemented, and follow the function calls to see what code is being used.

The REST API

The REST API is the main entry point for the Prediction app, and supports functionality like training models, predicting, harmonizing data etc.

  • The main entry point for the API is in rest_api_src/v1/rest_api.py (newer versions will have a different version number than v1).

  • The API is built using the fastapi library, and we are currently using Celery to handle asynchronous tasks (like training a model).

  • Celery is currently abstracted away using the CeleryPool and CeleryJob classes.

External models

The codebase contains various abtractions for external models. The general idea is that an external model is defined by what commands it uses to train and predict, and what kind of environment (e.g. docker) it needs to run these commands. CHAP then handles the necessary steps to call these commands in the given environment with correct data files.

Runners

The TrainPredictRunner class defines an interface that provides method for running commands for training and predicting for a given model. The DockerTrainPredictRunner class is a concrete implementation that defines how to run train/predict-commands in a docker environment.

External model wrapping

The ExternalModel class is used to represent an external model, and contains the necessary information for running the mode, like the runner (an object of a subclass of TrainPredictRunner, the model name etc).

This class is rarely used directly. The easiest way to parse a model specification and get an object of ExternalModel is to use the get_model_from_directory_or_github_url function. This function can take a directory or a github url, and parses the model specification in order to get an ExternalModel object with a suitable runner. By following the code in this function, you can see how external models are loaded and run.

Model evaluation and test/train splitting

A big nontrivial part of chap is to correctly split data into train and test sets for evaluation and passing these to models for evaluation.

A good starting point for understanding this process is the evaluate_model in the prediction_evaluator.py file. Functions like the train_test_generator function are relevant. Currently, the main evaluation flow does not compute metrics, but simply plots the predictions and the actual values (in the plot_forecasts function).