chap_core.assessment package

Subpackages

Submodules

chap_core.assessment.data_representation_transforming module

class chap_core.assessment.data_representation_transforming.MAEonMeanPredictions[source]

Bases: Evaluator

evaluate(all_truths: MultiLocationDiseaseTimeSeries, all_forecasts: MultiLocationForecast) MultiLocationErrorTimeSeries[source]
chap_core.assessment.data_representation_transforming.convert_single_splitpoint_to_multi_location_forecast(backTestList: List[BackTestForecast]) MultiLocationForecast[source]
chap_core.assessment.data_representation_transforming.convert_to_multi_location_forecast(backTestList: List[BackTestForecast]) Dict[str, MultiLocationForecast][source]
chap_core.assessment.data_representation_transforming.convert_to_multi_location_timeseries(obs: List[ObservationBase]) MultiLocationDiseaseTimeSeries[source]
chap_core.assessment.data_representation_transforming.mean(samples)[source]

chap_core.assessment.dataset_splitting module

class chap_core.assessment.dataset_splitting.IsTimeDelta(*args, **kwargs)[source]

Bases: Protocol

chap_core.assessment.dataset_splitting.get_split_points_for_data_set(data_set: DataSet, max_splits: int, start_offset=1) list[TimePeriod][source]
chap_core.assessment.dataset_splitting.get_split_points_for_period_range(max_splits, periods, start_offset)[source]
chap_core.assessment.dataset_splitting.split_test_train_on_period(data_set: ~chap_core.spatio_temporal_data.temporal_dataclass.DataSet, split_points: ~typing.Iterable[~chap_core.time_period.date_util_wrapper.TimePeriod], future_length: ~chap_core.assessment.dataset_splitting.IsTimeDelta | None = None, include_future_weather: bool = False, future_weather_class: ~typing.Type[~bionumpy.bnpdataclass.bnpdataclass.ClimateData] = <class 'bionumpy.bnpdataclass.bnpdataclass.ClimateData'>)[source]
chap_core.assessment.dataset_splitting.train_test_generator(dataset: DataSet, prediction_length: int, n_test_sets: int = 1, stride: int = 1, future_weather_provider: FutureWeatherFetcher | None = None) tuple[DataSet, Iterable[tuple[DataSet, DataSet, DataSet]]][source]

Genereate a train set along with an iterator of test data that contains tuples of full data up until a split point and data without target variables for the remaining steps

Parameters

dataset

The full dataset

prediction_length

How many periods to predict

n_test_sets

How many test sets to generate

stride

How many periods to stride between test sets

future_weather_provider

A function that can provide future weather data for the test sets

Returns

tuple[DataSet, Iterable[tuple[DataSet, DataSet]]]

The train set and an iterator of test sets

chap_core.assessment.dataset_splitting.train_test_split(data_set: DataSet, prediction_start_period: TimePeriod, extension: IsTimeDelta | None = None, restrict_test=True)[source]
chap_core.assessment.dataset_splitting.train_test_split_with_weather(data_set: ~chap_core.spatio_temporal_data.temporal_dataclass.DataSet, prediction_start_period: ~chap_core.time_period.date_util_wrapper.TimePeriod, extension: ~chap_core.assessment.dataset_splitting.IsTimeDelta | None = None, future_weather_class: ~typing.Type[~bionumpy.bnpdataclass.bnpdataclass.ClimateData] = <class 'bionumpy.bnpdataclass.bnpdataclass.ClimateData'>)[source]

chap_core.assessment.evaluator module

class chap_core.assessment.evaluator.ComponentBasedEvaluator(name, errorFunc, timeAggregationFunc, regionAggregationFunc)[source]

Bases: Evaluator

evaluate(all_truths: MultiLocationDiseaseTimeSeries, all_forecasts: MultiLocationForecast) MultiLocationErrorTimeSeries[source]
get_name()[source]
class chap_core.assessment.evaluator.Evaluator[source]

Bases: ABC

abstractmethod evaluate(all_truths: MultiLocationDiseaseTimeSeries, all_forecasts: MultiLocationForecast) MultiLocationErrorTimeSeries[source]
get_name() str[source]

chap_core.assessment.evaluator_suites module

chap_core.assessment.evaluator_suites.mae_error(truth: float, predictions: list[float])[source]
chap_core.assessment.evaluator_suites.mean_across_regions(errors)[source]
chap_core.assessment.evaluator_suites.mean_across_time(errors)[source]
chap_core.assessment.evaluator_suites.mse_error(truth: float, predictions: list[float])[source]
chap_core.assessment.evaluator_suites.sqrt_mean_across_time(errors)[source]

chap_core.assessment.flat_representations module

class chap_core.assessment.flat_representations.DataDimension(*values)[source]

Bases: str, Enum

Enum for the possible dimensions metrics datasets can have

horizon_distance = 'horizon_distance'
location = 'location'
time_period = 'time_period'
class chap_core.assessment.flat_representations.FlatData(*args, **kwargs)[source]

Bases: DataFrameModel

Base class for data points that include location and time_period.

class Config

Bases: BaseConfig

name: str | None = 'FlatData'

name of schema

location: pa.typing.Series[str] = 'location'
time_period: pa.typing.Series[str] = 'time_period'
class chap_core.assessment.flat_representations.FlatDataWithHorizon(*args, **kwargs)[source]

Bases: FlatData

class Config

Bases: Config

name: str | None = 'FlatDataWithHorizon'

name of schema

horizon_distance: pa.typing.Series[int] = 'horizon_distance'
class chap_core.assessment.flat_representations.FlatForecasts(*args, **kwargs)[source]

Bases: FlatDataWithHorizon

Forecasted disease cases. Note that cases are in forecast field, and that samples is used so we can represent multiple samples per location/time_period/horizon_distance in the dataframe.

class Config

Bases: Config

name: str | None = 'FlatForecasts'

name of schema

forecast: pa.typing.Series[float] = 'forecast'
sample: pa.typing.Series[int] = 'sample'
class chap_core.assessment.flat_representations.FlatMetric(*args, **kwargs)[source]

Bases: FlatDataWithHorizon

class Config

Bases: Config

name: str | None = 'FlatMetric'

name of schema

metric: pa.typing.Series[float] = 'metric'
class chap_core.assessment.flat_representations.FlatObserved(*args, **kwargs)[source]

Bases: FlatData

Observed disease cases

class Config

Bases: Config

name: str | None = 'FlatObserved'

name of schema

disease_cases: pa.typing.Series[float] = 'disease_cases'
chap_core.assessment.flat_representations.convert_backtest_observations_to_flat_observations(observations: List[ObservationBase]) DataFrame[source]

Convert a list of ObservationBase objects to a flat DataFrame format conforming to ObservedFlatDataSchema.

Args:

observations: List of ObservationBase objects containing observations reference_period: Optional reference period to calculate horizon_distance from.

If provided, horizon_distance will be calculated relative to this. If None, horizon_distance will be set to 0 for all observations.

Returns:

pd.DataFrame with columns: location, time_period, horizon_distance, disease_cases

chap_core.assessment.flat_representations.convert_backtest_to_flat_forecasts(backtest_forecasts: List[BackTestForecast], *, validate: bool = True) DataFrame[source]
chap_core.assessment.flat_representations.group_flat_forecast_by_horizon(flat_forecast_df: DataFrame, aggregate_samples: bool = True) DataFrame[source]

Group flat forecast data by horizon distance for analysis.

Args:

flat_forecast_df: DataFrame conforming to ForecastFlatDataSchema aggregate_samples: If True, average across samples to get mean forecast

Returns:

pd.DataFrame grouped by location and horizon_distance

chap_core.assessment.flat_representations.horizon_diff(period: str, period2: str) int[source]

Calculate the difference between two time periods in terms of time units.

chap_core.assessment.forecast module

chap_core.assessment.forecast.forecast(model, dataset: DataSet, prediction_length: TimeDelta, graph=None)[source]

Forecast n_months into the future using the model

chap_core.assessment.forecast.forecast_ahead(estimator: Estimator, dataset: DataSet, prediction_length: int)[source]

Forecast n_months into the future using the model

chap_core.assessment.forecast.forecast_with_predicted_weather(predictor: Predictor, historic_data: DataSet, prediction_length: int)[source]
chap_core.assessment.forecast.multi_forecast(model, dataset: DataSet, prediction_lenght: TimeDelta, pre_train_delta: TimeDelta)[source]

Forecast n_months into the future using the model

chap_core.assessment.metric_table module

chap_core.assessment.metric_table.create_metric_table(metrics: list[BackTestMetric])[source]
chap_core.assessment.metric_table.horizon_diff(period: str, period2: str) int[source]

chap_core.assessment.prediction_evaluator module

class chap_core.assessment.prediction_evaluator.Estimator(*args, **kwargs)[source]

Bases: Protocol

train(data: DataSet) Predictor[source]
class chap_core.assessment.prediction_evaluator.Predictor(*args, **kwargs)[source]

Bases: Protocol

predict(historic_data: DataSet[FeatureType], future_data: DataSet[FeatureType]) Samples[source]
chap_core.assessment.prediction_evaluator.backtest(estimator: Estimator, data: DataSet, prediction_length, n_test_sets, stride=1, weather_provider=None) Iterable[DataSet][source]
chap_core.assessment.prediction_evaluator.create_multiloc_timeseries(truth_data)[source]
chap_core.assessment.prediction_evaluator.evaluate_model(estimator: Estimator, data: DataSet, prediction_length=3, n_test_sets=4, report_filename=None, weather_provider=None)[source]

Evaluate a model on a dataset on a held out test set, making multiple predictions on the test set using the same trained model

Parameters

estimatorEstimator

The estimator to train and evaluate

dataDataSet

The data to train and evaluate on

prediction_lengthint

The number of periods to predict ahead

n_test_setsint

The number of test sets to evaluate on

Returns

tuple

Summary and individual evaluation results

chap_core.assessment.prediction_evaluator.plot_forecasts(predictor, test_instance, truth, pdf_filename)[source]
chap_core.assessment.prediction_evaluator.plot_predictions(predictions: DataSet[Samples], truth: DataSet, pdf_filename)[source]
chap_core.assessment.prediction_evaluator.without_disease(t)[source]

chap_core.assessment.representations module

class chap_core.assessment.representations.DiseaseObservation(time_period: str, disease_cases: int)[source]

Bases: object

disease_cases: int
time_period: str
class chap_core.assessment.representations.DiseaseTimeSeries(observations: List[chap_core.assessment.representations.DiseaseObservation])[source]

Bases: object

observations: List[DiseaseObservation]
class chap_core.assessment.representations.Error(time_period: str, value: float)[source]

Bases: object

time_period: str
value: float
class chap_core.assessment.representations.ErrorTimeSeries(observations: List[chap_core.assessment.representations.Error])[source]

Bases: object

observations: List[Error]
class chap_core.assessment.representations.Forecast(predictions: List[chap_core.assessment.representations.Samples])[source]

Bases: object

predictions: List[Samples]
class chap_core.assessment.representations.MultiLocationDiseaseTimeSeries(timeseries_dict: Dict[str, chap_core.assessment.representations.DiseaseTimeSeries] = <factory>)[source]

Bases: object

filter_by_time_periods(time_periods: List[str]) MultiLocationDiseaseTimeSeries[source]
locations()[source]
timeseries()[source]
timeseries_dict: Dict[str, DiseaseTimeSeries]
class chap_core.assessment.representations.MultiLocationErrorTimeSeries(timeseries_dict: Dict[str, chap_core.assessment.representations.ErrorTimeSeries])[source]

Bases: object

get_all_timeperiods()[source]
get_the_only_location()[source]
get_the_only_timeseries()[source]
locations()[source]
locationvalues_per_timepoint() List[Dict[str, Error]][source]
num_locations()[source]
num_timeperiods()[source]
timeseries()[source]
timeseries_dict: Dict[str, ErrorTimeSeries]
timeseries_length()[source]
class chap_core.assessment.representations.MultiLocationForecast(timeseries: Dict[str, chap_core.assessment.representations.Forecast])[source]

Bases: object

time_periods() Set[str][source]
timeseries: Dict[str, Forecast]
class chap_core.assessment.representations.Samples(time_period: str, disease_case_samples: List[float])[source]

Bases: object

disease_case_samples: List[float]
time_period: str

Module contents