chap_core.assessment package¶
Subpackages¶
- chap_core.assessment.metrics package
- Submodules
- chap_core.assessment.metrics.above_truth module
- chap_core.assessment.metrics.base module
- chap_core.assessment.metrics.crps module
- chap_core.assessment.metrics.crps_norm module
- chap_core.assessment.metrics.example_metric module
- chap_core.assessment.metrics.mae module
- chap_core.assessment.metrics.peak_diff module
- chap_core.assessment.metrics.percentile_coverage module
- chap_core.assessment.metrics.rmse module
- chap_core.assessment.metrics.test_metrics module
- Module contents
CRPSCRPSNormCRPSPerLocationDetailedCRPSDetailedCRPSNormDetailedRMSEExampleMetricIsWithin10th90thDetailedIsWithin25th75thDetailedMAEMetricBaseMetricSpecPeakValueDiffMetricPeakWeekLagMetricRMSERatioWithin10th90thRatioWithin10th90thPerLocationRatioWithin25th75thRatioWithin25th75thPerLocationSamplesAboveTruthTestMetricTestMetricDetailed
Submodules¶
chap_core.assessment.data_representation_transforming module¶
- class chap_core.assessment.data_representation_transforming.MAEonMeanPredictions[source]¶
Bases:
Evaluator- evaluate(all_truths: MultiLocationDiseaseTimeSeries, all_forecasts: MultiLocationForecast) MultiLocationErrorTimeSeries[source]¶
- chap_core.assessment.data_representation_transforming.convert_single_splitpoint_to_multi_location_forecast(backTestList: List[BackTestForecast]) MultiLocationForecast[source]¶
- chap_core.assessment.data_representation_transforming.convert_to_multi_location_forecast(backTestList: List[BackTestForecast]) Dict[str, MultiLocationForecast][source]¶
- chap_core.assessment.data_representation_transforming.convert_to_multi_location_timeseries(obs: List[ObservationBase]) MultiLocationDiseaseTimeSeries[source]¶
chap_core.assessment.dataset_splitting module¶
- chap_core.assessment.dataset_splitting.get_split_points_for_data_set(data_set: DataSet, max_splits: int, start_offset=1) list[TimePeriod][source]¶
- chap_core.assessment.dataset_splitting.get_split_points_for_period_range(max_splits, periods, start_offset)[source]¶
- chap_core.assessment.dataset_splitting.split_test_train_on_period(data_set: ~chap_core.spatio_temporal_data.temporal_dataclass.DataSet, split_points: ~typing.Iterable[~chap_core.time_period.date_util_wrapper.TimePeriod], future_length: ~chap_core.assessment.dataset_splitting.IsTimeDelta | None = None, include_future_weather: bool = False, future_weather_class: ~typing.Type[~bionumpy.bnpdataclass.bnpdataclass.ClimateData] = <class 'bionumpy.bnpdataclass.bnpdataclass.ClimateData'>)[source]¶
- chap_core.assessment.dataset_splitting.train_test_generator(dataset: DataSet, prediction_length: int, n_test_sets: int = 1, stride: int = 1, future_weather_provider: FutureWeatherFetcher | None = None) tuple[DataSet, Iterable[tuple[DataSet, DataSet, DataSet]]][source]¶
Genereate a train set along with an iterator of test data that contains tuples of full data up until a split point and data without target variables for the remaining steps
Parameters¶
- dataset
The full dataset
- prediction_length
How many periods to predict
- n_test_sets
How many test sets to generate
- stride
How many periods to stride between test sets
- future_weather_provider
A function that can provide future weather data for the test sets
Returns¶
- tuple[DataSet, Iterable[tuple[DataSet, DataSet]]]
The train set and an iterator of test sets
- chap_core.assessment.dataset_splitting.train_test_split(data_set: DataSet, prediction_start_period: TimePeriod, extension: IsTimeDelta | None = None, restrict_test=True)[source]¶
- chap_core.assessment.dataset_splitting.train_test_split_with_weather(data_set: ~chap_core.spatio_temporal_data.temporal_dataclass.DataSet, prediction_start_period: ~chap_core.time_period.date_util_wrapper.TimePeriod, extension: ~chap_core.assessment.dataset_splitting.IsTimeDelta | None = None, future_weather_class: ~typing.Type[~bionumpy.bnpdataclass.bnpdataclass.ClimateData] = <class 'bionumpy.bnpdataclass.bnpdataclass.ClimateData'>)[source]¶
chap_core.assessment.evaluator module¶
- class chap_core.assessment.evaluator.ComponentBasedEvaluator(name, errorFunc, timeAggregationFunc, regionAggregationFunc)[source]¶
Bases:
Evaluator- evaluate(all_truths: MultiLocationDiseaseTimeSeries, all_forecasts: MultiLocationForecast) MultiLocationErrorTimeSeries[source]¶
- class chap_core.assessment.evaluator.Evaluator[source]¶
Bases:
ABC- abstractmethod evaluate(all_truths: MultiLocationDiseaseTimeSeries, all_forecasts: MultiLocationForecast) MultiLocationErrorTimeSeries[source]¶
chap_core.assessment.evaluator_suites module¶
chap_core.assessment.flat_representations module¶
- class chap_core.assessment.flat_representations.DataDimension(*values)[source]¶
Bases:
str,EnumEnum for the possible dimensions metrics datasets can have
- horizon_distance = 'horizon_distance'¶
- location = 'location'¶
- time_period = 'time_period'¶
- class chap_core.assessment.flat_representations.FlatData(*args, **kwargs)[source]¶
Bases:
DataFrameModelBase class for data points that include location and time_period.
- location: pa.typing.Series[str] = 'location'¶
- time_period: pa.typing.Series[str] = 'time_period'¶
- class chap_core.assessment.flat_representations.FlatDataWithHorizon(*args, **kwargs)[source]¶
Bases:
FlatData- horizon_distance: pa.typing.Series[int] = 'horizon_distance'¶
- class chap_core.assessment.flat_representations.FlatForecasts(*args, **kwargs)[source]¶
Bases:
FlatDataWithHorizonForecasted disease cases. Note that cases are in forecast field, and that samples is used so we can represent multiple samples per location/time_period/horizon_distance in the dataframe.
- forecast: pa.typing.Series[float] = 'forecast'¶
- sample: pa.typing.Series[int] = 'sample'¶
- class chap_core.assessment.flat_representations.FlatMetric(*args, **kwargs)[source]¶
Bases:
FlatDataWithHorizon- metric: pa.typing.Series[float] = 'metric'¶
- class chap_core.assessment.flat_representations.FlatObserved(*args, **kwargs)[source]¶
Bases:
FlatDataObserved disease cases
- disease_cases: pa.typing.Series[float] = 'disease_cases'¶
- chap_core.assessment.flat_representations.convert_backtest_observations_to_flat_observations(observations: List[ObservationBase]) DataFrame[source]¶
Convert a list of ObservationBase objects to a flat DataFrame format conforming to ObservedFlatDataSchema.
- Args:
observations: List of ObservationBase objects containing observations reference_period: Optional reference period to calculate horizon_distance from.
If provided, horizon_distance will be calculated relative to this. If None, horizon_distance will be set to 0 for all observations.
- Returns:
pd.DataFrame with columns: location, time_period, horizon_distance, disease_cases
- chap_core.assessment.flat_representations.convert_backtest_to_flat_forecasts(backtest_forecasts: List[BackTestForecast], *, validate: bool = True) DataFrame[source]¶
- chap_core.assessment.flat_representations.group_flat_forecast_by_horizon(flat_forecast_df: DataFrame, aggregate_samples: bool = True) DataFrame[source]¶
Group flat forecast data by horizon distance for analysis.
- Args:
flat_forecast_df: DataFrame conforming to ForecastFlatDataSchema aggregate_samples: If True, average across samples to get mean forecast
- Returns:
pd.DataFrame grouped by location and horizon_distance
chap_core.assessment.forecast module¶
- chap_core.assessment.forecast.forecast(model, dataset: DataSet, prediction_length: TimeDelta, graph=None)[source]¶
Forecast n_months into the future using the model
- chap_core.assessment.forecast.forecast_ahead(estimator: Estimator, dataset: DataSet, prediction_length: int)[source]¶
Forecast n_months into the future using the model
chap_core.assessment.metric_table module¶
- chap_core.assessment.metric_table.create_metric_table(metrics: list[BackTestMetric])[source]¶
chap_core.assessment.prediction_evaluator module¶
- chap_core.assessment.prediction_evaluator.backtest(estimator: Estimator, data: DataSet, prediction_length, n_test_sets, stride=1, weather_provider=None) Iterable[DataSet][source]¶
- chap_core.assessment.prediction_evaluator.evaluate_model(estimator: Estimator, data: DataSet, prediction_length=3, n_test_sets=4, report_filename=None, weather_provider=None)[source]¶
Evaluate a model on a dataset on a held out test set, making multiple predictions on the test set using the same trained model
Parameters¶
- estimatorEstimator
The estimator to train and evaluate
- dataDataSet
The data to train and evaluate on
- prediction_lengthint
The number of periods to predict ahead
- n_test_setsint
The number of test sets to evaluate on
Returns¶
- tuple
Summary and individual evaluation results
- chap_core.assessment.prediction_evaluator.plot_forecasts(predictor, test_instance, truth, pdf_filename)[source]¶
chap_core.assessment.representations module¶
- class chap_core.assessment.representations.DiseaseObservation(time_period: str, disease_cases: int)[source]¶
Bases:
object- disease_cases: int¶
- time_period: str¶
- class chap_core.assessment.representations.DiseaseTimeSeries(observations: List[chap_core.assessment.representations.DiseaseObservation])[source]¶
Bases:
object- observations: List[DiseaseObservation]¶
- class chap_core.assessment.representations.Error(time_period: str, value: float)[source]¶
Bases:
object- time_period: str¶
- value: float¶
- class chap_core.assessment.representations.ErrorTimeSeries(observations: List[chap_core.assessment.representations.Error])[source]¶
Bases:
object
- class chap_core.assessment.representations.Forecast(predictions: List[chap_core.assessment.representations.Samples])[source]¶
Bases:
object
- class chap_core.assessment.representations.MultiLocationDiseaseTimeSeries(timeseries_dict: Dict[str, chap_core.assessment.representations.DiseaseTimeSeries] = <factory>)[source]¶
Bases:
object- filter_by_time_periods(time_periods: List[str]) MultiLocationDiseaseTimeSeries[source]¶
- timeseries_dict: Dict[str, DiseaseTimeSeries]¶
- class chap_core.assessment.representations.MultiLocationErrorTimeSeries(timeseries_dict: Dict[str, chap_core.assessment.representations.ErrorTimeSeries])[source]¶
Bases:
object- timeseries_dict: Dict[str, ErrorTimeSeries]¶
- class chap_core.assessment.representations.MultiLocationForecast(timeseries: Dict[str, chap_core.assessment.representations.Forecast])[source]¶
Bases:
object