Creating Custom Metrics¶
This guide explains how to create custom evaluation metrics for CHAP backtest results.
Overview¶
Metrics in CHAP measure how well a model's forecasts match observed values. The metrics system provides:
- Single definition: Each metric is defined once and supports multiple aggregation levels
- Multi-level aggregation: Get global values, per-location, per-horizon, or detailed breakdowns
- Automatic registration: Metrics are discovered and available throughout CHAP
- Two metric types: Deterministic (point forecasts) and Probabilistic (all samples)
Quick Start¶
Here's a minimal deterministic metric:
from chap_core.assessment.metrics.base import (
AggregationOp,
DeterministicMetric,
MetricSpec,
)
from chap_core.assessment.metrics import metric
@metric()
class MyAbsoluteErrorMetric(DeterministicMetric):
"""Computes absolute error between forecast and observation."""
spec = MetricSpec(
metric_id="my_absolute_error",
metric_name="My Absolute Error",
aggregation_op=AggregationOp.MEAN,
description="Absolute difference between forecast and observation",
)
def compute_point_metric(self, forecast: float, observed: float) -> float:
return abs(forecast - observed)
And a minimal probabilistic metric:
import numpy as np
from chap_core.assessment.metrics.base import (
AggregationOp,
ProbabilisticMetric,
MetricSpec,
)
from chap_core.assessment.metrics import metric
@metric()
class MySpreadMetric(ProbabilisticMetric):
"""Computes the spread (std dev) of forecast samples."""
spec = MetricSpec(
metric_id="my_spread",
metric_name="My Spread",
aggregation_op=AggregationOp.MEAN,
description="Standard deviation of forecast samples",
)
def compute_sample_metric(self, samples: np.ndarray, observed: float) -> float:
return float(np.std(samples))
Data Formats¶
Your metric receives data in standardized DataFrame formats:
Observations DataFrame (FlatObserved)¶
| Column | Type | Description |
|---|---|---|
location |
str | Location identifier |
time_period |
str | Time period (e.g., "2024-01" or "2024W01") |
disease_cases |
float | Observed disease cases |
Forecasts DataFrame (FlatForecasts)¶
| Column | Type | Description |
|---|---|---|
location |
str | Location identifier |
time_period |
str | Time period being forecasted |
horizon_distance |
int | How many periods ahead this forecast is |
sample |
int | Sample index (for probabilistic forecasts) |
forecast |
float | Forecasted value |
Output Format¶
Metrics return a DataFrame with dimension columns plus a metric column.
Base Classes¶
DeterministicMetric¶
For metrics comparing point forecasts (median of samples) to observations:
from chap_core.assessment.metrics.base import DeterministicMetric
# DeterministicMetric requires implementing:
# def compute_point_metric(self, forecast: float, observed: float) -> float
ProbabilisticMetric¶
For metrics that need all forecast samples:
from chap_core.assessment.metrics.base import ProbabilisticMetric
# ProbabilisticMetric requires implementing:
# def compute_sample_metric(self, samples: np.ndarray, observed: float) -> float
MetricSpec Configuration¶
from chap_core.assessment.metrics.base import AggregationOp, MetricSpec
spec = MetricSpec(
metric_id="unique_id", # Used in APIs and registry
metric_name="Display Name", # Human-readable name
aggregation_op=AggregationOp.MEAN, # MEAN, SUM, or ROOT_MEAN_SQUARE
description="What this metric measures",
)
Complete Examples¶
Example: RMSE-style Metric¶
from chap_core.assessment.metrics.base import (
AggregationOp,
DeterministicMetric,
MetricSpec,
)
from chap_core.assessment.metrics import metric
@metric()
class SquaredErrorMetric(DeterministicMetric):
"""
Squared error metric.
With ROOT_MEAN_SQUARE aggregation, this produces RMSE.
"""
spec = MetricSpec(
metric_id="squared_error",
metric_name="Squared Error",
aggregation_op=AggregationOp.ROOT_MEAN_SQUARE,
description="Squared error with RMSE aggregation",
)
def compute_point_metric(self, forecast: float, observed: float) -> float:
return abs(forecast - observed) # Base class squares for ROOT_MEAN_SQUARE
Example: Bias Detection Metric¶
import numpy as np
from chap_core.assessment.metrics.base import (
AggregationOp,
ProbabilisticMetric,
MetricSpec,
)
from chap_core.assessment.metrics import metric
@metric()
class ForecastBiasMetric(ProbabilisticMetric):
"""
Measures forecast bias as proportion of samples above truth.
Returns 0.5 for unbiased forecasts, >0.5 for over-prediction,
<0.5 for under-prediction.
"""
spec = MetricSpec(
metric_id="forecast_bias",
metric_name="Forecast Bias",
aggregation_op=AggregationOp.MEAN,
description="Proportion of samples above observed (0.5 = unbiased)",
)
def compute_sample_metric(self, samples: np.ndarray, observed: float) -> float:
return float(np.mean(samples > observed))
Example: Parameterized Metric with Subclasses¶
import numpy as np
from chap_core.assessment.metrics.base import (
AggregationOp,
ProbabilisticMetric,
MetricSpec,
)
from chap_core.assessment.metrics import metric
class IntervalCoverageMetric(ProbabilisticMetric):
"""Base class for interval coverage metrics (not registered directly)."""
low_pct: int
high_pct: int
def compute_sample_metric(self, samples: np.ndarray, observed: float) -> float:
low, high = np.percentile(samples, [self.low_pct, self.high_pct])
return 1.0 if (low <= observed <= high) else 0.0
@metric()
class Coverage80Metric(IntervalCoverageMetric):
"""80% prediction interval coverage."""
spec = MetricSpec(
metric_id="coverage_80",
metric_name="80% Coverage",
aggregation_op=AggregationOp.MEAN,
description="Proportion within 10th-90th percentile",
)
low_pct = 10
high_pct = 90
Using Metrics¶
Creating Example Data¶
First, let's create sample data to demonstrate metric computation:
import pandas as pd
import numpy as np
from chap_core.assessment.flat_representations import FlatObserved, FlatForecasts
# Create sample observations: 2 locations, 3 time periods
observations_df = pd.DataFrame({
"location": ["loc_A", "loc_A", "loc_A", "loc_B", "loc_B", "loc_B"],
"time_period": ["2024-01", "2024-02", "2024-03", "2024-01", "2024-02", "2024-03"],
"disease_cases": [100.0, 120.0, 90.0, 200.0, 180.0, 220.0],
})
observations = FlatObserved(observations_df)
# Create sample forecasts: 10 samples per observation, horizon 1 and 2
forecast_rows = []
np.random.seed(42)
for loc in ["loc_A", "loc_B"]:
base = 100 if loc == "loc_A" else 200
for period in ["2024-01", "2024-02", "2024-03"]:
for horizon in [1, 2]:
for sample_id in range(10):
forecast_rows.append({
"location": loc,
"time_period": period,
"horizon_distance": horizon,
"sample": sample_id,
"forecast": base + np.random.normal(0, 15),
})
forecasts_df = pd.DataFrame(forecast_rows)
forecasts = FlatForecasts(forecasts_df)
print(f"Observations shape: {observations_df.shape}")
print(f"Forecasts shape: {forecasts_df.shape}")
Computing Metrics at Different Aggregation Levels¶
from chap_core.assessment.metrics import get_metric
from chap_core.assessment.flat_representations import DataDimension
# Get the MAE metric
mae = get_metric("mae")()
# Global aggregate: single value across all data
global_result = mae.get_global_metric(observations, forecasts)
print("Global MAE:")
print(global_result)
print()
# Detailed: one value per (location, time_period, horizon_distance)
detailed_result = mae.get_detailed_metric(observations, forecasts)
print("Detailed MAE (first 6 rows):")
print(detailed_result.head(6))
print()
# Per location only
per_location = mae.get_metric(observations, forecasts, dimensions=(DataDimension.location,))
print("MAE per location:")
print(per_location)
print()
# Per horizon only
per_horizon = mae.get_metric(observations, forecasts, dimensions=(DataDimension.horizon_distance,))
print("MAE per horizon:")
print(per_horizon)
Getting Metrics from the Registry¶
from chap_core.assessment.metrics import get_metric, list_metrics
# Get a specific metric by ID
MAEClass = get_metric("mae")
mae_metric = MAEClass()
print(f"Metric: {mae_metric.get_name()} ({mae_metric.get_id()})")
print(f"Description: {mae_metric.get_description()}")
print()
# List all available metrics
print("Available metrics:")
for info in list_metrics():
print(f" {info['id']}: {info['name']}")
Registration and Discovery¶
The @metric() Decorator¶
The decorator registers your metric class when the module is imported:
from chap_core.assessment.metrics import metric
from chap_core.assessment.metrics.base import DeterministicMetric, MetricSpec, AggregationOp
@metric() # This registers the class in the global registry
class RegisteredMetric(DeterministicMetric):
spec = MetricSpec(
metric_id="registered_example",
metric_name="Registered Example",
aggregation_op=AggregationOp.MEAN,
description="Example of a registered metric",
)
def compute_point_metric(self, forecast: float, observed: float) -> float:
return abs(forecast - observed)
File Location¶
Place your metric file in chap_core/assessment/metrics/ and add an import to _discover_metrics() in chap_core/assessment/metrics/__init__.py.
Understanding Aggregation¶
AggregationOp Options¶
| Operation | Description | Use Case |
|---|---|---|
MEAN |
Average of values | MAE, coverage metrics |
SUM |
Sum of values | Count-based metrics |
ROOT_MEAN_SQUARE |
sqrt(mean(x^2)) | RMSE |
DataDimension Options¶
| Dimension | Description |
|---|---|
location |
Geographic location |
time_period |
Time period of the forecast |
horizon_distance |
How far ahead the forecast is |
Testing Your Metric¶
Use existing metrics as a pattern for testing:
from chap_core.assessment.metrics import get_metric
# Verify your metric is registered
metric_cls = get_metric("mae")
assert metric_cls is not None
# Instantiate and check properties
metric = metric_cls()
assert metric.get_id() == "mae"
assert metric.get_name() == "MAE"
Reference¶
Existing Implementations¶
Study these files in chap_core/assessment/metrics/:
| File | Type | Description |
|---|---|---|
mae.py |
Deterministic | Simple absolute error |
rmse.py |
Deterministic | Uses ROOT_MEAN_SQUARE aggregation |
crps.py |
Probabilistic | Uses all samples |
percentile_coverage.py |
Probabilistic | Parameterized with subclasses |
above_truth.py |
Probabilistic | Bias detection |
API Summary¶
from chap_core.assessment.metrics import (
metric, # Decorator to register metrics
get_metric, # Get metric class by ID
get_metrics_registry, # Get all registered metrics
list_metrics, # List metrics with metadata
)
from chap_core.assessment.metrics.base import (
Metric, # Base class (abstract)
DeterministicMetric, # For point forecast comparison
ProbabilisticMetric, # For sample-based metrics
MetricSpec, # Configuration dataclass
AggregationOp, # MEAN, SUM, ROOT_MEAN_SQUARE
)
from chap_core.assessment.flat_representations import DataDimension