chap_core.data package¶

Subpackages¶

chap_core.data.gluonts_adaptor package

Submodules¶

chap_core.data.adaptors module¶

chap_core.data.datasets module¶

chap_core.data.open_dengue module¶

class chap_core.data.open_dengue.OpenDengueDataSet[source]¶

Bases: object

as_dataset(country_name: str, spatial_resolution: Literal['Admin1', 'Admin2'] = 'Admin1', temporal_resolution='Week')[source]¶

data_path = 'https://github.com/OpenDengue/master-repo/raw/main/data/releases/V1.2.2/Temporal_extract_V1_2_2.zip'¶

subset(country_name: str, spatial_resolution: Literal['Admin1', 'Admin2'] = 'Admin1', temporal_resolution='Week')[source]¶

Module contents¶

class chap_core.data.DataSet(data_dict: dict[str, FeaturesT], polygons=None, metadata=DataSetMetaData(name='dataset', filename=None, db_id=None))[source]¶

Bases: Generic[FeaturesT]

Class representing severeal time series at different locations.

add_fields(new_type, **kwargs: dict[str, Callable])[source]¶

aggregate_to_parent(field_name: str = 'disease_cases', nan_indicator='disease_cases')[source]¶

data() → Iterable[FeaturesT][source]¶

classmethod df_from_pydantic_observations(observations: list[PeriodObservation]) → TimeSeriesData[source]¶

property end_timestamp: Timestamp¶

field_names()[source]¶

filter_locations(locations: Iterable[str]) → DataSet[FeaturesT][source]¶

classmethod from_csv(file_name: str, dataclass: Type[FeaturesT] | None = None) → DataSet[FeaturesT][source]¶

classmethod from_dict(data: dict, dataclass: type[TemporalDataclass])[source]¶

classmethod from_fields(dataclass: type[TimeSeriesData], fields: dict[str, DataSet[TimeSeriesArray]])[source]¶

classmethod from_file(file_name: str, dataclass: Type[FeaturesT]) → DataSet[FeaturesT][source]¶

classmethod from_pandas(df: DataFrame, dataclass: Type[FeaturesT] = None, fill_missing=False) → DataSet[FeaturesT][source]¶

Create a SpatioTemporalDict from a pandas dataframe. The dataframe needs to have a ‘location’ column, and a ‘time_period’ column. The time_period columnt needs to have strings that can be parsed into a period. All fields in the dataclass needs to be present in the dataframe. If ‘fill_missing’ is True, missing values will be filled with np.nan. Else all the time series needs to be consecutive.

Parameters¶

dfpd.DataFrame: The dataframe
dataclassType[FeaturesT]: The dataclass to use for the time series
fill_missingbool, optional: If missing values should be filled, by default False

Returns¶

DataSet[FeaturesT]: The SpatioTemporalDict

Examples¶

>>> import pandas as pd
>>> from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
>>> from chap_core.datatypes import HealthData
>>> df = pd.DataFrame(
...     {
...         "location": ["Oslo", "Oslo", "Bergen", "Bergen"],
...         "time_period": ["2020-01", "2020-02", "2020-01", "2020-02"],
...         "disease_cases": [10, 20, 30, 40],
...     }
... )
>>> DataSet.from_pandas(df, HealthData)

classmethod from_period_observations(observation_dict: dict[str, list[PeriodObservation]]) → DataSet[TimeSeriesData][source]¶

Create a SpatioTemporalDict from a dictionary of PeriodObservations. The keys are the location names, and the values are lists of PeriodObservations.

Parameters¶

observation_dictdict[str, list[PeriodObservation]]: The dictionary of observations

Returns¶

DataSet[TimeSeriesData]: The SpatioTemporalDict

Examples¶

>>> from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
>>> from chap_core.api_types import PeriodObservation
>>> class HealthObservation(PeriodObservation):
...     disease_cases: int
>>> observations = {
...     "Oslo": [
...         HealthObservation(time_period="2020-01", disease_cases=10),
...         HealthObservation(time_period="2020-02", disease_cases=20),
...     ]
... }
>>> DataSet.from_period_observations(observations)
>>> DataSet.to_pandas()

classmethod from_pickle(file_name: str, dataclass: Type[FeaturesT]) → DataSet[FeaturesT][source]¶

get_location(location: Location) → FeaturesT[source]¶

get_locations(location: Iterable[Location]) → DataSet[FeaturesT][source]¶

get_parent_dict() → dict[str, str] | None[source]¶

interpolate(field_names=None)[source]¶

items() → Iterable[Tuple[str, FeaturesT]][source]¶

join_on_time(other: DataSet[FeaturesT]) → DataSet[Tuple[FeaturesT, FeaturesT]][source]¶: Join two SpatioTemporalDicts on time. Returns a new SpatioTemporalDict. Assumes other is later in time.

keys() → Iterable[str][source]¶

locations() → Iterable[Location][source]¶

merge(other_dataset: DataSet, result_dataclass: type[TimeSeriesData]) → DataSet[source]¶

model_dump()[source]¶

property period_range: PeriodRange¶

plot()[source]¶

plot_aggregate()[source]¶

property polygons¶

remove_field(field_name, new_class=None)[source]¶

resample(freq)[source]¶

restrict_time_period(period_range: slice) → DataSet[FeaturesT][source]¶

set_polygons(polygons: FeatureCollectionModel, ignore_validation=False) → list[str][source]¶

property start_timestamp: Timestamp¶

to_csv(file_name: str, mode='w')[source]¶

to_pandas() → DataFrame[source]¶: Join the pandas frame for all locations with locations as column

to_pickle(file_name: str)[source]¶

to_report(pdf_filename: str)[source]¶

values() → Iterable[FeaturesT][source]¶

class chap_core.data.PeriodObservation(*, time_period: str)¶

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

time_period: str¶