chap_core.data package¶
Subpackages¶
Submodules¶
chap_core.data.adaptors module¶
chap_core.data.datasets module¶
chap_core.data.open_dengue module¶
- class chap_core.data.open_dengue.OpenDengueDataSet[source]¶
Bases:
object- as_dataset(country_name: str, spatial_resolution: Literal['Admin1', 'Admin2'] = 'Admin1', temporal_resolution='Week')[source]¶
- data_path = 'https://github.com/OpenDengue/master-repo/raw/main/data/releases/V1.2.2/Temporal_extract_V1_2_2.zip'¶
Module contents¶
- class chap_core.data.DataSet(data_dict: dict[str, FeaturesT], polygons=None, metadata=DataSetMetaData(name='dataset', filename=None, db_id=None))[source]¶
Bases:
Generic[FeaturesT]Class representing severeal time series at different locations.
- classmethod df_from_pydantic_observations(observations: list[PeriodObservation]) TimeSeriesData[source]¶
- property end_timestamp: Timestamp¶
- classmethod from_csv(file_name: str, dataclass: Type[FeaturesT] | None = None) DataSet[FeaturesT][source]¶
- classmethod from_dict(data: dict, dataclass: type[TemporalDataclass])[source]¶
- classmethod from_fields(dataclass: type[TimeSeriesData], fields: dict[str, DataSet[TimeSeriesArray]])[source]¶
- classmethod from_pandas(df: DataFrame, dataclass: Type[FeaturesT] = None, fill_missing=False) DataSet[FeaturesT][source]¶
Create a SpatioTemporalDict from a pandas dataframe. The dataframe needs to have a ‘location’ column, and a ‘time_period’ column. The time_period columnt needs to have strings that can be parsed into a period. All fields in the dataclass needs to be present in the dataframe. If ‘fill_missing’ is True, missing values will be filled with np.nan. Else all the time series needs to be consecutive.
Parameters¶
- dfpd.DataFrame
The dataframe
- dataclassType[FeaturesT]
The dataclass to use for the time series
- fill_missingbool, optional
If missing values should be filled, by default False
Returns¶
- DataSet[FeaturesT]
The SpatioTemporalDict
Examples¶
>>> import pandas as pd >>> from chap_core.spatio_temporal_data.temporal_dataclass import DataSet >>> from chap_core.datatypes import HealthData >>> df = pd.DataFrame( ... { ... "location": ["Oslo", "Oslo", "Bergen", "Bergen"], ... "time_period": ["2020-01", "2020-02", "2020-01", "2020-02"], ... "disease_cases": [10, 20, 30, 40], ... } ... ) >>> DataSet.from_pandas(df, HealthData)
- classmethod from_period_observations(observation_dict: dict[str, list[PeriodObservation]]) DataSet[TimeSeriesData][source]¶
Create a SpatioTemporalDict from a dictionary of PeriodObservations. The keys are the location names, and the values are lists of PeriodObservations.
Parameters¶
- observation_dictdict[str, list[PeriodObservation]]
The dictionary of observations
Returns¶
- DataSet[TimeSeriesData]
The SpatioTemporalDict
Examples¶
>>> from chap_core.spatio_temporal_data.temporal_dataclass import DataSet >>> from chap_core.api_types import PeriodObservation >>> class HealthObservation(PeriodObservation): ... disease_cases: int >>> observations = { ... "Oslo": [ ... HealthObservation(time_period="2020-01", disease_cases=10), ... HealthObservation(time_period="2020-02", disease_cases=20), ... ] ... } >>> DataSet.from_period_observations(observations) >>> DataSet.to_pandas()
- join_on_time(other: DataSet[FeaturesT]) DataSet[Tuple[FeaturesT, FeaturesT]][source]¶
Join two SpatioTemporalDicts on time. Returns a new SpatioTemporalDict. Assumes other is later in time.
- property period_range: PeriodRange¶
- property polygons¶
- set_polygons(polygons: FeatureCollectionModel, ignore_validation=False) list[str][source]¶
- property start_timestamp: Timestamp¶