chap_core.data package

Subpackages

Submodules

chap_core.data.adaptors module

chap_core.data.datasets module

chap_core.data.open_dengue module

class chap_core.data.open_dengue.OpenDengueDataSet[source]

Bases: object

as_dataset(country_name: str, spatial_resolution: Literal['Admin1', 'Admin2'] = 'Admin1', temporal_resolution='Week')[source]
data_path = 'https://github.com/OpenDengue/master-repo/raw/main/data/releases/V1.2.2/Temporal_extract_V1_2_2.zip'
subset(country_name: str, spatial_resolution: Literal['Admin1', 'Admin2'] = 'Admin1', temporal_resolution='Week')[source]

Module contents

class chap_core.data.DataSet(data_dict: dict[str, FeaturesT], polygons=None, metadata=DataSetMetaData(name='dataset', filename=None, db_id=None))[source]

Bases: Generic[FeaturesT]

Class representing severeal time series at different locations.

add_fields(new_type, **kwargs: dict[str, Callable])[source]
aggregate_to_parent(field_name: str = 'disease_cases', nan_indicator='disease_cases')[source]
data() Iterable[FeaturesT][source]
classmethod df_from_pydantic_observations(observations: list[PeriodObservation]) TimeSeriesData[source]
property end_timestamp: Timestamp
field_names()[source]
filter_locations(locations: Iterable[str]) DataSet[FeaturesT][source]
classmethod from_csv(file_name: str, dataclass: Type[FeaturesT] | None = None) DataSet[FeaturesT][source]
classmethod from_dict(data: dict, dataclass: type[TemporalDataclass])[source]
classmethod from_fields(dataclass: type[TimeSeriesData], fields: dict[str, DataSet[TimeSeriesArray]])[source]
classmethod from_file(file_name: str, dataclass: Type[FeaturesT]) DataSet[FeaturesT][source]
classmethod from_pandas(df: DataFrame, dataclass: Type[FeaturesT] = None, fill_missing=False) DataSet[FeaturesT][source]

Create a SpatioTemporalDict from a pandas dataframe. The dataframe needs to have a ‘location’ column, and a ‘time_period’ column. The time_period columnt needs to have strings that can be parsed into a period. All fields in the dataclass needs to be present in the dataframe. If ‘fill_missing’ is True, missing values will be filled with np.nan. Else all the time series needs to be consecutive.

Parameters

dfpd.DataFrame

The dataframe

dataclassType[FeaturesT]

The dataclass to use for the time series

fill_missingbool, optional

If missing values should be filled, by default False

Returns

DataSet[FeaturesT]

The SpatioTemporalDict

Examples

>>> import pandas as pd
>>> from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
>>> from chap_core.datatypes import HealthData
>>> df = pd.DataFrame(
...     {
...         "location": ["Oslo", "Oslo", "Bergen", "Bergen"],
...         "time_period": ["2020-01", "2020-02", "2020-01", "2020-02"],
...         "disease_cases": [10, 20, 30, 40],
...     }
... )
>>> DataSet.from_pandas(df, HealthData)
classmethod from_period_observations(observation_dict: dict[str, list[PeriodObservation]]) DataSet[TimeSeriesData][source]

Create a SpatioTemporalDict from a dictionary of PeriodObservations. The keys are the location names, and the values are lists of PeriodObservations.

Parameters

observation_dictdict[str, list[PeriodObservation]]

The dictionary of observations

Returns

DataSet[TimeSeriesData]

The SpatioTemporalDict

Examples

>>> from chap_core.spatio_temporal_data.temporal_dataclass import DataSet
>>> from chap_core.api_types import PeriodObservation
>>> class HealthObservation(PeriodObservation):
...     disease_cases: int
>>> observations = {
...     "Oslo": [
...         HealthObservation(time_period="2020-01", disease_cases=10),
...         HealthObservation(time_period="2020-02", disease_cases=20),
...     ]
... }
>>> DataSet.from_period_observations(observations)
>>> DataSet.to_pandas()
classmethod from_pickle(file_name: str, dataclass: Type[FeaturesT]) DataSet[FeaturesT][source]
get_location(location: Location) FeaturesT[source]
get_locations(location: Iterable[Location]) DataSet[FeaturesT][source]
get_parent_dict() dict[str, str] | None[source]
interpolate(field_names=None)[source]
items() Iterable[Tuple[str, FeaturesT]][source]
join_on_time(other: DataSet[FeaturesT]) DataSet[Tuple[FeaturesT, FeaturesT]][source]

Join two SpatioTemporalDicts on time. Returns a new SpatioTemporalDict. Assumes other is later in time.

keys() Iterable[str][source]
locations() Iterable[Location][source]
merge(other_dataset: DataSet, result_dataclass: type[TimeSeriesData]) DataSet[source]
model_dump()[source]
property period_range: PeriodRange
plot()[source]
plot_aggregate()[source]
property polygons
remove_field(field_name, new_class=None)[source]
resample(freq)[source]
restrict_time_period(period_range: slice) DataSet[FeaturesT][source]
set_polygons(polygons: FeatureCollectionModel, ignore_validation=False) list[str][source]
property start_timestamp: Timestamp
to_csv(file_name: str, mode='w')[source]
to_pandas() DataFrame[source]

Join the pandas frame for all locations with locations as column

to_pickle(file_name: str)[source]
to_report(pdf_filename: str)[source]
values() Iterable[FeaturesT][source]
class chap_core.data.PeriodObservation(*, time_period: str)

Bases: BaseModel

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

time_period: str