synthcity.plugins.core.models.tabular_encoder module
TabularEncoder module.
- class BinEncoder(*args: Any, **kwargs: Any)
Bases:
synthcity.plugins.core.models.tabular_encoder.TabularEncoder
Binary encoder (for SurvivalGAN).
Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.
- activation_layout(discrete_activation: str, continuous_activation: str) Sequence[Tuple[str, int]]
Get the layout of the activations.
- Returns a list of tuple, describing each column as:
continuous, and with length 1 + number of GMM clusters.
discrete, and with length <N>, the length of the one-hot encoding.
- cat_encoder_params: dict = {}
- categorical_encoder: Union[str, type] = 'passthrough'
- cont_encoder_params: dict = {'n_components': 2}
- continuous_encoder: Union[str, type] = 'bayesian_gmm'
- fit(raw_data: pandas.core.frame.DataFrame, discrete_columns: Optional[List] = None) Any
Fit the
TabularEncoder
.This step also counts the #columns in matrix data and span information.
- get_column_info(name: str) synthcity.plugins.core.models.tabular_encoder.FeatureInfo
- inverse_transform(data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Take matrix data and output raw data.
Output uses the same type as input to the transform function.
- layout() Sequence[synthcity.plugins.core.models.tabular_encoder.FeatureInfo]
Get the layout of the encoded dataset.
- Returns a list of tuple, describing each column as:
continuous, and with length 1 + number of GMM clusters.
discrete, and with length <N>, the length of the one-hot encoding.
- n_features() int
- transform(raw_data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Take raw data and output a matrix data.
- class FeatureInfo(*, name: str, feature_type: str, transform: Any = None, output_dimensions: int, transformed_features: List[str], trans_feature_types: List[str])
Bases:
pydantic.main.BaseModel
- Config
alias of
pydantic.config.BaseConfig
- classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model
Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values
- copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model
Duplicate a model, optionally choose which fields to include, exclude and change.
- Parameters
include – fields to include in new model
exclude – fields to exclude from new model, as with values this takes precedence over include
update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data
deep – set to True to make a deep copy of the model
- Returns
new model instance
- dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny
Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.
- feature_type: str
- classmethod from_orm(obj: Any) Model
- json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode
Generate a JSON representation of the model, include and exclude arguments as per dict().
encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().
- name: str
- output_dimensions: int
- classmethod parse_file(path: Union[str, pathlib.Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: pydantic.parse.Protocol = None, allow_pickle: bool = False) Model
- classmethod parse_obj(obj: Any) Model
- classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: pydantic.parse.Protocol = None, allow_pickle: bool = False) Model
- classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
- classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
- trans_feature_types: List[str]
- transform: Any
- transformed_features: List[str]
- classmethod update_forward_refs(**localns: Any) None
Try to update ForwardRefs on fields based on this Model, globalns and localns.
- classmethod validate(value: Any) Model
- class TabularEncoder(*args: Any, **kwargs: Any)
Bases:
sklearn.base.TransformerMixin
,sklearn.base.BaseEstimator
Tabular encoder.
Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.
- activation_layout(discrete_activation: str, continuous_activation: str) Sequence[Tuple[str, int]]
Get the layout of the activations.
- Returns a list of tuple, describing each column as:
continuous, and with length 1 + number of GMM clusters.
discrete, and with length <N>, the length of the one-hot encoding.
- cat_encoder_params: dict = {'handle_unknown': 'ignore', 'sparse_output': False}
- categorical_encoder: Union[str, type] = 'onehot'
- cont_encoder_params: dict = {'n_components': 10}
- continuous_encoder: Union[str, type] = 'bayesian_gmm'
- fit(raw_data: pandas.core.frame.DataFrame, discrete_columns: Optional[List] = None) Any
Fit the
TabularEncoder
.This step also counts the #columns in matrix data and span information.
- get_column_info(name: str) synthcity.plugins.core.models.tabular_encoder.FeatureInfo
- inverse_transform(data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Take matrix data and output raw data.
Output uses the same type as input to the transform function.
- layout() Sequence[synthcity.plugins.core.models.tabular_encoder.FeatureInfo]
Get the layout of the encoded dataset.
- Returns a list of tuple, describing each column as:
continuous, and with length 1 + number of GMM clusters.
discrete, and with length <N>, the length of the one-hot encoding.
- n_features() int
- transform(raw_data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Take raw data and output a matrix data.
- class TimeSeriesBinEncoder(*args: Any, **kwargs: Any)
Bases:
sklearn.base.TransformerMixin
,sklearn.base.BaseEstimator
Time series Bin encoder.
Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.
- fit(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List, discrete_columns: Optional[List] = None) synthcity.plugins.core.models.tabular_encoder.TimeSeriesBinEncoder
Fit the TimeSeriesBinEncoder
- fit_transform(static: pandas.core.frame.DataFrame, temporal: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame
- transform(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame
Take raw data and output a matrix data.
- class TimeSeriesTabularEncoder(*args: Any, **kwargs: Any)
Bases:
sklearn.base.TransformerMixin
,sklearn.base.BaseEstimator
TimeSeries Tabular encoder.
Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.
- activation_layout(discrete_activation: str, continuous_activation: str) Tuple
- activation_layout_temporal(discrete_activation: str, continuous_activation: str) Any
- fit(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List, discrete_columns: Optional[List] = None) synthcity.plugins.core.models.tabular_encoder.TimeSeriesTabularEncoder
- fit_temporal(temporal_data: List[pandas.core.frame.DataFrame], observation_times: List, discrete_columns: Optional[List] = None) synthcity.plugins.core.models.tabular_encoder.TimeSeriesTabularEncoder
- fit_transform(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, List]
- fit_transform_temporal(temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, List]
- inverse_transform(static_encoded: pandas.core.frame.DataFrame, temporal_encoded: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame
- inverse_transform_observation_times(observation_times: List) pandas.core.frame.DataFrame
- inverse_transform_static(static_encoded: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
- inverse_transform_temporal(temporal_encoded: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame
- layout() Tuple[List, List]
- n_features() Tuple
- transform(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, List]
- transform_observation_times(observation_times: List) List
- transform_static(static_data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
- transform_temporal(temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, List]