synthcity.plugins.core.models.tabular_encoder module

TabularEncoder module.

class BinEncoder(*args: Any, **kwargs: Any)

Bases: synthcity.plugins.core.models.tabular_encoder.TabularEncoder

Binary encoder (for SurvivalGAN).

Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.

activation_layout(discrete_activation: str, continuous_activation: str) Sequence[Tuple[str, int]]

Get the layout of the activations.

Returns a list of tuple, describing each column as:
  • continuous, and with length 1 + number of GMM clusters.

  • discrete, and with length <N>, the length of the one-hot encoding.

cat_encoder_params: dict = {}
categorical_encoder: Union[str, type] = 'passthrough'
cont_encoder_params: dict = {'n_components': 2}
continuous_encoder: Union[str, type] = 'bayesian_gmm'
fit(raw_data: pandas.core.frame.DataFrame, discrete_columns: Optional[List] = None) Any

Fit the TabularEncoder.

This step also counts the #columns in matrix data and span information.

get_column_info(name: str) synthcity.plugins.core.models.tabular_encoder.FeatureInfo
inverse_transform(data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Take matrix data and output raw data.

Output uses the same type as input to the transform function.

layout() Sequence[synthcity.plugins.core.models.tabular_encoder.FeatureInfo]

Get the layout of the encoded dataset.

Returns a list of tuple, describing each column as:
  • continuous, and with length 1 + number of GMM clusters.

  • discrete, and with length <N>, the length of the one-hot encoding.

n_features() int
transform(raw_data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Take raw data and output a matrix data.

class FeatureInfo(*, name: str, feature_type: str, transform: Any = None, output_dimensions: int, transformed_features: List[str], trans_feature_types: List[str])

Bases: pydantic.main.BaseModel

Config

alias of pydantic.config.BaseConfig

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False) DictStrAny

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

feature_type: str
classmethod from_orm(obj: Any) Model
json(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, by_alias: bool = False, skip_defaults: Optional[bool] = None, exclude_unset: bool = False, exclude_defaults: bool = False, exclude_none: bool = False, encoder: Optional[Callable[[Any], Any]] = None, models_as_dict: bool = True, **dumps_kwargs: Any) unicode

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

name: str
output_dimensions: int
classmethod parse_file(path: Union[str, pathlib.Path], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: pydantic.parse.Protocol = None, allow_pickle: bool = False) Model
classmethod parse_obj(obj: Any) Model
classmethod parse_raw(b: Union[str, bytes], *, content_type: unicode = None, encoding: unicode = 'utf8', proto: pydantic.parse.Protocol = None, allow_pickle: bool = False) Model
classmethod schema(by_alias: bool = True, ref_template: unicode = '#/definitions/{model}') DictStrAny
classmethod schema_json(*, by_alias: bool = True, ref_template: unicode = '#/definitions/{model}', **dumps_kwargs: Any) unicode
trans_feature_types: List[str]
transform: Any
transformed_features: List[str]
classmethod update_forward_refs(**localns: Any) None

Try to update ForwardRefs on fields based on this Model, globalns and localns.

classmethod validate(value: Any) Model
class TabularEncoder(*args: Any, **kwargs: Any)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Tabular encoder.

Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.

activation_layout(discrete_activation: str, continuous_activation: str) Sequence[Tuple[str, int]]

Get the layout of the activations.

Returns a list of tuple, describing each column as:
  • continuous, and with length 1 + number of GMM clusters.

  • discrete, and with length <N>, the length of the one-hot encoding.

cat_encoder_params: dict = {'handle_unknown': 'ignore', 'sparse_output': False}
categorical_encoder: Union[str, type] = 'onehot'
cont_encoder_params: dict = {'n_components': 10}
continuous_encoder: Union[str, type] = 'bayesian_gmm'
fit(raw_data: pandas.core.frame.DataFrame, discrete_columns: Optional[List] = None) Any

Fit the TabularEncoder.

This step also counts the #columns in matrix data and span information.

get_column_info(name: str) synthcity.plugins.core.models.tabular_encoder.FeatureInfo
inverse_transform(data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Take matrix data and output raw data.

Output uses the same type as input to the transform function.

layout() Sequence[synthcity.plugins.core.models.tabular_encoder.FeatureInfo]

Get the layout of the encoded dataset.

Returns a list of tuple, describing each column as:
  • continuous, and with length 1 + number of GMM clusters.

  • discrete, and with length <N>, the length of the one-hot encoding.

n_features() int
transform(raw_data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Take raw data and output a matrix data.

class TimeSeriesBinEncoder(*args: Any, **kwargs: Any)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

Time series Bin encoder.

Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.

fit(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List, discrete_columns: Optional[List] = None) synthcity.plugins.core.models.tabular_encoder.TimeSeriesBinEncoder

Fit the TimeSeriesBinEncoder

fit_transform(static: pandas.core.frame.DataFrame, temporal: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame
transform(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame

Take raw data and output a matrix data.

class TimeSeriesTabularEncoder(*args: Any, **kwargs: Any)

Bases: sklearn.base.TransformerMixin, sklearn.base.BaseEstimator

TimeSeries Tabular encoder.

Model continuous columns with a BayesianGMM and normalized to a scalar [0, 1] and a vector. Discrete columns are encoded using a scikit-learn OneHotEncoder.

activation_layout(discrete_activation: str, continuous_activation: str) Tuple
activation_layout_temporal(discrete_activation: str, continuous_activation: str) Any
fit(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List, discrete_columns: Optional[List] = None) synthcity.plugins.core.models.tabular_encoder.TimeSeriesTabularEncoder
fit_temporal(temporal_data: List[pandas.core.frame.DataFrame], observation_times: List, discrete_columns: Optional[List] = None) synthcity.plugins.core.models.tabular_encoder.TimeSeriesTabularEncoder
fit_transform(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, List]
fit_transform_temporal(temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, List]
inverse_transform(static_encoded: pandas.core.frame.DataFrame, temporal_encoded: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame
inverse_transform_observation_times(observation_times: List) pandas.core.frame.DataFrame
inverse_transform_static(static_encoded: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
inverse_transform_temporal(temporal_encoded: List[pandas.core.frame.DataFrame], observation_times: List) pandas.core.frame.DataFrame
layout() Tuple[List, List]
n_features() Tuple
transform(static_data: pandas.core.frame.DataFrame, temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, List]
transform_observation_times(observation_times: List) List
transform_static(static_data: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
transform_temporal(temporal_data: List[pandas.core.frame.DataFrame], observation_times: List) Tuple[pandas.core.frame.DataFrame, List]