synthcity.plugins.core.models.tabular_aim module

class TabularAIM(X: pandas.core.frame.DataFrame, epsilon: float = 1.0, delta: float = 1e-09, max_model_size: int = 80, degree: int = 2, num_marginals: Optional[int] = None, max_cells: int = 1000, encoder_max_clusters: int = 20, encoder_whitelist: list = [], device: Union[str, torch.device] = device(type='cpu'), learning_rate: float = 0.005, weight_decay: float = 0.001, logging_epoch: int = 100, random_state: int = 0, **kwargs: Any)

Bases: object

Inheritance diagram of synthcity.plugins.core.models.tabular_aim.TabularAIM
Parts

1

Adaptive and Iterative Mechanism (AIM) implementation, based on:
Parameters
  • X (pd.DataFrame) – Reference dataset, used for training the tabular encoder

  • parameters (# AIM) –

  • arguments (# core plugin) –

  • encoder_max_clusters (int = 20) – The max number of clusters to create for continuous columns when encoding with TabularEncoder. Defaults to 20.

  • encoder_whitelist (list = []) – Ignore columns from encoding with TabularEncoder. Defaults to [].

  • device – Union[str, torch.device] = DEVICE, # This is not used for this model, as it is built with sklearn, which is cpu only

  • random_state (int, optional) – _description_. Defaults to 0. # This is not used for this model

  • **kwargs (Any) – The keyword arguments are passed to a SKLearn RandomForestClassifier - https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html.

fit(X: pandas.core.frame.DataFrame, **kwargs: Any) Any
Parameters

data – Pandas DataFrame that contains the tabular data

Returns

AIMTrainer used for the fine-tuning process

generate(count: int, start_col: Optional[str] = '', start_col_dist: Optional[Union[dict, list]] = None, temperature: float = 0.7, k: int = 100, max_length: int = 100) pandas.core.frame.DataFrame

Generates tabular data using the trained AIM model.

Parameters

count (int) – The number of samples to generate

Returns

n_samples rows of generated data

Return type

pd.DataFrame