synthcity.plugins.time_series.plugin_timevae module
- class TimeVAEPlugin(n_iter: int = 1000, decoder_n_layers_hidden: int = 2, decoder_n_units_hidden: int = 150, decoder_nonlin: str = 'leaky_relu', decoder_nonlin_out_discrete: str = 'softmax', decoder_nonlin_out_continuous: str = 'tanh', decoder_batch_norm: bool = False, decoder_dropout: float = 0.01, decoder_residual: bool = True, encoder_n_layers_hidden: int = 3, encoder_n_units_hidden: int = 300, encoder_nonlin: str = 'leaky_relu', encoder_batch_norm: bool = False, encoder_dropout: float = 0.1, lr: float = 0.001, weight_decay: float = 0.001, batch_size: int = 64, n_iter_print: int = 10, clipping_value: int = 0, encoder_max_clusters: int = 20, encoder: Optional[Any] = None, device: Any = device(type='cpu'), mode: str = 'LSTM', gamma_penalty: float = 1, moments_penalty: float = 100, embedding_penalty: float = 10, random_state: int = 0, workspace: pathlib.Path = PosixPath('workspace'), compress_dataset: bool = False, sampling_patience: int = 500, **kwargs: Any)
Bases:
synthcity.plugins.core.plugin.Plugin
Synthetic time series generation using a Variational AutoEncoder.
- Parameters
n_iter – int Maximum number of iterations in the decoder.
n_units_in – int Number of features
decoder_n_layers_hidden – int Number of hidden layers in the decoder
decoder_n_units_hidden – int Number of hidden units in each layer of the decoder
decoder_nonlin – string, default ‘elu’ Nonlinearity to use in the decoder. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.
decoder_batch_norm – bool Enable/disable batch norm for the decoder
decoder_dropout – float Dropout value. If 0, the dropout is not used.
decoder_residual – bool Use residuals for the decoder
encoder_n_layers_hidden – int Number of hidden layers in the encoder
encoder_n_units_hidden – int Number of hidden units in each layer of the encoder
encoder_nonlin – string, default ‘relu’ Nonlinearity to use in the encoder. Can be ‘elu’, ‘relu’, ‘selu’ or ‘leaky_relu’.
encoder_n_iter – int Maximum number of iterations in the encoder.
encoder_batch_norm – bool Enable/disable batch norm for the encoder
encoder_dropout – float Dropout value for the encoder. If 0, the dropout is not used.
lr – float learning rate for optimizer.
weight_decay – float l2 (ridge) penalty for the weights.
batch_size – int Batch size
n_iter_print – int Number of iterations after which to print updates and check the validation loss.
random_state – int random_state used
clipping_value – int, default 0 Gradients clipping value
mode –
str = “RNN” Core neural net architecture. Available models:
”LSTM”
”GRU”
”RNN”
”Transformer”
”MLSTM_FCN”
”TCN”
”InceptionTime”
”InceptionTimePlus”
”XceptionTime”
”ResCNN”
”OmniScaleCNN”
”XCM”
device – The device used by PyTorch. cpu/cuda
use_horizon_condition – bool. Default = True Whether to condition the covariate generation on the observation times or not.
encoder_max_clusters – int The max number of clusters to create for continuous columns when encoding
encoder – Pre-trained tabular encoder. If None, a new encoder is trained.
arguments (# Core Plugin) –
workspace – Path. Optional Path for caching intermediary results.
compress_dataset – bool. Default = False. Drop redundant features before training the generator.
sampling_patience – int. Max inference iterations to wait for the generated data to match the training schema.
Example
>>> from synthcity.plugins import Plugins >>> from synthcity.utils.datasets.time_series.google_stocks import GoogleStocksDataloader >>> from synthcity.plugins.core.dataloader import TimeSeriesDataLoader >>> >>> plugin = Plugins().get("timevae") >>> static, temporal, outcome = GoogleStocksDataloader(as_numpy=True).load() >>> loader = TimeSeriesDataLoader( >>> temporal_data=temporal_data, >>> static_data=static_data, >>> outcome=outcome, >>> ) >>> plugin.fit(loader) >>> plugin.generate()
- fit(X: Union[synthcity.plugins.core.dataloader.DataLoader, pandas.core.frame.DataFrame], *args: Any, **kwargs: Any) Any
Training method the synthetic data plugin.
- Parameters
X – DataLoader. The reference dataset.
cond –
Optional, Union[pd.DataFrame, pd.Series, np.ndarray] Optional Training Conditional. The training conditional can be used to control to output of some models, like GANs or VAEs. The content can be anything, as long as it maps to the training dataset X. Usage example:
>>> from sklearn.datasets import load_iris >>> from synthcity.plugins.core.dataloader import GenericDataLoader >>> from synthcity.plugins.core.constraints import Constraints >>> >>> # Load in `test_plugin` the generative model of choice >>> # .... >>> >>> X, y = load_iris(as_frame=True, return_X_y=True) >>> X["target"] = y >>> >>> X = GenericDataLoader(X) >>> test_plugin.fit(X, cond=y) >>> >>> count = 10 >>> X_gen = test_plugin.generate(count, cond=np.ones(count)) >>> >>> # The Conditional only optimizes the output generation >>> # for GANs and VAEs, but does NOT guarantee the samples >>> # are only from that condition. >>> # If you want to guarantee that output contains only >>> # "target" == 1 samples, use Constraints. >>> >>> constraints = Constraints( >>> rules=[ >>> ("target", "==", 1), >>> ] >>> ) >>> X_gen = test_plugin.generate(count, >>> cond=np.ones(count), >>> constraints=constraints >>> ) >>> assert (X_gen["target"] == 1).all()
- Returns
self
- classmethod fqdn() str
The Fully-Qualified name of the plugin.
- generate(count: Optional[int] = None, constraints: Optional[synthcity.plugins.core.constraints.Constraints] = None, random_state: Optional[int] = None, **kwargs: Any) synthcity.plugins.core.dataloader.DataLoader
Synthetic data generation method.
- Parameters
count – optional int. The number of samples to generate. If None, it generated len(reference_dataset) samples.
cond – Optional, Union[pd.DataFrame, pd.Series, np.ndarray]. Optional Generation Conditional. The conditional can be used only if the model was trained using a conditional too. If provided, it must have count length. Not all models support conditionals. The conditionals can be used in VAEs or GANs to speed-up the generation under some constraints. For model agnostic solutions, check out the constraints parameter.
constraints –
optional Constraints. Optional constraints to apply on the generated data. If none, the reference schema constraints are applied. The constraints are model agnostic, and will filter the output of the generative model. The constraints are a list of rules. Each rule is a tuple of the form (<feature>, <operation>, <value>).
- Valid Operations:
”<”, “lt” : less than <value>
”<=”, “le”: less or equal with <value>
”>”, “gt” : greater than <value>
”>=”, “ge”: greater or equal with <value>
”==”, “eq”: equal with <value>
”in”: valid for categorical features, and <value> must be array. for example, (“target”, “in”, [0, 1])
”dtype”: <value> can be a data type. For example, (“target”, “dtype”, “int”)
- Usage example:
>>> from synthcity.plugins.core.constraints import Constraints >>> constraints = Constraints( >>> rules=[ >>> ("InterestingFeature", "==", 0), >>> ] >>> ) >>> >>> syn_data = syn_model.generate( count=count, constraints=constraints ).dataframe() >>> >>> assert (syn_data["InterestingFeature"] == 0).all()
random_state – optional int. Optional random seed to use.
- Returns
<count> synthetic samples
- static hyperparameter_space(**kwargs: Any) List[synthcity.plugins.core.distribution.Distribution]
Returns the hyperparameter space for the derived plugin.
- static load(buff: bytes) Any
- static load_dict(representation: dict) Any
- static name() str
The name of the plugin.
- plot(plt: Any, X: synthcity.plugins.core.dataloader.DataLoader, count: Optional[int] = None, plots: list = ['marginal', 'associations', 'tsne'], **kwargs: Any) Any
Plot the real-synthetic distributions.
- Parameters
plt – output
X – DataLoader. The reference dataset.
- Returns
self
- classmethod sample_hyperparameters(*args: Any, **kwargs: Any) Dict[str, Any]
Sample value from the hyperparameter space for the current plugin.
- classmethod sample_hyperparameters_optuna(trial: Any, *args: Any, **kwargs: Any) Dict[str, Any]
- save() bytes
- save_dict() dict
- save_to_file(path: pathlib.Path) bytes
- schema() synthcity.plugins.core.schema.Schema
The reference schema
- schema_includes(other: Union[synthcity.plugins.core.dataloader.DataLoader, pandas.core.frame.DataFrame]) bool
Helper method to test if the reference schema includes a Dataset
- Parameters
other – DataLoader. The dataset to test
- Returns
bool, if the schema includes the dataset or not.
- training_schema() synthcity.plugins.core.schema.Schema
The internal schema
- static type() str
The type of the plugin.
- static version() str
API version
- plugin
alias of
synthcity.plugins.time_series.plugin_timevae.TimeVAEPlugin