Latent Space#
- pydantic model vision_architectures.layers.latent_space.LatentEncoderConfig[source]#
Bases:
CNNBlockConfigShow JSON schema
{ "title": "LatentEncoderConfig", "type": "object", "properties": { "in_channels": { "description": "Number of input channels", "title": "In Channels", "type": "integer" }, "out_channels": { "description": "Number of output channels", "title": "Out Channels", "type": "integer" }, "kernel_size": { "anyOf": [ { "type": "integer" }, { "items": { "type": "integer" }, "type": "array" } ], "description": "Kernel size for the convolution", "title": "Kernel Size" }, "padding": { "anyOf": [ { "type": "integer" }, { "items": { "type": "integer" }, "type": "array" }, { "type": "string" } ], "default": "same", "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.", "title": "Padding" }, "stride": { "anyOf": [ { "type": "integer" }, { "items": { "type": "integer" }, "type": "array" } ], "default": 1, "description": "Stride for the convolution", "title": "Stride" }, "conv_kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments for the convolution layer", "title": "Conv Kwargs", "type": "object" }, "transposed": { "default": false, "description": "Whether to perform ConvTranspose instead of Conv", "title": "Transposed", "type": "boolean" }, "normalization": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "batchnorm3d", "description": "Normalization layer type.", "title": "Normalization" }, "normalization_pre_args": { "default": [], "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.", "items": {}, "title": "Normalization Pre Args", "type": "array" }, "normalization_post_args": { "default": [], "description": "Arguments for the normalization layer after providing the dimension.", "items": {}, "title": "Normalization Post Args", "type": "array" }, "normalization_kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments for the normalization layer", "title": "Normalization Kwargs", "type": "object" }, "activation": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "relu", "description": "Activation function type.", "title": "Activation" }, "activation_kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments for the activation function.", "title": "Activation Kwargs", "type": "object" }, "sequence": { "default": "CNA", "description": "Sequence of operations in the block.", "enum": [ "C", "AC", "CA", "CD", "CN", "DC", "NC", "ACD", "ACN", "ADC", "ANC", "CAD", "CAN", "CDA", "CDN", "CNA", "CND", "DAC", "DCA", "DCN", "DNC", "NAC", "NCA", "NCD", "NDC", "ACDN", "ACND", "ADCN", "ADNC", "ANCD", "ANDC", "CADN", "CAND", "CDAN", "CDNA", "CNAD", "CNDA", "DACN", "DANC", "DCAN", "DCNA", "DNAC", "DNCA", "NACD", "NADC", "NCAD", "NCDA", "NDAC", "NDCA" ], "title": "Sequence", "type": "string" }, "drop_prob": { "default": 0.0, "description": "Dropout probability.", "title": "Drop Prob", "type": "number" }, "init_low_var": { "default": false, "description": "Whether to initialize weights such that output variance is low", "title": "Init Low Var", "type": "boolean" } }, "required": [ "in_channels", "out_channels", "kernel_size" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
validate_before»all fields
-
field init_low_var:
bool= False# Whether to initialize weights such that output variance is low
- Validated by:
- validator validate_before » all fields[source]#
Base class method for validating data before creating the model.
- property dim: int#
- property latent_dim: int#
- pydantic model vision_architectures.layers.latent_space.LatentDecoderConfig[source]#
Bases:
CNNBlockConfigShow JSON schema
{ "title": "LatentDecoderConfig", "type": "object", "properties": { "in_channels": { "description": "Number of input channels", "title": "In Channels", "type": "integer" }, "out_channels": { "description": "Number of output channels", "title": "Out Channels", "type": "integer" }, "kernel_size": { "anyOf": [ { "type": "integer" }, { "items": { "type": "integer" }, "type": "array" } ], "description": "Kernel size for the convolution", "title": "Kernel Size" }, "padding": { "anyOf": [ { "type": "integer" }, { "items": { "type": "integer" }, "type": "array" }, { "type": "string" } ], "default": "same", "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.", "title": "Padding" }, "stride": { "anyOf": [ { "type": "integer" }, { "items": { "type": "integer" }, "type": "array" } ], "default": 1, "description": "Stride for the convolution", "title": "Stride" }, "conv_kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments for the convolution layer", "title": "Conv Kwargs", "type": "object" }, "transposed": { "default": false, "description": "Whether to perform ConvTranspose instead of Conv", "title": "Transposed", "type": "boolean" }, "normalization": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "batchnorm3d", "description": "Normalization layer type.", "title": "Normalization" }, "normalization_pre_args": { "default": [], "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.", "items": {}, "title": "Normalization Pre Args", "type": "array" }, "normalization_post_args": { "default": [], "description": "Arguments for the normalization layer after providing the dimension.", "items": {}, "title": "Normalization Post Args", "type": "array" }, "normalization_kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments for the normalization layer", "title": "Normalization Kwargs", "type": "object" }, "activation": { "anyOf": [ { "type": "string" }, { "type": "null" } ], "default": "relu", "description": "Activation function type.", "title": "Activation" }, "activation_kwargs": { "additionalProperties": true, "default": {}, "description": "Additional keyword arguments for the activation function.", "title": "Activation Kwargs", "type": "object" }, "sequence": { "default": "CNA", "description": "Sequence of operations in the block.", "enum": [ "C", "AC", "CA", "CD", "CN", "DC", "NC", "ACD", "ACN", "ADC", "ANC", "CAD", "CAN", "CDA", "CDN", "CNA", "CND", "DAC", "DCA", "DCN", "DNC", "NAC", "NCA", "NCD", "NDC", "ACDN", "ACND", "ADCN", "ADNC", "ANCD", "ANDC", "CADN", "CAND", "CDAN", "CDNA", "CNAD", "CNDA", "DACN", "DANC", "DCAN", "DCNA", "DNAC", "DNCA", "NACD", "NADC", "NCAD", "NCDA", "NDAC", "NDCA" ], "title": "Sequence", "type": "string" }, "drop_prob": { "default": 0.0, "description": "Dropout probability.", "title": "Drop Prob", "type": "number" } }, "required": [ "in_channels", "out_channels", "kernel_size" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
validate_before»all fields
- validator validate_before » all fields[source]#
Base class method for validating data before creating the model.
- property latent_dim#
- property dim#
- pydantic model vision_architectures.layers.latent_space.GaussianLatentSpaceConfig[source]#
Bases:
CustomBaseModelShow JSON schema
{ "title": "GaussianLatentSpaceConfig", "type": "object", "properties": {} }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Validators:
- class vision_architectures.layers.latent_space.LatentEncoder3D(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
ModuleEncodes input features into a latent space representation by predicting the mean and standard deviation of the latent space.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the LatentEncoder3D.
- Parameters:
config (
LatentEncoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(x, prior_mu=None, prior_log_var=None, return_log_var=False, max_mu=100.0, max_log_var=10.0, channels_first=True)[source]#
Get latent space representation of the input by mapping it to the latent dimension and then extracting the mean and standard deviation of the latent space. If a prior distribution is provided, the input is expected to predict the deviation from the prior. If it is not provided, one can think of it as the deviation from a standard normal distribution is being predicted. The output is the mean and standard deviation of the latent space.
- Parameters:
x (
Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.prior_mu (
Optional[Tensor]) – The mean of the prior distribution. If None, it is assumed to be the mean of a standard normal distribution. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.prior_log_var (
Optional[Tensor]) – The log-variance of the prior distribution. If None, it is assumed to be log-variance of a standard normal distribution. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.return_log_var (
bool) – Whether to return the log-variance too.max_mu (
float) – Clamps sigma to the minimum and maximum values allowed i.e. to the range [-max_mu, max_mu].max_log_var (
float) – Clamps log-variance to the minimum and maximum values allowed i.e. to the range [-max_log_var, max_log_var]. Defaults to 10.0, which corresponds to a variance from 0.000045 (std=0.006737) to 22026.465 (std=148.413).channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Returns:
The mean of the latent space. z_sigma: The standard deviation of the latent space.
- Return type:
z_mu
- class vision_architectures.layers.latent_space.LatentDecoder3D(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
ModuleDecodes latent space representations back to the original feature space.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the LatentDecoder3D.
- Parameters:
config (
LatentDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(z, channels_first=True)[source]#
Decode latent space representation back to the original feature space.
- Parameters:
z (
Tensor) – Sampled vector in the latent space. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Returns:
The decoded feature tensor. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- Return type:
x
- class vision_architectures.layers.latent_space.GaussianLatentSpace(config={}, **kwargs)[source]#
Bases:
ModuleImplements a Gaussian latent space with sampling and KL divergence computation.
- __init__(config={}, **kwargs)[source]#
Initializes the GaussianLatentSpace.
- Parameters:
config (
GaussianLatentSpaceConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.**kwargs – Additional keyword arguments for configuration.
- sample(z_mu, z_sigma, force_sampling=False)[source]#
Samples from the latent space using the reparameterization trick.
- Parameters:
z_mu (
Tensor) – The mean of the latent space. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.z_sigma (
Tensor) – The standard deviation of the latent space. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.force_sampling (
bool) – Whether to force sampling even in evaluation mode.
- Returns:
The sampled latent vector. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- Return type:
z_vae
- static kl_divergence(z_mu, z_sigma, prior_mu=None, prior_sigma=None, reduction='allsum', channels_first=True)[source]#
Computes the KL divergence between the latent space distribution and a prior distribution.
- Parameters:
z_mu (
Tensor) – The mean of the latent space. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.z_sigma (
Tensor) – The standard deviation of the latent space. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.prior_mu (
Optional[Tensor]) – The mean of the prior distribution. If None, it is assumed to be the mean of a standard normal distribution. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.prior_sigma (
Optional[Tensor]) – The standard deviation of the prior distribution. If None, it is assumed to be the standard deviation of a standard normal distribution. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.reduction (
Optional[Literal['allsum','channelssum']]) – The reduction method to apply to the KL divergence. If “allsum”, sums over all dimensions except batch and averages over the batch. If “channelssum”, sums over the channel dimension and averages over the batch. If None, no reduction is applied.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Returns:
The KL divergence between the latent space distribution and the prior distribution.
- Return type:
kl_div
- forward(z_mu, z_sigma, prior_mu=None, prior_sigma=None, kl_divergence_reduction='allsum', force_sampling=False, channels_first=True)[source]#
Samples from the latent space and computes the KL divergence between the latent space distribution and a prior distribution.
- Parameters:
z_mu (
Tensor) – The mean of the latent space. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.z_sigma (
Tensor) – The standard deviation of the latent space. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.prior_mu (
Optional[Tensor]) – The mean of the prior distribution. If None, it is assumed to be the mean of a standard normal distribution. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.prior_sigma (
Optional[Tensor]) – The standard deviation of the prior distribution. If None, it is assumed to be the standard deviation of a standard normal distribution. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.kl_divergence_reduction (
Optional[Literal['allsum','channelssum']]) – The reduction method to apply to the KL divergence. If “allsum”, sums over all dimensions except batch and averages over the batch. If “channelssum”, sums over the channel dimension and averages over the batch. If None, no reduction is applied.force_sampling (
bool) – Whether to force sampling even in evaluation mode.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Returns:
The sampled latent vector. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features. kl_div: The KL divergence between the latent space distribution and the prior distribution.
- Return type:
z_vae