Transformer#
- pydantic model vision_architectures.blocks.transformer.Attention1DMLPConfig[source]#
Bases:
CustomBaseModel
Show JSON schema
{ "title": "Attention1DMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" } }, "required": [ "dim" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
-
field dim:
int
[Required]# Dimension of the input and output features.
- Validated by:
-
field mlp_ratio:
int
= 4# Ratio of the hidden dimension in the MLP to the input dimension.
- Validated by:
-
field activation:
str
= 'gelu'# Activation function for the MLP.
- Validated by:
-
field mlp_drop_prob:
float
= 0.0# Dropout probability for the MLP.
- Validated by:
- pydantic model vision_architectures.blocks.transformer.Attention3DMLPConfig[source]#
Bases:
Attention1DMLPConfig
Show JSON schema
{ "title": "Attention3DMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" } }, "required": [ "dim" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
- pydantic model vision_architectures.blocks.transformer.Attention1DWithMLPConfig[source]#
Bases:
Attention1DMLPConfig
,Attention1DConfig
Show JSON schema
{ "title": "Attention1DWithMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference.", "title": "Max Attention Batch Size", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" } }, "required": [ "dim", "num_heads" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
-
field norm_location:
Literal
['pre'
,'post'
] = 'post'# Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.
- Validated by:
-
field layer_norm_eps:
float
= 1e-06# Epsilon value for the layer normalization.
- Validated by:
- pydantic model vision_architectures.blocks.transformer.Attention3DWithMLPConfig[source]#
Bases:
Attention3DMLPConfig
,Attention3DConfig
Show JSON schema
{ "title": "Attention3DWithMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference.", "title": "Max Attention Batch Size", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" } }, "required": [ "dim", "num_heads" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
-
field norm_location:
Literal
['pre'
,'post'
] = 'post'# Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.
- Validated by:
-
field layer_norm_eps:
float
= 1e-06# Epsilon value for the layer normalization.
- Validated by:
- class vision_architectures.blocks.transformer.Attention1DMLP(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
The MLP that is usually used after performing attention. This class is designed for 1D input eg. language, etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention1DMLP block. Activation checkpointing level 2.
- Parameters:
config (
Attention1DMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- class vision_architectures.blocks.transformer.Attention3DMLP(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Attention1DMLP
The MLP that is usually used after performing attention. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention3DMLP block. Activation checkpointing level 2.
- Parameters:
config (
Attention3DMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(hidden_states, channels_first=True)[source]#
Forward pass of the Attention3DMLP block.
- Parameters:
hidden_states (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool
) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.blocks.transformer.Attention1DWithMLP(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
An attention block with an MLP. This class is designed for 1D input eg. language, etc.
- __init__(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention1DWithMLP block. Activation checkpointing level 3.
- Parameters:
config (
Attention1DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.relative_position_bias (
Union
[RelativePositionEmbeddings3D
,RelativePositionEmbeddings3DMetaNetwork
,None
]) – Relative position embeddings for the attention mechanism.logit_scale (
Optional
[float
]) – Optional scaling factor for the attention logits.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(query, key, value)[source]#
Forward pass of the Attention1DWithMLP block.
- Parameters:
query (
Tensor
) – Tensor of shape (B, T, C) representing the input features.key (
Tensor
) – Tensor of shape (B, T, C) representing the input features.value (
Tensor
) – Tensor of shape (B, T, C) representing the input features.
- Return type:
Tensor
- Returns:
Tensor of shape (B, T, C) representing the output features.
- class vision_architectures.blocks.transformer.Attention3DWithMLP(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
An attention block with an MLP. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention3DWithMLP block. Activation checkpointing level 3.
- Parameters:
config (
Attention3DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.relative_position_bias (
Union
[RelativePositionEmbeddings3D
,RelativePositionEmbeddings3DMetaNetwork
,None
]) – Relative position embeddings for the attention mechanism.logit_scale (
Optional
[float
]) – Optional scaling factor for the attention logits.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(query, key, value, channels_first=True)[source]#
Forward pass of the Attention3DWithMLP block.
- Parameters:
query (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.key (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.value (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool
) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerEncoderBlock1D(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Attention1DWithMLP
A self attention transformer block. This class is designed for 1D input eg. language, etc.
- forward(qkv, *args, **kwargs)[source]#
Forward pass of the TransformerEncoderBlock1D block. Activation checkpointing level 3.
- Parameters:
qkv (
Tensor
) – Tensor of shape (B, T, C) representing the input features. The same tensor is used for query, key, and value.- Return type:
Tensor
- Returns:
Tensor of shape (B, T, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerEncoderBlock3D(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Attention3DWithMLP
A self attention transformer block. This class is designed for 3D input eg. medical images, videos etc.
- forward(qkv, *args, **kwargs)[source]#
Forward pass of the TransformerEncoderBlock3D block. Activation checkpointing level 3.
- Parameters:
qkv (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The same tensor is used for query, key, and value.- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerDecoderBlock1D(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
A cross attention transformer block. This class is designed for 1D input eg. language, etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize a TransformerDecoderBlock1D block. Activation checkpointing level 3.
- Parameters:
config (
Attention1DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(q, kv)[source]#
Forward pass of the TransformerDecoderBlock1D block.
- Parameters:
q (
Tensor
) – The query tensor. Tensor of shape (B, T, C) representing the input features.kv (
Tensor
) – The key and value tensors. Tensor of shape (B, T, C) representing the input features.
- Return type:
Tensor
- Returns:
Tensor of shape (B, T, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerDecoderBlock3D(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
A cross attention transformer block. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize a TransformerDecoderBlock3D block. Activation checkpointing level 3.
- Parameters:
config (
Attention3DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(q, kv, channels_first=True)[source]#
Forward pass of the TransformerDecoderBlock3D block.
- Parameters:
q (
Tensor
) – The query tensor. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.kv (
Tensor
) – The key and value tensors. Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool
) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.