Transformer#
- vision_architectures.blocks.transformer.TransformerEncoderBlock1DConfig[source]#
alias of
Attention1DWithMLPConfig
- vision_architectures.blocks.transformer.TransformerEncoderBlock3DConfig[source]#
alias of
Attention3DWithMLPConfig
- vision_architectures.blocks.transformer.TransformerDecoderBlock1DConfig[source]#
alias of
Attention1DWithMLPConfig
- vision_architectures.blocks.transformer.TransformerDecoderBlock3DConfig[source]#
alias of
Attention3DWithMLPConfig
- pydantic model vision_architectures.blocks.transformer.Attention1DMLPConfig[source]#
Bases:
CustomBaseModel
Show JSON schema
{ "title": "Attention1DMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" } }, "required": [ "dim" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
-
field dim:
int
[Required]# Dimension of the input and output features.
- Validated by:
-
field mlp_ratio:
int
= 4# Ratio of the hidden dimension in the MLP to the input dimension.
- Validated by:
-
field activation:
str
= 'gelu'# Activation function for the MLP.
- Validated by:
-
field mlp_drop_prob:
float
= 0.0# Dropout probability for the MLP.
- Validated by:
- pydantic model vision_architectures.blocks.transformer.Attention3DMLPConfig[source]#
Bases:
Attention1DMLPConfig
Show JSON schema
{ "title": "Attention3DMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" } }, "required": [ "dim" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
- pydantic model vision_architectures.blocks.transformer.Attention1DWithMLPConfig[source]#
Bases:
Attention1DMLPConfig
,Attention1DConfig
Show JSON schema
{ "title": "Attention1DWithMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference.", "title": "Max Attention Batch Size", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" } }, "required": [ "dim", "num_heads" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
-
field norm_location:
Literal
['pre'
,'post'
] = 'post'# Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.
- Validated by:
-
field layer_norm_eps:
float
= 1e-06# Epsilon value for the layer normalization.
- Validated by:
- pydantic model vision_architectures.blocks.transformer.Attention3DWithMLPConfig[source]#
Bases:
Attention3DMLPConfig
,Attention3DConfig
Show JSON schema
{ "title": "Attention3DWithMLPConfig", "type": "object", "properties": { "dim": { "description": "Dimension of the input and output features.", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference.", "title": "Max Attention Batch Size", "type": "integer" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" } }, "required": [ "dim", "num_heads" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
-
field norm_location:
Literal
['pre'
,'post'
] = 'post'# Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.
- Validated by:
-
field layer_norm_eps:
float
= 1e-06# Epsilon value for the layer normalization.
- Validated by:
- class vision_architectures.blocks.transformer.Attention1DMLP(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
The MLP that is usually used after performing attention. This class is designed for 1D input eg. language, etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention1DMLP block. Activation checkpointing level 2.
- Parameters:
config (
Attention1DMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- class vision_architectures.blocks.transformer.Attention3DMLP(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Attention1DMLP
The MLP that is usually used after performing attention. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention3DMLP block. Activation checkpointing level 2.
- Parameters:
config (
Attention3DMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(hidden_states, channels_first=True)[source]#
Forward pass of the Attention3DMLP block.
- Parameters:
hidden_states (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool
) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.blocks.transformer.Attention1DWithMLP(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
An attention block with an MLP. This class is designed for 1D input eg. language, etc.
- __init__(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention1DWithMLP block. Activation checkpointing level 3.
- Parameters:
config (
Attention1DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.relative_position_bias (
Union
[RelativePositionEmbeddings3D
,RelativePositionEmbeddings3DMetaNetwork
,None
]) – Relative position embeddings for the attention mechanism.logit_scale (
Optional
[float
]) – Optional scaling factor for the attention logits.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(query, key, value)[source]#
Forward pass of the Attention1DWithMLP block.
- Parameters:
query (
Tensor
) – Tensor of shape (B, T, C) representing the input features.key (
Tensor
) – Tensor of shape (B, T, C) representing the input features.value (
Tensor
) – Tensor of shape (B, T, C) representing the input features.
- Return type:
Tensor
- Returns:
Tensor of shape (B, T, C) representing the output features.
- class vision_architectures.blocks.transformer.Attention3DWithMLP(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
An attention block with an MLP. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Initialize an Attention3DWithMLP block. Activation checkpointing level 3.
- Parameters:
config (
Attention3DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.relative_position_bias (
Union
[RelativePositionEmbeddings3D
,RelativePositionEmbeddings3DMetaNetwork
,None
]) – Relative position embeddings for the attention mechanism.logit_scale (
Optional
[float
]) – Optional scaling factor for the attention logits.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(query, key, value, channels_first=True)[source]#
Forward pass of the Attention3DWithMLP block.
- Parameters:
query (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.key (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.value (
Tensor
) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool
) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerEncoderBlock1D(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Attention1DWithMLP
A self attention transformer block. This class is designed for 1D input eg. language, etc.
- forward(qkv=None, *args, q=None, k=None, v=None, **kwargs)[source]#
Forward pass of the TransformerEncoderBlock1D block. Activation checkpointing level 3.
- Parameters:
qkv (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. If provided, the same tensor is used for query, key, and value. Else, q, k, and v are considered.q (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. This is used only if qkv is not provided. This represents queries and is required.k (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. This is used only if qkv is not provided. This represents keys. If not provided, it is assumed to be the same as q.v (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. This is used only if qkv is not provided. This represents values. If not provided, it is assumed to be the same as k.
- Return type:
Tensor
- Returns:
Tensor of shape (B, T, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerEncoderBlock3D(config={}, relative_position_bias=None, logit_scale=None, checkpointing_level=0, **kwargs)[source]#
Bases:
Attention3DWithMLP
A self attention transformer block. This class is designed for 3D input eg. medical images, videos etc.
- forward(qkv=None, *args, q=None, k=None, v=None, **kwargs)[source]#
Forward pass of the TransformerEncoderBlock3D block. Activation checkpointing level 3.
- Parameters:
qkv (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. If provided, the same tensor is used for query, key, and value. Else, q, k, and v are considered.q (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. This is used only if qkv is not provided. This represents queries and is required.k (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. This is used only if qkv is not provided. This represents keys. If not provided, it is assumed to be the same as q.v (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. This is used only if qkv is not provided. This represents values. If not provided, it is assumed to be the same as k.
- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerDecoderBlock1D(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
A cross attention transformer block. This class is designed for 1D input eg. language, etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize a TransformerDecoderBlock1D block. Activation checkpointing level 3.
- Parameters:
config (
Attention1DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(q=None, kv=None, *, q1=None, k1=None, v1=None, k2=None, v2=None)[source]#
Forward pass of the TransformerDecoderBlock1D block.
- Parameters:
q (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. The query tensor used for self-attention. Either this or q1 should be provided.kv (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. The key and value tensors used for cross-attention. Either this or k1 and/or v1 should be provided.q1 (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. The query tensor used for self-attention. Either this or q should be provided.k1 (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. The key tensor used for self-attention. If not provided, this defaults to q or q1.v1 (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. The value tensor used for self-attention. If not provided, this defaults to q or q1.k2 (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. The key tensor used for cross-attention. Either this or kv should be provided.v2 (
Optional
[Tensor
]) – Tensor of shape (B, T, C) representing the input features. The value tensor used for cross-attention. If not provided, this defaults to kv or k2.
- Return type:
Tensor
- Returns:
Tensor of shape (B, T, C) representing the output features.
- class vision_architectures.blocks.transformer.TransformerDecoderBlock3D(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module
A cross attention transformer block. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize a TransformerDecoderBlock3D block. Activation checkpointing level 3.
- Parameters:
config (
Attention3DWithMLPConfig
) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int
) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointing
for more details.**kwargs – Additional keyword arguments for configuration.
- forward(q=None, kv=None, *, q1=None, k1=None, v1=None, k2=None, v2=None, channels_first=True)[source]#
Forward pass of the TransformerDecoderBlock3D block.
- Parameters:
q (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The query tensor used for self-attention. Either this or q1 should be provided.kv (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The key and value tensors used for cross-attention. Either this or k1 and/or v1 should be provided.q1 (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The query tensor used for self-attention. Either this or q should be provided.k1 (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The key tensor used for self-attention. If not provided, this defaults to q or q1.v1 (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The value tensor used for self-attention. If not provided, this defaults to q or q1.k2 (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The key tensor used for cross-attention. Either this or kv should be provided.v2 (
Optional
[Tensor
]) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features. The value tensor used for cross-attention. If not provided, this defaults to kv or k2.channels_first (
bool
) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor
- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.