CaiT3D#

pydantic model vision_architectures.nets.cait_3d.CaiTAttentionWithMLPConfig[source]#

Bases: Attention1DConfig, Attention1DMLPConfig

Show JSON schema
{
   "title": "CaiTAttentionWithMLPConfig",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            }
         ],
         "description": "Dimension of the input features. If tuple, (dim_qk, dim_v). Otherwise it is assumed to be dim of both qk and v.",
         "title": "Dim"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings1DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings1DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            }
         },
         "title": "RotaryPositionEmbeddings1DConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field layer_norm_eps: float = 1e-06#

Epsilon value for the layer normalization.

Validated by:
pydantic model vision_architectures.nets.cait_3d.CaiTStage1Config[source]#

Bases: CaiTAttentionWithMLPConfig

Show JSON schema
{
   "title": "CaiTStage1Config",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            }
         ],
         "description": "Dimension of the input features. If tuple, (dim_qk, dim_v). Otherwise it is assumed to be dim of both qk and v.",
         "title": "Dim"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings1DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "stage1_depth": {
         "description": "Number of layers in stage 1.",
         "minimum": 0,
         "title": "Stage1 Depth",
         "type": "integer"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings1DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            }
         },
         "title": "RotaryPositionEmbeddings1DConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "stage1_depth"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field stage1_depth: int [Required]#

Number of layers in stage 1.

Constraints:
  • ge = 0

Validated by:
pydantic model vision_architectures.nets.cait_3d.CaiTStage2Config[source]#

Bases: CaiTAttentionWithMLPConfig

Show JSON schema
{
   "title": "CaiTStage2Config",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            }
         ],
         "description": "Dimension of the input features. If tuple, (dim_qk, dim_v). Otherwise it is assumed to be dim of both qk and v.",
         "title": "Dim"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings1DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "num_class_tokens": {
         "default": 1,
         "description": "Number of class tokens to be added in stage 2.",
         "minimum": 0,
         "title": "Num Class Tokens",
         "type": "integer"
      },
      "stage2_depth": {
         "description": "Number of layers in stage 2.",
         "minimum": 0,
         "title": "Stage2 Depth",
         "type": "integer"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings1DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            }
         },
         "title": "RotaryPositionEmbeddings1DConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "stage2_depth"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field num_class_tokens: int = 1#

Number of class tokens to be added in stage 2.

Constraints:
  • ge = 0

Validated by:
field stage2_depth: int [Required]#

Number of layers in stage 2.

Constraints:
  • ge = 0

Validated by:
pydantic model vision_architectures.nets.cait_3d.CaiTConfig[source]#

Bases: CaiTStage1Config, CaiTStage2Config

Show JSON schema
{
   "title": "CaiTConfig",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            }
         ],
         "description": "Dimension of the input features. If tuple, (dim_qk, dim_v). Otherwise it is assumed to be dim of both qk and v.",
         "title": "Dim"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings1DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "num_class_tokens": {
         "default": 1,
         "description": "Number of class tokens to be added in stage 2.",
         "minimum": 0,
         "title": "Num Class Tokens",
         "type": "integer"
      },
      "stage2_depth": {
         "description": "Number of layers in stage 2.",
         "minimum": 0,
         "title": "Stage2 Depth",
         "type": "integer"
      },
      "stage1_depth": {
         "description": "Number of layers in stage 1.",
         "minimum": 0,
         "title": "Stage1 Depth",
         "type": "integer"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings1DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            }
         },
         "title": "RotaryPositionEmbeddings1DConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "stage2_depth",
      "stage1_depth"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:

Validators:
validator validate  »  all fields[source]#

Base method for validating the model after creation.

class vision_architectures.nets.cait_3d.CaiTAttentionWithMLP(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Attention layer used in the CaiT 3D model. Introduces learnable gamma scaling of hidden states after the self attention and MLP layers. This class is designed for 1D input eg. language, patchified images, etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the CaiT 3D attention layer.

Parameters:
  • config (CaiTAttentionWithMLPConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(q, kv)[source]#

Pass the input q and kv tensors through the q, k, and v matrices and then pass them through the CaiT attention layer.

Parameters:
  • q (Tensor) – Tensor of shape (B, T, C) representing the input features.

  • kv (Tensor) – Tensor of shape (B, T, C) representing the input features.

Return type:

Tensor

Returns:

Tensor of shape (B, T, C) representing the output features.

class vision_architectures.nets.cait_3d.CaiTStage1(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module, PyTorchModelHubMixin

CaiT stage 1. Performs self attention without class tokens focusing on learning features among tokens. This class is designed for 1D input eg. language, patchified images, etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the CaiTStage1.

Parameters:
  • config (CaiTStage1Config) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(embeddings, return_intermediates=False)[source]#

Pass the input embeddings through the CaiT stage 1 layers.

Parameters:
  • embeddings (Tensor) – Tensor of shape (B, T, C) representing the input features.

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

Return type:

Tensor

Returns:

Tensor of shape (B, T, C) representing the output features. If return_intermediates is True, returns a tuple of the output embeddings and a list of intermediate layer outputs.

class vision_architectures.nets.cait_3d.CaiTStage2(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module, PyTorchModelHubMixin

CaiT stage 2. Performs cross attention between class tokens and learned features from stage 1. This class is designed for 1D input eg. language, patchified images, etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the CaiTStage2.

Parameters:
  • config (CaiTStage2Config) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(class_tokens, embeddings, return_intermediates=False)[source]#

Pass the input embeddings through the CaiT stage 2 layers.

Parameters:
  • class_tokens (Tensor) – Tensor of shape (B, T, C) representing the input features.

  • embeddings (Tensor) – Tensor of shape (B, T, C) representing the input features.

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

Return type:

Tensor

Returns:

Tensor of shape (B, T, C) representing the output features. If return_intermediates is True, returns a tuple of the output embeddings and a list of intermediate layer outputs.

class vision_architectures.nets.cait_3d.CaiT1D(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module, PyTorchModelHubMixin

End-to-end CaiT model for classification. This class is designed for 1D input eg. language, patchified images, etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the CaiT Model.

Parameters:
  • config (CaiTConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(tokens, return_intermediates=False)[source]#

Pass the input embeddings through the CaiT layers. Expects flattened input.

Parameters:
  • tokens (Tensor) – Tensor of shape (B, T, C) representing the input features.

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

Return type:

Tensor | tuple

Returns:

Tensor of shape (B, T, C) representing the output features.

class vision_architectures.nets.cait_3d.CaiT3D(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: CaiT1D

End-to-end CaiT model for classification. This class is designed for 3D input eg. medical images, videos etc.