MaxViT3D#

pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DStem0Config[source]#

Bases: CNNBlockConfig

Show JSON schema
{
   "title": "MaxViT3DStem0Config",
   "type": "object",
   "properties": {
      "in_channels": {
         "description": "Number of input channels",
         "title": "In Channels",
         "type": "integer"
      },
      "out_channels": {
         "default": null,
         "description": "This is defined by dim",
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": 3,
         "description": "Kernel size for the convolutional layers in the stem",
         "title": "Kernel Size",
         "type": "integer"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
         "title": "Padding"
      },
      "stride": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": 1,
         "description": "Stride for the convolution",
         "title": "Stride"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the convolution layer",
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "batchnorm3d",
         "description": "Normalization layer type.",
         "title": "Normalization"
      },
      "normalization_pre_args": {
         "default": [],
         "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "description": "Arguments for the normalization layer after providing the dimension.",
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the normalization layer",
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "relu",
         "description": "Activation function type.",
         "title": "Activation"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the activation function.",
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "description": "Sequence of operations in the block.",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "description": "Dropout probability.",
         "title": "Drop Prob",
         "type": "number"
      },
      "dim": {
         "description": "Hidden dimension of the stem",
         "title": "Dim",
         "type": "integer"
      },
      "depth": {
         "default": 2,
         "description": "Number of convolutional layers in the stem",
         "minimum": 1,
         "title": "Depth",
         "type": "integer"
      }
   },
   "required": [
      "in_channels",
      "dim"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field in_channels: int [Required]#

Number of input channels

Validated by:
field kernel_size: int = 3#

Kernel size for the convolutional layers in the stem

Validated by:
field dim: int [Required]#

Hidden dimension of the stem

Validated by:
field depth: int = 2#

Number of convolutional layers in the stem

Constraints:
  • ge = 1

Validated by:
field out_channels: None = None#

This is defined by dim

Validated by:
pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DBlockConfig[source]#

Bases: MBConv3DConfig, Attention3DWithMLPConfig

Show JSON schema
{
   "title": "MaxViT3DBlockConfig",
   "type": "object",
   "properties": {
      "dim": {
         "description": "Input channel dimension of the block.",
         "title": "Dim",
         "type": "integer"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "relu",
         "description": "Activation function to use in the block.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "norm_location": {
         "default": "post",
         "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "in_channels": {
         "default": null,
         "description": "Use dim instead",
         "title": "In Channels",
         "type": "null"
      },
      "out_channels": {
         "default": null,
         "description": "Use expansion_ratio instead",
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": 3,
         "description": "Kernel size for the convolutional layers.",
         "title": "Kernel Size",
         "type": "integer"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
         "title": "Padding"
      },
      "stride": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": 1,
         "description": "Stride for the convolution",
         "title": "Stride"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the convolution layer",
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "default": "batchnorm3d",
         "description": "Normalization layer to use in the block.",
         "title": "Normalization",
         "type": "string"
      },
      "normalization_pre_args": {
         "default": [],
         "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "description": "Arguments for the normalization layer after providing the dimension.",
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the normalization layer",
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the activation function.",
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "description": "Sequence of operations in the block.",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "description": "Dropout probability.",
         "title": "Drop Prob",
         "type": "number"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Output channel dimension of the block. If None, it will be set to `dim`.",
         "title": "Out Dim"
      },
      "expansion_ratio": {
         "default": 6.0,
         "description": "Expansion ratio for the block.",
         "title": "Expansion Ratio",
         "type": "number"
      },
      "se_reduction_ratio": {
         "default": 4.0,
         "description": "Squeeze-and-excitation reduction ratio.",
         "title": "Se Reduction Ratio",
         "type": "number"
      },
      "window_size": {
         "description": "Size of the window to apply attention over",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "out_dim_ratio": {
         "default": 2,
         "description": "Ratio of the output dimension to the input dimension. Used only in the last block of stems",
         "title": "Out Dim Ratio",
         "type": "integer"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field window_size: tuple[int, int, int] [Required]#

Size of the window to apply attention over

Validated by:
field out_dim_ratio: int = 2#

Ratio of the output dimension to the input dimension. Used only in the last block of stems

Validated by:
pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DStemConfig[source]#

Bases: MaxViT3DBlockConfig

Show JSON schema
{
   "title": "MaxViT3DStemConfig",
   "type": "object",
   "properties": {
      "dim": {
         "description": "Input channel dimension of the block.",
         "title": "Dim",
         "type": "integer"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "relu",
         "description": "Activation function to use in the block.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "norm_location": {
         "default": "post",
         "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "in_channels": {
         "default": null,
         "description": "Use dim instead",
         "title": "In Channels",
         "type": "null"
      },
      "out_channels": {
         "default": null,
         "description": "Use expansion_ratio instead",
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": 3,
         "description": "Kernel size for the convolutional layers.",
         "title": "Kernel Size",
         "type": "integer"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
         "title": "Padding"
      },
      "stride": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": 1,
         "description": "Stride for the convolution",
         "title": "Stride"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the convolution layer",
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "default": "batchnorm3d",
         "description": "Normalization layer to use in the block.",
         "title": "Normalization",
         "type": "string"
      },
      "normalization_pre_args": {
         "default": [],
         "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "description": "Arguments for the normalization layer after providing the dimension.",
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the normalization layer",
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the activation function.",
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "description": "Sequence of operations in the block.",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "description": "Dropout probability.",
         "title": "Drop Prob",
         "type": "number"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Output channel dimension of the block. If None, it will be set to `dim`.",
         "title": "Out Dim"
      },
      "expansion_ratio": {
         "default": 6.0,
         "description": "Expansion ratio for the block.",
         "title": "Expansion Ratio",
         "type": "number"
      },
      "se_reduction_ratio": {
         "default": 4.0,
         "description": "Squeeze-and-excitation reduction ratio.",
         "title": "Se Reduction Ratio",
         "type": "number"
      },
      "window_size": {
         "description": "Size of the window to apply attention over",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "out_dim_ratio": {
         "default": 2,
         "description": "Ratio of the output dimension to the input dimension. Used only in the last block of stems",
         "title": "Out Dim Ratio",
         "type": "integer"
      },
      "depth": {
         "description": "Number of blocks in the stem",
         "title": "Depth",
         "type": "integer"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size",
      "depth"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field depth: int [Required]#

Number of blocks in the stem

Validated by:
pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DEncoderConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "MaxViT3DEncoderConfig",
   "type": "object",
   "properties": {
      "stem0": {
         "$ref": "#/$defs/MaxViT3DStem0Config",
         "description": "Configuration for the stem0"
      },
      "stems": {
         "description": "Configurations for the remaining stems",
         "items": {
            "$ref": "#/$defs/MaxViT3DStemConfig"
         },
         "title": "Stems",
         "type": "array"
      }
   },
   "$defs": {
      "MaxViT3DStem0Config": {
         "properties": {
            "in_channels": {
               "description": "Number of input channels",
               "title": "In Channels",
               "type": "integer"
            },
            "out_channels": {
               "default": null,
               "description": "This is defined by dim",
               "title": "Out Channels",
               "type": "null"
            },
            "kernel_size": {
               "default": 3,
               "description": "Kernel size for the convolutional layers in the stem",
               "title": "Kernel Size",
               "type": "integer"
            },
            "padding": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "same",
               "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
               "title": "Padding"
            },
            "stride": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": 1,
               "description": "Stride for the convolution",
               "title": "Stride"
            },
            "conv_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the convolution layer",
               "title": "Conv Kwargs",
               "type": "object"
            },
            "transposed": {
               "default": false,
               "description": "Whether to perform ConvTranspose instead of Conv",
               "title": "Transposed",
               "type": "boolean"
            },
            "normalization": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "batchnorm3d",
               "description": "Normalization layer type.",
               "title": "Normalization"
            },
            "normalization_pre_args": {
               "default": [],
               "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
               "items": {},
               "title": "Normalization Pre Args",
               "type": "array"
            },
            "normalization_post_args": {
               "default": [],
               "description": "Arguments for the normalization layer after providing the dimension.",
               "items": {},
               "title": "Normalization Post Args",
               "type": "array"
            },
            "normalization_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the normalization layer",
               "title": "Normalization Kwargs",
               "type": "object"
            },
            "activation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "relu",
               "description": "Activation function type.",
               "title": "Activation"
            },
            "activation_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the activation function.",
               "title": "Activation Kwargs",
               "type": "object"
            },
            "sequence": {
               "default": "CNA",
               "description": "Sequence of operations in the block.",
               "enum": [
                  "C",
                  "AC",
                  "CA",
                  "CD",
                  "CN",
                  "DC",
                  "NC",
                  "ACD",
                  "ACN",
                  "ADC",
                  "ANC",
                  "CAD",
                  "CAN",
                  "CDA",
                  "CDN",
                  "CNA",
                  "CND",
                  "DAC",
                  "DCA",
                  "DCN",
                  "DNC",
                  "NAC",
                  "NCA",
                  "NCD",
                  "NDC",
                  "ACDN",
                  "ACND",
                  "ADCN",
                  "ADNC",
                  "ANCD",
                  "ANDC",
                  "CADN",
                  "CAND",
                  "CDAN",
                  "CDNA",
                  "CNAD",
                  "CNDA",
                  "DACN",
                  "DANC",
                  "DCAN",
                  "DCNA",
                  "DNAC",
                  "DNCA",
                  "NACD",
                  "NADC",
                  "NCAD",
                  "NCDA",
                  "NDAC",
                  "NDCA"
               ],
               "title": "Sequence",
               "type": "string"
            },
            "drop_prob": {
               "default": 0.0,
               "description": "Dropout probability.",
               "title": "Drop Prob",
               "type": "number"
            },
            "dim": {
               "description": "Hidden dimension of the stem",
               "title": "Dim",
               "type": "integer"
            },
            "depth": {
               "default": 2,
               "description": "Number of convolutional layers in the stem",
               "minimum": 1,
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "in_channels",
            "dim"
         ],
         "title": "MaxViT3DStem0Config",
         "type": "object"
      },
      "MaxViT3DStemConfig": {
         "properties": {
            "dim": {
               "description": "Input channel dimension of the block.",
               "title": "Dim",
               "type": "integer"
            },
            "num_heads": {
               "description": "Number of query heads",
               "title": "Num Heads",
               "type": "integer"
            },
            "ratio_q_to_kv_heads": {
               "default": 1,
               "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
               "title": "Ratio Q To Kv Heads",
               "type": "integer"
            },
            "logit_scale_learnable": {
               "default": false,
               "description": "Whether the logit scale is learnable.",
               "title": "Logit Scale Learnable",
               "type": "boolean"
            },
            "attn_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for attention weights.",
               "title": "Attn Drop Prob",
               "type": "number"
            },
            "proj_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the projection layer.",
               "title": "Proj Drop Prob",
               "type": "number"
            },
            "max_attention_batch_size": {
               "default": -1,
               "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
               "title": "Max Attention Batch Size",
               "type": "integer"
            },
            "rotary_position_embeddings_config": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Config for rotary position embeddings"
            },
            "mlp_ratio": {
               "default": 4,
               "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
               "title": "Mlp Ratio",
               "type": "integer"
            },
            "activation": {
               "default": "relu",
               "description": "Activation function to use in the block.",
               "title": "Activation",
               "type": "string"
            },
            "mlp_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the MLP.",
               "title": "Mlp Drop Prob",
               "type": "number"
            },
            "norm_location": {
               "default": "post",
               "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
               "enum": [
                  "pre",
                  "post"
               ],
               "title": "Norm Location",
               "type": "string"
            },
            "layer_norm_eps": {
               "default": 1e-06,
               "description": "Epsilon value for the layer normalization.",
               "title": "Layer Norm Eps",
               "type": "number"
            },
            "in_channels": {
               "default": null,
               "description": "Use dim instead",
               "title": "In Channels",
               "type": "null"
            },
            "out_channels": {
               "default": null,
               "description": "Use expansion_ratio instead",
               "title": "Out Channels",
               "type": "null"
            },
            "kernel_size": {
               "default": 3,
               "description": "Kernel size for the convolutional layers.",
               "title": "Kernel Size",
               "type": "integer"
            },
            "padding": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "same",
               "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
               "title": "Padding"
            },
            "stride": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": 1,
               "description": "Stride for the convolution",
               "title": "Stride"
            },
            "conv_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the convolution layer",
               "title": "Conv Kwargs",
               "type": "object"
            },
            "transposed": {
               "default": false,
               "description": "Whether to perform ConvTranspose instead of Conv",
               "title": "Transposed",
               "type": "boolean"
            },
            "normalization": {
               "default": "batchnorm3d",
               "description": "Normalization layer to use in the block.",
               "title": "Normalization",
               "type": "string"
            },
            "normalization_pre_args": {
               "default": [],
               "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
               "items": {},
               "title": "Normalization Pre Args",
               "type": "array"
            },
            "normalization_post_args": {
               "default": [],
               "description": "Arguments for the normalization layer after providing the dimension.",
               "items": {},
               "title": "Normalization Post Args",
               "type": "array"
            },
            "normalization_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the normalization layer",
               "title": "Normalization Kwargs",
               "type": "object"
            },
            "activation_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the activation function.",
               "title": "Activation Kwargs",
               "type": "object"
            },
            "sequence": {
               "default": "CNA",
               "description": "Sequence of operations in the block.",
               "enum": [
                  "C",
                  "AC",
                  "CA",
                  "CD",
                  "CN",
                  "DC",
                  "NC",
                  "ACD",
                  "ACN",
                  "ADC",
                  "ANC",
                  "CAD",
                  "CAN",
                  "CDA",
                  "CDN",
                  "CNA",
                  "CND",
                  "DAC",
                  "DCA",
                  "DCN",
                  "DNC",
                  "NAC",
                  "NCA",
                  "NCD",
                  "NDC",
                  "ACDN",
                  "ACND",
                  "ADCN",
                  "ADNC",
                  "ANCD",
                  "ANDC",
                  "CADN",
                  "CAND",
                  "CDAN",
                  "CDNA",
                  "CNAD",
                  "CNDA",
                  "DACN",
                  "DANC",
                  "DCAN",
                  "DCNA",
                  "DNAC",
                  "DNCA",
                  "NACD",
                  "NADC",
                  "NCAD",
                  "NCDA",
                  "NDAC",
                  "NDCA"
               ],
               "title": "Sequence",
               "type": "string"
            },
            "drop_prob": {
               "default": 0.0,
               "description": "Dropout probability.",
               "title": "Drop Prob",
               "type": "number"
            },
            "out_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Output channel dimension of the block. If None, it will be set to `dim`.",
               "title": "Out Dim"
            },
            "expansion_ratio": {
               "default": 6.0,
               "description": "Expansion ratio for the block.",
               "title": "Expansion Ratio",
               "type": "number"
            },
            "se_reduction_ratio": {
               "default": 4.0,
               "description": "Squeeze-and-excitation reduction ratio.",
               "title": "Se Reduction Ratio",
               "type": "number"
            },
            "window_size": {
               "description": "Size of the window to apply attention over",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Window Size",
               "type": "array"
            },
            "out_dim_ratio": {
               "default": 2,
               "description": "Ratio of the output dimension to the input dimension. Used only in the last block of stems",
               "title": "Out Dim Ratio",
               "type": "integer"
            },
            "depth": {
               "description": "Number of blocks in the stem",
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "dim",
            "num_heads",
            "window_size",
            "depth"
         ],
         "title": "MaxViT3DStemConfig",
         "type": "object"
      },
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      }
   },
   "required": [
      "stem0",
      "stems"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field stem0: MaxViT3DStem0Config [Required]#

Configuration for the stem0

Validated by:
field stems: list[MaxViT3DStemConfig] [Required]#

Configurations for the remaining stems

Validated by:
validator validate  »  all fields[source]#

Base method for validating the model after creation.

class vision_architectures.nets.maxvit_3d.MaxViT3DStem0(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Stem0 for MaxViT3D. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the MaxViT3DStem0 block.

Parameters:
  • config (MaxViT3DStem0Config) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(x, channels_first=True)[source]#

Pass the input through the stem0. Downsamples the input 2x along each dimension

Parameters:
  • x (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

class vision_architectures.nets.maxvit_3d.MaxViT3DBlockAttention(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: SwinV23DLayer

Perform windowed attention on the input tensor. This class is designed for 3D input eg. medical images, videos etc.

class vision_architectures.nets.maxvit_3d.MaxViT3DGridAttention(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: SwinV23DLayer

Perform grid attention on the input tensor.

Note that the grid attention implementation differs from the paper where the image is being partitioned based on the window size and not based on the number of windows. For example:

Let us say the input is

A1 A2 A3 A4 A5 A6
B1 B2 B3 B4 B5 B6
C1 C2 C3 C4 C5 C6
D1 D2 D3 D4 D5 D6

Let us say the window size is 2x2. The grid attention will be performed on the following 6 windows:

A1 A4  A2 A5  A3 A6
C1 C4  C2 C5  C3 C6

B1 B4  B2 B5  B3 B6
D1 D4  D2 D5  D3 D6

According to the paper, my understanding is that attention should have been applied on the following 4 windows:

A1 A3 A5  A2 A4 A6
B1 B3 B5  B2 B4 B6

C1 C3 C5  C2 C4 C6
D1 D3 D5  D2 D4 D6

i.e. the first token of all 2x2 windows in block attention, the second token of all 2x2 windows in block attention and so on.

This has been implemented different so as to limit the number of tokens to be attended to in a window, as if utilized as per the paper, since 3D inputs are usually very large, the number of total windows in block attention would be very large, leading to a very large number of tokens to attend to in each window in grid attention.

It would also cause problems when estimating the position embeddings as the grid size of the position embeddings would vary very with every input size.

class vision_architectures.nets.maxvit_3d.MaxViT3DBlock(config={}, modify_dims=False, checkpointing_level=0, **kwargs)[source]#

Bases: Module

MaxViT3D block.

__init__(config={}, modify_dims=False, checkpointing_level=0, **kwargs)[source]#

Initialize MaxViT3D block.

Parameters:
  • config (MaxViT3DBlockConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(x, channels_first=True)[source]#

Pass the input through the block.

Parameters:
  • x (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

class vision_architectures.nets.maxvit_3d.MaxViT3DStem(config={}, checkpointing_level=0, dont_downsample=False, **kwargs)[source]#

Bases: Module

Implementation of a group of MaxViT blocks forming a stem. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, dont_downsample=False, **kwargs)[source]#

Initialize the stem

Parameters:
  • config (MaxViT3DStemConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • dont_downsample (bool) – Whether or not to downsample the input at the end of the stem.

  • **kwargs – Additional keyword arguments for configuration.

forward(x, channels_first=True)[source]#

Pass the input through the stem.

Parameters:
  • x (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

class vision_architectures.nets.maxvit_3d.MaxViT3DEncoder(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

3D MaxViT encoder. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the 3D MaxViT encoder.

Parameters:
  • config (MaxViT3DEncoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(x, return_intermediates=False, channels_first=True)[source]#

Pass the input through the 3D MaxViT encoder.

Parameters:
  • x (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, returns a tuple of the output and a list of intermediate stem outputs. Note that the stem outputs are always in channels_last format.