MaxViT3D#

pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DStem0Config[source]#

Bases: CNNBlockConfig

Show JSON schema
{
   "title": "MaxViT3DStem0Config",
   "type": "object",
   "properties": {
      "in_channels": {
         "title": "In Channels",
         "type": "integer"
      },
      "out_channels": {
         "default": null,
         "description": "This is defined by dim",
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": 3,
         "title": "Kernel Size",
         "type": "integer"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "title": "Padding"
      },
      "stride": {
         "default": 1,
         "title": "Stride",
         "type": "integer"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "batchnorm3d",
         "title": "Normalization"
      },
      "normalization_pre_args": {
         "default": [],
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "relu",
         "title": "Activation"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "title": "Drop Prob",
         "type": "number"
      },
      "dim": {
         "title": "Dim",
         "type": "integer"
      },
      "depth": {
         "default": 2,
         "title": "Depth",
         "type": "integer"
      }
   },
   "required": [
      "in_channels",
      "dim"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field in_channels: int [Required]#
Validated by:
field kernel_size: int = 3#
Validated by:
field dim: int [Required]#
Validated by:
field depth: int = 2#
Validated by:
field out_channels: None = None#

This is defined by dim

Validated by:
pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DBlockConfig[source]#

Bases: MBConv3DConfig, Attention3DWithMLPConfig

Show JSON schema
{
   "title": "MaxViT3DBlockConfig",
   "type": "object",
   "properties": {
      "dim": {
         "title": "Dim",
         "type": "integer"
      },
      "mlp_ratio": {
         "default": 4,
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "relu",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference.",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "norm_location": {
         "default": "post",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "in_channels": {
         "default": null,
         "title": "In Channels",
         "type": "null"
      },
      "out_channels": {
         "default": null,
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": 3,
         "title": "Kernel Size",
         "type": "integer"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "title": "Padding"
      },
      "stride": {
         "default": 1,
         "title": "Stride",
         "type": "integer"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "default": "batchnorm3d",
         "title": "Normalization",
         "type": "string"
      },
      "normalization_pre_args": {
         "default": [],
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "title": "Drop Prob",
         "type": "number"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Out Dim"
      },
      "expansion_ratio": {
         "default": 6.0,
         "title": "Expansion Ratio",
         "type": "number"
      },
      "se_reduction_ratio": {
         "default": 4.0,
         "title": "Se Reduction Ratio",
         "type": "number"
      },
      "window_size": {
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "modify_dims": {
         "default": false,
         "title": "Modify Dims",
         "type": "boolean"
      },
      "out_dim_ratio": {
         "default": 2,
         "title": "Out Dim Ratio",
         "type": "integer"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field window_size: tuple[int, int, int] [Required]#
Validated by:
field modify_dims: bool = False#
Validated by:
field out_dim_ratio: int = 2#
Validated by:
pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DStemConfig[source]#

Bases: MaxViT3DBlockConfig

Show JSON schema
{
   "title": "MaxViT3DStemConfig",
   "type": "object",
   "properties": {
      "dim": {
         "title": "Dim",
         "type": "integer"
      },
      "mlp_ratio": {
         "default": 4,
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "relu",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference.",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "norm_location": {
         "default": "post",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "in_channels": {
         "default": null,
         "title": "In Channels",
         "type": "null"
      },
      "out_channels": {
         "default": null,
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": 3,
         "title": "Kernel Size",
         "type": "integer"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "title": "Padding"
      },
      "stride": {
         "default": 1,
         "title": "Stride",
         "type": "integer"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "default": "batchnorm3d",
         "title": "Normalization",
         "type": "string"
      },
      "normalization_pre_args": {
         "default": [],
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "title": "Drop Prob",
         "type": "number"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "title": "Out Dim"
      },
      "expansion_ratio": {
         "default": 6.0,
         "title": "Expansion Ratio",
         "type": "number"
      },
      "se_reduction_ratio": {
         "default": 4.0,
         "title": "Se Reduction Ratio",
         "type": "number"
      },
      "window_size": {
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "modify_dims": {
         "default": false,
         "title": "Modify Dims",
         "type": "boolean"
      },
      "out_dim_ratio": {
         "default": 2,
         "title": "Out Dim Ratio",
         "type": "integer"
      },
      "depth": {
         "title": "Depth",
         "type": "integer"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size",
      "depth"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field depth: int [Required]#
Validated by:
pydantic model vision_architectures.nets.maxvit_3d.MaxViT3DEncoderConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "MaxViT3DEncoderConfig",
   "type": "object",
   "properties": {
      "stem0": {
         "$ref": "#/$defs/MaxViT3DStem0Config"
      },
      "stems": {
         "items": {
            "$ref": "#/$defs/MaxViT3DStemConfig"
         },
         "title": "Stems",
         "type": "array"
      }
   },
   "$defs": {
      "MaxViT3DStem0Config": {
         "properties": {
            "in_channels": {
               "title": "In Channels",
               "type": "integer"
            },
            "out_channels": {
               "default": null,
               "description": "This is defined by dim",
               "title": "Out Channels",
               "type": "null"
            },
            "kernel_size": {
               "default": 3,
               "title": "Kernel Size",
               "type": "integer"
            },
            "padding": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "same",
               "title": "Padding"
            },
            "stride": {
               "default": 1,
               "title": "Stride",
               "type": "integer"
            },
            "conv_kwargs": {
               "additionalProperties": true,
               "default": {},
               "title": "Conv Kwargs",
               "type": "object"
            },
            "transposed": {
               "default": false,
               "description": "Whether to perform ConvTranspose instead of Conv",
               "title": "Transposed",
               "type": "boolean"
            },
            "normalization": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "batchnorm3d",
               "title": "Normalization"
            },
            "normalization_pre_args": {
               "default": [],
               "items": {},
               "title": "Normalization Pre Args",
               "type": "array"
            },
            "normalization_post_args": {
               "default": [],
               "items": {},
               "title": "Normalization Post Args",
               "type": "array"
            },
            "normalization_kwargs": {
               "additionalProperties": true,
               "default": {},
               "title": "Normalization Kwargs",
               "type": "object"
            },
            "activation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "relu",
               "title": "Activation"
            },
            "activation_kwargs": {
               "additionalProperties": true,
               "default": {},
               "title": "Activation Kwargs",
               "type": "object"
            },
            "sequence": {
               "default": "CNA",
               "enum": [
                  "C",
                  "AC",
                  "CA",
                  "CD",
                  "CN",
                  "DC",
                  "NC",
                  "ACD",
                  "ACN",
                  "ADC",
                  "ANC",
                  "CAD",
                  "CAN",
                  "CDA",
                  "CDN",
                  "CNA",
                  "CND",
                  "DAC",
                  "DCA",
                  "DCN",
                  "DNC",
                  "NAC",
                  "NCA",
                  "NCD",
                  "NDC",
                  "ACDN",
                  "ACND",
                  "ADCN",
                  "ADNC",
                  "ANCD",
                  "ANDC",
                  "CADN",
                  "CAND",
                  "CDAN",
                  "CDNA",
                  "CNAD",
                  "CNDA",
                  "DACN",
                  "DANC",
                  "DCAN",
                  "DCNA",
                  "DNAC",
                  "DNCA",
                  "NACD",
                  "NADC",
                  "NCAD",
                  "NCDA",
                  "NDAC",
                  "NDCA"
               ],
               "title": "Sequence",
               "type": "string"
            },
            "drop_prob": {
               "default": 0.0,
               "title": "Drop Prob",
               "type": "number"
            },
            "dim": {
               "title": "Dim",
               "type": "integer"
            },
            "depth": {
               "default": 2,
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "in_channels",
            "dim"
         ],
         "title": "MaxViT3DStem0Config",
         "type": "object"
      },
      "MaxViT3DStemConfig": {
         "properties": {
            "dim": {
               "title": "Dim",
               "type": "integer"
            },
            "mlp_ratio": {
               "default": 4,
               "title": "Mlp Ratio",
               "type": "integer"
            },
            "activation": {
               "default": "relu",
               "title": "Activation",
               "type": "string"
            },
            "mlp_drop_prob": {
               "default": 0.0,
               "title": "Mlp Drop Prob",
               "type": "number"
            },
            "num_heads": {
               "description": "Number of query heads",
               "title": "Num Heads",
               "type": "integer"
            },
            "ratio_q_to_kv_heads": {
               "default": 1,
               "title": "Ratio Q To Kv Heads",
               "type": "integer"
            },
            "logit_scale_learnable": {
               "default": false,
               "title": "Logit Scale Learnable",
               "type": "boolean"
            },
            "attn_drop_prob": {
               "default": 0.0,
               "title": "Attn Drop Prob",
               "type": "number"
            },
            "proj_drop_prob": {
               "default": 0.0,
               "title": "Proj Drop Prob",
               "type": "number"
            },
            "max_attention_batch_size": {
               "default": -1,
               "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference.",
               "title": "Max Attention Batch Size",
               "type": "integer"
            },
            "norm_location": {
               "default": "post",
               "enum": [
                  "pre",
                  "post"
               ],
               "title": "Norm Location",
               "type": "string"
            },
            "layer_norm_eps": {
               "default": 1e-06,
               "title": "Layer Norm Eps",
               "type": "number"
            },
            "in_channels": {
               "default": null,
               "title": "In Channels",
               "type": "null"
            },
            "out_channels": {
               "default": null,
               "title": "Out Channels",
               "type": "null"
            },
            "kernel_size": {
               "default": 3,
               "title": "Kernel Size",
               "type": "integer"
            },
            "padding": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "same",
               "title": "Padding"
            },
            "stride": {
               "default": 1,
               "title": "Stride",
               "type": "integer"
            },
            "conv_kwargs": {
               "additionalProperties": true,
               "default": {},
               "title": "Conv Kwargs",
               "type": "object"
            },
            "transposed": {
               "default": false,
               "description": "Whether to perform ConvTranspose instead of Conv",
               "title": "Transposed",
               "type": "boolean"
            },
            "normalization": {
               "default": "batchnorm3d",
               "title": "Normalization",
               "type": "string"
            },
            "normalization_pre_args": {
               "default": [],
               "items": {},
               "title": "Normalization Pre Args",
               "type": "array"
            },
            "normalization_post_args": {
               "default": [],
               "items": {},
               "title": "Normalization Post Args",
               "type": "array"
            },
            "normalization_kwargs": {
               "additionalProperties": true,
               "default": {},
               "title": "Normalization Kwargs",
               "type": "object"
            },
            "activation_kwargs": {
               "additionalProperties": true,
               "default": {},
               "title": "Activation Kwargs",
               "type": "object"
            },
            "sequence": {
               "default": "CNA",
               "enum": [
                  "C",
                  "AC",
                  "CA",
                  "CD",
                  "CN",
                  "DC",
                  "NC",
                  "ACD",
                  "ACN",
                  "ADC",
                  "ANC",
                  "CAD",
                  "CAN",
                  "CDA",
                  "CDN",
                  "CNA",
                  "CND",
                  "DAC",
                  "DCA",
                  "DCN",
                  "DNC",
                  "NAC",
                  "NCA",
                  "NCD",
                  "NDC",
                  "ACDN",
                  "ACND",
                  "ADCN",
                  "ADNC",
                  "ANCD",
                  "ANDC",
                  "CADN",
                  "CAND",
                  "CDAN",
                  "CDNA",
                  "CNAD",
                  "CNDA",
                  "DACN",
                  "DANC",
                  "DCAN",
                  "DCNA",
                  "DNAC",
                  "DNCA",
                  "NACD",
                  "NADC",
                  "NCAD",
                  "NCDA",
                  "NDAC",
                  "NDCA"
               ],
               "title": "Sequence",
               "type": "string"
            },
            "drop_prob": {
               "default": 0.0,
               "title": "Drop Prob",
               "type": "number"
            },
            "out_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "title": "Out Dim"
            },
            "expansion_ratio": {
               "default": 6.0,
               "title": "Expansion Ratio",
               "type": "number"
            },
            "se_reduction_ratio": {
               "default": 4.0,
               "title": "Se Reduction Ratio",
               "type": "number"
            },
            "window_size": {
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Window Size",
               "type": "array"
            },
            "modify_dims": {
               "default": false,
               "title": "Modify Dims",
               "type": "boolean"
            },
            "out_dim_ratio": {
               "default": 2,
               "title": "Out Dim Ratio",
               "type": "integer"
            },
            "depth": {
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "dim",
            "num_heads",
            "window_size",
            "depth"
         ],
         "title": "MaxViT3DStemConfig",
         "type": "object"
      }
   },
   "required": [
      "stem0",
      "stems"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field stem0: MaxViT3DStem0Config [Required]#
Validated by:
field stems: list[MaxViT3DStemConfig] [Required]#
Validated by:
validator validate  »  all fields[source]#

Base method for validating the model after creation.

class vision_architectures.nets.maxvit_3d.MaxViT3DStem0(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*args, **kwargs)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vision_architectures.nets.maxvit_3d.MaxViT3DBlockAttention(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: SwinV23DLayer

class vision_architectures.nets.maxvit_3d.MaxViT3DGridAttention(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: SwinV23DLayer

Perform grid attention on the input tensor.

Note that the grid attention implementation differs from the paper where the image is being partitioned based on the window size and not based on the number of windows. For example:

Let us say the input is

A1 A2 A3 A4 A5 A6
B1 B2 B3 B4 B5 B6
C1 C2 C3 C4 C5 C6
D1 D2 D3 D4 D5 D6

Let us say the window size is 2x2. The grid attention will be performed on the following 6 windows:

A1 A4  A2 A5  A3 A6
C1 C4  C2 C5  C3 C6

B1 B4  B2 B5  B3 B6
D1 D4  D2 D5  D3 D6

According to the paper, my understanding is that attention should have been applied on the following 4 windows:

A1 A3 A5  A2 A4 A6
B1 B3 B5  B2 B4 B6

C1 C3 C5  C2 C4 C6
D1 D3 D5  D2 D4 D6

i.e. the first token of all 2x2 windows in block attention, the second token of all 2x2 windows in block attention and so on.

This has been implemented different so as to limit the number of tokens to be attended to in a window, as if utilized as per the paper, since 3D inputs are usually very large, the number of total windows in block attention would be very large, leading to a very large number of tokens to attend to in each window in grid attention.

It would also cause problems when estimating the position embeddings as the grid size of the position embeddings would vary very with every input size.

class vision_architectures.nets.maxvit_3d.MaxViT3DBlock(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*args, **kwargs)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vision_architectures.nets.maxvit_3d.MaxViT3DStem(config={}, checkpointing_level=0, dont_downsample=False, **kwargs)[source]#

Bases: Module

__init__(config={}, checkpointing_level=0, dont_downsample=False, **kwargs)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*args, **kwargs)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vision_architectures.nets.maxvit_3d.MaxViT3DEncoder(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(*args, **kwargs)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.