SwinV23D#

pydantic model vision_architectures.nets.swinv2_3d.SwinV23DPatchMergingConfig[source]#

Bases: Swin3DPatchMergingConfig

Show JSON schema
{
   "title": "SwinV23DPatchMergingConfig",
   "type": "object",
   "properties": {
      "in_dim": {
         "description": "Input dimension before merging",
         "title": "In Dim",
         "type": "integer"
      },
      "out_dim": {
         "description": "Output dimension after merging",
         "title": "Out Dim",
         "type": "integer"
      },
      "merge_window_size": {
         "description": "Size of the window for merging patches",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Merge Window Size",
         "type": "array"
      }
   },
   "required": [
      "in_dim",
      "out_dim",
      "merge_window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:

Validators:

pydantic model vision_architectures.nets.swinv2_3d.SwinV23DPatchSplittingConfig[source]#

Bases: Swin3DPatchSplittingConfig

Show JSON schema
{
   "title": "SwinV23DPatchSplittingConfig",
   "type": "object",
   "properties": {
      "in_dim": {
         "description": "Input dimension before splitting",
         "title": "In Dim",
         "type": "integer"
      },
      "out_dim": {
         "description": "Output dimension after splitting",
         "title": "Out Dim",
         "type": "integer"
      },
      "final_window_size": {
         "description": "Size of the window to split patches into",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Final Window Size",
         "type": "array"
      }
   },
   "required": [
      "in_dim",
      "out_dim",
      "final_window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:

Validators:

pydantic model vision_architectures.nets.swinv2_3d.SwinV23DBlockConfig[source]#

Bases: Swin3DBlockConfig

Show JSON schema
{
   "title": "SwinV23DBlockConfig",
   "type": "object",
   "properties": {
      "dim": {
         "description": "Dim at which attention is performed",
         "title": "Dim",
         "type": "integer"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "norm_location": {
         "default": "post",
         "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "window_size": {
         "description": "Size of the window to apply attention over",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "use_relative_position_bias": {
         "default": false,
         "description": "Whether to use relative position bias",
         "title": "Use Relative Position Bias",
         "type": "boolean"
      },
      "patch_merging": {
         "anyOf": [
            {
               "$ref": "#/$defs/SwinV23DPatchMergingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch merging config if desired. Patch merging is applied before attention."
      },
      "patch_splitting": {
         "anyOf": [
            {
               "$ref": "#/$defs/SwinV23DPatchSplittingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch splitting config if desired. Patch splitting is applied after attention."
      },
      "in_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
         "title": "In Dim"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
         "title": "Out Dim"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "SwinV23DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "SwinV23DPatchMergingConfig",
         "type": "object"
      },
      "SwinV23DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "SwinV23DPatchSplittingConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field patch_merging: SwinV23DPatchMergingConfig | None = None#

Patch merging config if desired. Patch merging is applied before attention.

Validated by:
field patch_splitting: SwinV23DPatchSplittingConfig | None = None#

Patch splitting config if desired. Patch splitting is applied after attention.

Validated by:
pydantic model vision_architectures.nets.swinv2_3d.SwinV23DStageConfig[source]#

Bases: SwinV23DBlockConfig, Swin3DStageConfig

Show JSON schema
{
   "title": "SwinV23DStageConfig",
   "type": "object",
   "properties": {
      "dim": {
         "description": "Dim at which attention is performed",
         "title": "Dim",
         "type": "integer"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "norm_location": {
         "default": "post",
         "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "window_size": {
         "description": "Size of the window to apply attention over",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "use_relative_position_bias": {
         "default": false,
         "description": "Whether to use relative position bias",
         "title": "Use Relative Position Bias",
         "type": "boolean"
      },
      "patch_merging": {
         "anyOf": [
            {
               "$ref": "#/$defs/SwinV23DPatchMergingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch merging config if desired. Patch merging is applied before attention."
      },
      "patch_splitting": {
         "anyOf": [
            {
               "$ref": "#/$defs/SwinV23DPatchSplittingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch splitting config if desired. Patch splitting is applied after attention."
      },
      "in_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
         "title": "In Dim"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
         "title": "Out Dim"
      },
      "depth": {
         "description": "Number of transformer blocks in this stage",
         "title": "Depth",
         "type": "integer"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "SwinV23DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "SwinV23DPatchMergingConfig",
         "type": "object"
      },
      "SwinV23DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "SwinV23DPatchSplittingConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size",
      "depth"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:

Validators:

pydantic model vision_architectures.nets.swinv2_3d.SwinV23DEncoderDecoderConfig[source]#

Bases: Swin3DEncoderDecoderConfig

Show JSON schema
{
   "title": "SwinV23DEncoderDecoderConfig",
   "type": "object",
   "properties": {
      "stages": {
         "items": {
            "$ref": "#/$defs/SwinV23DStageConfig"
         },
         "title": "Stages",
         "type": "array"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "SwinV23DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "SwinV23DPatchMergingConfig",
         "type": "object"
      },
      "SwinV23DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "SwinV23DPatchSplittingConfig",
         "type": "object"
      },
      "SwinV23DStageConfig": {
         "properties": {
            "dim": {
               "description": "Dim at which attention is performed",
               "title": "Dim",
               "type": "integer"
            },
            "num_heads": {
               "description": "Number of query heads",
               "title": "Num Heads",
               "type": "integer"
            },
            "ratio_q_to_kv_heads": {
               "default": 1,
               "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
               "title": "Ratio Q To Kv Heads",
               "type": "integer"
            },
            "logit_scale_learnable": {
               "default": false,
               "description": "Whether the logit scale is learnable.",
               "title": "Logit Scale Learnable",
               "type": "boolean"
            },
            "attn_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for attention weights.",
               "title": "Attn Drop Prob",
               "type": "number"
            },
            "proj_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the projection layer.",
               "title": "Proj Drop Prob",
               "type": "number"
            },
            "max_attention_batch_size": {
               "default": -1,
               "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
               "title": "Max Attention Batch Size",
               "type": "integer"
            },
            "rotary_position_embeddings_config": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Config for rotary position embeddings"
            },
            "mlp_ratio": {
               "default": 4,
               "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
               "title": "Mlp Ratio",
               "type": "integer"
            },
            "activation": {
               "default": "gelu",
               "description": "Activation function for the MLP.",
               "title": "Activation",
               "type": "string"
            },
            "mlp_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the MLP.",
               "title": "Mlp Drop Prob",
               "type": "number"
            },
            "norm_location": {
               "default": "post",
               "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
               "enum": [
                  "pre",
                  "post"
               ],
               "title": "Norm Location",
               "type": "string"
            },
            "layer_norm_eps": {
               "default": 1e-06,
               "description": "Epsilon value for the layer normalization.",
               "title": "Layer Norm Eps",
               "type": "number"
            },
            "window_size": {
               "description": "Size of the window to apply attention over",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Window Size",
               "type": "array"
            },
            "use_relative_position_bias": {
               "default": false,
               "description": "Whether to use relative position bias",
               "title": "Use Relative Position Bias",
               "type": "boolean"
            },
            "patch_merging": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/SwinV23DPatchMergingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch merging config if desired. Patch merging is applied before attention."
            },
            "patch_splitting": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/SwinV23DPatchSplittingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch splitting config if desired. Patch splitting is applied after attention."
            },
            "in_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
               "title": "In Dim"
            },
            "out_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
               "title": "Out Dim"
            },
            "depth": {
               "description": "Number of transformer blocks in this stage",
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "dim",
            "num_heads",
            "window_size",
            "depth"
         ],
         "title": "SwinV23DStageConfig",
         "type": "object"
      }
   },
   "required": [
      "stages"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field stages: list[SwinV23DStageConfig] [Required]#
Validated by:
pydantic model vision_architectures.nets.swinv2_3d.SwinV23DEncoderWithPatchEmbeddingsConfig[source]#

Bases: SwinV23DEncoderDecoderConfig, Swin3DEncoderWithPatchEmbeddingsConfig

Show JSON schema
{
   "title": "SwinV23DEncoderWithPatchEmbeddingsConfig",
   "type": "object",
   "properties": {
      "stages": {
         "items": {
            "$ref": "#/$defs/SwinV23DStageConfig"
         },
         "title": "Stages",
         "type": "array"
      },
      "in_channels": {
         "description": "Number of input channels in the input image/video",
         "title": "In Channels",
         "type": "integer"
      },
      "patch_size": {
         "description": "Size of the patches to be extracted from the input image/video",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Patch Size",
         "type": "array"
      },
      "image_size": {
         "anyOf": [
            {
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Size of the input image/video. Required if absolute position embeddings are learnable.",
         "title": "Image Size"
      },
      "use_absolute_position_embeddings": {
         "default": true,
         "description": "Whether to use absolute position embeddings.",
         "title": "Use Absolute Position Embeddings",
         "type": "boolean"
      },
      "learnable_absolute_position_embeddings": {
         "default": false,
         "description": "Whether to use learnable absolute position embeddings.",
         "title": "Learnable Absolute Position Embeddings",
         "type": "boolean"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "SwinV23DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "SwinV23DPatchMergingConfig",
         "type": "object"
      },
      "SwinV23DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "SwinV23DPatchSplittingConfig",
         "type": "object"
      },
      "SwinV23DStageConfig": {
         "properties": {
            "dim": {
               "description": "Dim at which attention is performed",
               "title": "Dim",
               "type": "integer"
            },
            "num_heads": {
               "description": "Number of query heads",
               "title": "Num Heads",
               "type": "integer"
            },
            "ratio_q_to_kv_heads": {
               "default": 1,
               "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
               "title": "Ratio Q To Kv Heads",
               "type": "integer"
            },
            "logit_scale_learnable": {
               "default": false,
               "description": "Whether the logit scale is learnable.",
               "title": "Logit Scale Learnable",
               "type": "boolean"
            },
            "attn_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for attention weights.",
               "title": "Attn Drop Prob",
               "type": "number"
            },
            "proj_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the projection layer.",
               "title": "Proj Drop Prob",
               "type": "number"
            },
            "max_attention_batch_size": {
               "default": -1,
               "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
               "title": "Max Attention Batch Size",
               "type": "integer"
            },
            "rotary_position_embeddings_config": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Config for rotary position embeddings"
            },
            "mlp_ratio": {
               "default": 4,
               "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
               "title": "Mlp Ratio",
               "type": "integer"
            },
            "activation": {
               "default": "gelu",
               "description": "Activation function for the MLP.",
               "title": "Activation",
               "type": "string"
            },
            "mlp_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the MLP.",
               "title": "Mlp Drop Prob",
               "type": "number"
            },
            "norm_location": {
               "default": "post",
               "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
               "enum": [
                  "pre",
                  "post"
               ],
               "title": "Norm Location",
               "type": "string"
            },
            "layer_norm_eps": {
               "default": 1e-06,
               "description": "Epsilon value for the layer normalization.",
               "title": "Layer Norm Eps",
               "type": "number"
            },
            "window_size": {
               "description": "Size of the window to apply attention over",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Window Size",
               "type": "array"
            },
            "use_relative_position_bias": {
               "default": false,
               "description": "Whether to use relative position bias",
               "title": "Use Relative Position Bias",
               "type": "boolean"
            },
            "patch_merging": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/SwinV23DPatchMergingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch merging config if desired. Patch merging is applied before attention."
            },
            "patch_splitting": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/SwinV23DPatchSplittingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch splitting config if desired. Patch splitting is applied after attention."
            },
            "in_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
               "title": "In Dim"
            },
            "out_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
               "title": "Out Dim"
            },
            "depth": {
               "description": "Number of transformer blocks in this stage",
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "dim",
            "num_heads",
            "window_size",
            "depth"
         ],
         "title": "SwinV23DStageConfig",
         "type": "object"
      }
   },
   "required": [
      "stages",
      "in_channels",
      "patch_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:

Validators:

class vision_architectures.nets.swinv2_3d.SwinV23DLayerLogitScale(num_heads)[source]#

Bases: Module

__init__(num_heads)[source]#

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward()[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class vision_architectures.nets.swinv2_3d.SwinV23DLayer(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DLayer

SwinV2 3D Layer applying windowed attention with optional relative position embeddings. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the SwinV23DLayer.

Parameters:
  • config (RelativePositionEmbeddings3DConfig | Attention3DWithMLPConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DBlock(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DBlock

SwinV2 3D Block consisting of two SwinV23DLayers: one with regular windows and one with shifted windows. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the SwinV23DBlock.

Parameters:
  • config (SwinV23DBlockConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DPatchMerging(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DPatchMerging

Patch merging layer for SwinV23D. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the SwinV23DPatchMerging layer.

Parameters:
  • config (SwinV23DPatchMergingConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DPatchSplitting(config, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DPatchSplitting

Patch splitting layer for SwinV23D. This class is designed for 3D input eg. medical images, videos etc.

This is a self-implemented class and is not part of the paper.

__init__(config, checkpointing_level=0, **kwargs)[source]#

Initialize the SwinV23DPatchSplitting layer.

Parameters:
  • config (SwinV23DPatchSplittingConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DStage(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DStage

SwinV23D stage for SwinV23D. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the SwinV23DStage.

Parameters:
  • config (SwinV23DStageConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DEncoderDecoderBase(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DEncoderDecoderBase, PyTorchModelHubMixin

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the SwinV23DEncoder/SwinV23DDecoder.

Parameters:
  • config (SwinV23DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DEncoder(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: SwinV23DEncoderDecoderBase

3D Swin Transformer encoder. Assumes input has already been patchified/tokenized. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the SwinV23DEncoder/SwinV23DDecoder.

Parameters:
  • config (SwinV23DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DDecoder(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: SwinV23DEncoderDecoderBase

3D Swin Transformer decoder. Assumes input has already been patchified/tokenized. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the SwinV23DEncoder/SwinV23DDecoder.

Parameters:
  • config (SwinV23DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swinv2_3d.SwinV23DEncoderWithPatchEmbeddings(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DEncoderWithPatchEmbeddings, PyTorchModelHubMixin

3D SwinV2 transformer with 3D patch embeddings. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the SwinV23DEncoderWithPatchEmbeddings.

Parameters:
  • config (SwinV23DEncoderWithPatchEmbeddingsConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.