Swin3D#

pydantic model vision_architectures.nets.swin_3d.Swin3DPatchMergingConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "Swin3DPatchMergingConfig",
   "type": "object",
   "properties": {
      "in_dim": {
         "description": "Input dimension before merging",
         "title": "In Dim",
         "type": "integer"
      },
      "out_dim": {
         "description": "Output dimension after merging",
         "title": "Out Dim",
         "type": "integer"
      },
      "merge_window_size": {
         "description": "Size of the window for merging patches",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Merge Window Size",
         "type": "array"
      }
   },
   "required": [
      "in_dim",
      "out_dim",
      "merge_window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field in_dim: int [Required]#

Input dimension before merging

Validated by:
field out_dim: int [Required]#

Output dimension after merging

Validated by:
field merge_window_size: tuple[int, int, int] [Required]#

Size of the window for merging patches

Validated by:
property out_dim_ratio: float#
validator validate_before  »  all fields[source]#

Base class method for validating data before creating the model.

pydantic model vision_architectures.nets.swin_3d.Swin3DPatchSplittingConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "Swin3DPatchSplittingConfig",
   "type": "object",
   "properties": {
      "in_dim": {
         "description": "Input dimension before splitting",
         "title": "In Dim",
         "type": "integer"
      },
      "out_dim": {
         "description": "Output dimension after splitting",
         "title": "Out Dim",
         "type": "integer"
      },
      "final_window_size": {
         "description": "Size of the window to split patches into",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Final Window Size",
         "type": "array"
      }
   },
   "required": [
      "in_dim",
      "out_dim",
      "final_window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field in_dim: int [Required]#

Input dimension before splitting

Validated by:
field out_dim: int [Required]#

Output dimension after splitting

Validated by:
field final_window_size: tuple[int, int, int] [Required]#

Size of the window to split patches into

Validated by:
property in_dim_ratio: float#
validator validate_before  »  all fields[source]#

Base class method for validating data before creating the model.

pydantic model vision_architectures.nets.swin_3d.Swin3DBlockConfig[source]#

Bases: Attention3DWithMLPConfig

Show JSON schema
{
   "title": "Swin3DBlockConfig",
   "type": "object",
   "properties": {
      "dim": {
         "description": "Dim at which attention is performed",
         "title": "Dim",
         "type": "integer"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "norm_location": {
         "default": "post",
         "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "window_size": {
         "description": "Size of the window to apply attention over",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "use_relative_position_bias": {
         "default": false,
         "description": "Whether to use relative position bias",
         "title": "Use Relative Position Bias",
         "type": "boolean"
      },
      "patch_merging": {
         "anyOf": [
            {
               "$ref": "#/$defs/Swin3DPatchMergingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch merging config if desired. Patch merging is applied before attention."
      },
      "patch_splitting": {
         "anyOf": [
            {
               "$ref": "#/$defs/Swin3DPatchSplittingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch splitting config if desired. Patch splitting is applied after attention."
      },
      "in_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
         "title": "In Dim"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
         "title": "Out Dim"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "Swin3DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "Swin3DPatchMergingConfig",
         "type": "object"
      },
      "Swin3DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "Swin3DPatchSplittingConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field window_size: tuple[int, int, int] [Required]#

Size of the window to apply attention over

Validated by:
field use_relative_position_bias: bool = False#

Whether to use relative position bias

Validated by:
field patch_merging: Swin3DPatchMergingConfig | None = None#

Patch merging config if desired. Patch merging is applied before attention.

Validated by:
field patch_splitting: Swin3DPatchSplittingConfig | None = None#

Patch splitting config if desired. Patch splitting is applied after attention.

Validated by:
field in_dim: int | None = None#

Input dimension of the stage. Useful if patch_merging is used.

Validated by:
field dim: int [Required]#

Dim at which attention is performed

Validated by:
field out_dim: int | None = None#

Output dimension of the stage. Useful if patch_splitting is used.

Validated by:
property spatial_compression_ratio#
get_out_patch_size(in_patch_size)[source]#
get_in_patch_size(out_patch_size)[source]#
get_in_dim()[source]#
Return type:

int

get_out_dim()[source]#
Return type:

int

property out_dim_ratio: float#
populate()[source]#

Populate the in_dim and out_dim of patch_splitting and patch_merging based on the stage’s in_dim, dim, out_dim.

validator validate  »  all fields[source]#

Base method for validating the model after creation.

pydantic model vision_architectures.nets.swin_3d.Swin3DStageConfig[source]#

Bases: Swin3DBlockConfig

Show JSON schema
{
   "title": "Swin3DStageConfig",
   "type": "object",
   "properties": {
      "dim": {
         "description": "Dim at which attention is performed",
         "title": "Dim",
         "type": "integer"
      },
      "num_heads": {
         "description": "Number of query heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "ratio_q_to_kv_heads": {
         "default": 1,
         "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
         "title": "Ratio Q To Kv Heads",
         "type": "integer"
      },
      "logit_scale_learnable": {
         "default": false,
         "description": "Whether the logit scale is learnable.",
         "title": "Logit Scale Learnable",
         "type": "boolean"
      },
      "attn_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for attention weights.",
         "title": "Attn Drop Prob",
         "type": "number"
      },
      "proj_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the projection layer.",
         "title": "Proj Drop Prob",
         "type": "number"
      },
      "max_attention_batch_size": {
         "default": -1,
         "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
         "title": "Max Attention Batch Size",
         "type": "integer"
      },
      "rotary_position_embeddings_config": {
         "anyOf": [
            {
               "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Config for rotary position embeddings"
      },
      "mlp_ratio": {
         "default": 4,
         "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
         "title": "Mlp Ratio",
         "type": "integer"
      },
      "activation": {
         "default": "gelu",
         "description": "Activation function for the MLP.",
         "title": "Activation",
         "type": "string"
      },
      "mlp_drop_prob": {
         "default": 0.0,
         "description": "Dropout probability for the MLP.",
         "title": "Mlp Drop Prob",
         "type": "number"
      },
      "norm_location": {
         "default": "post",
         "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
         "enum": [
            "pre",
            "post"
         ],
         "title": "Norm Location",
         "type": "string"
      },
      "layer_norm_eps": {
         "default": 1e-06,
         "description": "Epsilon value for the layer normalization.",
         "title": "Layer Norm Eps",
         "type": "number"
      },
      "window_size": {
         "description": "Size of the window to apply attention over",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Window Size",
         "type": "array"
      },
      "use_relative_position_bias": {
         "default": false,
         "description": "Whether to use relative position bias",
         "title": "Use Relative Position Bias",
         "type": "boolean"
      },
      "patch_merging": {
         "anyOf": [
            {
               "$ref": "#/$defs/Swin3DPatchMergingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch merging config if desired. Patch merging is applied before attention."
      },
      "patch_splitting": {
         "anyOf": [
            {
               "$ref": "#/$defs/Swin3DPatchSplittingConfig"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Patch splitting config if desired. Patch splitting is applied after attention."
      },
      "in_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
         "title": "In Dim"
      },
      "out_dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
         "title": "Out Dim"
      },
      "depth": {
         "description": "Number of transformer blocks in this stage",
         "title": "Depth",
         "type": "integer"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "Swin3DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "Swin3DPatchMergingConfig",
         "type": "object"
      },
      "Swin3DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "Swin3DPatchSplittingConfig",
         "type": "object"
      }
   },
   "required": [
      "dim",
      "num_heads",
      "window_size",
      "depth"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field depth: int [Required]#

Number of transformer blocks in this stage

Validated by:
pydantic model vision_architectures.nets.swin_3d.Swin3DEncoderDecoderConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "Swin3DEncoderDecoderConfig",
   "type": "object",
   "properties": {
      "stages": {
         "items": {
            "$ref": "#/$defs/Swin3DStageConfig"
         },
         "title": "Stages",
         "type": "array"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "Swin3DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "Swin3DPatchMergingConfig",
         "type": "object"
      },
      "Swin3DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "Swin3DPatchSplittingConfig",
         "type": "object"
      },
      "Swin3DStageConfig": {
         "properties": {
            "dim": {
               "description": "Dim at which attention is performed",
               "title": "Dim",
               "type": "integer"
            },
            "num_heads": {
               "description": "Number of query heads",
               "title": "Num Heads",
               "type": "integer"
            },
            "ratio_q_to_kv_heads": {
               "default": 1,
               "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
               "title": "Ratio Q To Kv Heads",
               "type": "integer"
            },
            "logit_scale_learnable": {
               "default": false,
               "description": "Whether the logit scale is learnable.",
               "title": "Logit Scale Learnable",
               "type": "boolean"
            },
            "attn_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for attention weights.",
               "title": "Attn Drop Prob",
               "type": "number"
            },
            "proj_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the projection layer.",
               "title": "Proj Drop Prob",
               "type": "number"
            },
            "max_attention_batch_size": {
               "default": -1,
               "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
               "title": "Max Attention Batch Size",
               "type": "integer"
            },
            "rotary_position_embeddings_config": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Config for rotary position embeddings"
            },
            "mlp_ratio": {
               "default": 4,
               "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
               "title": "Mlp Ratio",
               "type": "integer"
            },
            "activation": {
               "default": "gelu",
               "description": "Activation function for the MLP.",
               "title": "Activation",
               "type": "string"
            },
            "mlp_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the MLP.",
               "title": "Mlp Drop Prob",
               "type": "number"
            },
            "norm_location": {
               "default": "post",
               "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
               "enum": [
                  "pre",
                  "post"
               ],
               "title": "Norm Location",
               "type": "string"
            },
            "layer_norm_eps": {
               "default": 1e-06,
               "description": "Epsilon value for the layer normalization.",
               "title": "Layer Norm Eps",
               "type": "number"
            },
            "window_size": {
               "description": "Size of the window to apply attention over",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Window Size",
               "type": "array"
            },
            "use_relative_position_bias": {
               "default": false,
               "description": "Whether to use relative position bias",
               "title": "Use Relative Position Bias",
               "type": "boolean"
            },
            "patch_merging": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Swin3DPatchMergingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch merging config if desired. Patch merging is applied before attention."
            },
            "patch_splitting": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Swin3DPatchSplittingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch splitting config if desired. Patch splitting is applied after attention."
            },
            "in_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
               "title": "In Dim"
            },
            "out_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
               "title": "Out Dim"
            },
            "depth": {
               "description": "Number of transformer blocks in this stage",
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "dim",
            "num_heads",
            "window_size",
            "depth"
         ],
         "title": "Swin3DStageConfig",
         "type": "object"
      }
   },
   "required": [
      "stages"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field stages: list[Swin3DStageConfig] [Required]#
Validated by:
populate()[source]#

Populate the in_dim, dim, out_dim of each stage.

validator validate  »  all fields[source]#

Base method for validating the model after creation.

get_out_dim_ratios()[source]#
pydantic model vision_architectures.nets.swin_3d.Swin3DEncoderWithPatchEmbeddingsConfig[source]#

Bases: Swin3DEncoderDecoderConfig

Show JSON schema
{
   "title": "Swin3DEncoderWithPatchEmbeddingsConfig",
   "type": "object",
   "properties": {
      "stages": {
         "items": {
            "$ref": "#/$defs/Swin3DStageConfig"
         },
         "title": "Stages",
         "type": "array"
      },
      "in_channels": {
         "description": "Number of input channels in the input image/video",
         "title": "In Channels",
         "type": "integer"
      },
      "patch_size": {
         "description": "Size of the patches to be extracted from the input image/video",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Patch Size",
         "type": "array"
      },
      "image_size": {
         "anyOf": [
            {
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Size of the input image/video. Required if absolute position embeddings are learnable.",
         "title": "Image Size"
      },
      "use_absolute_position_embeddings": {
         "default": true,
         "description": "Whether to use absolute position embeddings.",
         "title": "Use Absolute Position Embeddings",
         "type": "boolean"
      },
      "learnable_absolute_position_embeddings": {
         "default": false,
         "description": "Whether to use learnable absolute position embeddings.",
         "title": "Learnable Absolute Position Embeddings",
         "type": "boolean"
      }
   },
   "$defs": {
      "RotaryPositionEmbeddings3DConfig": {
         "properties": {
            "dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Dimension of the position embeddings",
               "title": "Dim"
            },
            "base": {
               "default": 10000.0,
               "description": "Base value for the exponent.",
               "title": "Base",
               "type": "number"
            },
            "split": {
               "anyOf": [
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        },
                        {
                           "type": "number"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "maxItems": 3,
                     "minItems": 3,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  }
               ],
               "default": [
                  0.3333333333333333,
                  0.3333333333333333,
                  0.3333333333333333
               ],
               "description": "Split of the position embeddings. If float, converted to int based on self.dim",
               "title": "Split"
            }
         },
         "title": "RotaryPositionEmbeddings3DConfig",
         "type": "object"
      },
      "Swin3DPatchMergingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before merging",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after merging",
               "title": "Out Dim",
               "type": "integer"
            },
            "merge_window_size": {
               "description": "Size of the window for merging patches",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Merge Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "merge_window_size"
         ],
         "title": "Swin3DPatchMergingConfig",
         "type": "object"
      },
      "Swin3DPatchSplittingConfig": {
         "properties": {
            "in_dim": {
               "description": "Input dimension before splitting",
               "title": "In Dim",
               "type": "integer"
            },
            "out_dim": {
               "description": "Output dimension after splitting",
               "title": "Out Dim",
               "type": "integer"
            },
            "final_window_size": {
               "description": "Size of the window to split patches into",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Final Window Size",
               "type": "array"
            }
         },
         "required": [
            "in_dim",
            "out_dim",
            "final_window_size"
         ],
         "title": "Swin3DPatchSplittingConfig",
         "type": "object"
      },
      "Swin3DStageConfig": {
         "properties": {
            "dim": {
               "description": "Dim at which attention is performed",
               "title": "Dim",
               "type": "integer"
            },
            "num_heads": {
               "description": "Number of query heads",
               "title": "Num Heads",
               "type": "integer"
            },
            "ratio_q_to_kv_heads": {
               "default": 1,
               "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.",
               "title": "Ratio Q To Kv Heads",
               "type": "integer"
            },
            "logit_scale_learnable": {
               "default": false,
               "description": "Whether the logit scale is learnable.",
               "title": "Logit Scale Learnable",
               "type": "boolean"
            },
            "attn_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for attention weights.",
               "title": "Attn Drop Prob",
               "type": "number"
            },
            "proj_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the projection layer.",
               "title": "Proj Drop Prob",
               "type": "number"
            },
            "max_attention_batch_size": {
               "default": -1,
               "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).",
               "title": "Max Attention Batch Size",
               "type": "integer"
            },
            "rotary_position_embeddings_config": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Config for rotary position embeddings"
            },
            "mlp_ratio": {
               "default": 4,
               "description": "Ratio of the hidden dimension in the MLP to the input dimension.",
               "title": "Mlp Ratio",
               "type": "integer"
            },
            "activation": {
               "default": "gelu",
               "description": "Activation function for the MLP.",
               "title": "Activation",
               "type": "string"
            },
            "mlp_drop_prob": {
               "default": 0.0,
               "description": "Dropout probability for the MLP.",
               "title": "Mlp Drop Prob",
               "type": "number"
            },
            "norm_location": {
               "default": "post",
               "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.",
               "enum": [
                  "pre",
                  "post"
               ],
               "title": "Norm Location",
               "type": "string"
            },
            "layer_norm_eps": {
               "default": 1e-06,
               "description": "Epsilon value for the layer normalization.",
               "title": "Layer Norm Eps",
               "type": "number"
            },
            "window_size": {
               "description": "Size of the window to apply attention over",
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "title": "Window Size",
               "type": "array"
            },
            "use_relative_position_bias": {
               "default": false,
               "description": "Whether to use relative position bias",
               "title": "Use Relative Position Bias",
               "type": "boolean"
            },
            "patch_merging": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Swin3DPatchMergingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch merging config if desired. Patch merging is applied before attention."
            },
            "patch_splitting": {
               "anyOf": [
                  {
                     "$ref": "#/$defs/Swin3DPatchSplittingConfig"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Patch splitting config if desired. Patch splitting is applied after attention."
            },
            "in_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.",
               "title": "In Dim"
            },
            "out_dim": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.",
               "title": "Out Dim"
            },
            "depth": {
               "description": "Number of transformer blocks in this stage",
               "title": "Depth",
               "type": "integer"
            }
         },
         "required": [
            "dim",
            "num_heads",
            "window_size",
            "depth"
         ],
         "title": "Swin3DStageConfig",
         "type": "object"
      }
   },
   "required": [
      "stages",
      "in_channels",
      "patch_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field in_channels: int [Required]#

Number of input channels in the input image/video

Validated by:
field patch_size: tuple[int, int, int] [Required]#

Size of the patches to be extracted from the input image/video

Validated by:
field image_size: tuple[int, int, int] | None = None#

Size of the input image/video. Required if absolute position embeddings are learnable.

Validated by:
field use_absolute_position_embeddings: bool = True#

Whether to use absolute position embeddings.

Validated by:
field learnable_absolute_position_embeddings: bool = False#

Whether to use learnable absolute position embeddings.

Validated by:
validator validate  »  all fields[source]#

Base method for validating the model after creation.

class vision_architectures.nets.swin_3d.Swin3DLayer(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Swin 3D Layer applying windowed attention with optional relative position embeddings. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the Swin3DLayer.

Parameters:
  • config (RelativePositionEmbeddings3DConfig | Attention3DWithMLPConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(hidden_states, channels_first=True)[source]#

Window the input features and apply self attention on each window.

Parameters:
  • hidden_states (Tensor) – {INPUT_3D_DOC}

  • channels_first (bool) – {CHANNELS_FIRST_DOC}

Return type:

Tensor

Returns:

{OUTPUT_3D_DOC}

class vision_architectures.nets.swin_3d.Swin3DBlock(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Swin 3D Block consisting of two Swin3DLayers: one with regular windows and one with shifted windows. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the Swin3DBlock.

Parameters:
  • config (Swin3DBlockConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(hidden_states, channels_first=True, return_intermediates=False)[source]#

Apply window attention and shifted window attention on the input features.

Parameters:
  • hidden_states (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

Return type:

Tensor | tuple[Tensor, list[Tensor]]

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns a list of intermediate layer outputs. Note that the intermediate layer outputs returned will always be in channels_last format.

class vision_architectures.nets.swin_3d.Swin3DPatchMerging(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Patch merging layer for Swin3D. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the Swin3DPatchMerging layer.

Parameters:
  • config (Swin3DPatchMergingConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(hidden_states, channels_first=True)[source]#

Merge multiple patches into a single patch.

Parameters:
  • hidden_states (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

class vision_architectures.nets.swin_3d.Swin3DPatchSplitting(config, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Patch splitting layer for Swin3D. This class is designed for 3D input eg. medical images, videos etc.

This is a self-implemented class and is not part of the paper.

__init__(config, checkpointing_level=0, **kwargs)[source]#

Initialize the Swin3DPatchSplitting layer.

Parameters:
  • config (Swin3DPatchSplittingConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(hidden_states, channels_first=True)[source]#

Split patches into multiple patches.

Parameters:
  • hidden_states (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

class vision_architectures.nets.swin_3d.Swin3DStage(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Swin3D stage for Swin3D. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the Swin3DStage.

Parameters:
  • config (Swin3DStageConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(hidden_states, channels_first=True, return_intermediates=False)[source]#

Merge patches if applicable (used by the encoder), perform a series of window and shifted window attention, and then split patches if applicable (used by the decoder).

Parameters:
  • hidden_states (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns a list of intermediate layer outputs. Note that the intermediate layer outputs returned will always be in channels_last format.

class vision_architectures.nets.swin_3d.Swin3DEncoderDecoderBase(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module, PyTorchModelHubMixin

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the Swin3DEncoder/Swin3DDecoder.

Parameters:
  • config (Swin3DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(hidden_states, channels_first=True, return_intermediates=False)[source]#

Encodes the input features using the Swin Transformer hierarchy.

Parameters:
  • hidden_states (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns a list of intermediate layer outputs. Note that the intermediate layer outputs returned will always be in channels_last format.

class vision_architectures.nets.swin_3d.Swin3DEncoder(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DEncoderDecoderBase

3D Swin Transformer encoder. Assumes input has already been patchified/tokenized. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the Swin3DEncoder/Swin3DDecoder.

Parameters:
  • config (Swin3DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swin_3d.Swin3DDecoder(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Swin3DEncoderDecoderBase

3D Swin Transformer decoder. Assumes input has already been patchified/tokenized. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the Swin3DEncoder/Swin3DDecoder.

Parameters:
  • config (Swin3DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

class vision_architectures.nets.swin_3d.Swin3DEncoderWithPatchEmbeddings(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module, PyTorchModelHubMixin

3D Swin transformer with 3D patch embeddings. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initializes the Swin3DEncoderWithPatchEmbeddings.

Parameters:
  • config (Swin3DEncoderWithPatchEmbeddingsConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(pixel_values, spacings=None, crop_offsets=None, channels_first=True, return_intermediates=False)[source]#

Patchify the input pixel values and then pass it through the Swin transformer.

Parameters:
  • pixel_values (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • spacings (Optional[Tensor]) – Spacing information of shape (B, 3) of the input features.

  • crop_offsets (Optional[Tensor]) – Used if the embeddings required are of a crop of a larger image. If provided, the grid coordinates will be offset accordingly.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

  • return_intermediates (bool) – Return intermediate outputs such as layer/block/stage outputs.

Return type:

Tensor | tuple[Tensor, list[Tensor], list[Tensor]]

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns the intermediate stage outputs and layer outputs.