Swin3D#
- pydantic model vision_architectures.nets.swin_3d.Swin3DPatchMergingConfig[source]#
Bases:
CustomBaseModelShow JSON schema
{ "title": "Swin3DPatchMergingConfig", "type": "object", "properties": { "in_dim": { "description": "Input dimension before merging", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after merging", "title": "Out Dim", "type": "integer" }, "merge_window_size": { "description": "Size of the window for merging patches", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Merge Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "merge_window_size" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
validate_before»all fields
-
field in_dim:
int[Required]# Input dimension before merging
- Validated by:
-
field out_dim:
int[Required]# Output dimension after merging
- Validated by:
-
field merge_window_size:
tuple[int,int,int] [Required]# Size of the window for merging patches
- Validated by:
- property out_dim_ratio: float#
- pydantic model vision_architectures.nets.swin_3d.Swin3DPatchSplittingConfig[source]#
Bases:
CustomBaseModelShow JSON schema
{ "title": "Swin3DPatchSplittingConfig", "type": "object", "properties": { "in_dim": { "description": "Input dimension before splitting", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after splitting", "title": "Out Dim", "type": "integer" }, "final_window_size": { "description": "Size of the window to split patches into", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Final Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "final_window_size" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
validate_before»all fields
-
field in_dim:
int[Required]# Input dimension before splitting
- Validated by:
-
field out_dim:
int[Required]# Output dimension after splitting
- Validated by:
-
field final_window_size:
tuple[int,int,int] [Required]# Size of the window to split patches into
- Validated by:
- property in_dim_ratio: float#
- pydantic model vision_architectures.nets.swin_3d.Swin3DBlockConfig[source]#
Bases:
Attention3DWithMLPConfigShow JSON schema
{ "title": "Swin3DBlockConfig", "type": "object", "properties": { "dim": { "description": "Dim at which attention is performed", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.", "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "description": "Whether the logit scale is learnable.", "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "description": "Dropout probability for attention weights.", "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "description": "Dropout probability for the projection layer.", "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).", "title": "Max Attention Batch Size", "type": "integer" }, "rotary_position_embeddings_config": { "anyOf": [ { "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig" }, { "type": "null" } ], "default": null, "description": "Config for rotary position embeddings" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" }, "window_size": { "description": "Size of the window to apply attention over", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Window Size", "type": "array" }, "use_relative_position_bias": { "default": false, "description": "Whether to use relative position bias", "title": "Use Relative Position Bias", "type": "boolean" }, "patch_merging": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchMergingConfig" }, { "type": "null" } ], "default": null, "description": "Patch merging config if desired. Patch merging is applied before attention." }, "patch_splitting": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchSplittingConfig" }, { "type": "null" } ], "default": null, "description": "Patch splitting config if desired. Patch splitting is applied after attention." }, "in_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.", "title": "In Dim" }, "out_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.", "title": "Out Dim" } }, "$defs": { "RotaryPositionEmbeddings3DConfig": { "properties": { "dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Dimension of the position embeddings", "title": "Dim" }, "base": { "default": 10000.0, "description": "Base value for the exponent.", "title": "Base", "type": "number" }, "split": { "anyOf": [ { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "number" }, { "type": "number" }, { "type": "number" } ], "type": "array" }, { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "type": "array" } ], "default": [ 0.3333333333333333, 0.3333333333333333, 0.3333333333333333 ], "description": "Split of the position embeddings. If float, converted to int based on self.dim", "title": "Split" } }, "title": "RotaryPositionEmbeddings3DConfig", "type": "object" }, "Swin3DPatchMergingConfig": { "properties": { "in_dim": { "description": "Input dimension before merging", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after merging", "title": "Out Dim", "type": "integer" }, "merge_window_size": { "description": "Size of the window for merging patches", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Merge Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "merge_window_size" ], "title": "Swin3DPatchMergingConfig", "type": "object" }, "Swin3DPatchSplittingConfig": { "properties": { "in_dim": { "description": "Input dimension before splitting", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after splitting", "title": "Out Dim", "type": "integer" }, "final_window_size": { "description": "Size of the window to split patches into", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Final Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "final_window_size" ], "title": "Swin3DPatchSplittingConfig", "type": "object" } }, "required": [ "dim", "num_heads", "window_size" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
validate»all fields
-
field window_size:
tuple[int,int,int] [Required]# Size of the window to apply attention over
- Validated by:
-
field use_relative_position_bias:
bool= False# Whether to use relative position bias
- Validated by:
-
field patch_merging:
Swin3DPatchMergingConfig|None= None# Patch merging config if desired. Patch merging is applied before attention.
- Validated by:
-
field patch_splitting:
Swin3DPatchSplittingConfig|None= None# Patch splitting config if desired. Patch splitting is applied after attention.
- Validated by:
-
field in_dim:
int|None= None# Input dimension of the stage. Useful if
patch_mergingis used.- Validated by:
-
field dim:
int[Required]# Dim at which attention is performed
- Validated by:
-
field out_dim:
int|None= None# Output dimension of the stage. Useful if
patch_splittingis used.- Validated by:
- property spatial_compression_ratio#
- property out_dim_ratio: float#
- pydantic model vision_architectures.nets.swin_3d.Swin3DStageConfig[source]#
Bases:
Swin3DBlockConfigShow JSON schema
{ "title": "Swin3DStageConfig", "type": "object", "properties": { "dim": { "description": "Dim at which attention is performed", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.", "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "description": "Whether the logit scale is learnable.", "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "description": "Dropout probability for attention weights.", "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "description": "Dropout probability for the projection layer.", "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).", "title": "Max Attention Batch Size", "type": "integer" }, "rotary_position_embeddings_config": { "anyOf": [ { "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig" }, { "type": "null" } ], "default": null, "description": "Config for rotary position embeddings" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" }, "window_size": { "description": "Size of the window to apply attention over", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Window Size", "type": "array" }, "use_relative_position_bias": { "default": false, "description": "Whether to use relative position bias", "title": "Use Relative Position Bias", "type": "boolean" }, "patch_merging": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchMergingConfig" }, { "type": "null" } ], "default": null, "description": "Patch merging config if desired. Patch merging is applied before attention." }, "patch_splitting": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchSplittingConfig" }, { "type": "null" } ], "default": null, "description": "Patch splitting config if desired. Patch splitting is applied after attention." }, "in_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.", "title": "In Dim" }, "out_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.", "title": "Out Dim" }, "depth": { "description": "Number of transformer blocks in this stage", "title": "Depth", "type": "integer" } }, "$defs": { "RotaryPositionEmbeddings3DConfig": { "properties": { "dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Dimension of the position embeddings", "title": "Dim" }, "base": { "default": 10000.0, "description": "Base value for the exponent.", "title": "Base", "type": "number" }, "split": { "anyOf": [ { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "number" }, { "type": "number" }, { "type": "number" } ], "type": "array" }, { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "type": "array" } ], "default": [ 0.3333333333333333, 0.3333333333333333, 0.3333333333333333 ], "description": "Split of the position embeddings. If float, converted to int based on self.dim", "title": "Split" } }, "title": "RotaryPositionEmbeddings3DConfig", "type": "object" }, "Swin3DPatchMergingConfig": { "properties": { "in_dim": { "description": "Input dimension before merging", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after merging", "title": "Out Dim", "type": "integer" }, "merge_window_size": { "description": "Size of the window for merging patches", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Merge Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "merge_window_size" ], "title": "Swin3DPatchMergingConfig", "type": "object" }, "Swin3DPatchSplittingConfig": { "properties": { "in_dim": { "description": "Input dimension before splitting", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after splitting", "title": "Out Dim", "type": "integer" }, "final_window_size": { "description": "Size of the window to split patches into", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Final Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "final_window_size" ], "title": "Swin3DPatchSplittingConfig", "type": "object" } }, "required": [ "dim", "num_heads", "window_size", "depth" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
-
field depth:
int[Required]# Number of transformer blocks in this stage
- Validated by:
- pydantic model vision_architectures.nets.swin_3d.Swin3DEncoderDecoderConfig[source]#
Bases:
CustomBaseModelShow JSON schema
{ "title": "Swin3DEncoderDecoderConfig", "type": "object", "properties": { "stages": { "items": { "$ref": "#/$defs/Swin3DStageConfig" }, "title": "Stages", "type": "array" } }, "$defs": { "RotaryPositionEmbeddings3DConfig": { "properties": { "dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Dimension of the position embeddings", "title": "Dim" }, "base": { "default": 10000.0, "description": "Base value for the exponent.", "title": "Base", "type": "number" }, "split": { "anyOf": [ { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "number" }, { "type": "number" }, { "type": "number" } ], "type": "array" }, { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "type": "array" } ], "default": [ 0.3333333333333333, 0.3333333333333333, 0.3333333333333333 ], "description": "Split of the position embeddings. If float, converted to int based on self.dim", "title": "Split" } }, "title": "RotaryPositionEmbeddings3DConfig", "type": "object" }, "Swin3DPatchMergingConfig": { "properties": { "in_dim": { "description": "Input dimension before merging", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after merging", "title": "Out Dim", "type": "integer" }, "merge_window_size": { "description": "Size of the window for merging patches", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Merge Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "merge_window_size" ], "title": "Swin3DPatchMergingConfig", "type": "object" }, "Swin3DPatchSplittingConfig": { "properties": { "in_dim": { "description": "Input dimension before splitting", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after splitting", "title": "Out Dim", "type": "integer" }, "final_window_size": { "description": "Size of the window to split patches into", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Final Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "final_window_size" ], "title": "Swin3DPatchSplittingConfig", "type": "object" }, "Swin3DStageConfig": { "properties": { "dim": { "description": "Dim at which attention is performed", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.", "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "description": "Whether the logit scale is learnable.", "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "description": "Dropout probability for attention weights.", "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "description": "Dropout probability for the projection layer.", "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).", "title": "Max Attention Batch Size", "type": "integer" }, "rotary_position_embeddings_config": { "anyOf": [ { "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig" }, { "type": "null" } ], "default": null, "description": "Config for rotary position embeddings" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" }, "window_size": { "description": "Size of the window to apply attention over", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Window Size", "type": "array" }, "use_relative_position_bias": { "default": false, "description": "Whether to use relative position bias", "title": "Use Relative Position Bias", "type": "boolean" }, "patch_merging": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchMergingConfig" }, { "type": "null" } ], "default": null, "description": "Patch merging config if desired. Patch merging is applied before attention." }, "patch_splitting": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchSplittingConfig" }, { "type": "null" } ], "default": null, "description": "Patch splitting config if desired. Patch splitting is applied after attention." }, "in_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.", "title": "In Dim" }, "out_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.", "title": "Out Dim" }, "depth": { "description": "Number of transformer blocks in this stage", "title": "Depth", "type": "integer" } }, "required": [ "dim", "num_heads", "window_size", "depth" ], "title": "Swin3DStageConfig", "type": "object" } }, "required": [ "stages" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
validate»all fields
-
field stages:
list[Swin3DStageConfig] [Required]# - Validated by:
- pydantic model vision_architectures.nets.swin_3d.Swin3DEncoderWithPatchEmbeddingsConfig[source]#
Bases:
Swin3DEncoderDecoderConfigShow JSON schema
{ "title": "Swin3DEncoderWithPatchEmbeddingsConfig", "type": "object", "properties": { "stages": { "items": { "$ref": "#/$defs/Swin3DStageConfig" }, "title": "Stages", "type": "array" }, "in_channels": { "description": "Number of input channels in the input image/video", "title": "In Channels", "type": "integer" }, "patch_size": { "description": "Size of the patches to be extracted from the input image/video", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Patch Size", "type": "array" }, "image_size": { "anyOf": [ { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "type": "array" }, { "type": "null" } ], "default": null, "description": "Size of the input image/video. Required if absolute position embeddings are learnable.", "title": "Image Size" }, "use_absolute_position_embeddings": { "default": true, "description": "Whether to use absolute position embeddings.", "title": "Use Absolute Position Embeddings", "type": "boolean" }, "learnable_absolute_position_embeddings": { "default": false, "description": "Whether to use learnable absolute position embeddings.", "title": "Learnable Absolute Position Embeddings", "type": "boolean" } }, "$defs": { "RotaryPositionEmbeddings3DConfig": { "properties": { "dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Dimension of the position embeddings", "title": "Dim" }, "base": { "default": 10000.0, "description": "Base value for the exponent.", "title": "Base", "type": "number" }, "split": { "anyOf": [ { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "number" }, { "type": "number" }, { "type": "number" } ], "type": "array" }, { "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "type": "array" } ], "default": [ 0.3333333333333333, 0.3333333333333333, 0.3333333333333333 ], "description": "Split of the position embeddings. If float, converted to int based on self.dim", "title": "Split" } }, "title": "RotaryPositionEmbeddings3DConfig", "type": "object" }, "Swin3DPatchMergingConfig": { "properties": { "in_dim": { "description": "Input dimension before merging", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after merging", "title": "Out Dim", "type": "integer" }, "merge_window_size": { "description": "Size of the window for merging patches", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Merge Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "merge_window_size" ], "title": "Swin3DPatchMergingConfig", "type": "object" }, "Swin3DPatchSplittingConfig": { "properties": { "in_dim": { "description": "Input dimension before splitting", "title": "In Dim", "type": "integer" }, "out_dim": { "description": "Output dimension after splitting", "title": "Out Dim", "type": "integer" }, "final_window_size": { "description": "Size of the window to split patches into", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Final Window Size", "type": "array" } }, "required": [ "in_dim", "out_dim", "final_window_size" ], "title": "Swin3DPatchSplittingConfig", "type": "object" }, "Swin3DStageConfig": { "properties": { "dim": { "description": "Dim at which attention is performed", "title": "Dim", "type": "integer" }, "num_heads": { "description": "Number of query heads", "title": "Num Heads", "type": "integer" }, "ratio_q_to_kv_heads": { "default": 1, "description": "Ratio of query heads to key/value heads. Useful for MQA/GQA.", "title": "Ratio Q To Kv Heads", "type": "integer" }, "logit_scale_learnable": { "default": false, "description": "Whether the logit scale is learnable.", "title": "Logit Scale Learnable", "type": "boolean" }, "attn_drop_prob": { "default": 0.0, "description": "Dropout probability for attention weights.", "title": "Attn Drop Prob", "type": "number" }, "proj_drop_prob": { "default": 0.0, "description": "Dropout probability for the projection layer.", "title": "Proj Drop Prob", "type": "number" }, "max_attention_batch_size": { "default": -1, "description": "Runs attention by splitting the inputs into chunks of this size. 0 means no chunking. Useful for large inputs during inference. (This happens along batch dimension).", "title": "Max Attention Batch Size", "type": "integer" }, "rotary_position_embeddings_config": { "anyOf": [ { "$ref": "#/$defs/RotaryPositionEmbeddings3DConfig" }, { "type": "null" } ], "default": null, "description": "Config for rotary position embeddings" }, "mlp_ratio": { "default": 4, "description": "Ratio of the hidden dimension in the MLP to the input dimension.", "title": "Mlp Ratio", "type": "integer" }, "activation": { "default": "gelu", "description": "Activation function for the MLP.", "title": "Activation", "type": "string" }, "mlp_drop_prob": { "default": 0.0, "description": "Dropout probability for the MLP.", "title": "Mlp Drop Prob", "type": "number" }, "norm_location": { "default": "post", "description": "Location of the normalization layer in the attention block. Pre-normalization implies normalization before the attention operation, while post-normalization applies it after.", "enum": [ "pre", "post" ], "title": "Norm Location", "type": "string" }, "layer_norm_eps": { "default": 1e-06, "description": "Epsilon value for the layer normalization.", "title": "Layer Norm Eps", "type": "number" }, "window_size": { "description": "Size of the window to apply attention over", "maxItems": 3, "minItems": 3, "prefixItems": [ { "type": "integer" }, { "type": "integer" }, { "type": "integer" } ], "title": "Window Size", "type": "array" }, "use_relative_position_bias": { "default": false, "description": "Whether to use relative position bias", "title": "Use Relative Position Bias", "type": "boolean" }, "patch_merging": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchMergingConfig" }, { "type": "null" } ], "default": null, "description": "Patch merging config if desired. Patch merging is applied before attention." }, "patch_splitting": { "anyOf": [ { "$ref": "#/$defs/Swin3DPatchSplittingConfig" }, { "type": "null" } ], "default": null, "description": "Patch splitting config if desired. Patch splitting is applied after attention." }, "in_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Input dimension of the stage. Useful if ``patch_merging`` is used.", "title": "In Dim" }, "out_dim": { "anyOf": [ { "type": "integer" }, { "type": "null" } ], "default": null, "description": "Output dimension of the stage. Useful if ``patch_splitting`` is used.", "title": "Out Dim" }, "depth": { "description": "Number of transformer blocks in this stage", "title": "Depth", "type": "integer" } }, "required": [ "dim", "num_heads", "window_size", "depth" ], "title": "Swin3DStageConfig", "type": "object" } }, "required": [ "stages", "in_channels", "patch_size" ] }
- Config:
arbitrary_types_allowed: bool = True
extra: str = ignore
validate_default: bool = True
validate_assignment: bool = True
validate_return: bool = True
- Fields:
- Validators:
validate»all fields
-
field in_channels:
int[Required]# Number of input channels in the input image/video
- Validated by:
-
field patch_size:
tuple[int,int,int] [Required]# Size of the patches to be extracted from the input image/video
- Validated by:
-
field image_size:
tuple[int,int,int] |None= None# Size of the input image/video. Required if absolute position embeddings are learnable.
- Validated by:
-
field use_absolute_position_embeddings:
bool= True# Whether to use absolute position embeddings.
- Validated by:
-
field learnable_absolute_position_embeddings:
bool= False# Whether to use learnable absolute position embeddings.
- Validated by:
- class vision_architectures.nets.swin_3d.Swin3DLayer(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
ModuleSwin 3D Layer applying windowed attention with optional relative position embeddings. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the Swin3DLayer.
- Parameters:
config (
RelativePositionEmbeddings3DConfig|Attention3DWithMLPConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- class vision_architectures.nets.swin_3d.Swin3DBlock(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
ModuleSwin 3D Block consisting of two Swin3DLayers: one with regular windows and one with shifted windows. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the Swin3DBlock.
- Parameters:
config (
Swin3DBlockConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(hidden_states, channels_first=True, return_intermediates=False)[source]#
Apply window attention and shifted window attention on the input features.
- Parameters:
hidden_states (
Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).return_intermediates (
bool) – Return intermediate outputs such as layer/block/stage outputs.
- Return type:
Tensor|tuple[Tensor,list[Tensor]]- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns a list of intermediate layer outputs. Note that the intermediate layer outputs returned will always be in
channels_lastformat.
- class vision_architectures.nets.swin_3d.Swin3DPatchMerging(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
ModulePatch merging layer for Swin3D. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize the Swin3DPatchMerging layer.
- Parameters:
config (
Swin3DPatchMergingConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(hidden_states, channels_first=True)[source]#
Merge multiple patches into a single patch.
- Parameters:
hidden_states (
Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.nets.swin_3d.Swin3DPatchSplitting(config, checkpointing_level=0, **kwargs)[source]#
Bases:
ModulePatch splitting layer for Swin3D. This class is designed for 3D input eg. medical images, videos etc.
This is a self-implemented class and is not part of the paper.
- __init__(config, checkpointing_level=0, **kwargs)[source]#
Initialize the Swin3DPatchSplitting layer.
- Parameters:
config (
Swin3DPatchSplittingConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(hidden_states, channels_first=True)[source]#
Split patches into multiple patches.
- Parameters:
hidden_states (
Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).
- Return type:
Tensor- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.
- class vision_architectures.nets.swin_3d.Swin3DStage(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
ModuleSwin3D stage for Swin3D. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initialize the Swin3DStage.
- Parameters:
config (
Swin3DStageConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(hidden_states, channels_first=True, return_intermediates=False)[source]#
Merge patches if applicable (used by the encoder), perform a series of window and shifted window attention, and then split patches if applicable (used by the decoder).
- Parameters:
hidden_states (
Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).return_intermediates (
bool) – Return intermediate outputs such as layer/block/stage outputs.
- Return type:
Tensor- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns a list of intermediate layer outputs. Note that the intermediate layer outputs returned will always be in
channels_lastformat.
- class vision_architectures.nets.swin_3d.Swin3DEncoderDecoderBase(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module,PyTorchModelHubMixin- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the Swin3DEncoder/Swin3DDecoder.
- Parameters:
config (
Swin3DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(hidden_states, channels_first=True, return_intermediates=False)[source]#
Encodes the input features using the Swin Transformer hierarchy.
- Parameters:
hidden_states (
Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).return_intermediates (
bool) – Return intermediate outputs such as layer/block/stage outputs.
- Return type:
Tensor- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns a list of intermediate layer outputs. Note that the intermediate layer outputs returned will always be in
channels_lastformat.
- class vision_architectures.nets.swin_3d.Swin3DEncoder(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Swin3DEncoderDecoderBase3D Swin Transformer encoder. Assumes input has already been patchified/tokenized. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the Swin3DEncoder/Swin3DDecoder.
- Parameters:
config (
Swin3DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- class vision_architectures.nets.swin_3d.Swin3DDecoder(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Swin3DEncoderDecoderBase3D Swin Transformer decoder. Assumes input has already been patchified/tokenized. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the Swin3DEncoder/Swin3DDecoder.
- Parameters:
config (
Swin3DEncoderDecoderConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- class vision_architectures.nets.swin_3d.Swin3DEncoderWithPatchEmbeddings(config={}, checkpointing_level=0, **kwargs)[source]#
Bases:
Module,PyTorchModelHubMixin3D Swin transformer with 3D patch embeddings. This class is designed for 3D input eg. medical images, videos etc.
- __init__(config={}, checkpointing_level=0, **kwargs)[source]#
Initializes the Swin3DEncoderWithPatchEmbeddings.
- Parameters:
config (
Swin3DEncoderWithPatchEmbeddingsConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.checkpointing_level (
int) – The level of checkpointing to use for activation checkpointing. Refer toActivationCheckpointingfor more details.**kwargs – Additional keyword arguments for configuration.
- forward(pixel_values, spacings=None, crop_offsets=None, channels_first=True, return_intermediates=False)[source]#
Patchify the input pixel values and then pass it through the Swin transformer.
- Parameters:
pixel_values (
Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.spacings (
Optional[Tensor]) – Spacing information of shape (B, 3) of the input features.crop_offsets (
Optional[Tensor]) – Used if the embeddings required are of a crop of a larger image. If provided, the grid coordinates will be offset accordingly.channels_first (
bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).return_intermediates (
bool) – Return intermediate outputs such as layer/block/stage outputs.
- Return type:
Tensor|tuple[Tensor,list[Tensor],list[Tensor]]- Returns:
Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.. If return_intermediates is True, also returns the intermediate stage outputs and layer outputs.