Embeddings#

vision_architectures.layers.embeddings.get_absolute_position_embeddings_3d(dim, grid_size, spacing=(1.0, 1.0, 1.0), crop_offset=None, channels_first=True)[source]#

Get 3D sinusoidal position embeddings.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 6.

  • grid_size (tuple[int, int, int]) – Size of the patch grid (d, h, w).

  • spacing (tuple[float, float, float]) – Spacing between patches in each dimension. Useful for medical images.

  • crop_offset (Optional[tuple[int, int, int]]) – Used if the embeddings required are of a crop of a larger image. If provided, the grid coordinates will be offset accordingly.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

vision_architectures.layers.embeddings.get_timestep_embeddings_1d(dim, indices)[source]#

Get 1D sinusoidal position embeddings for specific indices.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 2.

  • indices (Tensor) – Indices for which to get the embeddings. Shape: (length,).

Return type:

Tensor

Returns:

A tensor of shape (1, length, dim) containing the position embeddings.

vision_architectures.layers.embeddings.get_all_timestep_embeddings_1d(dim, length, device=device(type='cpu'))[source]#

Get 1D sinusoidal position embeddings.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 2.

  • length (int) – Length of the sequence.

  • device – Device to create the embeddings on.

Return type:

Tensor

Returns:

A tensor of shape (1, length, dim) containing the position embeddings.

vision_architectures.layers.embeddings.get_absolute_position_embeddings_1d(dim, length, device=device(type='cpu'))[source]#

Get 1D sinusoidal position embeddings.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 2.

  • length (int) – Length of the sequence.

  • device – Device to create the embeddings on.

Return type:

Tensor

Returns:

A tensor of shape (1, length, dim) containing the position embeddings.

pydantic model vision_architectures.layers.embeddings.RelativePositionEmbeddings3DConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "RelativePositionEmbeddings3DConfig",
   "type": "object",
   "properties": {
      "num_heads": {
         "description": "Number of query attention heads",
         "title": "Num Heads",
         "type": "integer"
      },
      "grid_size": {
         "description": "Size of entire patch matrix.",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Grid Size",
         "type": "array"
      }
   },
   "required": [
      "num_heads",
      "grid_size"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field num_heads: int [Required]#

Number of query attention heads

Validated by:
field grid_size: tuple[int, int, int] [Required]#

Size of entire patch matrix.

Validated by:
property num_patches: int#

Number of patches.

validator validate_before  »  all fields[source]#

Base class method for validating data before creating the model.

validator validate  »  all fields[source]#

Base method for validating the model after creation.

pydantic model vision_architectures.layers.embeddings.AbsolutePositionEmbeddings3DConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "AbsolutePositionEmbeddings3DConfig",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Dimension of the position embeddings",
         "title": "Dim"
      },
      "grid_size": {
         "anyOf": [
            {
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Size of entire patch matrix.",
         "title": "Grid Size"
      },
      "learnable": {
         "default": false,
         "description": "Whether the position embeddings are learnable.",
         "title": "Learnable",
         "type": "boolean"
      }
   }
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field dim: int | None = None#

Dimension of the position embeddings

Validated by:
field grid_size: tuple[int, int, int] | None = None#

Size of entire patch matrix.

Validated by:
field learnable: bool = False#

Whether the position embeddings are learnable.

Validated by:
property num_patches: int#

Number of patches.

validator validate_before  »  all fields[source]#

Base class method for validating data before creating the model.

validator validate  »  all fields[source]#

Base method for validating the model after creation.

pydantic model vision_architectures.layers.embeddings.AbsolutePositionEmbeddings1DConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "AbsolutePositionEmbeddings1DConfig",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Dimension of the position embeddings",
         "title": "Dim"
      },
      "length": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Length of the sequence.",
         "title": "Length"
      },
      "learnable": {
         "default": false,
         "description": "Whether the position embeddings are learnable.",
         "title": "Learnable",
         "type": "boolean"
      }
   }
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field dim: int | None = None#

Dimension of the position embeddings

Validated by:
field length: int | None = None#

Length of the sequence.

Validated by:
field learnable: bool = False#

Whether the position embeddings are learnable.

Validated by:
validator validate  »  all fields[source]#

Base method for validating the model after creation.

pydantic model vision_architectures.layers.embeddings.RotaryPositionEmbeddings1DConfig[source]#

Bases: CustomBaseModel

Show JSON schema
{
   "title": "RotaryPositionEmbeddings1DConfig",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Dimension of the position embeddings",
         "title": "Dim"
      },
      "base": {
         "default": 10000.0,
         "description": "Base value for the exponent.",
         "title": "Base",
         "type": "number"
      }
   }
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field dim: int | None = None#

Dimension of the position embeddings

Validated by:
field base: float = 10000.0#

Base value for the exponent.

Validated by:
pydantic model vision_architectures.layers.embeddings.RotaryPositionEmbeddings3DConfig[source]#

Bases: RotaryPositionEmbeddings1DConfig

Show JSON schema
{
   "title": "RotaryPositionEmbeddings3DConfig",
   "type": "object",
   "properties": {
      "dim": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Dimension of the position embeddings",
         "title": "Dim"
      },
      "base": {
         "default": 10000.0,
         "description": "Base value for the exponent.",
         "title": "Base",
         "type": "number"
      },
      "split": {
         "anyOf": [
            {
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "number"
                  },
                  {
                     "type": "number"
                  },
                  {
                     "type": "number"
                  }
               ],
               "type": "array"
            },
            {
               "maxItems": 3,
               "minItems": 3,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            }
         ],
         "default": [
            0.3333333333333333,
            0.3333333333333333,
            0.3333333333333333
         ],
         "description": "Split of the position embeddings. If float, converted to int based on self.dim",
         "title": "Split"
      }
   }
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field split: tuple[float, float, float] | tuple[int, int, int] = (0.3333333333333333, 0.3333333333333333, 0.3333333333333333)#

Split of the position embeddings. If float, converted to int based on self.dim

Validated by:
get_split_as_ints(dim)[source]#
validator validate  »  all fields[source]#

Base method for validating the model after creation.

pydantic model vision_architectures.layers.embeddings.PatchEmbeddings3DConfig[source]#

Bases: CNNBlockConfig

Show JSON schema
{
   "title": "PatchEmbeddings3DConfig",
   "type": "object",
   "properties": {
      "in_channels": {
         "description": "Number of input channels.",
         "title": "In Channels",
         "type": "integer"
      },
      "out_channels": {
         "default": null,
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": null,
         "title": "Kernel Size",
         "type": "null"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
         "title": "Padding"
      },
      "stride": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": 1,
         "description": "Stride for the convolution",
         "title": "Stride"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the convolution layer",
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "batchnorm3d",
         "description": "Normalization layer type.",
         "title": "Normalization"
      },
      "normalization_pre_args": {
         "default": [],
         "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "description": "Arguments for the normalization layer after providing the dimension.",
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the normalization layer",
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "relu",
         "description": "Activation function type.",
         "title": "Activation"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the activation function.",
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "description": "Sequence of operations in the block.",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "description": "Dropout probability.",
         "title": "Drop Prob",
         "type": "number"
      },
      "patch_size": {
         "description": "Size of the patches to extract from the input.",
         "maxItems": 3,
         "minItems": 3,
         "prefixItems": [
            {
               "type": "integer"
            },
            {
               "type": "integer"
            },
            {
               "type": "integer"
            }
         ],
         "title": "Patch Size",
         "type": "array"
      },
      "dim": {
         "description": "Dimension of the embeddings.",
         "title": "Dim",
         "type": "integer"
      },
      "norm_layer": {
         "default": "layernorm",
         "description": "Normalization layer to use.",
         "title": "Norm Layer",
         "type": "string"
      }
   },
   "required": [
      "in_channels",
      "patch_size",
      "dim"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field patch_size: tuple[int, int, int] [Required]#

Size of the patches to extract from the input.

Validated by:
field in_channels: int [Required]#

Number of input channels.

Validated by:
field dim: int [Required]#

Dimension of the embeddings.

Validated by:
field norm_layer: str = 'layernorm'#

Normalization layer to use.

Validated by:
field out_channels: None = None#
Validated by:
field kernel_size: None = None#
Validated by:
validator validate_before  »  all fields[source]#

Base class method for validating data before creating the model.

vision_architectures.layers.embeddings.get_coords_grid(grid_size)[source]#

Get a coordinate grid of shape (3, d, h, w) for a given grid size.

Parameters:

grid_size (tuple[int, int, int]) – Size of the grid (d, h, w).

Return type:

Tensor

Returns:

A tensor of shape (3, d, h, w) containing the coordinates.

class vision_architectures.layers.embeddings.RelativePositionEmbeddings3D(config={}, **kwargs)[source]#

Bases: Module

Learnable 3D Relative Position Embeddings. This can be passed directly to the attention layers. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, **kwargs)[source]#

Initialize RelativePositionEmbeddings3D.

Parameters:
  • config (RelativePositionEmbeddings3DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • **kwargs – Additional keyword arguments for configuration.

forward()[source]#

Get relative position embeddings as specified by the config.

Return type:

Tensor

Returns:

A tensor of shape (1, num_heads, num_patches, num_patches) containing the relative position embeddings.

class vision_architectures.layers.embeddings.RelativePositionEmbeddings3DMetaNetwork(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

3D Relative Position Embeddings obtained from a meta network (inspired by SwinV2). This can be passed directly to the attention layers. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize RelativePositionEmbeddings3DMetaNetwork.

Parameters:
  • config (RelativePositionEmbeddings3DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

get_relative_position_embeddings_table()[source]#

Get the relative position embeddings table from the meta network.

Return type:

Tensor

Returns:

A tensor of shape (num_patches, num_heads) containing the relative position embeddings table.

forward()[source]#

Get relative position embeddings as specified by the config.

Return type:

Tensor

Returns:

A tensor of shape (num_heads, num_patches, num_patches) containing the relative position embeddings.

vision_architectures.layers.embeddings.get_sinusoidal_embeddings_3d(dim, grid_size, spacing=(1.0, 1.0, 1.0), crop_offset=None, channels_first=True)[source]#

Get 3D sinusoidal position embeddings.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 6.

  • grid_size (tuple[int, int, int]) – Size of the patch grid (d, h, w).

  • spacing (tuple[float, float, float]) – Spacing between patches in each dimension. Useful for medical images.

  • crop_offset (Optional[tuple[int, int, int]]) – Used if the embeddings required are of a crop of a larger image. If provided, the grid coordinates will be offset accordingly.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

class vision_architectures.layers.embeddings.AbsolutePositionEmbeddings3D(config={}, **kwargs)[source]#

Bases: Module

3D Absolute Position Embeddings. May or may not learnable. These have to be applied on the input manually and cannot be passed to attention layers directly. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, **kwargs)[source]#

Initialize AbsolutePositionEmbeddings3D.

Parameters:
  • config (AbsolutePositionEmbeddings3DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • **kwargs – Additional keyword arguments for configuration.

forward(x, embedding_type='add', spacings=None, channels_first=True, crop_offsets=None)[source]#

Apply absolute position embeddings to the input tensor.

Parameters:
  • x (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • embedding_type (Literal['add', 'concat']) – Type of embedding to apply. ‘add’ to add the position embeddings to the input, ‘concat’ to concatenate them along the channel dimension.

  • spacings (Optional[Tensor]) – Spacing information of shape (B, 3) of the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

  • crop_offsets (Optional[Tensor]) – Used if the embeddings required are of a crop of a larger image. If provided, the grid coordinates will be offset accordingly.

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

vision_architectures.layers.embeddings.get_specific_sinusoidal_embeddings_1d(dim, indices)[source]#

Get 1D sinusoidal position embeddings for specific indices.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 2.

  • indices (Tensor) – Indices for which to get the embeddings. Shape: (length,).

Return type:

Tensor

Returns:

A tensor of shape (1, length, dim) containing the position embeddings.

vision_architectures.layers.embeddings.get_sinusoidal_embeddings_1d(dim, length, device=device(type='cpu'))[source]#

Get 1D sinusoidal position embeddings.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 2.

  • length (int) – Length of the sequence.

  • device – Device to create the embeddings on.

Return type:

Tensor

Returns:

A tensor of shape (1, length, dim) containing the position embeddings.

class vision_architectures.layers.embeddings.AbsolutePositionEmbeddings1D(config={}, **kwargs)[source]#

Bases: Module

1D Absolute Position Embeddings. May or may not learnable. These have to be applied on the input manually and cannot be passed to attention layers directly. This class is designed for 1D input eg. language, patchified images, etc.

__init__(config={}, **kwargs)[source]#

Initialize AbsolutePositionEmbeddings1D.

Parameters:
  • config (AbsolutePositionEmbeddings1DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • **kwargs – Additional keyword arguments for configuration.

forward(x, embedding_type='add')[source]#

Apply absolute position embeddings to the input tensor.

Parameters:
  • x (Tensor) – Tensor of shape (B, T, C) representing the input features.

  • embedding_type (Literal['add', 'concat']) – Type of embedding to apply. ‘add’ to add the position embeddings to the input, ‘concat’ to concatenate them along the last dimension.

Return type:

Tensor

Returns:

Tensor of shape (B, T, C) representing the output features.

vision_architectures.layers.embeddings.get_rope_rotation_coefficients_1d(dim, length, base=10000.0)[source]#

Get 1D RoPE cos and sin rotation coefficients.

Parameters:
  • dim (int) – Embedding dimension. Must be divisible by 2.

  • length (int) – Length of the sequence.

  • base (float) – Base value to use for the rotation coefficients.

Return type:

tuple[Tensor, Tensor]

Returns:

A tuple of tensors containing the cos and sin rotation coefficients.

class vision_architectures.layers.embeddings.RotaryPositionEmbeddings1D(config={}, **kwargs)[source]#

Bases: Module

1D Rotary Position Embeddings. This class is designed for 1D input eg. language, patchified images, etc.

__init__(config={}, **kwargs)[source]#

Initialize RotaryPositionEmbeddings1D.

Parameters:
  • config (RotaryPositionEmbeddings1DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • **kwargs – Additional keyword arguments for configuration.

static get_rotation_coefficients(dim, length, device, dtype=<class 'torch.dtype'>)#
Return type:

tuple[Tensor, Tensor]

static rearrange_for_sin_coefficients(x)[source]#

Split the tensor into pairs along the last axis, flip each pair’s order, and then negate the first element. That is, for an input tensor [a, b, c, d], the output will be [-b, a, -d, c].

Parameters:

x (Tensor) – Input tensor with last dimension dim

Return type:

Tensor

Returns:

Rearranged tensor

apply_rope(x, cos, sin)[source]#

Apply 1D Rotary Position Embeddings to the given tensor.

Parameters:
  • x (Tensor) – Input tensor with last dimension dim

  • cos (Tensor) – Cosine rotation coefficients

  • sin (Tensor) – Sine rotation coefficients

Return type:

Tensor

Returns:

Tensor after applying 1D Rotary Position Embeddings

forward(x)[source]#

Apply 1D Rotary Position Embeddings.

Parameters:

x (Tensor) – Tensor of shape (B, T, C) representing the input features.

Return type:

Tensor

Returns:

Tensor of shape (B, T, C) representing the output features.

extra_repr()[source]#

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

class vision_architectures.layers.embeddings.RotaryPositionEmbeddings3D(config={}, **kwargs)[source]#

Bases: RotaryPositionEmbeddings1D

3D Rotary Position Embeddings. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, **kwargs)[source]#

Initialize RotaryPositionEmbeddings1D.

Parameters:
  • config (RotaryPositionEmbeddings3DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • **kwargs – Additional keyword arguments for configuration.

apply_rope(x, cos, sin, axis)[source]#

Apply 1D Rotary Position Embeddings to the given tensor specific to partixcular axis.

Parameters:
  • x (Tensor) – Input tensor with last dimension dim

  • cos (Tensor) – Cosine rotation coefficients

  • sin (Tensor) – Sine rotation coefficients

  • axis (int) – Axis which corresponds to the current dimension

Return type:

Tensor

Returns:

Tensor after applying 1D Rotary Position Embeddings

forward(x, channels_first=True)[source]#

Apply 3D Rotary Position Embeddings.

Parameters:
  • x (Tensor) – Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the input features.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

Tensor

Returns:

Tensor of shape (B, C, Z, Y, X) or (B, Z, Y, X, C) representing the output features.

extra_repr()[source]#

Return the extra representation of the module.

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

class vision_architectures.layers.embeddings.PatchEmbeddings3D(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: CNNBlock3D

3D Patch Embeddings using a convolutional layer. This class is designed for 3D input eg. medical images, videos etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize PatchEmbeddings3D.

Parameters:
  • config (PatchEmbeddings3DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.