UperNet2D#

pydantic model vision_architectures.nets.upernet_2d.UPerNet2DFusionConfig[source]#

Bases: CNNBlockConfig

Show JSON schema
{
   "title": "UPerNet2DFusionConfig",
   "type": "object",
   "properties": {
      "in_channels": {
         "default": null,
         "description": "Calculated based on other parameters",
         "title": "In Channels",
         "type": "null"
      },
      "out_channels": {
         "default": null,
         "description": "Calculated based on other parameters",
         "title": "Out Channels",
         "type": "null"
      },
      "kernel_size": {
         "default": 3,
         "description": "Kernel size for the convolutional layers",
         "title": "Kernel Size",
         "type": "integer"
      },
      "padding": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            },
            {
               "type": "string"
            }
         ],
         "default": "same",
         "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
         "title": "Padding"
      },
      "stride": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "items": {
                  "type": "integer"
               },
               "type": "array"
            }
         ],
         "default": 1,
         "description": "Stride for the convolution",
         "title": "Stride"
      },
      "conv_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the convolution layer",
         "title": "Conv Kwargs",
         "type": "object"
      },
      "transposed": {
         "default": false,
         "description": "Whether to perform ConvTranspose instead of Conv",
         "title": "Transposed",
         "type": "boolean"
      },
      "normalization": {
         "default": "batchnorm2d",
         "title": "Normalization",
         "type": "string"
      },
      "normalization_pre_args": {
         "default": [],
         "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
         "items": {},
         "title": "Normalization Pre Args",
         "type": "array"
      },
      "normalization_post_args": {
         "default": [],
         "description": "Arguments for the normalization layer after providing the dimension.",
         "items": {},
         "title": "Normalization Post Args",
         "type": "array"
      },
      "normalization_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the normalization layer",
         "title": "Normalization Kwargs",
         "type": "object"
      },
      "activation": {
         "anyOf": [
            {
               "type": "string"
            },
            {
               "type": "null"
            }
         ],
         "default": "relu",
         "description": "Activation function type.",
         "title": "Activation"
      },
      "activation_kwargs": {
         "additionalProperties": true,
         "default": {},
         "description": "Additional keyword arguments for the activation function.",
         "title": "Activation Kwargs",
         "type": "object"
      },
      "sequence": {
         "default": "CNA",
         "description": "Sequence of operations in the block.",
         "enum": [
            "C",
            "AC",
            "CA",
            "CD",
            "CN",
            "DC",
            "NC",
            "ACD",
            "ACN",
            "ADC",
            "ANC",
            "CAD",
            "CAN",
            "CDA",
            "CDN",
            "CNA",
            "CND",
            "DAC",
            "DCA",
            "DCN",
            "DNC",
            "NAC",
            "NCA",
            "NCD",
            "NDC",
            "ACDN",
            "ACND",
            "ADCN",
            "ADNC",
            "ANCD",
            "ANDC",
            "CADN",
            "CAND",
            "CDAN",
            "CDNA",
            "CNAD",
            "CNDA",
            "DACN",
            "DANC",
            "DCAN",
            "DCNA",
            "DNAC",
            "DNCA",
            "NACD",
            "NADC",
            "NCAD",
            "NCDA",
            "NDAC",
            "NDCA"
         ],
         "title": "Sequence",
         "type": "string"
      },
      "drop_prob": {
         "default": 0.0,
         "description": "Dropout probability.",
         "title": "Drop Prob",
         "type": "number"
      },
      "num_features": {
         "description": "Number of input feature maps",
         "title": "Num Features",
         "type": "integer"
      },
      "dim": {
         "description": "Dimension of the fused feature map",
         "title": "Dim",
         "type": "integer"
      },
      "fused_shape": {
         "anyOf": [
            {
               "maxItems": 2,
               "minItems": 2,
               "prefixItems": [
                  {
                     "type": "integer"
                  },
                  {
                     "type": "integer"
                  }
               ],
               "type": "array"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Shape of the fused feature map. It can also be provided during runtime. If None, highest input resolution is used.",
         "title": "Fused Shape"
      },
      "interpolation_mode": {
         "default": "bilinear",
         "description": "Interpolation mode for the FPN block.",
         "title": "Interpolation Mode",
         "type": "string"
      }
   },
   "required": [
      "num_features",
      "dim"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:

field num_features: int [Required]#

Number of input feature maps

Validated by:
field kernel_size: int = 3#

Kernel size for the convolutional layers

Validated by:
field dim: int [Required]#

Dimension of the fused feature map

Validated by:
field fused_shape: tuple[int, int] | None = None#

Shape of the fused feature map. It can also be provided during runtime. If None, highest input resolution is used.

Validated by:
field interpolation_mode: str = 'bilinear'#

Interpolation mode for the FPN block.

Validated by:
field normalization: str = 'batchnorm2d'#
Validated by:
field in_channels: None = None#

Calculated based on other parameters

Validated by:
field out_channels: None = None#

Calculated based on other parameters

Validated by:
pydantic model vision_architectures.nets.upernet_2d.UPerNet2DConfig[source]#

Bases: FPN2DConfig

Show JSON schema
{
   "title": "UPerNet2DConfig",
   "type": "object",
   "properties": {
      "blocks": {
         "description": "List of configs for the FPN blocks.",
         "items": {
            "$ref": "#/$defs/FPN2DBlockConfig"
         },
         "title": "Blocks",
         "type": "array"
      },
      "fusion": {
         "$ref": "#/$defs/UPerNet2DFusionConfig",
         "description": "Configuration for the UPerNet2D fusion block"
      },
      "enabled_outputs": {
         "default": [
            "object"
         ],
         "description": "Select which outputs to enable",
         "items": {
            "enum": [
               "object",
               "part",
               "scene",
               "material",
               "texture"
            ],
            "type": "string"
         },
         "title": "Enabled Outputs",
         "type": "array",
         "uniqueItems": true
      },
      "num_objects": {
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "null"
            }
         ],
         "default": null,
         "description": "Number of object classes",
         "title": "Num Objects"
      }
   },
   "$defs": {
      "FPN2DBlockConfig": {
         "properties": {
            "in_channels": {
               "default": null,
               "description": "Calculated based on other parameters",
               "title": "In Channels",
               "type": "null"
            },
            "out_channels": {
               "default": null,
               "description": "Calculated based on other parameters",
               "title": "Out Channels",
               "type": "null"
            },
            "kernel_size": {
               "default": 3,
               "description": "Kernel size for the convolutional layers in the FPN block.",
               "title": "Kernel Size",
               "type": "integer"
            },
            "padding": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "same",
               "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
               "title": "Padding"
            },
            "stride": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": 1,
               "description": "Stride for the convolution",
               "title": "Stride"
            },
            "conv_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the convolution layer",
               "title": "Conv Kwargs",
               "type": "object"
            },
            "transposed": {
               "default": false,
               "description": "Whether to perform ConvTranspose instead of Conv",
               "title": "Transposed",
               "type": "boolean"
            },
            "normalization": {
               "default": "batchnorm2d",
               "title": "Normalization",
               "type": "string"
            },
            "normalization_pre_args": {
               "default": [],
               "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
               "items": {},
               "title": "Normalization Pre Args",
               "type": "array"
            },
            "normalization_post_args": {
               "default": [],
               "description": "Arguments for the normalization layer after providing the dimension.",
               "items": {},
               "title": "Normalization Post Args",
               "type": "array"
            },
            "normalization_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the normalization layer",
               "title": "Normalization Kwargs",
               "type": "object"
            },
            "activation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "relu",
               "description": "Activation function type.",
               "title": "Activation"
            },
            "activation_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the activation function.",
               "title": "Activation Kwargs",
               "type": "object"
            },
            "sequence": {
               "default": "CNA",
               "description": "Sequence of operations in the block.",
               "enum": [
                  "C",
                  "AC",
                  "CA",
                  "CD",
                  "CN",
                  "DC",
                  "NC",
                  "ACD",
                  "ACN",
                  "ADC",
                  "ANC",
                  "CAD",
                  "CAN",
                  "CDA",
                  "CDN",
                  "CNA",
                  "CND",
                  "DAC",
                  "DCA",
                  "DCN",
                  "DNC",
                  "NAC",
                  "NCA",
                  "NCD",
                  "NDC",
                  "ACDN",
                  "ACND",
                  "ADCN",
                  "ADNC",
                  "ANCD",
                  "ANDC",
                  "CADN",
                  "CAND",
                  "CDAN",
                  "CDNA",
                  "CNAD",
                  "CNDA",
                  "DACN",
                  "DANC",
                  "DCAN",
                  "DCNA",
                  "DNAC",
                  "DNCA",
                  "NACD",
                  "NADC",
                  "NCAD",
                  "NCDA",
                  "NDAC",
                  "NDCA"
               ],
               "title": "Sequence",
               "type": "string"
            },
            "drop_prob": {
               "default": 0.0,
               "description": "Dropout probability.",
               "title": "Drop Prob",
               "type": "number"
            },
            "dim": {
               "description": "Input channel dimension of the FPN block.",
               "title": "Dim",
               "type": "integer"
            },
            "skip_conn_dim": {
               "description": "Input channel dimension of the skip connection.",
               "title": "Skip Conn Dim",
               "type": "integer"
            },
            "interpolation_mode": {
               "default": "bilinear",
               "description": "Interpolation mode for the FPN block.",
               "title": "Interpolation Mode",
               "type": "string"
            },
            "merge_method": {
               "default": "add",
               "description": "Merge method for the FPN block.",
               "enum": [
                  "add",
                  "concat"
               ],
               "title": "Merge Method",
               "type": "string"
            }
         },
         "required": [
            "dim",
            "skip_conn_dim"
         ],
         "title": "FPN2DBlockConfig",
         "type": "object"
      },
      "UPerNet2DFusionConfig": {
         "properties": {
            "in_channels": {
               "default": null,
               "description": "Calculated based on other parameters",
               "title": "In Channels",
               "type": "null"
            },
            "out_channels": {
               "default": null,
               "description": "Calculated based on other parameters",
               "title": "Out Channels",
               "type": "null"
            },
            "kernel_size": {
               "default": 3,
               "description": "Kernel size for the convolutional layers",
               "title": "Kernel Size",
               "type": "integer"
            },
            "padding": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  },
                  {
                     "type": "string"
                  }
               ],
               "default": "same",
               "description": "Padding for the convolution. Can be 'same' or an integer/tuple of integers.",
               "title": "Padding"
            },
            "stride": {
               "anyOf": [
                  {
                     "type": "integer"
                  },
                  {
                     "items": {
                        "type": "integer"
                     },
                     "type": "array"
                  }
               ],
               "default": 1,
               "description": "Stride for the convolution",
               "title": "Stride"
            },
            "conv_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the convolution layer",
               "title": "Conv Kwargs",
               "type": "object"
            },
            "transposed": {
               "default": false,
               "description": "Whether to perform ConvTranspose instead of Conv",
               "title": "Transposed",
               "type": "boolean"
            },
            "normalization": {
               "default": "batchnorm2d",
               "title": "Normalization",
               "type": "string"
            },
            "normalization_pre_args": {
               "default": [],
               "description": "Arguments for the normalization layer before providing the dimension. Useful when using GroupNorm layers are being used to specify the number of groups.",
               "items": {},
               "title": "Normalization Pre Args",
               "type": "array"
            },
            "normalization_post_args": {
               "default": [],
               "description": "Arguments for the normalization layer after providing the dimension.",
               "items": {},
               "title": "Normalization Post Args",
               "type": "array"
            },
            "normalization_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the normalization layer",
               "title": "Normalization Kwargs",
               "type": "object"
            },
            "activation": {
               "anyOf": [
                  {
                     "type": "string"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": "relu",
               "description": "Activation function type.",
               "title": "Activation"
            },
            "activation_kwargs": {
               "additionalProperties": true,
               "default": {},
               "description": "Additional keyword arguments for the activation function.",
               "title": "Activation Kwargs",
               "type": "object"
            },
            "sequence": {
               "default": "CNA",
               "description": "Sequence of operations in the block.",
               "enum": [
                  "C",
                  "AC",
                  "CA",
                  "CD",
                  "CN",
                  "DC",
                  "NC",
                  "ACD",
                  "ACN",
                  "ADC",
                  "ANC",
                  "CAD",
                  "CAN",
                  "CDA",
                  "CDN",
                  "CNA",
                  "CND",
                  "DAC",
                  "DCA",
                  "DCN",
                  "DNC",
                  "NAC",
                  "NCA",
                  "NCD",
                  "NDC",
                  "ACDN",
                  "ACND",
                  "ADCN",
                  "ADNC",
                  "ANCD",
                  "ANDC",
                  "CADN",
                  "CAND",
                  "CDAN",
                  "CDNA",
                  "CNAD",
                  "CNDA",
                  "DACN",
                  "DANC",
                  "DCAN",
                  "DCNA",
                  "DNAC",
                  "DNCA",
                  "NACD",
                  "NADC",
                  "NCAD",
                  "NCDA",
                  "NDAC",
                  "NDCA"
               ],
               "title": "Sequence",
               "type": "string"
            },
            "drop_prob": {
               "default": 0.0,
               "description": "Dropout probability.",
               "title": "Drop Prob",
               "type": "number"
            },
            "num_features": {
               "description": "Number of input feature maps",
               "title": "Num Features",
               "type": "integer"
            },
            "dim": {
               "description": "Dimension of the fused feature map",
               "title": "Dim",
               "type": "integer"
            },
            "fused_shape": {
               "anyOf": [
                  {
                     "maxItems": 2,
                     "minItems": 2,
                     "prefixItems": [
                        {
                           "type": "integer"
                        },
                        {
                           "type": "integer"
                        }
                     ],
                     "type": "array"
                  },
                  {
                     "type": "null"
                  }
               ],
               "default": null,
               "description": "Shape of the fused feature map. It can also be provided during runtime. If None, highest input resolution is used.",
               "title": "Fused Shape"
            },
            "interpolation_mode": {
               "default": "bilinear",
               "description": "Interpolation mode for the FPN block.",
               "title": "Interpolation Mode",
               "type": "string"
            }
         },
         "required": [
            "num_features",
            "dim"
         ],
         "title": "UPerNet2DFusionConfig",
         "type": "object"
      }
   },
   "required": [
      "blocks",
      "fusion"
   ]
}

Config:
  • arbitrary_types_allowed: bool = True

  • extra: str = ignore

  • validate_default: bool = True

  • validate_assignment: bool = True

  • validate_return: bool = True

Fields:
Validators:
field fusion: UPerNet2DFusionConfig [Required]#

Configuration for the UPerNet2D fusion block

Validated by:
field enabled_outputs: set[Literal['object', 'part', 'scene', 'material', 'texture']] = {'object'}#

Select which outputs to enable

Validated by:
field num_objects: int | None = None#

Number of object classes

Validated by:
validator validate_before  »  all fields[source]#

Base class method for validating data before creating the model.

validator validate  »  all fields[source]#

Base method for validating the model after creation.

class vision_architectures.nets.upernet_2d.UPerNet2DFusion(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module

Fusion block for UPerNet2D. This class is designed for 2D input eg. natural images etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the UPerNet2DFusion block.

Parameters:
  • config (UPerNet2DFusionConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

concat_features(features, fused_shape=None)[source]#

Concatenate features from different resolutions and interpolate them to the same size.

Parameters:
  • features (list[Tensor]) – A list of channels-first 2D multi-scale features of shapes [(b, dim, h1, w1), (b, dim, h2, w2), …] where h1 > h2 > …

  • fused_shape (Optional[tuple[int, int, int]]) – Shape to which all feature maps will be interpolated. If None, value entered in the config is used. If that is None too, the shape of the largest feature map is used.

Return type:

Tensor

Returns:

A feature map with spatial resolution of fused_shape and concatenated channels.

fuse_features(concatenated_features)[source]#

Fuse features from different resolutions.

Parameters:

concatenated_features (Tensor) – A channels-first feature map with spatial resolution of fused_shape and concatenated channels.

Return type:

Tensor

Returns:

A fused 2D feature map.

forward(features, fused_shape=None, channels_first=True)[source]#

Collect and fuse all of the multi-scale features.

Parameters:
  • features (list[Tensor]) – A list of 2D multi-scale features of shapes [(b, [dim], h1, w1, [dim]), (b, [dim], h2, w2, [dim]), …] where h1 > h2 > …

  • fused_shape (Optional[tuple[int, int, int]]) – Shape to which all feature maps will be interpolated. If None, value entered in the config is used. If that is None too, the shape of the largest feature map is used.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C). This is assumed for all the features.

Return type:

Tensor

Returns:

A fused 2D feature map.

class vision_architectures.nets.upernet_2d.UPerNet2D(config={}, checkpointing_level=0, **kwargs)[source]#

Bases: Module, PyTorchModelHubMixin

Implementation of the UPerNet2D architecture. This class is designed for 2D input eg. natural images etc.

__init__(config={}, checkpointing_level=0, **kwargs)[source]#

Initialize the UPerNet2D architecture.

Parameters:
  • config (UPerNet2DConfig) – An instance of the Config class that contains all the configuration parameters. It can also be passed as a dictionary and the instance will be created automatically.

  • checkpointing_level (int) – The level of checkpointing to use for activation checkpointing. Refer to ActivationCheckpointing for more details.

  • **kwargs – Additional keyword arguments for configuration.

forward(features, fusion_shape=None, channels_first=True)[source]#

Return different outputs from the UPerNet2D architecture as per the paper.

Parameters:
  • features (list[Tensor]) – List of feature maps from the FPN. Tensor of shape (B, C, Y, X) or (B, Y, X, C) representing the input features.

  • fusion_shape (Optional[tuple[int, int]]) – Desired output shape for the feature fusion. If None and not specified in the config, the highest shape of the highest resolution feature map is used.

  • channels_first (bool) – Whether the inputs are in channels first format (B, C, …) or not (B, …, C).

Return type:

dict[str, Tensor]

Returns:

A dictionary of outputs for each output type. Tensor of shape (B, C, Y, X) or (B, Y, X, C) representing the output features.