Reference for `ultralytics/models/utils/ops.py`

Note

This file is available at https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/utils/ops.py. If you spot a problem please help fix it by contributing a Pull Request 🛠️. Thank you 🙏!

ultralytics.models.utils.ops.HungarianMatcher

HungarianMatcher(
    cost_gain=None,
    use_fl=True,
    with_mask=False,
    num_sample_points=12544,
    alpha=0.25,
    gamma=2.0,
)

Bases: Module

A module implementing the HungarianMatcher, which is a differentiable module to solve the assignment problem in an end-to-end fashion.

HungarianMatcher performs optimal assignment over the predicted and ground truth bounding boxes using a cost function that considers classification scores, bounding box coordinates, and optionally, mask predictions.

Attributes:

Name	Type	Description
`cost_gain`	`dict`	Dictionary of cost coefficients: 'class', 'bbox', 'giou', 'mask', and 'dice'.
`use_fl`	`bool`	Indicates whether to use Focal Loss for the classification cost calculation.
`with_mask`	`bool`	Indicates whether the model makes mask predictions.
`num_sample_points`	`int`	The number of sample points used in mask cost calculation.
`alpha`	`float`	The alpha factor in Focal Loss calculation.
`gamma`	`float`	The gamma factor in Focal Loss calculation.

Methods:

Name	Description
`forward`	Computes the assignment between predictions and ground truths for a batch.
`_cost_mask`	Computes the mask cost and dice cost if masks are predicted.

The HungarianMatcher uses a cost function that considers classification scores, bounding box coordinates, and optionally mask predictions to perform optimal bipartite matching between predictions and ground truths.

Parameters:

Name	Type	Description	Default
`cost_gain`	`dict`	Dictionary of cost coefficients for different components of the matching cost. Should contain keys 'class', 'bbox', 'giou', 'mask', and 'dice'.	`None`
`use_fl`	`bool`	Whether to use Focal Loss for the classification cost calculation.	`True`
`with_mask`	`bool`	Whether the model makes mask predictions.	`False`
`num_sample_points`	`int`	Number of sample points used in mask cost calculation.	`12544`
`alpha`	`float`	Alpha factor in Focal Loss calculation.	`0.25`
`gamma`	`float`	Gamma factor in Focal Loss calculation.	`2.0`

Source code in ultralytics/models/utils/ops.py

def __init__(self, cost_gain=None, use_fl=True, with_mask=False, num_sample_points=12544, alpha=0.25, gamma=2.0):
    """
    Initialize a HungarianMatcher module for optimal assignment of predicted and ground truth bounding boxes.

    The HungarianMatcher uses a cost function that considers classification scores, bounding box coordinates,
    and optionally mask predictions to perform optimal bipartite matching between predictions and ground truths.

    Args:
        cost_gain (dict, optional): Dictionary of cost coefficients for different components of the matching cost.
            Should contain keys 'class', 'bbox', 'giou', 'mask', and 'dice'.
        use_fl (bool, optional): Whether to use Focal Loss for the classification cost calculation.
        with_mask (bool, optional): Whether the model makes mask predictions.
        num_sample_points (int, optional): Number of sample points used in mask cost calculation.
        alpha (float, optional): Alpha factor in Focal Loss calculation.
        gamma (float, optional): Gamma factor in Focal Loss calculation.
    """
    super().__init__()
    if cost_gain is None:
        cost_gain = {"class": 1, "bbox": 5, "giou": 2, "mask": 1, "dice": 1}
    self.cost_gain = cost_gain
    self.use_fl = use_fl
    self.with_mask = with_mask
    self.num_sample_points = num_sample_points
    self.alpha = alpha
    self.gamma = gamma

forward

forward(
    pred_bboxes,
    pred_scores,
    gt_bboxes,
    gt_cls,
    gt_groups,
    masks=None,
    gt_mask=None,
)

Forward pass for HungarianMatcher. Computes costs based on prediction and ground truth and finds the optimal matching between predictions and ground truth based on these costs.

Parameters:

Name	Type	Description	Default
`pred_bboxes`	`Tensor`	Predicted bounding boxes with shape (batch_size, num_queries, 4).	required
`pred_scores`	`Tensor`	Predicted scores with shape (batch_size, num_queries, num_classes).	required
`gt_cls`	`Tensor`	Ground truth classes with shape (num_gts, ).	required
`gt_bboxes`	`Tensor`	Ground truth bounding boxes with shape (num_gts, 4).	required
`gt_groups`	`List[int]`	List of length equal to batch size, containing the number of ground truths for each image.	required
`masks`	`Tensor`	Predicted masks with shape (batch_size, num_queries, height, width).	`None`
`gt_mask`	`List[Tensor]`	List of ground truth masks, each with shape (num_masks, Height, Width).	`None`

Returns:

Type	Description
`List[Tuple[Tensor, Tensor]]`	A list of size batch_size, each element is a tuple (index_i, index_j), where: - index_i is the tensor of indices of the selected predictions (in order) - index_j is the tensor of indices of the corresponding selected ground truth targets (in order) For each batch element, it holds: len(index_i) = len(index_j) = min(num_queries, num_target_boxes)

Source code in ultralytics/models/utils/ops.py

def forward(self, pred_bboxes, pred_scores, gt_bboxes, gt_cls, gt_groups, masks=None, gt_mask=None):
    """
    Forward pass for HungarianMatcher. Computes costs based on prediction and ground truth and finds the optimal
    matching between predictions and ground truth based on these costs.

    Args:
        pred_bboxes (torch.Tensor): Predicted bounding boxes with shape (batch_size, num_queries, 4).
        pred_scores (torch.Tensor): Predicted scores with shape (batch_size, num_queries, num_classes).
        gt_cls (torch.Tensor): Ground truth classes with shape (num_gts, ).
        gt_bboxes (torch.Tensor): Ground truth bounding boxes with shape (num_gts, 4).
        gt_groups (List[int]): List of length equal to batch size, containing the number of ground truths for
            each image.
        masks (torch.Tensor, optional): Predicted masks with shape (batch_size, num_queries, height, width).
        gt_mask (List[torch.Tensor], optional): List of ground truth masks, each with shape (num_masks, Height, Width).

    Returns:
        (List[Tuple[torch.Tensor, torch.Tensor]]): A list of size batch_size, each element is a tuple (index_i, index_j), where:
            - index_i is the tensor of indices of the selected predictions (in order)
            - index_j is the tensor of indices of the corresponding selected ground truth targets (in order)
            For each batch element, it holds:
                len(index_i) = len(index_j) = min(num_queries, num_target_boxes)
    """
    bs, nq, nc = pred_scores.shape

    if sum(gt_groups) == 0:
        return [(torch.tensor([], dtype=torch.long), torch.tensor([], dtype=torch.long)) for _ in range(bs)]

    # We flatten to compute the cost matrices in a batch
    # (batch_size * num_queries, num_classes)
    pred_scores = pred_scores.detach().view(-1, nc)
    pred_scores = F.sigmoid(pred_scores) if self.use_fl else F.softmax(pred_scores, dim=-1)
    # (batch_size * num_queries, 4)
    pred_bboxes = pred_bboxes.detach().view(-1, 4)

    # Compute the classification cost
    pred_scores = pred_scores[:, gt_cls]
    if self.use_fl:
        neg_cost_class = (1 - self.alpha) * (pred_scores**self.gamma) * (-(1 - pred_scores + 1e-8).log())
        pos_cost_class = self.alpha * ((1 - pred_scores) ** self.gamma) * (-(pred_scores + 1e-8).log())
        cost_class = pos_cost_class - neg_cost_class
    else:
        cost_class = -pred_scores

    # Compute the L1 cost between boxes
    cost_bbox = (pred_bboxes.unsqueeze(1) - gt_bboxes.unsqueeze(0)).abs().sum(-1)  # (bs*num_queries, num_gt)

    # Compute the GIoU cost between boxes, (bs*num_queries, num_gt)
    cost_giou = 1.0 - bbox_iou(pred_bboxes.unsqueeze(1), gt_bboxes.unsqueeze(0), xywh=True, GIoU=True).squeeze(-1)

    # Final cost matrix
    C = (
        self.cost_gain["class"] * cost_class
        + self.cost_gain["bbox"] * cost_bbox
        + self.cost_gain["giou"] * cost_giou
    )
    # Compute the mask cost and dice cost
    if self.with_mask:
        C += self._cost_mask(bs, gt_groups, masks, gt_mask)

    # Set invalid values (NaNs and infinities) to 0 (fixes ValueError: matrix contains invalid numeric entries)
    C[C.isnan() | C.isinf()] = 0.0

    C = C.view(bs, nq, -1).cpu()
    indices = [linear_sum_assignment(c[i]) for i, c in enumerate(C.split(gt_groups, -1))]
    gt_groups = torch.as_tensor([0, *gt_groups[:-1]]).cumsum_(0)  # (idx for queries, idx for gt)
    return [
        (torch.tensor(i, dtype=torch.long), torch.tensor(j, dtype=torch.long) + gt_groups[k])
        for k, (i, j) in enumerate(indices)
    ]

ultralytics.models.utils.ops.get_cdn_group

get_cdn_group(
    batch,
    num_classes,
    num_queries,
    class_embed,
    num_dn=100,
    cls_noise_ratio=0.5,
    box_noise_scale=1.0,
    training=False,
)

Get contrastive denoising training group with positive and negative samples from ground truths.

Parameters:

Name	Type	Description	Default
`batch`	`dict`	A dict that includes 'gt_cls' (torch.Tensor with shape (num_gts, )), 'gt_bboxes' (torch.Tensor with shape (num_gts, 4)), 'gt_groups' (List[int]) which is a list of batch size length indicating the number of gts of each image.	required
`num_classes`	`int`	Number of classes.	required
`num_queries`	`int`	Number of queries.	required
`class_embed`	`Tensor`	Embedding weights to map class labels to embedding space.	required
`num_dn`	`int`	Number of denoising queries.	`100`
`cls_noise_ratio`	`float`	Noise ratio for class labels.	`0.5`
`box_noise_scale`	`float`	Noise scale for bounding box coordinates.	`1.0`
`training`	`bool`	If it's in training mode.	`False`

Returns:

Name	Type	Description
`padding_cls`	`Optional[Tensor]`	The modified class embeddings for denoising.
`padding_bbox`	`Optional[Tensor]`	The modified bounding boxes for denoising.
`attn_mask`	`Optional[Tensor]`	The attention mask for denoising.
`dn_meta`	`Optional[Dict]`	Meta information for denoising.

Source code in ultralytics/models/utils/ops.py

def get_cdn_group(
    batch, num_classes, num_queries, class_embed, num_dn=100, cls_noise_ratio=0.5, box_noise_scale=1.0, training=False
):
    """
    Get contrastive denoising training group with positive and negative samples from ground truths.

    Args:
        batch (dict): A dict that includes 'gt_cls' (torch.Tensor with shape (num_gts, )), 'gt_bboxes'
            (torch.Tensor with shape (num_gts, 4)), 'gt_groups' (List[int]) which is a list of batch size length
            indicating the number of gts of each image.
        num_classes (int): Number of classes.
        num_queries (int): Number of queries.
        class_embed (torch.Tensor): Embedding weights to map class labels to embedding space.
        num_dn (int, optional): Number of denoising queries.
        cls_noise_ratio (float, optional): Noise ratio for class labels.
        box_noise_scale (float, optional): Noise scale for bounding box coordinates.
        training (bool, optional): If it's in training mode.

    Returns:
        padding_cls (Optional[torch.Tensor]): The modified class embeddings for denoising.
        padding_bbox (Optional[torch.Tensor]): The modified bounding boxes for denoising.
        attn_mask (Optional[torch.Tensor]): The attention mask for denoising.
        dn_meta (Optional[Dict]): Meta information for denoising.
    """
    if (not training) or num_dn <= 0 or batch is None:
        return None, None, None, None
    gt_groups = batch["gt_groups"]
    total_num = sum(gt_groups)
    max_nums = max(gt_groups)
    if max_nums == 0:
        return None, None, None, None

    num_group = num_dn // max_nums
    num_group = 1 if num_group == 0 else num_group
    # Pad gt to max_num of a batch
    bs = len(gt_groups)
    gt_cls = batch["cls"]  # (bs*num, )
    gt_bbox = batch["bboxes"]  # bs*num, 4
    b_idx = batch["batch_idx"]

    # Each group has positive and negative queries.
    dn_cls = gt_cls.repeat(2 * num_group)  # (2*num_group*bs*num, )
    dn_bbox = gt_bbox.repeat(2 * num_group, 1)  # 2*num_group*bs*num, 4
    dn_b_idx = b_idx.repeat(2 * num_group).view(-1)  # (2*num_group*bs*num, )

    # Positive and negative mask
    # (bs*num*num_group, ), the second total_num*num_group part as negative samples
    neg_idx = torch.arange(total_num * num_group, dtype=torch.long, device=gt_bbox.device) + num_group * total_num

    if cls_noise_ratio > 0:
        # Half of bbox prob
        mask = torch.rand(dn_cls.shape) < (cls_noise_ratio * 0.5)
        idx = torch.nonzero(mask).squeeze(-1)
        # Randomly put a new one here
        new_label = torch.randint_like(idx, 0, num_classes, dtype=dn_cls.dtype, device=dn_cls.device)
        dn_cls[idx] = new_label

    if box_noise_scale > 0:
        known_bbox = xywh2xyxy(dn_bbox)

        diff = (dn_bbox[..., 2:] * 0.5).repeat(1, 2) * box_noise_scale  # 2*num_group*bs*num, 4

        rand_sign = torch.randint_like(dn_bbox, 0, 2) * 2.0 - 1.0
        rand_part = torch.rand_like(dn_bbox)
        rand_part[neg_idx] += 1.0
        rand_part *= rand_sign
        known_bbox += rand_part * diff
        known_bbox.clip_(min=0.0, max=1.0)
        dn_bbox = xyxy2xywh(known_bbox)
        dn_bbox = torch.logit(dn_bbox, eps=1e-6)  # inverse sigmoid

    num_dn = int(max_nums * 2 * num_group)  # total denoising queries
    # class_embed = torch.cat([class_embed, torch.zeros([1, class_embed.shape[-1]], device=class_embed.device)])
    dn_cls_embed = class_embed[dn_cls]  # bs*num * 2 * num_group, 256
    padding_cls = torch.zeros(bs, num_dn, dn_cls_embed.shape[-1], device=gt_cls.device)
    padding_bbox = torch.zeros(bs, num_dn, 4, device=gt_bbox.device)

    map_indices = torch.cat([torch.tensor(range(num), dtype=torch.long) for num in gt_groups])
    pos_idx = torch.stack([map_indices + max_nums * i for i in range(num_group)], dim=0)

    map_indices = torch.cat([map_indices + max_nums * i for i in range(2 * num_group)])
    padding_cls[(dn_b_idx, map_indices)] = dn_cls_embed
    padding_bbox[(dn_b_idx, map_indices)] = dn_bbox

    tgt_size = num_dn + num_queries
    attn_mask = torch.zeros([tgt_size, tgt_size], dtype=torch.bool)
    # Match query cannot see the reconstruct
    attn_mask[num_dn:, :num_dn] = True
    # Reconstruct cannot see each other
    for i in range(num_group):
        if i == 0:
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), max_nums * 2 * (i + 1) : num_dn] = True
        if i == num_group - 1:
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), : max_nums * i * 2] = True
        else:
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), max_nums * 2 * (i + 1) : num_dn] = True
            attn_mask[max_nums * 2 * i : max_nums * 2 * (i + 1), : max_nums * 2 * i] = True
    dn_meta = {
        "dn_pos_idx": [p.reshape(-1) for p in pos_idx.cpu().split(list(gt_groups), dim=1)],
        "dn_num_group": num_group,
        "dn_num_split": [num_dn, num_queries],
    }

    return (
        padding_cls.to(class_embed.device),
        padding_bbox.to(class_embed.device),
        attn_mask.to(class_embed.device),
        dn_meta,
    )

📅 Created 1 year ago ✏️ Updated 7 months ago

Reference for ultralytics/models/utils/ops.py

ultralytics.models.utils.ops.HungarianMatcher

forward

ultralytics.models.utils.ops.get_cdn_group

Reference for `ultralytics/models/utils/ops.py`