Yes, I am switching to PyTorch, and I am so far very happy with it.

Recently, I am working on a multilabel classification problem, where the evaluation metric is the macro f1 score. So, ideally, we would want the loss function to be aligned with our evaluation metric, instead of using standard BCE.

Initially, I was using the following function:

1 2 3 4 5 6 7 8 |
def f1_loss(output, target, epsilon=1e-7): probas = nn.Sigmoid()(output) TP = (probas * target).sum(dim=1) precision = TP / (probas.sum(dim=1) + epsilon) recall = TP / (target.sum(dim=1) + epsilon) f1 = 2 * precision * recall / (precision + recall + epsilon) f1 = f1.clamp(min=epsilon, max=1-epsilon) return 1 - f1.mean() |

It is perfectly usable for the purpose of a loss function, like your typical training code:

1 2 3 4 5 6 |
for batch_idx, (inp, target) in enumerate(dataloader): output = model(inp) optimizer.zero_grad() loss = f1_loss(output, target) loss.backward() optimizer.step() |

Better, we can make it a PyTorch module, so that the usage is more like your typical PyTorch loss:

1 2 3 4 5 6 7 8 9 10 11 12 13 |
class F1_Loss(nn.Module): def __init__(self, epsilon=1e-7): super(F1_Loss, self).__init__() self.epsilon = epsilon def forward(self, output, target): probas = nn.Sigmoid()(output) TP = (probas * target).sum(dim=1) precision = TP / (probas.sum(dim=1) + self.epsilon) recall = TP / (target.sum(dim=1) + self.epsilon) f1 = 2 * precision * recall / (precision + recall + self.epsilon) f1 = f1.clamp(min=self.epsilon, max=1-self.epsilon) return 1 - f1.mean() |

That is simply to put the original f1_loss function on to the forward pass of a simple module. As a result, I can explicitly put the module to GPU.

1 2 3 4 5 6 7 8 |
f1_loss = F1_Loss().cuda() for batch_idx, (inp, target) in enumerate(dataloader): output = model(inp) optimizer.zero_grad() loss = f1_loss(output, target) loss.backward() optimizer.step() |

GregHey Ren, please could you explain why you clamp the output?

adminPost authorHello Greg, I added clamp and find it works slightly better on my dataset at that time, at least in first few epochs compared to no clamp.

GregThanks for replying Ren. Is there any intuition behind adding it tho? You must have had some sort of gut feeling?

GeorgesHi Ren, are you sure F1 is differentiable?

adminPost authorNo, this is an approximation, it works reasonably well.

GeorgesThank you for your reply, I will give it a shot in my experiments.