Looking through the documentation, I was not able to find the standard binary classification hinge loss function, like the one defined on wikipedia page: l(y) = max( 0, 1 - t*y) where t E {-1, 1}. A linear classifier is a classification algorithm which makes its predictions based on a linear predictor function combining a set of weight with the feature vector. Binary Classification from Positive Data with Skewed Confidence. The hinge loss function is calculated on the score \( f(\vx) \) of the class, as opposed to the final prediction \( \yhat \). On the ImageNet classification problem, our PyTorch implementation of binary ResNet-18 and AlexNet models provided the same state-of-the-art accuracy (Table 2) as the DoReFa-Nets with 4-bit activations: 59. BCELoss () net_out = net (data) loss = criterion (net_out, target) This should work fine for you. In preparation for backpropagation, set gradients to zero by calling zero_grad() on the optimizer. num_classes) """ Softmax-The final step of the softmax classifier: mapping final hidden layer to class scores. Then the loss function for a single sample in the dataset is expressed as: \[-y \log(p)-(1-y) \log(1-p)\ ,\] where \(y\) is the label of the sample, and \(p\) is the predicted probability of the sample belonging to class 1. 