Skip to content

f1_score: Classification F1 Score Metric

The f1_score function computes the F1 score for classification models, supporting both binary and multiclass settings. The F1 score is the harmonic mean of precision and recall, providing a balanced measure that is especially useful when classes are imbalanced.


Overview

The F1 score combines precision and recall into a single metric by taking their harmonic mean.

This metric is particularly important when you want to balance the trade-off between precision and recall, such as in information retrieval, medical diagnosis, and fraud detection.


Parameters

Parameter Type Default Description
y_true array-like or pandas Series Ground truth (correct) target values. Shape: (n_samples,)
y_pred array-like or pandas Series Estimated target values as returned by a classifier. Shape: (n_samples,)
average {'binary', 'micro', 'macro', 'weighted', None} 'binary' Determines the type of averaging performed on the data. See below for details.
labels array-like or None None List of labels to include. If None, uses sorted unique labels from y_true and y_pred.

Averaging Options

  • 'binary': Only report results for the positive class (default for binary classification).
  • 'micro': Calculate metrics globally by counting the total true positives, false negatives, and false positives.
  • 'macro': Calculate metrics for each label, and find their unweighted mean.
  • 'weighted': Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label).
  • None: Return the F1 score for each class as an array.

Returns

  • f1: float or array of floats
    F1 score(s). Returns a float if average is not None, otherwise returns an array of F1 values for each class.

Raises

  • ValueError
  • If y_true or y_pred is a pandas DataFrame (must select a column).
  • If the shapes of y_true and y_pred do not match.
  • If average='binary' but the problem is not binary classification.
  • If average is not a recognized option.

Example Usage

from machinegnostics.metrics import f1_score

# Example 1: Macro-averaged F1 for multiclass
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 2, 0]
print(f1_score(y_true, y_pred, average='macro'))  # Output: 0.7777777777777777

# Example 2: Binary F1 with pandas Series
import pandas as pd
df = pd.DataFrame({'true': [1, 0, 1], 'pred': [1, 1, 1]})
print(f1_score(df['true'], df['pred'], average='binary'))  # Output: 0.8

Notes

  • The function supports input as numpy arrays, lists, or pandas Series.
  • If you pass a pandas DataFrame, you must select a column (e.g., df['col']), not the whole DataFrame.
  • For binary classification, by convention, the second label is treated as the positive class.
  • For imbalanced datasets, consider using average='weighted' to account for class support.