classification_report: Classification Metrics Summary

The classification_report function generates a comprehensive summary of key classification metrics—precision, recall, F1 score, and support—for each class in your dataset. It supports both string and dictionary output formats, making it suitable for both human-readable reports and programmatic analysis.

Overview

This function provides a detailed breakdown of classifier performance for each class, including:

Precision: Proportion of positive identifications that were actually correct.
Recall: Proportion of actual positives that were correctly identified.
F1 Score: Harmonic mean of precision and recall.
Support: Number of true instances for each class.

It also computes weighted averages across all classes.

Parameters

Parameter	Type	Default	Description
`y_true`	array-like or pandas Series	—	Ground truth (correct) target values. Shape: (n_samples,)
`y_pred`	array-like or pandas Series	—	Estimated target values as returned by a classifier. Shape: (n_samples,)
`labels`	array-like or None	None	List of labels to include in the report. If None, uses sorted unique labels from y_true and y_pred.
`target_names`	list of str or None	None	Optional display names matching the labels (same order).
`digits`	int	2	Number of digits for formatting output.
`output_dict`	bool	False	If True, return output as a dict. If False, return as a formatted string.
`zero_division`	{0, 1, 'warn'}	0	Value to return when there is a zero division (no predicted samples for a class).

Returns

report: str or dict Text summary or dictionary of the precision, recall, F1 score, and support for each class.

Example Usage

from machinegnostics.metrics import classification_report

y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 2, 0]

# String report
print(classification_report(y_true, y_pred))

# Dictionary report
report_dict = classification_report(y_true, y_pred, output_dict=True)
print(report_dict)

Output Example

String Output:

Class             Precision    Recall   F1-score    Support
==========================================================
0                    1.00      0.50      0.67          2
1                    0.00      0.00      0.00          1
2                    1.00      1.00      1.00          2
==========================================================
Avg/Total            0.80      0.60      0.67          5

Dictionary Output:

{
  '0': {'precision': 1.0, 'recall': 0.5, 'f1-score': 0.67, 'support': 2},
  '1': {'precision': 0.0, 'recall': 0.0, 'f1-score': 0.0, 'support': 1},
  '2': {'precision': 1.0, 'recall': 1.0, 'f1-score': 1.0, 'support': 2},
  'avg/total': {'precision': 0.8, 'recall': 0.6, 'f1-score': 0.67, 'support': 5}
}

Notes

The function uses precision_score, recall_score, and f1_score from the Machine Gnostics metrics module for consistency.
If target_names is provided, its length must match the number of labels.
For imbalanced datasets, the weighted average provides a more informative summary than the unweighted mean.
The zero_division parameter controls the behavior when a class has no predicted samples.