Skip to content

CrossValidator: Custom k-Fold Cross-Validation

The CrossValidator class provides a simple, flexible implementation of k-fold cross-validation for evaluating machine learning models. It is designed to work with any model that implements fit(X, y) and predict(X) methods, and supports custom scoring functions for regression or classification tasks.


Overview

Cross-validation is a robust technique for assessing the generalization performance of machine learning models. The CrossValidator class splits your dataset into k folds, trains the model on k-1 folds, and evaluates it on the remaining fold, repeating this process for each fold. The results are aggregated to provide a reliable estimate of model performance.


Parameters

Parameter Type Default Description
model object A machine learning model with fit(X, y) and predict(X) methods.
X array-like Feature matrix of shape (n_samples, n_features).
y array-like Target labels of shape (n_samples,).
k int 5 Number of folds for cross-validation.
shuffle bool True Whether to shuffle the dataset before splitting into folds.
random_seed int/None None Seed for reproducible shuffling (ignored if shuffle=False).

Attributes

  • folds: list of tuple List of (train_indices, test_indices) for each fold.

Methods

split()

Splits the dataset into k folds.

  • Returns: folds: list of tuple Each tuple contains (train_indices, test_indices) for a fold.

evaluate(scoring_func)

Performs k-fold cross-validation and returns evaluation scores.

  • Parameters:scoring_func: callableA function that takes y_true and y_pred and returns a numeric score (e.g., mean_squared_error, accuracy_score).
  • Returns: scores: list of float Evaluation scores for each fold.

Example Usage

from machinegnostics.models import CrossValidator, LinearRegressor
from machinegnostics.metircs import mean_squared_error
import numpy as np

# Generate random data
X = np.random.rand(100, 10)
y = np.random.rand(100)

# Initialize model and cross-validator
model = LinearRegressor()
cv = CrossValidator(model, X, y, k=5, shuffle=True, random_seed=42)

# Evaluate using mean squared error
scores = cv.evaluate(mean_squared_error)
print("Cross-Validation Scores:", scores)
print("Mean Score:", np.mean(scores))

Notes

  • The model is re-initialized and trained from scratch for each fold.
  • Supports any model with fit and predict methods.
  • Works with any scoring function that accepts y_true and y_pred.
  • Shuffling with a fixed random_seed ensures reproducible splits.

Author: Nirmal Parmar
Date: 2025-05-01