Skip to content

make_regression_check_data: Synthetic Regression Dataset

The make_regression_check_data function generates synthetic regression datasets. It supports linear, polynomial, and sinusoidal relationships with additive Gaussian noise and optional outliers. This serves as a versatile testbed for validating regression models within the Machine Gnostics framework.


Overview

This utility creates regression datasets for testing:

  • Functions: Linear/Polynomial (\(y = \sum a_i x^i + c\)), Sinusoidal (\(y = A \sin(x) + c\)), or Cosine.
  • Outliers: Option to contaminate a percentage of data with large shifts (outlier_ratio), ideal for testing robust regression capabilities.
  • Purpose: Unit testing models, verifying pipeline integrity, and demonstrating regression capabilities.
  • Reproducibility: Uses a fixed seed (default 42).

Parameters

Parameter Type Description Default
n_samples int The number of data points to generate. 20
slope float True coefficient (linear/poly) or amplitude (sin/cos). 3.5
intercept float The true intercept (bias) of the underlying relationship. 10.0
noise_level float Standard deviation of Gaussian noise added to target. 2.0
degree int Degree of the polynomial. Used if function_type='poly'. 1
function_type str Component function type: 'poly', 'sin', 'cos'. 'poly'
outlier_ratio float Proportion of samples to contaminate with outliers (0.0 to 1.0). 0.0
seed int Random seed for reproducibility. 42

Returns

Return Type Description
X numpy.ndarray Input feature array of shape (n_samples,). Values uniform in [0, 10].
y numpy.ndarray Target array of shape (n_samples,).

Example Usage

from machinegnostics.datasets import make_regression_check_data
import numpy as np

# Example 1: Standard Linear Regression
X, y = make_regression_check_data(n_samples=50)

# Example 2: Sinusoidal data with outliers
X_sin, y_sin = make_regression_check_data(
    n_samples=100, 
    function_type='sin', 
    outlier_ratio=0.1
)

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")

Author: Nirmal Parmar