Machine Gnostics Polynomial Regression¶

The PolynomialRegressor is a robust polynomial regression model built on the principles of Mathematical Gnostics. It is designed to provide deterministic, interpretable, and resilient regression in the presence of outliers, noise, and non-Gaussian data distributions. Unlike traditional statistical models, this regressor leverages algebraic and geometric concepts from Mathematical Gnostics, focusing on event-level modeling and robust loss minimization.

Key Features:

Robust to Outliers: Uses gnostic loss functions and adaptive weights to minimize the influence of outliers and corrupted samples.
Polynomial Feature Expansion: Supports configurable polynomial degrees for flexible modeling.
Iterative Optimization: Employs iterative fitting with early stopping and convergence checks.
Custom Gnostic Loss: Minimizes a user-selected gnostic loss ('hi', 'hj', etc.) for event-level robustness.
Detailed Training History: Optionally records loss, weights, entropy, and gnostic characteristics at each iteration.
Easy Integration: Compatible with numpy arrays and supports model persistence.

1. Basic Usage: Robust Polynomial Regression¶

Let’s compare the Machine Gnostics PolynomialRegressor with standard polynomial regression on a dataset with outliers.

Basic Polynomial Regression

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from machinegnostics.models.regression import PolynomialRegressor

# Set random seed for reproducibility
np.random.seed(42)

# Generate data
X = np.linspace(0, 2, 10).reshape(-1, 1)
y = 2.0 * np.exp(1.8 * X.ravel()) + np.random.normal(0, 0.2, 10)
y[8:] += [80.0, -8.0]  # Add outliers

# Create test points for smooth curve
X_test = np.linspace(0, 2, 100).reshape(-1, 1)

# Fit regular polynomial regression
degree = 2
poly_reg = make_pipeline(PolynomialFeatures(degree), LinearRegression())
poly_reg.fit(X, y)
y_pred_regular = poly_reg.predict(X)
y_pred_regular_test = poly_reg.predict(X_test)

# Fit robust Machine Gnostics regression
mg_model = PolynomialRegressor(degree=degree)
mg_model.fit(X, y.flatten())
y_pred_robust = mg_model.predict(X)
y_pred_robust_test = mg_model.predict(X_test)
print(f'model coeff: {mg_model.coefficients}')

# Calculate residuals
residuals_regular = y - y_pred_regular
residuals_robust = y - y_pred_robust

# Create figure with subplots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 15), height_ratios=[2, 1])

# Plot regression curves
ax1.scatter(X, y, color='gray', label='Data', zorder=2)
ax1.scatter(X[8:], y[8:], color='red', s=100, label='Outliers', zorder=3)
ax1.plot(X_test, y_pred_regular_test, 'b--', label='Regular Polynomial', zorder=1)
ax1.plot(X_test, y_pred_robust_test, 'r-', label='Robust MG Regression', zorder=1)
ax1.set_xlabel('X')
ax1.set_ylabel('y')
ax1.set_title('Comparison: Regular vs Robust Machine Gnostics Polynomial Regression')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot residuals
ax2.scatter(X, residuals_regular, color='blue', label='Regular Residuals', alpha=0.6)
ax2.scatter(X, residuals_robust, color='red', label='Robust Residuals', alpha=0.6)
ax2.axhline(y=0, color='k', linestyle='--', alpha=0.3)
ax2.set_xlabel('X')
ax2.set_ylabel('Residuals')
ax2.set_title('Residual Plot')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print mean squared error for both methods (excluding outliers)
mse_regular = np.mean((y_pred_regular[:-2] - y[:-2])**2)
mse_robust = np.mean((y_pred_robust[:-2] - y[:-2])**2)
print(f"MSE (excluding outliers):")
print(f"Regular Polynomial: {mse_regular:.4f}")
print(f"Robust MG Regression: {mse_robust:.4f}")

# Print max absolute residuals (excluding outliers)
max_resid_regular = np.max(np.abs(residuals_regular[:-2]))
max_resid_robust = np.max(np.abs(residuals_robust[:-2]))
print(f"\nMax Absolute Residuals (excluding outliers):")
print(f"Regular Polynomial: {max_resid_regular:.4f}")
print(f"Robust MG Regression: {max_resid_robust:.4f}")

Output:

1758912736971

MSE (excluding outliers):
Regular Polynomial: 63.8383
Robust MG Regression: 1.0044

Max Absolute Residuals (excluding outliers):
Regular Polynomial: 17.5910
Robust MG Regression: 1.3305

2. Custom Gnostic Loss and Training History¶

For advanced users, the PolynomialRegressor supports custom gnostic loss functions, adaptive weighting, and detailed training history for analysis and visualization.

Advanced: Custom Loss and Training History

# gnostic loss hi or hj     
mg_model = PolynomialRegressor(degree=2, mg_loss='hi', history=True)     
mg_model.fit(X, y)     
# Access training history     
history = mg_model._history     
print(history)

3. Cross-Validation and Gnostic Mean Squared Error¶

Cross-validation is essential for evaluating model generalization. Machine Gnostics provides a CrossValidator for robust, assumption-free validation, and a gnostic version of mean squared error (MSE) that uses the gnostic mean instead of the statistical mean.

The gnostic mean is a robust, assumption-free measure designed to provide deeper insight and reliability, especially in the presence of outliers or non-normal data. This ensures that error metrics reflect the true structure and diagnostic properties of your data, in line with the principles of Mathematical Gnostics.

Cross-Validation with Gnostic and Regular Metrics

# cross validation example (optional)

from machinegnostics.models import CrossValidator
from machinegnostics.metrics import mean_squared_error, root_mean_squared_error, mean_absolute_error

# normal mean squared error
def normal_mse(y_true, y_pred):
    return np.mean((y_true - y_pred) ** 2)

# Define cross-validator
cv = CrossValidator(model=mg_model, X=X, y=y, k=5, random_seed=42)

# Perform cross-validation with mean absolute error
cv_results = cv.evaluate(mean_absolute_error)
print("\nCross-Validation Results (Gnostics - Mean Absolute Error):")
for fold, mse in enumerate(cv_results, 1):
    print(f"Fold {fold}: {mse:.4f}")

# cross validation with root mean absolute error
cv_rmse = CrossValidator(model=mg_model, X=X, y=y, k=5, random_seed=42)
cv_results_rmse = cv_rmse.evaluate(root_mean_squared_error)
print("\nCross-Validation Results (Root Mean Squared Error):")
for fold, rmse in enumerate(cv_results_rmse, 1):
    print(f"Fold {fold}: {rmse:.4f}")

# cross validation with mean squared error
cv_mae = CrossValidator(model=mg_model, X=X, y=y, k=5, random_seed=42)
cv_results_mae = cv_mae.evaluate(mean_squared_error)
print("\nCross-Validation Results (Mean Squared Error):")
for fold, mae in enumerate(cv_results_mae, 1):
    print(f"Fold {fold}: {mae:.4f}")

# cross validation with normal mse
cv_normal = CrossValidator(model=mg_model, X=X, y=y, k=5, random_seed=42)
cv_results_normal = cv_normal.evaluate(normal_mse)
print("\nCross-Validation Results (Regular MSE):")
for fold, mse in enumerate(cv_results_normal, 1):
    print(f"Fold {fold}: {mse:.4f}")

Note:

The mean_squared_error function from Machine Gnostics computes MSE using the gnostic mean, which is more robust to outliers and non-Gaussian data than the traditional mean. Explore more gnostic metrics here
Use gnostic metrics for deeper, more reliable diagnostics in challenging data scenarios.

Tips¶

Use PolynomialRegressor for robust polynomial regression, especially when data may contain outliers or non-Gaussian noise.
Adjust the degree parameter for higher-order polynomial fits.
Use the loss parameter to select different gnostic loss functions for event-level robustness.
Enable record_history=True to analyze training dynamics and convergence.
For more advanced usage and parameter tuning, see the API Reference.

Next: Explore more tutorials and real-world examples in the Examples section!