DataHomogeneity: Homogeneity Analysis for EGDF (Machine Gnostics)¶
The DataHomogeneity
class provides robust, numerical homogeneity analysis for Estimating Global Distribution Functions (EGDF) by examining the shape and characteristics of their probability density functions (PDF). It is designed to detect outliers, clusters, and non-homogeneous structure in data using gnostic theory principles.
Overview¶
DataHomogeneity analyzes the fitted EGDF's PDF to determine if the underlying data is homogeneous. Homogeneity is defined by the presence of a single global maximum (unimodal PDF) and the absence of negative density values. The class uses robust peak detection, configurable smoothing, and comprehensive diagnostics to provide reliable results.
Gnostic vs. Statistical Homogeneity:
Gnostic homogeneity analysis is based on the algebraic and geometric properties of the data and the EGDF, not on statistical or probabilistic assumptions. It is deterministic, reproducible, and sensitive to both outliers and clusters, making it fundamentally different from classical statistical homogeneity tests.
- Assumption-Free: No parametric or probabilistic assumptions.
- Numerical: Decisions are made based on numerical analysis, not visual inspection.
- Robust: Detects outliers and clusters via PDF maxima.
- Diagnostic: Tracks errors, warnings, and analysis parameters.
- Memory-Efficient: Optional flushing of large arrays after analysis.
- Visualization: Built-in plotting for PDF and detected maxima.
Key Features¶
- Automatic EGDF validation and homogeneity testing
- Robust peak detection with configurable smoothing
- Comprehensive error and warning tracking
- Memory management with optional data flushing
- Detailed visualization of analysis results
- Integration with EGDF parameter systems
Parameters¶
Parameter | Type | Default | Description |
---|---|---|---|
gdf |
EGDF | required | Fitted EGDF object (must have catch=True and be fitted) |
verbose |
bool | True | Print detailed progress, warnings, and results |
catch |
bool | True | Store all analysis results and metadata |
flush |
bool | False | Clear large arrays after analysis to save memory |
smoothing_sigma |
float | 1.0 | Gaussian smoothing parameter for PDF preprocessing |
min_height_ratio |
float | 0.01 | Minimum relative height threshold for peak detection |
min_distance |
int or None | None | Minimum separation between detected peaks (auto if None) |
Attributes¶
- is_homogeneous:
bool or None
Primary analysis result (None before fit, True/False after analysis) - picks:
List[Dict]
Detected maxima with detailed information (index, position, value, global flag) - z0:
float or None
Global optimum value from EGDF or detected from PDF - global_extremum_idx:
int or None
Array index of the global maximum - fitted:
bool
Indicates if analysis has been completed
Methods¶
fit(plot=False)
¶
Performs comprehensive homogeneity analysis on the EGDF object.
- plot:
bool
(optional)
If True, generates plots for visual inspection of the analysis results
Returns:
bool
— True if data is homogeneous, False otherwise
results()
¶
Retrieves comprehensive homogeneity analysis results and metadata.
Returns:
dict
— Contains keys such as 'is_homogeneous', 'picks', 'z0', 'global_extremum_idx', 'analysis_parameters', 'gdf_parameters', 'errors', 'warnings'
plot(figsize=(12, 8), title=None)
¶
Visualizes the PDF, detected maxima, and homogeneity status.
- figsize:
tuple
(default: (12, 8))
Figure size in inches - title:
str or None
Custom plot title
Returns:
None (displays plot)
Example Usage¶
import numpy as np
from machinegnostics.magcal import EGDF, DataHomogeneity
# Homogeneous data
data = np.array([ -13.5, 0, 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
egdf = EGDF(data=data, catch=True)
egdf.fit()
# Homogeneity analysis
homogeneity = DataHomogeneity(egdf, verbose=True)
is_homogeneous = homogeneity.fit(plot=True)
print(f"Data is homogeneous: {is_homogeneous}")
# Access results
results = homogeneity.results()
print(f"Number of maxima detected: {len(results['picks'])}")
Notes¶
- Only supports EGDF objects (not QGDF, ELDF, or QLDF)
- Homogeneity is defined by a single global maximum and no negative PDF values
- Outliers and clusters are detected as additional maxima
- Numerical analysis is preferred over visual inspection for reliability
- Use
flush=True
for large datasets to save memory - All errors and warnings are tracked in the results dictionary
Author: Nirmal Parmar
Date: 2025-09-24