Skip to content

DataCluster: Advanced Cluster Boundary Detection for Gnostic Distribution Functions (Machine Gnostics)

The DataCluster class identifies main cluster boundaries (LCB and UCB) from probability density functions of Gnostic Distribution Functions (GDFs): ELDF, EGDF, QLDF, and QGDF. It uses normalized PDF analysis, derivative-based methods, and shape detection algorithms for robust cluster identification.


Overview

DataCluster provides automated, robust cluster boundary detection for GDFs. It adapts its algorithm based on the type of GDF, using derivative thresholds, valley detection, and slope analysis to find the main cluster region. The class is designed for scientific, engineering, and data science applications where interpretable cluster boundaries are needed.

  • Supports All GDF Types: ELDF, EGDF, QLDF, QGDF.
  • PDF Normalization: Ensures consistent analysis across distributions.
  • Shape Detection: W-shape/U-shape/heterogeneous detection for QLDF.
  • Derivative-Based Boundaries: Uses first and second derivatives for boundary detection.
  • Fallback Strategies: Falls back to data bounds if boundary detection fails.
  • Diagnostic: Tracks errors, warnings, and method details.
  • Visualization: Plots PDF, boundaries, and derivative analysis.

Key Features

  • Automated cluster boundary detection for GDFs
  • PDF normalization and robust derivative analysis
  • Shape-based valley detection for QLDF
  • Adaptive thresholding and slope analysis
  • Comprehensive error handling and diagnostics
  • Visualization of PDF, boundaries, and cluster regions

Parameters

Parameter Type Default Description
gdf ELDF/EGDF/QLDF/QGDF required Fitted GDF object with pdf_points available
verbose bool False Print detailed logs and diagnostics
catch bool True Store errors, warnings, and results
derivative_threshold float 0.01 Threshold for ELDF/EGDF boundary detection
slope_percentile int 70 Percentile for QLDF/QGDF slope-based detection

Attributes

  • LCB: float or NoneCluster Lower Boundary (left boundary of main cluster)
  • UCB: float or NoneCluster Upper Boundary (right boundary of main cluster)
  • z0: float or NoneCharacteristic point of the distribution
  • S_opt: float or NoneOptimal scale parameter from GDF
  • pdf_normalized: ndarray or NoneMin-max normalized PDF values [0,1]
  • pdf_original: ndarray or NoneOriginal PDF values
  • params: dictComplete analysis results, boundaries, diagnostics, and method details
  • fitted: bool Indicates whether clustering analysis has been completed

Methods

fit(plot=False)

Performs cluster boundary detection analysis.

  • plot: bool (optional) If True, generates a plot of the PDF, detected boundaries, and derivative analysis.

Returns: Tuple[float or None, float or None] — The detected LCB and UCB values. Returns None for a bound if it cannot be determined.


results()

Returns a comprehensive cluster analysis results dictionary.

Returns: dict — Contains LCB, UCB, cluster width, GDF type, Z0, S_opt, method details, errors, and warnings.


plot(figsize=(12, 8))

Creates a visualization of the PDF, detected boundaries, and derivative analysis.

  • figsize: tuple (default: (12, 8)) Figure size

Returns: None (displays plot)


Example Usage

import numpy as np
from machinegnostics.magcal import ELDF, DataCluster

data = np.array([ -13.5, 0, 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])

eldf = ELDF()
eldf.fit(data)

cluster = DataCluster(gdf=eldf, verbose=True)
CLB, CUB = cluster.fit(plot=True)

results = cluster.results()
print(f"Lower boundary: {results['LCB']}")
print(f"Upper boundary: {results['UCB']}")
print(f"Cluster width: {results['cluster_width']}")

Notes

  • Clustering works best with local distribution functions (ELDF, QLDF).
  • Global functions (EGDF, QGDF) have limited clustering effectiveness due to uniqueness constraints.
  • QLDF W-shape detection is effective for central clusters between outlying regions.
  • For heterogeneous data with multiple clusters, consider splitting the dataset before analysis.
  • Errors and warnings are tracked in the results dictionary.

References

  • Gnostic Distribution Function theory and cluster analysis methods (see mathematical gnostics literature).

Author: Nirmal Parmar
Date: 2025-09-24