make_stackloss_check_data: Stack Loss Dataset¶

The make_stackloss_check_data function retrieves the classic Stack Loss dataset (Brownlee, 1965). This dataset describes the operation of a plant for the oxidation of ammonia to nitric acid and is a standard benchmark for robust regression due to the presence of well-known outliers.

Overview¶

The dataset consists of 21 operational days of a plant converting ammonia to nitric acid. The goal is typically to predict the Stack Loss (the amount of ammonia escaping unabsorbed) based on three operational variables.

Significance: Ideally suited for demonstrating robust regression methods because it contains several acknowledged outliers (observations 1, 2, 3, and 21) that can distort standard least-squares models.
Size: 21 samples, 4 variables.

Data Dictionary¶

The dataset is returned as a single matrix with the following columns:

Air Flow (Feature): Rate of operation of the plant.
Water Temp. (Feature): Cooling water inlet temperature.
Acid Conc. (Feature): Acid concentration (in per 1000 minus 500).
Stack.Loss (Target): Amount of ammonia escaping the absorption column.

Returns¶

Return	Type	Description
`data`	numpy.ndarray	The complete data array of shape `(21, 4)`.
`column_names`	list of str	The list of columns: `['Air Flow', 'Water Temp.', 'Acid Conc.', 'Stack.Loss']`

Example Usage¶

from machinegnostics.datasets import make_stackloss_check_data
import numpy as np

# Load the dataset
data, names = make_stackloss_check_data()

print(f"Data shape: {data.shape}")
print(f"Columns: {names}")

# Separate Features (X) and Target (y)
X = data[:, :3]
y = data[:, 3]

Source: Brownlee, K. A. (1965). Statistical Theory and Methodology in Science and Engineering. New York: John Wiley & Sons.

Author: Nirmal Parmar