Content area
Full Text
In statistical process control (SPC), models are built from baseline data that are observations during successful production. Often the baseline data has to be extracted from a long steam of historical data that includes observations from both successful and unsuccessful productions. Baseline periods have to be identified correctly to ensure that the SPC models are correct and, subsequently, the on-line monitoring based on these models is effective.
This paper proposes a new method to identify baseline periods in a long historical dataset. The method identifies baseline periods where the quality is good, the quality variable has a stable distribution, and the time intervals are sufficiently long. The proposed method is tested on a real dataset from a melting process and yields a baseline that is considered reasonable and convincing to the process engineers. Simulation experiments also show that the proposed method is robust to the distribution of the quality variable by consistently identifying correct baseline periods across different distributions. In contrast, two existing methods of change-point identification are very sensitive to distribution assumptions.
Key Words: Agglomerative Hierarchical Clustering; Baseline; Change-Point Identification; Cluster Analysis; Data Mining; Likelihood Ratio Test; Statistical Process Control.
IN STATISTICAL PROCESS CONTROL (SPC), baseline data consist of observations of process and quality variables collected when the process is operating successfully; that is, the process is manufacturing properly and the product quality is good. The baseline data is used to construct a model (or models) of successful production. During on-line monitoring, the current observation is compared with the baseline models. If they are inconsistent, it is a signal that the process needs adjustment or correction. To ensure that the model is correct and the resulting on-line SPC monitoring system will perform well, it is critical to choose the baseline data with care.
In practice, the baseline is selected from a historical dataset, which may consist of many observations over a long time period that includes intervals of both successful and unsuccessful production. The purpose of this paper is to devise a method to automatically extract baseline periods from a long stream of historical data to form the baseline.
Baseline periods should have a stable productquality distribution and the most favorable quality. Also, a baseline period has to be long enough to exclude...