Traditional Mahalanobis distance is a generalized distance, which can be considered a measure of the degree of similarity (or divergence) in the mean values of different characteristics of a population, considering the correlation among the characteristics. It has been used for many year in clustering classification and discriminant analysis. Mahalanobis distance is attributable to Prof.P.C. Mahalanobis , founder of the Indian Statistics Institute some 60 years ago. Mahalanobis distance has been used for various types of pattern recognition, e.g. inspection systems, face and voice recognition systems , counterfeit detection systems, etc. The figure below displays data published by Fisher (1936) and cluster analysis, where classification into three predetermined categories is demonstrated
Another generalized distance most engineers have encountered is the Euclidean distance between two multivariate points p and q. If p = (p1, p2,…, pn) and q = (q1, q2,…, qn) are two points in Euclidean n-space, then the distance from p to q, or from q to p is given by:
No consideration is given to the correlation between characteristics in Euclidean distance calculations.
Dr. G.Taguchi of Ohken Associates Japan developed an innovative method for determining the generalized distance from the centroid of a reference group (of multivariate data) to a multivariate point. For example, if a doctor were to have a group of very healthy patients, whose vital characteristics like blood pressure, body temperature, skin color, heart rate, and respiration rate, etc. were all considered exemplary, then he could define a Mahalanobis space, a reference space, with those healthy folks, and use the centroid as the zero point and define a unit distance for a continuous degree-of -health scale. If a not-so-healthy person came to the same doctor, and the same characteristics were measured, he would have an MHD number much higher than the reference group. His MHD number would be indicative of his generalized distance from the centroid of the healthy group. As time passed, the MHD number for the not-so-healthy patient could increase (or decrease) , depending on whether his health were failing or improving, respectively. In general, very healthy people tend to look quite similar , while unhealthy people tend to look quite different from one another, (and from the healthy group) . In addition, the changes in correlation structure among the unhealthy patients’ characteristics strongly affect their MHD numbers. In the case where a person’s MHD number reached a predetermined high threshold value, for example, hospitalization might be recommended by the doctor. If the MHD became similar to those of the reference group, the patient could be recommended for simple periodic occasional doctor visits.
From any number of multivariate characteristics measured, it is possible to readily identify those characteristics which are most important (in a pareto sense) . Reducing cost of measurement is an important consideration for many enterprises. There is usually a subset of measurement which provide all necessary data to make correct decisions. Strong correlations between measurement make it possible to eliminate measures that add little value. The information contained in a handful of multivariate measurements may be sufficient to identify abnormal conditions.
A medical trend chart of MHD illustrates the relative level of health of a person as a function of time. For example, daily collection of data for a patient, along with daily estimation of MHD, could be used to track overall health improvements (or deteriorations). Increasing trends could be used for prognostics, to initiate preventive countermeasures, before a threshold condition is reached. The corrective effect of the countermeasure could be captured in the MHD number from the following days. Multivariate process control charts, like Shewhart and Cusum charts are similar , but these are based on probabilistic control limits derived from various statistical distribution assumptions. No such assumptions are made with MHD. Rather, consideration of costs are used to set limits.
For manufactured products, multivariate measures from testing are typically collected following final assembly. If we assume that the health of a manufactured product is analogous to the health of a patient, we could use similar methods to identify abnormal conditions and calculate a continuous MHD number for the multivariate condition. By collecting a group of manufactured systems, with exemplary performance, a Mahalanobis space could be constructed from the multivariate characteristics. A zero point and unit distance scale would be estimated as before. The system’s health could be diagnosed at t=0, just after assembly, and even later at intervals dictated by a data collection schedule. The manufactured product could easily be classified into normal and abnormal states at t=0, and the product’s tendency to become abnormal could be tracked.
The MHD measure can be utilized for many interesting industrial problems including fault detection, fault isolation, degradation identification, and prognostics. For example, air bag deployment system decision relies on the ability to first establish a reference space for normal everyday driving, and then to release the air bags when multivariate shock loads and accelerations exceeds a threshold value. This is fault detection. Fire alarms should actuate when various fire conditions exist over and about that expected from simple kitchen cooking or cigarette smoking. Multivariate reference space would be collected from normal cooking conditions and abnormal fire condition would be declared above some threshold value. Tendency to fail for a high volume printer, with multivariate sensor data, could be inspected periodically, and a service agent could be dispatched or electronic countermeasure could be applied, before customer ever noticed. Availability of the printer would be higher without the fault downtime, and customer satisfaction would be higher.