Regression Analysis — Outliers and other Influential Observations

Box and Whiskers plot - Regression analysis

Exhibit 33.30 Box and Whiskers plot. This indicates whether distribution is skewed and reveals outliers, i.e., values lying beyond the whiskers (data point #5).

An outlier is an observation that is distant from other observations. It may result from measurement error; in which case it should be discarded. Or it may be indicative of a heavy-tailed population distribution, which then violates the assumption of normality.

Box and whisker plots such as the one shown in Exhibit 33.30, reveal outliers in a univariate assessment. For pairs of variables, outliers appear as isolated points on the outskirts of scatterplots. For more than 2 variables, statistical techniques such as Mahalanobis D2 may be used for detecting outliers.

Influential observations are any observations, outliers included, that have a disproportionate effect on the regression results. These need to be carefully examined and should be removed, unless there is a rationale for retaining them.

