Issues pertaining to variables need to be addressed to ensure the outcome is a model
that is statistically significant and theoretically sound.
Regression can ascertain relationships, but not the underlying causal mechanism. So,
first and foremost, models must remain true to their theoretical foundation. We must ensure that the
coefficients are meaningful; that their expected sign and magnitude are conceptually correct.
A subtle problem can arise due to the stepwise estimation process. Since it takes one
variable at a time, if a pair (xi, xj) collectively explain a significant
proportion of the variance, they will not be selected by the stepwise method, if by themselves, neither
is significant.
Another issue, multicollinearity or the correlation among independent variables
may make some variables redundant. This needs to be assessed, since in the selection process, it is
likely that after one is included, the others will not be included.
While multicollinearity masks relationships that are not needed for predictive purposes,
it does not reflect on their relationship with the dependent variable. This is discussed in the next
section.
The inclusion of variables with low multicollinearity with the other independent
variables, and high correlation with the dependent variable, increases the overall predictive power of
a model.