What features are most important for a machine learning model? The best measures are the SHAP values (Shapley Additive Explanations) which is based on cooperative game theory through finding what is the impact or value from a single feature relative to all combination of features. It is not the same as a coefficient in a regression so we cannot say what is the impact from a change in the exogenous variable. It does tell us the importance of one feature over others within a model, so it can provide focus on the most relevant features. The SHAP tells us the importance of each feature on the prediction of the model, but it does not provide any information on the quality of the prediction.
The cooperative game approach assumes that features are like players and measures the contribution of the feature on the overall prediction for each observation. The sum of the SHAP values will be 1, so this is the percentage contribution of a given feature.
If there is an observation X, it will have a prediction f(x) with E[f(x)] the expected value of the target variable. The sum of the SHAP will be f(x) which for any observation X will vary against the average E[f(x)]. The SHAP will change with each observation which is an interesting characteristic because that will tell whether a given feature observation will be impactful on the model output. High (low) values of a feature observation may have high (low) positive contributions which provides useful information on the non-linear sensitivity of given features.
There are several ways to look at the SHAP value. Features can be just rank ordered by their SHAP over a given period, so we know what the mean SHAP value for a given feature relative to all other features. What is also useful is a beeswarm or violin graph which shows the feature value (high or low) as color for each observation versus the SHAP value, the impact on model output. This can be done for each feature which will tell us the observation importance on model prediction. The data for given set of observation can also be looked as a waterfall graph.
These types of techniques provide a richness of output from a ML model that can generate insights that may not be possible through standard regression techniques.