Paper ID: 2409.02130

From Predictive Importance to Causality: Which Machine Learning Model Reflects Reality?

Muhammad Arbab Arshad, Pallavi Kandanur, Saurabh Sonawani, Laiba Batool, Muhammad Umar Habib

This study analyzes the Ames Housing Dataset using CatBoost and LightGBM models to explore feature importance and causal relationships in housing price prediction. We examine the correlation between SHAP values and EconML predictions, achieving high accuracy in price forecasting. Our analysis reveals a moderate Spearman rank correlation of 0.48 between SHAP-based feature importance and causally significant features, highlighting the complexity of aligning predictive modeling with causal understanding in housing market analysis. Through extensive causal analysis, including heterogeneity exploration and policy tree interpretation, we provide insights into how specific features like porches impact housing prices across various scenarios. This work underscores the need for integrated approaches that combine predictive power with causal insights in real estate valuation, offering valuable guidance for stakeholders in the industry.

Submitted: Sep 1, 2024