Authors
Veronica Guidetti, Federica Mandreoli
Publication date
2024
Description
Machine learning (ML) has transformed healthcare, improving diagnostics, treatment, research, and patient care. However, clinical decision support (CDS) still relies heavily on classical statistical models and manual rules, lacking transparency and accuracy. Starting from the mid-20th century, scoring systems offer a transparent approach to CDS development. Nevertheless, classical methods for scoring systems like logistic regression may lack predictive accuracy and suffer in handling complex high-dimensional electronic health record data, while black-box ML models pose risks due to their lack of interpretability. To address these challenges, our group focuses on developing interpretable symbolic ML approaches, leveraging multi-objective symbolic regression (MOSR) to accelerate index development, mitigate human bias, and allow for the exploration of new aggregation functions and weighting systems. MOSR optimizes multiple objectives simultaneously, distilling complex phenomena into non-linear yet understandable constructs, a crucial aspect for gaining trust from healthcare professionals. Moreover, MOSR is highly flexible and extendable to classical statistical models. This paper presents our experience in developing data-driven scoring systems, building on real-world applications such as COVID-19 mortality prediction and risk estimation post-liver transplantation. Our methodology involves designing the entire data pipeline, from feature selection to scoring formula generation, highlighting the importance of developing data-centric and interpretable ML techniques for high-risk domains.