Evaluating Metrics for Fund Selection

Morningstar vs Lipper vs S&P Capital IQ…

June 2025. Reading Time: 10 Minutes. Author: Nicolas Rabener.

SUMMARY

  • The most popular fund selection metrics have no predictive power
  • Unsurprisingly, fees matter
  • Fund selection is challenging

INTRODUCTION

Fund databases such as Morningstar, FE Fund Info, and Lipper use broadly similar methodologies to rank investment funds. Morningstar’s quantitative Star Rating evaluates funds based on risk-adjusted returns relative to peers over multiple time horizons. Lipper’s Leader Ratings assess funds across five dimensions: total return, consistent return, capital preservation, expenses, and tax efficiency – all benchmarked against peer groups. S&P Capital IQ, meanwhile, uses a model that incorporates performance, volatility, and Sharpe ratios.

While these systems share a common goal – to identify funds likely to outperform or deliver superior risk-adjusted returns on a relative basis – their effectiveness remains an open question.

In this research article, we will analyze and compare the different fund ranking methodologies to assess their predictive value.

EVALUATION OF FUND RANKING METRICS 

Most fund databases evaluate performance relative to peer groups rather than benchmark indices – a practice that arguably lowers the performance bar for funds (pun intended). If benchmark indices were used instead, it would likely reveal that a relatively small number of funds consistently outperform, as evidenced by the research from S&P SPIVA. This insight, however, may not align with the commercial interests of firms that sell fund data.

In contrast, our research has no such limitations. We evaluate funds against their benchmark indices, all of which are investable through low-cost ETFs. Our analysis focuses on all U.S.-listed equity mutual funds and ETFs with at least a 10-year performance history. To ensure meaningful comparisons, we only include funds with an R² of at least 0.80 relative to their benchmarks – our benchmark selection achieves an average R² of 0.93. The resulting dataset includes approximately 2,500 funds.

We rank funds using several metrics and select the top and bottom 10% during an in-sample period (2015 – 2019), and then evaluate their behavior in the out-o