
Unlocking Alpha from Textual Data
The asset manager’s competitive landscape has intensified, and traditional investment factors – such as value, momentum, and quality – are increasingly well-understood, resulting in diminished returns (e.g. McLean & Pontiff, 2015; Calluzzo, Moneta, & Topaloglu S., 2019; Jacobs, Kenneth, & Lee, 2025).
One promising yet still underutilized source of alpha lies in textual data (e.g. see review by Sun et al. 2024). While traditional investment strategies rely on structured financial data, a vast amount of valuable and complementary information remains hidden within unstructured textual sources, such as annual reports, patents, earnings calls transcripts, and employee reviews.
Until recently, systematically leveraging these textual sources for investment signals remained impractical, due to technological and infrastructural limitations.
The rapid advancement of Large Language Models (LLMs) – AI systems capable of deeply understanding and interpreting human language – has changed this. LLMs now offer asset managers a powerful means of systematically extracting and quantifying insights from massive textual datasets. Leaders in the AI space, such as Open AI, Alphabet, and Anthropic, offer the most advanced models through developer-friendly interfaces, making it straight-forward to start experimenting with. While the transformative potential of LLMs is often discussed in broad terms, practical implementation in investment processes requires clear understanding, specialized expertise, and robust infrastructure.
[....]