Empirical test of whether Bloomberg's proprietary news-sentiment signal can be replicated with open-source FinBERT — and what the gap tells us about quant strategies
Bloomberg sells a daily news-sentiment field called NEWS_SENTIMENT_DAILY_AVG that aggregates thousands of stories per stock per day from a curated, mostly paywalled corpus. Quants use it as a feature in systematic strategies. The question this project asked: is the signal actually predictive on its own, or is its value purely commercial moat? And separately — can an open-source FinBERT pipeline applied to free Yahoo Finance headlines reproduce it?
I pulled 30 large-cap S&P 100 names balanced across 10 GICS sectors over 2018-2026 (~62,800 stock-day observations) directly from the Bloomberg Terminal via a VBA macro driving the Excel BDH API. The signal was tested at multiple holding horizons (1d, 5d, 21d), used to construct a long-short quintile portfolio with 5-day rebalancing, and attributed to Fama-French 3 factors with Newey-West HAC standard errors. Then I applied ProsusAI/finbert to ~300 Yahoo headlines and Spearman-correlated my output against Bloomberg's signal on matched (date, ticker) cells.
The replication failed by enough that the failure became the most defensible finding: Bloomberg's signal cannot be reconstructed from free public news, which is exactly what makes it commercially valuable.
=BDH(ticker, {fields}, dates) per sheet covering PX_LAST, NEWS_SENTIMENT_DAILY_AVG, CUR_MKT_CAP at once, then pins to values so the workbook survives outside the live session.ProsusAI/finbert applied to ~300 Yahoo headlines, daily aggregation, Spearman comparison against Bloomberg. Result: −0.26 (p=0.10, N=41). Manual inspection confirms FinBERT's polarity calls are reasonable per headline — the mismatch is that the two systems aggregate different news corpora.python-docx, so the writeup is reproducible end-to-end.