TFRBench: A Reasoning Benchmark for Evaluating Forecasting Systems

Domain	Dataset	Source	Total Samples	Reference Samples	Freq.	Window (in-out)	Ch.	Series
Energy	Solar	GIFT-Eval	274	193	Daily	96-96	1	137
Electricity	GIFT-Eval	1346	353	Daily	96-96	1	81
Sales	Car-parts	GIFT-Eval	1037	692	Monthly	25-25	1	1037
Hierarchical Sales	GIFT-Eval	1370	1076	Daily	96-96	1	81
Web/CloudOps	Bitbrains Fast Storage	GIFT-Eval	1153	678	Hourly	96-96	2	197
Web Traffic	Monash archive	1214	710	Daily	96-96	1	162
Transportation	Traffic	Autoformer et al.	1104	449	Hourly	96-96	1	6
NYC Taxi	nyc.gov	1000	391	Hourly	96-96	1	1
Economics/Finance	Amazon pricing	Yahoo Finance/nasdaq	512	328	Daily	14-7	5	1
Apple pricing	Yahoo Finance/nasdaq	809	509	Daily	14-7	5	1

[Close]:

Cross-Channel Analysis: The Close price forecast must be logically consistent with the other price channels, falling strictly between the High and Low forecasts for the day. It is strongly influenced by the Open price and serves as the primary anchor for the next day's Open. Significant movements in the Close price should be accompanied by a corresponding increase in the Volume forecast, signaling market conviction.
Trend Component Plan: Following the recent sharp decline, model a flattening of the underlying trend. This represents a period of price stabilization. The trend should not be extrapolated downwards aggressively; instead, it should reflect the potential for the price to re-test recent low points as part of a bottoming process.
Seasonality Component Plan: Project the established weekly (day-of-the-week) seasonality pattern forward. This pattern captures the regular, recurring investor behaviors observed throughout a typical trading week.
Residuals & Event Plan: The "U.S. Senate Bombing (1983-11-07)" introduced significant market uncertainty.
- U.S. Senate Bombing (1983-11-07): Incorporate a sharp, negative residual shock (a downward adjustment of around 2-4% of the price) on and for a few days after this date to model the resulting fear and selling pressure.
- Non-Impactful Events: The "Martin Luther King, Jr. Day Bill Signed" and "Jesse Jackson Announces Presidential Candidacy" events are considered to have a negligible impact on market residuals.

[High]:

Cross-Channel Analysis: The High forecast must be the highest value of all price channels for any given day, acting as a ceiling for the Open, Low, and Close. The spread between the High and Low is a measure of volatility and is expected to widen on days with higher forecasted Volume, reflecting greater market activity and uncertain.
Trend Component Plan: The trend for the High price will closely follow the stabilized trend modeled for the other price channels (Open, Close, Low), reflecting a pause in the recent sharp downtrend.
Seasonality Component Plan: Continue to apply the established weekly seasonality pattern, which reflects recurring intraday peaks in buying pressure that vary by the day of the week.
Residuals & Event Plan: The market uncertainty following the "U.S. Senate Bombing (1983-11-07)" will increase intraday volatility.
- U.S. Senate Bombing (1983-11-07): Introduce moderate positive residuals (upward adjustments of 1-3% of the price) on the days following the event to model higher-than-expected price spikes as the market reacts with choppiness.

[Low]:

Cross-Channel Analysis: The Low forecast must be the lowest value of all price channels for a given day, serving as the floor for the Open, High, and Close. A widening gap between High and Low, driven by a lower Low, indicates increased selling pressure and should be correlated with higher forecasted Volume.
Trend Component Plan: The trend for the Low price will move in lockstep with the other price channels. Model a stabilization of the trend, indicating that the intense downward pressure from the recent sell-off is subsiding but that the risk of re-testing recent lows remains.
Seasonality Component Plan: Project the existing weekly seasonality pattern forward, as intraday selling pressure often reaches its peak at predictable times during the trading week.
Residuals & Event Plan: The "U.S. Senate Bombing (1983-11-07)" is expected to heighten investor fear.
- U.S. Senate Bombing (1983-11-07): Apply a significant negative residual shock (a downward adjustment of 3-5% of the price) on and immediately after this date to capture the increased selling pressure pushing the daily low further down.

[Open]:

Cross-Channel Analysis: The Open price is fundamentally anchored to the previous day's Close. Forecasted gaps between the prior Close and the Open signal overnight sentiment shifts and suggest a more volatile session ahead, which should be reflected in a wider High-Low range and higher Volume for that day.
Trend Component Plan: The trend for the Open will be nearly identical to the Close price's trend. It should reflect the same stabilization pattern after the recent sharp price decline.
Seasonality Component Plan: Maintain the established day-of-the-week seasonality pattern, as certain days may systematically open higher or lower relative to the previous close due to recurring news cycles or investor behavior.
Residuals & Event Plan: The primary impact of the "U.S. Senate Bombing (1983-11-07)" would be felt at the market open on the following day.
- U.S. Senate Bombing (1983-11-07): Apply a large negative residual on the morning of 1983-11-08 to model the market gapping down as it digests the overnight news.

[Volume]:

Cross-Channel Analysis: Volume is positively correlated with volatility (the High-Low spread). A forecast for a stable or narrow trading range in the price channels should correspond with a forecast for lower Volume. Conversely, a large price movement or wide trading range implies higher Volume.
Trend Component Plan: The underlying trend for Volume should show a sharp decay from the recent massive spike. Model a mean-reversion process where trading activity returns toward more normal historical levels, though potentially remaining slightly elevated compared to the pre-spike baseline.
Seasonality Component Plan: Project the typical weekly seasonality forward. This often includes higher volume at the beginning and end of the week (Monday/Friday) and lower volume mid-week.
Residuals & Event Plan:
- U.S. Senate Bombing (1983-11-07): This event will drive a surge in trading activity. Incorporate a large positive residual shock (a spike of 1.5x to 2.5x the recent average) to Volume around this date to reflect fear-based trading.
- Veterans Day (1983-11-11): Apply a moderate negative residual to model the lighter-than-expected trading volume often associated with holidays.

TFRBench :
A Reasoning Benchmark for Evaluating Forecasting Systems

Abstract

TFRBench Leaderboard

Datasets

TFRBench provides a multi-domain suite with diverse temporal frequencies and dimensionalities, including univariate to multivariate series.

Reasoning Output Example

Reasoning Example (Economics/Finance Domain)

BibTeX

TFRBench : A Reasoning Benchmark for Evaluating Forecasting Systems

Abstract

TFRBench Leaderboard

Datasets

TFRBench provides a multi-domain suite with diverse temporal frequencies and dimensionalities, including univariate to multivariate series.

Reasoning Output Example

Reasoning Example (Economics/Finance Domain)

BibTeX

TFRBench :
A Reasoning Benchmark for Evaluating Forecasting Systems