菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-13
📄 Abstract - TempusBench: An Evaluation Framework for Time-Series Forecasting

Foundation models have transformed natural language processing and computer vision, and a rapidly growing literature on time-series foundation models (TSFMs) seeks to replicate this success in forecasting. While recent open-source models demonstrate the promise of TSFMs, the field lacks a comprehensive and community-accepted model evaluation framework. We see at least four major issues impeding progress on the development of such a framework. First, current evaluation frameworks consist of benchmark forecasting tasks derived from often outdated datasets (e.g., M3), many of which lack clear metadata and overlap with the corpora used to pre-train TSFMs. Second, existing frameworks evaluate models along a narrowly defined set of benchmark forecasting tasks such as forecast horizon length or domain, but overlook core statistical properties such as non-stationarity and seasonality. Third, domain-specific models (e.g., XGBoost) are often compared unfairly, as existing frameworks neglect a systematic and consistent hyperparameter tuning convention for all models. Fourth, visualization tools for interpreting comparative performance are lacking. To address these issues, we introduce TempusBench, an open-source evaluation framework for TSFMs. TempusBench consists of 1) new datasets which are not included in existing TSFM pretraining corpora, 2) a set of novel benchmark tasks that go beyond existing ones, 3) a model evaluation pipeline with a standardized hyperparameter tuning protocol, and 4) a tensorboard-based visualization interface. We provide access to our code on GitHub: this https URL.

顶级标签: model evaluation benchmark machine learning
详细标签: time-series forecasting foundation models evaluation framework hyperparameter tuning visualization 或 搜索:

TempusBench:一个时间序列预测的评估框架 / TempusBench: An Evaluation Framework for Time-Series Forecasting


1️⃣ 一句话总结

这篇论文针对当前时间序列预测模型评估中存在的四大问题,提出了一个名为TempusBench的开源评估框架,旨在通过提供新的数据集、更全面的评估任务、标准化的模型调优流程和可视化工具,来更公平、更全面地衡量和比较预测模型的性能。

源自 arXiv: 2604.11529