Summary of Towards Neural Scaling Laws For Time Series Foundation Models, by Qingren Yao et al.

Towards Neural Scaling Laws for Time Series Foundation Models

by Qingren Yao, Chao-Han Huck Yang, Renhe Jiang, Yuxuan Liang, Ming Jin, Shirui Pan

First submitted to arxiv on: 16 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates the scaling behavior of time series foundation models (TSFMs) on both in-distribution (ID) and out-of-distribution (OOD) data, focusing on two common architectures: encoder-only and decoder-only Transformers. The authors train and evaluate these models across varying parameter counts, compute budgets, and dataset sizes to understand their log-likelihood loss scaling behavior. They find that the log-likelihood loss exhibits similar scaling behavior in both OOD and ID settings. Additionally, the authors compare the scaling properties of different architectures, including two state-of-the-art TSFMs as case studies, revealing that model architecture plays a significant role in scaling. The findings suggest that encoder-only Transformers are more scalable than decoder-only Transformers, while architectural enhancements primarily improve ID performance but reduce OOD scalability.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper looks at how big models for time series data get better or worse when they’re bigger or smaller. They tested two types of models: ones that just look at the data and ones that also try to predict what the data should be. The authors found out that these models behave similarly when they’re working with data they know (in-distribution) and when they’re trying to work with new, unknown data (out-of-distribution). They also compared different types of models and found that some are better at handling unknown data than others.

Keywords

» Artificial intelligence » Decoder » Encoder » Log likelihood » Time series

Towards Neural Scaling Laws for Time Series Foundation Models

by Qingren Yao, Chao-Han Huck Yang, Renhe Jiang, Yuxuan Liang, Ming Jin, Shirui Pan

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Optimizing Yolov5s Object Detection Through Knowledge Distillation Algorithm, by Guanming Huang et al.

Summary of Challenges, Methods, Data — a Survey Of Machine Learning in Water Distribution Networks, by Valerie Vaquet et al.

Related Posts