Summary of Time-reversal Provides Unsupervised Feedback to Llms, by Yerram Varun et al.

Time-Reversal Provides Unsupervised Feedback to LLMs

by Yerram Varun, Rahul Madhavan, Sravanti Addepalli, Arun Suggala, Karthikeyan Shanmugam, Prateek Jain

First submitted to arxiv on: 3 Dec 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper explores whether Large Language Models (LLMs) can be empowered to think backwards and provide unsupervised feedback that complements forward LLMs. To achieve this, the authors introduce Time Reversed Language Models (TRLMs), which can score and generate queries when conditioned on responses, effectively functioning in reverse time direction. A pre-trained and fine-tuned language model (TRLM-Ba) is used to infer response-to-query directions from scratch. Empirical results show that TRLMs can complement forward model predictions, achieving up to 5% improvement on the AlpacaEval Leaderboard using self-log perplexity scores. The paper also demonstrates the effectiveness of TRLM scoring in applications such as citation generation and passage retrieval, outperforming conventional forward scoring. Furthermore, the generative ability of TRLMs is leveraged to provide unsupervised feedback to input safety filters of LLMs, reducing false negative rates while maintaining low false positive rates against several attacks on JailbreakBench.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper shows that Large Language Models can be taught to look back and give helpful feedback. The authors create a new type of model called Time Reversed Language Models (TRLMs) that can predict and score in the opposite direction of time. This allows TRLMs to provide better feedback than traditional forward models. The results show that using TRLMs can improve performance on tasks like generating citations and retrieving passages. Additionally, TRLMs can help improve the accuracy of input safety filters for LLMs by providing unsupervised feedback.

Keywords

* Artificial intelligence * Language model * Perplexity * Unsupervised

Time-Reversal Provides Unsupervised Feedback to LLMs

by Yerram Varun, Rahul Madhavan, Sravanti Addepalli, Arun Suggala, Karthikeyan Shanmugam, Prateek Jain

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Gracefully Filtering Backdoor Samples For Generative Large Language Models Without Retraining, by Zongru Wu et al.

Summary of Gaussian Splatting Under Attack: Investigating Adversarial Noise in 3d Objects, by Abdurrahman Zeybey et al.

Related Posts