Loading Now

Summary of Time-reversal Provides Unsupervised Feedback to Llms, by Yerram Varun et al.


Time-Reversal Provides Unsupervised Feedback to LLMs

by Yerram Varun, Rahul Madhavan, Sravanti Addepalli, Arun Suggala, Karthikeyan Shanmugam, Prateek Jain

First submitted to arxiv on: 3 Dec 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper explores whether Large Language Models (LLMs) can be empowered to think backwards and provide unsupervised feedback that complements forward LLMs. To achieve this, the authors introduce Time Reversed Language Models (TRLMs), which can score and generate queries when conditioned on responses, effectively functioning in reverse time direction. A pre-trained and fine-tuned language model (TRLM-Ba) is used to infer response-to-query directions from scratch. Empirical results show that TRLMs can complement forward model predictions, achieving up to 5% improvement on the AlpacaEval Leaderboard using self-log perplexity scores. The paper also demonstrates the effectiveness of TRLM scoring in applications such as citation generation and passage retrieval, outperforming conventional forward scoring. Furthermore, the generative ability of TRLMs is leveraged to provide unsupervised feedback to input safety filters of LLMs, reducing false negative rates while maintaining low false positive rates against several attacks on JailbreakBench.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper shows that Large Language Models can be taught to look back and give helpful feedback. The authors create a new type of model called Time Reversed Language Models (TRLMs) that can predict and score in the opposite direction of time. This allows TRLMs to provide better feedback than traditional forward models. The results show that using TRLMs can improve performance on tasks like generating citations and retrieving passages. Additionally, TRLMs can help improve the accuracy of input safety filters for LLMs by providing unsupervised feedback.

Keywords

» Artificial intelligence  » Language model  » Perplexity  » Unsupervised