Loading Now

Summary of Towards Exact Gradient-based Training on Analog In-memory Computing, by Zhaoxian Wu and Tayfun Gokmen and Malte J. Rasch and Tianyi Chen


Towards Exact Gradient-based Training on Analog In-memory Computing

by Zhaoxian Wu, Tayfun Gokmen, Malte J. Rasch, Tianyi Chen

First submitted to arxiv on: 18 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Hardware Architecture (cs.AR); Optimization and Control (math.OC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the feasibility of training large-scale AI models on analog in-memory accelerators, which offer a promising solution for energy-efficient AI. The study focuses on the training perspective, as previous research has mainly focused on inference. The authors highlight the limitations of stochastic gradient descent (SGD) algorithm when applied to model training on non-ideal devices, leading to inexactly converged results. To address this issue, they introduce a heuristic analog algorithm called Tiki-Taka, which empirically outperforms SGD and rigorously shows its ability to exactly converge to a critical point.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about finding ways to train big AI models on special computers that use less energy. Right now, training these models takes up too much energy and costs the environment. The authors looked at how we currently train these models using something called SGD, but they found it’s not very good because the computers are not perfect. They then came up with a new way to train the models, called Tiki-Taka, which works better than what we’re doing now.

Keywords

» Artificial intelligence  » Inference  » Stochastic gradient descent