Loading Now

Summary of Exploring the Boundaries Of On-device Inference: When Tiny Falls Short, Go Hierarchical, by Adarsh Prasad Behera et al.


Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

by Adarsh Prasad Behera, Paulius Daubaris, Iñaki Bravo, José Gallego, Roberto Morabito, Joerg Widmer, Jaya Prakash Varma Champati

First submitted to arxiv on: 10 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Distributed, Parallel, and Cluster Computing (cs.DC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The Hierarchical Inference (HI) system is a promising solution for edge ML systems, allowing for more complex tasks like image classification while reducing latency and energy consumption on devices. By offloading selected samples to an edge server or cloud, HI can improve accuracy compared to on-device inference. However, existing works don’t account for the device’s hardware capabilities, network connectivity, and model types. This paper addresses this gap by comparing HI with on-device inference based on measurements of accuracy, latency, and energy consumption for five devices and three image classification datasets. The results show that HI systems can achieve up to 73% lower latency and up to 77% lower device energy consumption than on-device inference systems. A key challenge is finding small-size models that are accurate enough for remote inference. To address this, the authors propose a hybrid system, Early Exit with HI (EE-HI), which reduces latency by up to 59.7% and lowers device energy consumption by up to 60.4%. The EE-HI system provides a more efficient way to achieve the benefits of HI while minimizing its limitations.
Low GrooveSquid.com (original content) Low Difficulty Summary
Edge ML systems can be more efficient, responsive, and private if they do on-device inference instead of sending data to the cloud. However, this requires less capable models that can fit in devices’ memory. The Hierarchical Inference (HI) system is a solution that offloads some tasks to an edge server or cloud. This makes it faster and more accurate than just doing everything on the device. But existing HI systems don’t consider how different devices, networks, and models affect performance. This paper tests HI against on-device inference using real devices and image classification datasets. It shows that HI can be much faster (up to 73% lower latency) and use less energy (up to 77% lower) than doing everything on the device.

Keywords

» Artificial intelligence  » Image classification  » Inference