Loading Now

Summary of Optimising Tinyml with Quantization and Distillation Of Transformer and Mamba Models For Indoor Localisation on Edge Devices, by Thanaphon Suwannaphong et al.


Optimising TinyML with Quantization and Distillation of Transformer and Mamba Models for Indoor Localisation on Edge Devices

by Thanaphon Suwannaphong, Ferdian Jovan, Ian Craddock, Ryan McConville

First submitted to arxiv on: 12 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Software Engineering (cs.SE)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed research introduces small and efficient machine learning models (TinyML) for on-device indoor localisation on resource-constrained edge devices. The primary goal is to move typical approaches from centralised remote processing to the edge device itself, offering benefits such as increased battery life, enhanced privacy, reduced latency, and lowered operational costs. To achieve this, model compression techniques like quantization and knowledge distillation are employed to significantly reduce the model size while maintaining high predictive performance. The study focuses on deploying a large state-of-the-art transformer-based model within low-power MCUs and proposes a state-space-based architecture using Mamba as an alternative to the transformer. Experimental results demonstrate that the quantized transformer model performs well under 64 KB RAM constraints, achieving a balance between model size and localisation precision. Furthermore, the compact Mamba model shows strong performance even with 32 KB of RAM without the need for model compression.
Low GrooveSquid.com (original content) Low Difficulty Summary
TinyML models are tiny versions of big machine learning models that can run on devices like smartwatches or fitness trackers. These devices don’t have a lot of power or storage, so we need to make the models smaller and more efficient. This paper shows how to do this using special techniques called quantization and knowledge distillation. The goal is to get these tiny models working well even with very limited resources, like 64 KB of RAM. If we can make it work, it could be really useful for things like tracking patient movement in hospitals.

Keywords

» Artificial intelligence  » Knowledge distillation  » Machine learning  » Model compression  » Precision  » Quantization  » Tracking  » Transformer