Summary of Knowledge Composition Using Task Vectors with Learned Anisotropic Scaling, by Frederic Z. Zhang et al.
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling
by Frederic Z. Zhang, Paul Albert, Cristian Rodriguez-Opazo, Anton van den Hengel, Ehsan Abbasnejad
First submitted to arxiv on: 3 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A pre-trained model’s fine-tuning direction is characterized by the task vector, which represents the learned weight difference relative to the original model. This paper investigates whether individual components of these vectors, such as parameter blocks, exhibit similar properties and can be combined to enhance knowledge composition and transfer. The proposed algorithm, aTLAS, linearly combines parameter blocks with different coefficients, allowing for anisotropic scaling at the task vector level. By exploiting the low intrinsic dimensionality of pre-trained models, aTLAS reduces the dependency on large amounts of data and improves generalizability in tasks such as few-shot recognition and test-time adaptation. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Pre-trained models are great because they can be adapted to do new things with just a little bit of extra training. The special sauce that makes this happen is called the task vector. It’s like a direction arrow that shows how much the model needs to change to learn something new. This paper looks at what happens when we break down the task vector into smaller pieces, called parameter blocks, and see if we can combine them in clever ways to make the model even better. The answer is yes! By doing this, we can make the model work with less data and be more flexible. We tested our idea on some tricky tasks like recognizing things from just a few examples or adapting to new situations without needing lots of labeled training data. |
Keywords
* Artificial intelligence * Few shot * Fine tuning