Summary of Denseformer: Enhancing Information Flow in Transformers Via Depth Weighted Averaging, by Matteo Pagliardini et al.
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averagingby Matteo Pagliardini, Amirkeivan Mohtashami, Francois…