Summary of Falcon: Faster and Parallel Inference Of Large Language Models Through Enhanced Semi-autoregressive Drafting and Custom-designed Decoding Tree, by Xiangxiang Gao and Weisheng Xie and Yiwei Xiang and Feng Ji
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed…