Summary of Improving Rare Word Translation with Dictionaries and Attention Masking, by Kenneth J. Sible et al.
Improving Rare Word Translation With Dictionaries and Attention Maskingby Kenneth J. Sible, David ChiangFirst submitted…
Improving Rare Word Translation With Dictionaries and Attention Maskingby Kenneth J. Sible, David ChiangFirst submitted…
Selective Prompt Anchoring for Code Generationby Yuan Tian, Tianyi ZhangFirst submitted to arxiv on: 17…
Neighbor Overlay-Induced Graph Attention Networkby Tiqiao Wei, Ye YuanFirst submitted to arxiv on: 16 Aug…
GeoTransformer: Enhancing Urban Forecasting with Dependency Retrieval and Geospatial Attentionby Yuhao Jia, Zile Wu, Shengao…
Beam Prediction based on Large Language Modelsby Yucheng Sheng, Kai Huang, Le Liang, Peng Liu,…
A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Modelsby Geonhee Kim, Marco Valentino, André…
RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Constructionby Xiucheng Wang, Keda…
Beyond Uniform Query Distribution: Key-Driven Grouped Query Attentionby Zohaib Khan, Muhammad Khaquan, Omer Tafveez, Burhanuddin…
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Expertsby Qizhen Zhang,…
Analytical Uncertainty-Based Loss Weighting in Multi-Task Learningby Lukas Kirchdorfer, Cathrin Elich, Simon Kutsche, Heiner Stuckenschmidt,…