Summary of Derivational Morphology Reveals Analogical Generalization in Large Language Models, by Valentin Hofmann et al.
Derivational Morphology Reveals Analogical Generalization in Large Language Models
by Valentin Hofmann, Leonie Weissweiler, David Mortensen, Hinrich Schütze, Janet Pierrehumbert
First submitted to arxiv on: 12 Nov 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Large language models (LLMs) have been found to exhibit linguistic generalization abilities, with most studies analyzing whether their language skills resemble rules. However, it is unclear if analogical processes, which can be formalized as similarity operations on stored exemplars, also play a role in this phenomenon. This study examines derivational morphology, specifically English adjective nominalization, which displays notable variability, to investigate the underlying mechanisms of linguistic generalization in LLMs. The authors introduce a new method for investigating linguistic generalization in LLMs, focusing on GPT-J and fitting cognitive models that instantiate rule-based and analogical learning to the LLM training data. The results show that while both rule-based and analogical models explain the predictions of GPT-J equally well for adjectives with regular nominalization patterns, the analogical model provides a much better match for adjectives with variable nominalization patterns. Furthermore, GPT-J’s behavior is sensitive to individual word frequencies, even for regular forms, suggesting that analogical processes play a bigger role in linguistic generalization than previously thought. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models (LLMs) are super smart computers that can understand and generate human-like text. Researchers want to know how these computers learn new words and rules of language. Most studies think the LLMs learn through simple rules, but some scientists believe they might also be using a more creative process called analogy. This study looks at a specific type of word formation in English, where adjectives become nouns. The researchers found that while both rule-based and analogical models do well for common cases, the analogical model is much better at explaining rare or unusual cases. This suggests that LLMs might be using their own creative thinking to learn language, rather than just following simple rules. |
Keywords
» Artificial intelligence » Generalization » Gpt