Loading Now

Summary of Hight: Hierarchical Graph Tokenization For Graph-language Alignment, by Yongqiang Chen et al.


HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment

by Yongqiang Chen, Quanming Yao, Juzheng Zhang, James Cheng, Yatao Bian

First submitted to arxiv on: 20 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty summary: Recently, there has been a surge of interest in extending the success of large language models (LLMs) to graph modality, such as social networks and molecules. While existing approaches adopt graph neural networks to represent graphs as node tokens and feed them into LLMs for alignment, they overlook the hierarchical structures inherent in graph data. In molecular graphs, high-order structural information contains rich semantics of functional groups, encoding biochemical functionalities. We establish a benchmark showing that neglecting hierarchy leads to subpar alignment and hallucination. To address this, we propose Hierarchical GrapH Tokenization (HIGHT), which extracts and encodes node, motif, and graph levels of informative tokens to improve LLMs’ graph perception. HIGHT also uses an augmented fine-tuning dataset enriched with hierarchical information. Extensive experiments on 7 molecule-centric benchmarks confirm the effectiveness of HIGHT in reducing hallucination by 40% and improving downstream tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty summary: Researchers have been trying to use big language models for understanding graphs, like social networks or molecules. Right now, most approaches represent graphs as a series of nodes and feed them into language models for alignment. However, they don’t take into account the hierarchy of information in these graphs. In molecular graphs, this hierarchy contains important clues about how molecules work. We showed that ignoring this hierarchy leads to bad results and “hallucinations” (making things up). To fix this, we developed a new way to represent graphs called HIGHT, which helps language models understand graphs better. We tested HIGHT on 7 different benchmarks and found it reduces hallucination by 40% and improves performance in other tasks.

Keywords

» Artificial intelligence  » Alignment  » Fine tuning  » Hallucination  » Semantics  » Tokenization