Loading Now

Summary of Isolating Authorship From Content with Semantic Embeddings and Contrastive Learning, by Javier Huertas-tato et al.


Isolating authorship from content with semantic embeddings and contrastive learning

by Javier Huertas-Tato, Adrián Girón-Jiménez, Alejandro Martín, David Camacho

First submitted to arxiv on: 27 Nov 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel approach to reduce the correlation between content and authorship in neural models for authorship analysis. Traditional methods rely heavily on stylistic features, which can be exploited by authors writing about similar topics in a similar style. The proposed technique uses contrastive learning with additional hard negatives generated using a semantic similarity model to disentangle the content embedding space from the style embedding space. This approach aims to create embeddings that are more informed by authorial style rather than content. The paper presents ablations on two different datasets and evaluates its performance on out-of-domain challenges, showing significant improvements in accuracy (up to 10%) when facing particularly challenging evaluations.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about finding a way to tell who wrote something just by reading it, without looking at the words inside. Right now, machines can do this pretty well if the authors have different writing styles. But what if many authors write about the same things in the same way? That makes it harder for machines to figure out who wrote what. The researchers want to find a way to make machines focus more on the style of the writer rather than the words they use. They came up with a new method that uses special tricks to keep the two apart. They tested this method on some datasets and showed that it works better than before, especially when it’s really hard to tell who wrote what.

Keywords

» Artificial intelligence  » Embedding space