Loading Now

Summary of Annoctr: a Dataset For Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports, by Lukas Lange et al.


AnnoCTR: A Dataset for Detecting and Linking Entities, Tactics, and Techniques in Cyber Threat Reports

by Lukas Lange, Marc Müller, Ghazaleh Haratinezhad Torbati, Dragan Milchevski, Patrick Grau, Subhash Pujari, Annemarie Friedrich

First submitted to arxiv on: 11 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents AnnoCTR, a novel CC-BY-SA-licensed dataset of cyber threat reports annotated with named entities, temporal expressions, and cybersecurity-specific concepts. The dataset is linked to Wikipedia and the widely-used MITRE ATT&CK knowledge base, allowing for fine-grained annotation of entire documents. Previous datasets have either provided single labels per document or annotated sentences out-of-context; AnnoCTR’s approach offers a more detailed representation. To model these annotations, state-of-the-art neural models are employed in an experimental study. The results show that concept descriptions from MITRE ATT&CK can be effectively used for training data augmentation, particularly in identifying explicitly and implicitly mentioned MITRE ATT&CK concepts.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps keep the internet safe by making it easier to understand cyber threats. Right now, people usually share information about cyber attacks using words, but this makes it hard to find important details. The authors created a special dataset called AnnoCTR that helps organize these reports with important information like names of attackers and types of attacks. They linked this dataset to two main sources: Wikipedia and the MITRE ATT&CK knowledge base. This allows for more accurate understanding of cyber threats. The paper also tests how well neural networks can learn from this dataset, showing that it’s a great way to improve our ability to detect cyber attacks.

Keywords

* Artificial intelligence  * Data augmentation  * Knowledge base