Loading Now

Artificial Intelligence & Machine Learning Keywords

Browse over 300 keywords that organize our 40,000+ AI research paper summaries. This hub gives you quick access to models, methods, tasks, metrics, core concepts, data topics, and optimization techniques across modern machine learning. Use the table of contents to jump to the explanations for each category, or scroll to the complete keyword index. Each keyword links to its own archive page, which aggregates related paper summaries. We keep terminology consistent with current literature so researchers, practitioners, and learners can navigate quickly. Start with the category overviews to understand scope, then dive into the full list below.


Models & Architectures

This category covers the major neural network families and model blueprints that power modern AI systems. It includes transformer-based language models, convolutional and recurrent networks for perception, and graph neural networks for structured data. Generative architectures such as diffusion models, variational autoencoders, and GANs also live here, reflecting their central role in synthesis and representation learning. We highlight canonical variants (e.g., BERT, GPT, ResNet, U-Net, Vision Transformers) to anchor terminology to widely used designs. Understanding these architectures clarifies capability, compute requirements, and common failure modes. When you recognize the model class, you can predict training dynamics, data needs, and suitable evaluation strategies.

BERT GPT LLaMA
Claude Gemini PaLM
T5 VIT Vision transformer
Resnet Residual network CNN
Convolutional network RNN Recurrent network
LSTM GNN GCN
Graph attention network Transformer Seq2seq
Sequence model Siamese network Autoencoder
Variational autoencoder GAN Generative adversarial network
Diffusion Diffusion model RCNN
Fast rcnn Faster rcnn Masked language model
Causal language model Encoder Decoder
Encoder decoder Unet Yolo
SAM Retrieval augmented generation RAG

Jump to full keyword list ↓

Methods & Training Techniques

Methods and training techniques describe how models learn from data and how we adapt them efficiently. This includes attention mechanisms, optimization routines, curriculum and continual learning, and regularization tools like dropout and batch normalization. Modern adaptation approaches—fine-tuning, instruction tuning, LoRA, quantization, pruning, and distillation—appear here because they change compute and data economics. Transfer learning, domain adaptation, and generalization strategies determine how knowledge moves across tasks and distributions. We also include supervision regimes (supervised, unsupervised, self-/semi-supervised, few/one/zero-shot) that dictate labeling needs. Mastering these techniques lets you scale models responsibly and make them practical under real-world constraints.

Attention Self attention Multi head attention
Cross attention Positional encoding Dropout
Batch normalization Backpropagation Distillation
Knowledge distillation Pruning Quantization
Lora Low rank adaptation Fine tuning
Instruction tuning Domain adaptation Domain generalization
Curriculum learning Early stopping Continual learning
Self supervised Semi supervised Semi supervision
Supervised Unsupervised Few shot
One shot Zero shot N shot
Online learning Meta learning Multi task
Transfer learning Transferability Representation learning
Pretraining Prompt Prompting
Teacher model Student model Model compression
Parameter efficient Federated learning Grid search
Region proposal Anchor box Bounding box
Feature pyramid Mask

Jump to full keyword list ↓

Tasks & Applications

Tasks and applications map model capabilities to real problems across NLP, vision, speech, and multimodal settings. Classic tasks include classification, regression, clustering, detection, segmentation, and tracking. Application-specific goals like question answering, summarization, translation, image captioning, and speech recognition reflect end-user value. We also include advanced perception tasks such as optical flow, pose estimation, face recognition, and scene understanding. Organizing research by task clarifies datasets, metrics, baselines, and failure patterns. Picking the right task framing often matters as much as picking the right model.

Classification Regression Clustering
Object detection Object tracking Tracking
Image classification Image segmentation Instance segmentation
Image captioning Image generation Image inpainting
Image denoising Image synthesis Optical flow
Pose estimation Face recognition Scene understanding
Gesture recognition Question answering Summarization
Translation Text generation Text classification
Semantic segmentation Semantic parsing Named entity recognition
NER Coreference Entity linking
Event detection Intent detection Time series
Activity recognition Depth estimation Discourse

Jump to full keyword list ↓

Metrics & Evaluation

Metrics translate model behavior into quantitative evidence and enable rigorous comparisons. Classification metrics like precision, recall, F1, ROC, and AUC capture trade-offs under different thresholds. For generation and sequence tasks, measures such as BLEU, ROUGE, perplexity, and log-likelihood assess fluency, fidelity, and calibration. Ranking and detection rely on mean average precision and related area-based summaries. Understanding metric sensitivity, dataset bias, and statistical uncertainty prevents overclaiming and supports reproducible science. Robust evaluation is how we separate genuine progress from overfitting and hype.

Precision Recall F1 score
Roc curve AUC Bleu
Rouge Perplexity Log likelihood
Cross entropy CER MAE
MSE Mean average precision Confusion matrix
Likelihood

Jump to full keyword list ↓

Core Concepts

Core concepts are the foundational ideas that appear across models, methods, and tasks. They include probabilistic and statistical viewpoints, representation learning, and the geometry of latent/vector spaces. We cover tokens and tokenization, similarity measures, and common mathematical operators found in deep networks. Generalization, scaling laws, under/overfitting, and regularization principles explain why models succeed—or fail—beyond the training set. Energy-based and discriminative/generative formulations provide complementary perspectives on learning. Grasping these concepts accelerates reading new papers and integrating results across subfields.

Deep learning Machine learning Neural network
Language model Large language model Autoregressive
Probabilistic model Generative model Discriminative model
Energy based model Statistical model Latent space
Vector space Token Tokenization
Tokenizer Cosine similarity Euclidean distance
Dot product Inference Context length
Context window Grounding Hallucination
Generalization Scaling laws Outlier detection
Anomaly detection Novelty detection Underfitting
Overfitting Regularization Alignment
Multi modal Probability Language understanding
Semantics Syntax Discourse

Jump to full keyword list ↓

Data & Features

Data and features determine the ceiling on model performance before any algorithmic tweaks. This category includes data augmentation, labeling quality, and dataset curation strategies that improve robustness and coverage. Feature engineering and extraction—classical and deep—shape what information is available to learners. We also include ensembles, bootstrapping, and bagging/boosting as data-centric stability techniques. Knowledge bases and graphs connect symbols with structure, enabling retrieval and reasoning. When data pipelines are healthy, models train faster, evaluate fairly, and transfer more reliably.

Data augmentation Data labeling Feature engineering
Feature extraction Feature selection Feature map
Feature pyramid Bag of words Bagging
Boosting Ensemble model Bootstrapping
Doc2Vec GloVe Word2Vec
TF IDF Vectorization Knowledge base
Knowledge graph Pattern matching Pattern recognition
One hot

Jump to full keyword list ↓

Optimization & Regularization

Optimization converts objectives into learned parameters using gradient-based and related methods. Stochastic gradient descent and its variants remain the workhorses, but practical training requires careful schedules and stability tricks. Loss functions, kernels, activations, and temperature scaling shape inductive biases and calibration. Regularization—explicit or implicit—controls complexity to improve generalization and safety under distribution shift. We also highlight parameter-efficient training that reduces compute without sacrificing performance. A solid optimization toolbox turns promising architectures into dependable systems.

Gradient descent Stochastic gradient descent Optimization
Loss function Objective function Hyperparameter
Grid search Kernel trick Hinge loss
Logits Temperature Relu
Softmax Sigmoid Tanh
Model compression Parameter efficient

Jump to full keyword list ↓

Other Concepts

This catch-all gathers important adjacent methods from statistics, signal processing, and classical machine learning. Bayesian inference and graphical models offer principled uncertainty handling and structure. Traditional learners—trees, random forests, XGBoost, logistic/linear regression—remain strong baselines and production workhorses. Dimensionality reduction techniques like PCA, t-SNE, and UMAP aid visualization and preprocessing. We also include linguistic tools (syntax, semantics, stemming, lemmatization) and pattern matching for text pipelines. These ideas integrate with deep learning to deliver robust, interpretable, and efficient solutions.

Bayesian inference Bayesian network Hidden markov model
Markov model Decision tree Random forest
XGBoost Extreme gradient boosting Naive bayes
Linear regression Logistic regression PCA
Principal component analysis Tsne Umap
Dimensionality reduction K means Hierarchical clustering
Spectral clustering Nearest neighbor Signal processing
Spatiotemporal Temporal model Syntax
Semantics Lemmatization Stemming
Stopword Template matching Vector space
Manifold learning Mixture model Mixture of experts
Knowledge base Knowledge graph Synthetic data
Optimization Parameter efficient 1 shot
N gram One hot

Jump to full keyword list ↓