AI Glossary

A to Z Machine Learning Glossary

Here’s a comprehensive A-to-Z glossary of key Machine Learning terms and their definitions: This glossary covers foundational and advanced ML concepts, providing a broad overview of the field.

This glossary explains many machine learning terms, especially those related to TensorFlow and large language models, in easy-to-understand language.

A

Ablation: The process of removing components of a model to study their impact on performance.
A/B Testing: A statistical method to compare two versions of a system to determine which performs better.
Accelerator Chip: Specialized hardware (e.g., GPUs, TPUs) designed to speed up machine learning computations.
Accuracy: The proportion of correct predictions made by a model.
Activation Function: A function in neural networks that determines the output of a node (e.g., ReLU, Sigmoid).
AdaGrad: An optimization algorithm that adapts the learning rate for each parameter.
Adversarial Attack: Deliberate manipulation of input data to deceive an AI model.
Adversarial Training: Training models with adversarial examples to improve robustness.
AR (Augmented Reality): Technology that overlays digital information onto the physical world, often enhanced by AI.
Artificial General Intelligence (AGI): AI with human-like cognitive abilities, capable of performing any intellectual task.
Attention Mechanism: A component in neural networks that focuses on specific parts of input data (e.g., in transformers).
Autoencoder: A neural network used for unsupervised learning, often for dimensionality reduction or feature learning.
AutoML: Automated Machine Learning, the process of automating the end-to-end process of applying machine learning to real-world problems.
AUC (Area Under the ROC Curve): A metric that measures the performance of a classification model across all thresholds.
Axis-Aligned Condition: A condition in decision trees where splits are made parallel to the feature axes.

B

Backpropagation: A technique used in training neural networks to adjust weights by calculating the gradient of the loss function.
Bagging (Bootstrap Aggregating): An ensemble technique that combines multiple models to reduce variance.
Bag of Words: A text representation method where text is represented as a collection of words, ignoring grammar and word order.
Baseline: A simple model or heuristic used as a reference point for evaluating more complex models.
Batch Normalization: A technique to normalize the inputs of each layer in a neural network to improve training stability.
Bayesian Neural Network: A neural network that incorporates Bayesian inference to model uncertainty.
Bayesian Optimization: A technique for optimizing expensive-to-evaluate functions using probabilistic models.
Bellman Equation: A recursive equation used in reinforcement learning to compute the value of a state.
Bias (Ethics/Fairness): Systematic errors in AI models that lead to unfair or inaccurate outcomes, often due to flawed training data.
Bidirectional Language Model: A language model that processes text in both forward and backward directions.
Bigram: A pair of consecutive words in a text.
Binary Classification: A classification task with two possible outcomes.
BLEU (Bilingual Evaluation Understudy): A metric for evaluating the quality of machine-translated text.
BLEURT (Bilingual Evaluation Understudy from Transformers): A more advanced version of BLEU that uses transformer models.
Boosting: An ensemble technique that builds models sequentially to correct errors from previous models (e.g., AdaBoost, Gradient Boosting).
Bounding Box: A rectangular box used in object detection to localize objects in an image.
Broadcasting: A technique in numerical computing to perform operations on arrays of different shapes.
Bucketing: A technique for grouping continuous data into discrete bins.

C

Calibration Layer: A layer in a neural network that adjusts the output probabilities to better match the true distribution.
Candidate Generation: The process of generating potential candidates for a recommendation system.
Candidate Sampling: A technique to reduce computational cost by sampling a subset of candidates for training.
Categorical Data: Data that represents categories or labels rather than numerical values.
Centroid: The center point of a cluster in clustering algorithms.
Centroid-Based Clustering: A clustering method that groups data points based on their distance to centroids (e.g., K-Means).
Chain-of-Thought Prompting: A technique to improve reasoning in language models by prompting them to generate intermediate steps.
Checkpoint: A saved state of a model during training, allowing for resuming training or inference.
Cloud TPU: Google’s Tensor Processing Unit, a hardware accelerator for machine learning tasks.
Clustering: An unsupervised learning technique that groups similar data points together (e.g., K-Means, DBSCAN).
Convolutional Neural Network (CNN): A deep learning model commonly used for image and video analysis.
Cross-Validation: A technique for evaluating model performance by splitting data into multiple subsets.
Curriculum Learning: Training models on easier tasks first before progressing to harder ones.

D

Data Augmentation: Techniques to increase the diversity of training data (e.g., flipping images, adding noise).
DataFrame: A tabular data structure used in data analysis (e.g., in Pandas).
Data Parallelism: A technique to distribute data across multiple devices for parallel processing.
Dataset API (tf.data): A TensorFlow API for building efficient data pipelines.
Decision Forest: An ensemble of decision trees used for classification or regression.
Deep Q-Network (DQN): A reinforcement learning algorithm that uses a deep neural network to approximate the Q-function.
Deep Neural Network (DNN): A neural network with multiple layers, capable of learning complex patterns.
Demographic Parity: A fairness metric that requires predictions to be independent of protected attributes.
Denoising: The process of removing noise from data, often used in autoencoders.
Dimensionality Reduction: Reducing the number of features in a dataset (e.g., PCA, t-SNE).
Dropout: A regularization technique to prevent overfitting in neural networks by randomly deactivating nodes.
Dynamic Programming: A method used in reinforcement learning to solve complex problems by breaking them into simpler subproblems.

E

Eager Execution: A mode in TensorFlow where operations are executed immediately, rather than building a computational graph.
Earth Mover’s Distance (EMD): A measure of the distance between two probability distributions.
Edit Distance: A measure of the similarity between two strings, based on the number of edits required to transform one into the other.
Einsum Notation: A compact notation for expressing tensor operations.
Embedding Layer: A layer in neural networks that converts categorical data into dense vectors.
Empirical Cumulative Distribution Function (eCDF or EDF): A function that estimates the cumulative distribution of a dataset.
Empirical Risk Minimization (ERM): A principle in machine learning to minimize the error on the training data.
Entropy: A measure of uncertainty or randomness, often used in decision trees and information theory.

F

Factuality: The degree to which a model’s outputs are factually correct.
Fairness Constraint: A constraint applied to a model to ensure fairness in predictions.
Fairness Metric: A metric used to evaluate the fairness of a model (e.g., demographic parity, equalized odds).
False Negative (FN): A case where the model incorrectly predicts the negative class.
False Negative Rate: The proportion of actual positives incorrectly predicted as negatives.
False Positive (FP): A case where the model incorrectly predicts the positive class.
False Positive Rate (FPR): The proportion of actual negatives incorrectly predicted as positives.
Feature Engineering: The process of selecting, transforming, and creating features to improve model performance.
Federated Learning: A decentralized approach to training AI models across multiple devices without sharing raw data.
Few-Shot Learning: Training models to perform tasks with very few labeled examples.
Fine-Tuning: Adapting a pre-trained model to a specific task by training it further on a smaller dataset.
F1 Score: A metric that balances precision and recall, often used for classification tasks.

G

Gradient Boosted (Decision) Trees (GBT): An ensemble technique that builds decision trees sequentially to correct errors from previous models.
GAN (Generative Adversarial Network): A framework where two neural networks (generator and discriminator) compete to generate realistic data.
Gradient Boosting: An ensemble technique that builds models sequentially to correct errors from previous models.
Gradient Descent: An optimization algorithm used to minimize the loss function in machine learning.
Graph Convolutional Network (GCN): A neural network designed for graph-structured data.
Graph Neural Network (GNN): A type of neural network that operates on graph data.

H

Hallucination: When a model generates outputs that are not grounded in the input data (e.g., false information).
Hashing: A technique to map data of arbitrary size to fixed-size values.
Heuristic: A rule-of-thumb or shortcut used to solve problems more efficiently.
Hierarchical Clustering: A clustering method that builds a hierarchy of clusters.
Hill Climbing: An optimization technique that iteratively improves a solution by making small changes.
Hinge Loss: A loss function used in support vector machines for classification tasks.
Human-in-the-Loop (HITL): A system where humans are involved in training, validating, or improving AI models.
Hyperparameter Tuning: The process of optimizing hyperparameters to improve model performance (e.g., grid search, random search).

I

Image Segmentation: Dividing an image into regions to identify objects or boundaries.
Imputation: Techniques for handling missing data in datasets.
Instance-Based Learning: A learning approach where predictions are made based on similar instances in the training data (e.g., KNN).
Inverse Reinforcement Learning: Inferring the reward function from observed behavior.

K

Kernel Support Vector Machines (KSVMs): A type of SVM that uses kernel functions to handle non-linear data.
K-Fold Cross Validation: A technique for evaluating model performance by splitting data into k subsets and training on k-1 subsets while validating on the remaining subset.
K-Means Clustering: An unsupervised learning algorithm used to group data into clusters.
Kernel: A function used in machine learning to transform data into a higher-dimensional space (e.g., in SVMs).
Knowledge Distillation: Transferring knowledge from a large model to a smaller one to improve efficiency.
Knowledge Graph: A structured representation of knowledge that maps relationships between entities.

L

L0 Regularization: A regularization technique that penalizes the number of non-zero parameters in a model.
L1 Loss: A loss function that measures the absolute difference between predicted and actual values.
Label Encoding: Converting categorical labels into numerical values for machine learning.
Latent Space: A compressed representation of data learned by a model.
Learning Rate: A hyperparameter that controls the step size during model optimization.
Long Short-Term Memory (LSTM): A type of recurrent neural network (RNN) used for sequence modeling.

M

Machine Learning (ML): A subset of AI that enables systems to learn from data and improve without explicit programming.
Machine Translation: The use of AI to automatically translate text from one language to another.
Majority Class: The most frequent class in a classification problem.
Markov Decision Process (MDP): A mathematical framework for modeling decision-making in reinforcement learning.
Markov Property: The property that the future state depends only on the current state, not on the sequence of preceding states.
Mean Absolute Error (MAE): A metric that measures the average absolute difference between predicted and actual values.
Meta-Learning: A technique where models learn how to learn, often used in few-shot learning.
Mixture of Experts (MoE): A model architecture that combines specialized sub-models for different tasks.
Model Drift: The degradation of model performance over time due to changes in data distribution.
Model Interpretability: The ability to understand and explain how a model makes decisions.
Multi-Agent System: A system where multiple AI agents interact to achieve a goal.

N

NaN Trap: A situation where NaN (Not a Number) values propagate through a model, causing errors.
Naive Bayes: A probabilistic classifier based on Bayes’ theorem, often used for text classification.
Natural Language Processing (NLP): A field of AI focused on enabling machines to understand and generate human language.
Natural Language Understanding (NLU): A subfield of NLP focused on understanding the meaning of text.
Neural Architecture Search (NAS): Automating the design of neural network architectures.
Neural Network: A computational model inspired by the human brain, consisting of interconnected layers of nodes.
N-Gram: A contiguous sequence of n items (e.g., words or characters) in a text.
Node (TensorFlow Graph): A unit of computation in a TensorFlow computational graph.
Normalization Layer: A layer in neural networks that standardizes inputs (e.g., BatchNorm, LayerNorm).

O

Object Detection: A computer vision task that identifies and locates objects within an image.
One-Hot Encoding: Representing categorical data as binary vectors.
Ontology: A formal representation of knowledge in a domain, often used in AI systems.
Optimization Algorithms: Techniques used to minimize or maximize a function (e.g., Gradient Descent, Adam).
Overfitting: A modeling error where a machine learning model performs well on training data but poorly on new data.

P

Perceptron: The simplest type of neural network, consisting of a single layer.
Precision and Recall: Metrics used to evaluate classification models; precision measures accuracy of positive predictions, while recall measures the fraction of positives correctly identified.
Pre-trained Model: A model trained on a large dataset and fine-tuned for specific tasks.
Probabilistic Graphical Model (PGM): A model that represents probabilistic relationships between variables.
Prompt Engineering: The process of designing effective inputs (prompts) to guide AI models’ outputs.

Q

Quantum AI: The application of quantum computing to AI tasks for improved efficiency and capabilities.
Quantization: Reducing the precision of model parameters to improve efficiency (e.g., for edge devices).

R

Recurrent Neural Network (RNN): A neural network designed for sequential data, such as time series or text.
Regression Analysis: A statistical method for modeling the relationship between variables.
Reinforcement Learning with Human Feedback (RLHF): Training models using feedback from humans to improve alignment.
Residual Network (ResNet): A deep neural network architecture with skip connections to improve training.
Robotics: The field of designing and programming robots, often incorporating AI.

S

Self-Supervised Learning: A learning approach where models generate their own labels from unlabeled data.
Sequence-to-Sequence (Seq2Seq): A model architecture for tasks like machine translation.
Swarm Intelligence: Collective behavior of decentralized systems inspired by nature (e.g., ant colonies).
Synthetic Data: Artificially generated data used to train AI models when real data is scarce or sensitive.

T

TensorFlow/Keras: Popular frameworks for building and training machine learning and deep learning models.
Transformer: A neural network architecture based on self-attention, widely used in NLP (e.g., GPT, BERT).
Transfer Learning: Reusing a pre-trained model for a new task.
Tree-Based Models: Models that use decision trees for predictions (e.g., Random Forest, Gradient Boosting).
Triplet Loss: A loss function used in tasks like face recognition to learn embeddings.

U

Uncertainty Quantification: Measuring the uncertainty in model predictions.
Universal Approximation Theorem: A theorem stating that neural networks can approximate any function given sufficient capacity.
Unsupervised Learning: A machine learning approach where the model is trained on unlabeled data to find patterns.

V

Variational Autoencoder (VAE): A generative model that learns latent representations of data.
Vision-Language Model: A model that processes both visual and textual data (e.g., CLIP).

W

Weak AI (Narrow AI): AI designed for specific tasks, as opposed to strong AI, which aims for general intelligence.
Weight Initialization: Setting initial values for model weights to improve training.
Word2Vec: A technique for learning word embeddings from text data.

X

XAI (Explainable AI): AI systems designed to provide transparent and understandable explanations for their decisions.
XGBoost: A scalable and efficient implementation of gradient boosting for supervised learning.

Y

Yield Management: The use of AI and data analysis to optimize pricing and inventory decisions.
YOLO (You Only Look Once): A real-time object detection algorithm.

Z

Zero-Shot Learning: A model’s ability to perform tasks it was not explicitly trained on.
Z-Score Normalization: Scaling data to have a mean of 0 and a standard deviation of 1.

Reference

ML: Wikipedia
Ablation: Learn more
Artificial Intelligence: Britannica
AI Technologies: IBM

A to Z Machine Learning Glossary

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Reference

LEAVE A REPLY Cancel reply

Airtalk Wireless Replacement Phone: What You Need to Know

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Reference

RELATED ARTICLESMORE FROM AUTHOR

Natural Language Processing (NLP) Glossary

A to Z Deep Learning Glossary

A to Z Ai Glossary

LEAVE A REPLY Cancel reply

Airtalk Wireless Replacement Phone: What You Need to Know

RELATED ARTICLES MORE FROM AUTHOR