Here’s a comprehensive A-to-Z glossary of key Machine Learning terms and their definitions: This glossary covers foundational and advanced ML concepts, providing a broad overview of the field.
This glossary explains many machine learning terms, especially those related to TensorFlow and large language models, in easy-to-understand language.
A
- Ablation: The process of removing components of a model to study their impact on performance.
- A/B Testing: A statistical method to compare two versions of a system to determine which performs better.
- Accelerator Chip: Specialized hardware (e.g., GPUs, TPUs) designed to speed up machine learning computations.
- Accuracy: The proportion of correct predictions made by a model.
- Activation Function: A function in neural networks that determines the output of a node (e.g., ReLU, Sigmoid).
- AdaGrad: An optimization algorithm that adapts the learning rate for each parameter.
- Adversarial Attack: Deliberate manipulation of input data to deceive an AI model.
- Adversarial Training: Training models with adversarial examples to improve robustness.
- AR (Augmented Reality): Technology that overlays digital information onto the physical world, often enhanced by AI.
- Artificial General Intelligence (AGI): AI with human-like cognitive abilities, capable of performing any intellectual task.
- Attention Mechanism: A component in neural networks that focuses on specific parts of input data (e.g., in transformers).
- Autoencoder: A neural network used for unsupervised learning, often for dimensionality reduction or feature learning.
- AutoML: Automated Machine Learning, the process of automating the end-to-end process of applying machine learning to real-world problems.
- AUC (Area Under the ROC Curve): A metric that measures the performance of a classification model across all thresholds.
- Axis-Aligned Condition: A condition in decision trees where splits are made parallel to the feature axes.
B
- Backpropagation: A technique used in training neural networks to adjust weights by calculating the gradient of the loss function.
- Bagging (Bootstrap Aggregating): An ensemble technique that combines multiple models to reduce variance.
- Bag of Words: A text representation method where text is represented as a collection of words, ignoring grammar and word order.
- Baseline: A simple model or heuristic used as a reference point for evaluating more complex models.
- Batch Normalization: A technique to normalize the inputs of each layer in a neural network to improve training stability.
- Bayesian Neural Network: A neural network that incorporates Bayesian inference to model uncertainty.
- Bayesian Optimization: A technique for optimizing expensive-to-evaluate functions using probabilistic models.
- Bellman Equation: A recursive equation used in reinforcement learning to compute the value of a state.
- Bias (Ethics/Fairness): Systematic errors in AI models that lead to unfair or inaccurate outcomes, often due to flawed training data.
- Bidirectional Language Model: A language model that processes text in both forward and backward directions.
- Bigram: A pair of consecutive words in a text.
- Binary Classification: A classification task with two possible outcomes.
- BLEU (Bilingual Evaluation Understudy): A metric for evaluating the quality of machine-translated text.
- BLEURT (Bilingual Evaluation Understudy from Transformers): A more advanced version of BLEU that uses transformer models.
- Boosting: An ensemble technique that builds models sequentially to correct errors from previous models (e.g., AdaBoost, Gradient Boosting).
- Bounding Box: A rectangular box used in object detection to localize objects in an image.
- Broadcasting: A technique in numerical computing to perform operations on arrays of different shapes.
- Bucketing: A technique for grouping continuous data into discrete bins.
C
- Calibration Layer: A layer in a neural network that adjusts the output probabilities to better match the true distribution.
- Candidate Generation: The process of generating potential candidates for a recommendation system.
- Candidate Sampling: A technique to reduce computational cost by sampling a subset of candidates for training.
- Categorical Data: Data that represents categories or labels rather than numerical values.
- Centroid: The center point of a cluster in clustering algorithms.
- Centroid-Based Clustering: A clustering method that groups data points based on their distance to centroids (e.g., K-Means).
- Chain-of-Thought Prompting: A technique to improve reasoning in language models by prompting them to generate intermediate steps.
- Checkpoint: A saved state of a model during training, allowing for resuming training or inference.
- Cloud TPU: Google’s Tensor Processing Unit, a hardware accelerator for machine learning tasks.
- Clustering: An unsupervised learning technique that groups similar data points together (e.g., K-Means, DBSCAN).
- Convolutional Neural Network (CNN): A deep learning model commonly used for image and video analysis.
- Cross-Validation: A technique for evaluating model performance by splitting data into multiple subsets.
- Curriculum Learning: Training models on easier tasks first before progressing to harder ones.
D
- Data Augmentation: Techniques to increase the diversity of training data (e.g., flipping images, adding noise).
- DataFrame: A tabular data structure used in data analysis (e.g., in Pandas).
- Data Parallelism: A technique to distribute data across multiple devices for parallel processing.
- Dataset API (tf.data): A TensorFlow API for building efficient data pipelines.
- Decision Forest: An ensemble of decision trees used for classification or regression.
- Deep Q-Network (DQN): A reinforcement learning algorithm that uses a deep neural network to approximate the Q-function.
- Deep Neural Network (DNN): A neural network with multiple layers, capable of learning complex patterns.
- Demographic Parity: A fairness metric that requires predictions to be independent of protected attributes.
- Denoising: The process of removing noise from data, often used in autoencoders.
- Dimensionality Reduction: Reducing the number of features in a dataset (e.g., PCA, t-SNE).
- Dropout: A regularization technique to prevent overfitting in neural networks by randomly deactivating nodes.
- Dynamic Programming: A method used in reinforcement learning to solve complex problems by breaking them into simpler subproblems.
E
- Eager Execution: A mode in TensorFlow where operations are executed immediately, rather than building a computational graph.
- Earth Mover’s Distance (EMD): A measure of the distance between two probability distributions.
- Edit Distance: A measure of the similarity between two strings, based on the number of edits required to transform one into the other.
- Einsum Notation: A compact notation for expressing tensor operations.
- Embedding Layer: A layer in neural networks that converts categorical data into dense vectors.
- Empirical Cumulative Distribution Function (eCDF or EDF): A function that estimates the cumulative distribution of a dataset.
- Empirical Risk Minimization (ERM): A principle in machine learning to minimize the error on the training data.
- Entropy: A measure of uncertainty or randomness, often used in decision trees and information theory.
F
- Factuality: The degree to which a model’s outputs are factually correct.
- Fairness Constraint: A constraint applied to a model to ensure fairness in predictions.
- Fairness Metric: A metric used to evaluate the fairness of a model (e.g., demographic parity, equalized odds).
- False Negative (FN): A case where the model incorrectly predicts the negative class.
- False Negative Rate: The proportion of actual positives incorrectly predicted as negatives.
- False Positive (FP): A case where the model incorrectly predicts the positive class.
- False Positive Rate (FPR): The proportion of actual negatives incorrectly predicted as positives.
- Feature Engineering: The process of selecting, transforming, and creating features to improve model performance.
- Federated Learning: A decentralized approach to training AI models across multiple devices without sharing raw data.
- Few-Shot Learning: Training models to perform tasks with very few labeled examples.
- Fine-Tuning: Adapting a pre-trained model to a specific task by training it further on a smaller dataset.
- F1 Score: A metric that balances precision and recall, often used for classification tasks.
G
- Gradient Boosted (Decision) Trees (GBT): An ensemble technique that builds decision trees sequentially to correct errors from previous models.
- GAN (Generative Adversarial Network): A framework where two neural networks (generator and discriminator) compete to generate realistic data.
- Gradient Boosting: An ensemble technique that builds models sequentially to correct errors from previous models.
- Gradient Descent: An optimization algorithm used to minimize the loss function in machine learning.
- Graph Convolutional Network (GCN): A neural network designed for graph-structured data.
- Graph Neural Network (GNN): A type of neural network that operates on graph data.
H
- Hallucination: When a model generates outputs that are not grounded in the input data (e.g., false information).
- Hashing: A technique to map data of arbitrary size to fixed-size values.
- Heuristic: A rule-of-thumb or shortcut used to solve problems more efficiently.
- Hierarchical Clustering: A clustering method that builds a hierarchy of clusters.
- Hill Climbing: An optimization technique that iteratively improves a solution by making small changes.
- Hinge Loss: A loss function used in support vector machines for classification tasks.
- Human-in-the-Loop (HITL): A system where humans are involved in training, validating, or improving AI models.
- Hyperparameter Tuning: The process of optimizing hyperparameters to improve model performance (e.g., grid search, random search).
I
- Image Segmentation: Dividing an image into regions to identify objects or boundaries.
- Imputation: Techniques for handling missing data in datasets.
- Instance-Based Learning: A learning approach where predictions are made based on similar instances in the training data (e.g., KNN).
- Inverse Reinforcement Learning: Inferring the reward function from observed behavior.
K
- Kernel Support Vector Machines (KSVMs): A type of SVM that uses kernel functions to handle non-linear data.
- K-Fold Cross Validation: A technique for evaluating model performance by splitting data into k subsets and training on k-1 subsets while validating on the remaining subset.
- K-Means Clustering: An unsupervised learning algorithm used to group data into clusters.
- Kernel: A function used in machine learning to transform data into a higher-dimensional space (e.g., in SVMs).
- Knowledge Distillation: Transferring knowledge from a large model to a smaller one to improve efficiency.
- Knowledge Graph: A structured representation of knowledge that maps relationships between entities.
L
- L0 Regularization: A regularization technique that penalizes the number of non-zero parameters in a model.
- L1 Loss: A loss function that measures the absolute difference between predicted and actual values.
- Label Encoding: Converting categorical labels into numerical values for machine learning.
- Latent Space: A compressed representation of data learned by a model.
- Learning Rate: A hyperparameter that controls the step size during model optimization.
- Long Short-Term Memory (LSTM): A type of recurrent neural network (RNN) used for sequence modeling.
M
- Machine Learning (ML): A subset of AI that enables systems to learn from data and improve without explicit programming.
- Machine Translation: The use of AI to automatically translate text from one language to another.
- Majority Class: The most frequent class in a classification problem.
- Markov Decision Process (MDP): A mathematical framework for modeling decision-making in reinforcement learning.
- Markov Property: The property that the future state depends only on the current state, not on the sequence of preceding states.
- Mean Absolute Error (MAE): A metric that measures the average absolute difference between predicted and actual values.
- Meta-Learning: A technique where models learn how to learn, often used in few-shot learning.
- Mixture of Experts (MoE): A model architecture that combines specialized sub-models for different tasks.
- Model Drift: The degradation of model performance over time due to changes in data distribution.
- Model Interpretability: The ability to understand and explain how a model makes decisions.
- Multi-Agent System: A system where multiple AI agents interact to achieve a goal.
N
- NaN Trap: A situation where NaN (Not a Number) values propagate through a model, causing errors.
- Naive Bayes: A probabilistic classifier based on Bayes’ theorem, often used for text classification.
- Natural Language Processing (NLP): A field of AI focused on enabling machines to understand and generate human language.
- Natural Language Understanding (NLU): A subfield of NLP focused on understanding the meaning of text.
- Neural Architecture Search (NAS): Automating the design of neural network architectures.
- Neural Network: A computational model inspired by the human brain, consisting of interconnected layers of nodes.
- N-Gram: A contiguous sequence of n items (e.g., words or characters) in a text.
- Node (TensorFlow Graph): A unit of computation in a TensorFlow computational graph.
- Normalization Layer: A layer in neural networks that standardizes inputs (e.g., BatchNorm, LayerNorm).
O
- Object Detection: A computer vision task that identifies and locates objects within an image.
- One-Hot Encoding: Representing categorical data as binary vectors.
- Ontology: A formal representation of knowledge in a domain, often used in AI systems.
- Optimization Algorithms: Techniques used to minimize or maximize a function (e.g., Gradient Descent, Adam).
- Overfitting: A modeling error where a machine learning model performs well on training data but poorly on new data.
P
- Perceptron: The simplest type of neural network, consisting of a single layer.
- Precision and Recall: Metrics used to evaluate classification models; precision measures accuracy of positive predictions, while recall measures the fraction of positives correctly identified.
- Pre-trained Model: A model trained on a large dataset and fine-tuned for specific tasks.
- Probabilistic Graphical Model (PGM): A model that represents probabilistic relationships between variables.
- Prompt Engineering: The process of designing effective inputs (prompts) to guide AI models’ outputs.
Q
- Quantum AI: The application of quantum computing to AI tasks for improved efficiency and capabilities.
- Quantization: Reducing the precision of model parameters to improve efficiency (e.g., for edge devices).
R
- Recurrent Neural Network (RNN): A neural network designed for sequential data, such as time series or text.
- Regression Analysis: A statistical method for modeling the relationship between variables.
- Reinforcement Learning with Human Feedback (RLHF): Training models using feedback from humans to improve alignment.
- Residual Network (ResNet): A deep neural network architecture with skip connections to improve training.
- Robotics: The field of designing and programming robots, often incorporating AI.
S
- Self-Supervised Learning: A learning approach where models generate their own labels from unlabeled data.
- Sequence-to-Sequence (Seq2Seq): A model architecture for tasks like machine translation.
- Swarm Intelligence: Collective behavior of decentralized systems inspired by nature (e.g., ant colonies).
- Synthetic Data: Artificially generated data used to train AI models when real data is scarce or sensitive.
T
- TensorFlow/Keras: Popular frameworks for building and training machine learning and deep learning models.
- Transformer: A neural network architecture based on self-attention, widely used in NLP (e.g., GPT, BERT).
- Transfer Learning: Reusing a pre-trained model for a new task.
- Tree-Based Models: Models that use decision trees for predictions (e.g., Random Forest, Gradient Boosting).
- Triplet Loss: A loss function used in tasks like face recognition to learn embeddings.
U
- Uncertainty Quantification: Measuring the uncertainty in model predictions.
- Universal Approximation Theorem: A theorem stating that neural networks can approximate any function given sufficient capacity.
- Unsupervised Learning: A machine learning approach where the model is trained on unlabeled data to find patterns.
V
- Variational Autoencoder (VAE): A generative model that learns latent representations of data.
- Vision-Language Model: A model that processes both visual and textual data (e.g., CLIP).
W
- Weak AI (Narrow AI): AI designed for specific tasks, as opposed to strong AI, which aims for general intelligence.
- Weight Initialization: Setting initial values for model weights to improve training.
- Word2Vec: A technique for learning word embeddings from text data.
X
- XAI (Explainable AI): AI systems designed to provide transparent and understandable explanations for their decisions.
- XGBoost: A scalable and efficient implementation of gradient boosting for supervised learning.
Y
- Yield Management: The use of AI and data analysis to optimize pricing and inventory decisions.
- YOLO (You Only Look Once): A real-time object detection algorithm.
Z
- Zero-Shot Learning: A model’s ability to perform tasks it was not explicitly trained on.
- Z-Score Normalization: Scaling data to have a mean of 0 and a standard deviation of 1.
Reference
- ML: Wikipedia
- Ablation: Learn more
- Artificial Intelligence: Britannica
- AI Technologies: IBM