A to Z Machine Learning Glossary

A to Z Machine Learning Glossary

Here’s a comprehensive A-to-Z glossary of key Machine Learning terms and their definitions: This glossary covers foundational and advanced ML concepts, providing a broad overview of the field.

This glossary explains many machine learning terms, especially those related to TensorFlow and large language models, in easy-to-understand language.

A

  • Ablation: The process of removing components of a model to study their impact on performance.
  • A/B Testing: A statistical method to compare two versions of a system to determine which performs better.
  • Accelerator Chip: Specialized hardware (e.g., GPUs, TPUs) designed to speed up machine learning computations.
  • Accuracy: The proportion of correct predictions made by a model.
  • Activation Function: A function in neural networks that determines the output of a node (e.g., ReLU, Sigmoid).
  • AdaGrad: An optimization algorithm that adapts the learning rate for each parameter.
  • Adversarial Attack: Deliberate manipulation of input data to deceive an AI model.
  • Adversarial Training: Training models with adversarial examples to improve robustness.
  • AR (Augmented Reality): Technology that overlays digital information onto the physical world, often enhanced by AI.
  • Artificial General Intelligence (AGI): AI with human-like cognitive abilities, capable of performing any intellectual task.
  • Attention Mechanism: A component in neural networks that focuses on specific parts of input data (e.g., in transformers).
  • Autoencoder: A neural network used for unsupervised learning, often for dimensionality reduction or feature learning.
  • AutoML: Automated Machine Learning, the process of automating the end-to-end process of applying machine learning to real-world problems.
  • AUC (Area Under the ROC Curve): A metric that measures the performance of a classification model across all thresholds.
  • Axis-Aligned Condition: A condition in decision trees where splits are made parallel to the feature axes.

B

  • Backpropagation: A technique used in training neural networks to adjust weights by calculating the gradient of the loss function.
  • Bagging (Bootstrap Aggregating): An ensemble technique that combines multiple models to reduce variance.
  • Bag of Words: A text representation method where text is represented as a collection of words, ignoring grammar and word order.
  • Baseline: A simple model or heuristic used as a reference point for evaluating more complex models.
  • Batch Normalization: A technique to normalize the inputs of each layer in a neural network to improve training stability.
  • Bayesian Neural Network: A neural network that incorporates Bayesian inference to model uncertainty.
  • Bayesian Optimization: A technique for optimizing expensive-to-evaluate functions using probabilistic models.
  • Bellman Equation: A recursive equation used in reinforcement learning to compute the value of a state.
  • Bias (Ethics/Fairness): Systematic errors in AI models that lead to unfair or inaccurate outcomes, often due to flawed training data.
  • Bidirectional Language Model: A language model that processes text in both forward and backward directions.
  • Bigram: A pair of consecutive words in a text.
  • Binary Classification: A classification task with two possible outcomes.
  • BLEU (Bilingual Evaluation Understudy): A metric for evaluating the quality of machine-translated text.
  • BLEURT (Bilingual Evaluation Understudy from Transformers): A more advanced version of BLEU that uses transformer models.
  • Boosting: An ensemble technique that builds models sequentially to correct errors from previous models (e.g., AdaBoost, Gradient Boosting).
  • Bounding Box: A rectangular box used in object detection to localize objects in an image.
  • Broadcasting: A technique in numerical computing to perform operations on arrays of different shapes.
  • Bucketing: A technique for grouping continuous data into discrete bins.

C

  • Calibration Layer: A layer in a neural network that adjusts the output probabilities to better match the true distribution.
  • Candidate Generation: The process of generating potential candidates for a recommendation system.
  • Candidate Sampling: A technique to reduce computational cost by sampling a subset of candidates for training.
  • Categorical Data: Data that represents categories or labels rather than numerical values.
  • Centroid: The center point of a cluster in clustering algorithms.
  • Centroid-Based Clustering: A clustering method that groups data points based on their distance to centroids (e.g., K-Means).
  • Chain-of-Thought Prompting: A technique to improve reasoning in language models by prompting them to generate intermediate steps.
  • Checkpoint: A saved state of a model during training, allowing for resuming training or inference.
  • Cloud TPU: Google’s Tensor Processing Unit, a hardware accelerator for machine learning tasks.
  • Clustering: An unsupervised learning technique that groups similar data points together (e.g., K-Means, DBSCAN).
  • Convolutional Neural Network (CNN): A deep learning model commonly used for image and video analysis.
  • Cross-Validation: A technique for evaluating model performance by splitting data into multiple subsets.
  • Curriculum Learning: Training models on easier tasks first before progressing to harder ones.

D

  • Data Augmentation: Techniques to increase the diversity of training data (e.g., flipping images, adding noise).
  • DataFrame: A tabular data structure used in data analysis (e.g., in Pandas).
  • Data Parallelism: A technique to distribute data across multiple devices for parallel processing.
  • Dataset API (tf.data): A TensorFlow API for building efficient data pipelines.
  • Decision Forest: An ensemble of decision trees used for classification or regression.
  • Deep Q-Network (DQN): A reinforcement learning algorithm that uses a deep neural network to approximate the Q-function.
  • Deep Neural Network (DNN): A neural network with multiple layers, capable of learning complex patterns.
  • Demographic Parity: A fairness metric that requires predictions to be independent of protected attributes.
  • Denoising: The process of removing noise from data, often used in autoencoders.
  • Dimensionality Reduction: Reducing the number of features in a dataset (e.g., PCA, t-SNE).
  • Dropout: A regularization technique to prevent overfitting in neural networks by randomly deactivating nodes.
  • Dynamic Programming: A method used in reinforcement learning to solve complex problems by breaking them into simpler subproblems.

E

  • Eager Execution: A mode in TensorFlow where operations are executed immediately, rather than building a computational graph.
  • Earth Mover’s Distance (EMD): A measure of the distance between two probability distributions.
  • Edit Distance: A measure of the similarity between two strings, based on the number of edits required to transform one into the other.
  • Einsum Notation: A compact notation for expressing tensor operations.
  • Embedding Layer: A layer in neural networks that converts categorical data into dense vectors.
  • Empirical Cumulative Distribution Function (eCDF or EDF): A function that estimates the cumulative distribution of a dataset.
  • Empirical Risk Minimization (ERM): A principle in machine learning to minimize the error on the training data.
  • Entropy: A measure of uncertainty or randomness, often used in decision trees and information theory.

F

  • Factuality: The degree to which a model’s outputs are factually correct.
  • Fairness Constraint: A constraint applied to a model to ensure fairness in predictions.
  • Fairness Metric: A metric used to evaluate the fairness of a model (e.g., demographic parity, equalized odds).
  • False Negative (FN): A case where the model incorrectly predicts the negative class.
  • False Negative Rate: The proportion of actual positives incorrectly predicted as negatives.
  • False Positive (FP): A case where the model incorrectly predicts the positive class.
  • False Positive Rate (FPR): The proportion of actual negatives incorrectly predicted as positives.
  • Feature Engineering: The process of selecting, transforming, and creating features to improve model performance.
  • Federated Learning: A decentralized approach to training AI models across multiple devices without sharing raw data.
  • Few-Shot Learning: Training models to perform tasks with very few labeled examples.
  • Fine-Tuning: Adapting a pre-trained model to a specific task by training it further on a smaller dataset.
  • F1 Score: A metric that balances precision and recall, often used for classification tasks.

G

  • Gradient Boosted (Decision) Trees (GBT): An ensemble technique that builds decision trees sequentially to correct errors from previous models.
  • GAN (Generative Adversarial Network): A framework where two neural networks (generator and discriminator) compete to generate realistic data.
  • Gradient Boosting: An ensemble technique that builds models sequentially to correct errors from previous models.
  • Gradient Descent: An optimization algorithm used to minimize the loss function in machine learning.
  • Graph Convolutional Network (GCN): A neural network designed for graph-structured data.
  • Graph Neural Network (GNN): A type of neural network that operates on graph data.

H

  • Hallucination: When a model generates outputs that are not grounded in the input data (e.g., false information).
  • Hashing: A technique to map data of arbitrary size to fixed-size values.
  • Heuristic: A rule-of-thumb or shortcut used to solve problems more efficiently.
  • Hierarchical Clustering: A clustering method that builds a hierarchy of clusters.
  • Hill Climbing: An optimization technique that iteratively improves a solution by making small changes.
  • Hinge Loss: A loss function used in support vector machines for classification tasks.
  • Human-in-the-Loop (HITL): A system where humans are involved in training, validating, or improving AI models.
  • Hyperparameter Tuning: The process of optimizing hyperparameters to improve model performance (e.g., grid search, random search).

I

  • Image Segmentation: Dividing an image into regions to identify objects or boundaries.
  • Imputation: Techniques for handling missing data in datasets.
  • Instance-Based Learning: A learning approach where predictions are made based on similar instances in the training data (e.g., KNN).
  • Inverse Reinforcement Learning: Inferring the reward function from observed behavior.

K

  • Kernel Support Vector Machines (KSVMs): A type of SVM that uses kernel functions to handle non-linear data.
  • K-Fold Cross Validation: A technique for evaluating model performance by splitting data into k subsets and training on k-1 subsets while validating on the remaining subset.
  • K-Means Clustering: An unsupervised learning algorithm used to group data into clusters.
  • Kernel: A function used in machine learning to transform data into a higher-dimensional space (e.g., in SVMs).
  • Knowledge Distillation: Transferring knowledge from a large model to a smaller one to improve efficiency.
  • Knowledge Graph: A structured representation of knowledge that maps relationships between entities.

L

  • L0 Regularization: A regularization technique that penalizes the number of non-zero parameters in a model.
  • L1 Loss: A loss function that measures the absolute difference between predicted and actual values.
  • Label Encoding: Converting categorical labels into numerical values for machine learning.
  • Latent Space: A compressed representation of data learned by a model.
  • Learning Rate: A hyperparameter that controls the step size during model optimization.
  • Long Short-Term Memory (LSTM): A type of recurrent neural network (RNN) used for sequence modeling.

M

  • Machine Learning (ML): A subset of AI that enables systems to learn from data and improve without explicit programming.
  • Machine Translation: The use of AI to automatically translate text from one language to another.
  • Majority Class: The most frequent class in a classification problem.
  • Markov Decision Process (MDP): A mathematical framework for modeling decision-making in reinforcement learning.
  • Markov Property: The property that the future state depends only on the current state, not on the sequence of preceding states.
  • Mean Absolute Error (MAE): A metric that measures the average absolute difference between predicted and actual values.
  • Meta-Learning: A technique where models learn how to learn, often used in few-shot learning.
  • Mixture of Experts (MoE): A model architecture that combines specialized sub-models for different tasks.
  • Model Drift: The degradation of model performance over time due to changes in data distribution.
  • Model Interpretability: The ability to understand and explain how a model makes decisions.
  • Multi-Agent System: A system where multiple AI agents interact to achieve a goal.

N

  • NaN Trap: A situation where NaN (Not a Number) values propagate through a model, causing errors.
  • Naive Bayes: A probabilistic classifier based on Bayes’ theorem, often used for text classification.
  • Natural Language Processing (NLP): A field of AI focused on enabling machines to understand and generate human language.
  • Natural Language Understanding (NLU): A subfield of NLP focused on understanding the meaning of text.
  • Neural Architecture Search (NAS): Automating the design of neural network architectures.
  • Neural Network: A computational model inspired by the human brain, consisting of interconnected layers of nodes.
  • N-Gram: A contiguous sequence of n items (e.g., words or characters) in a text.
  • Node (TensorFlow Graph): A unit of computation in a TensorFlow computational graph.
  • Normalization Layer: A layer in neural networks that standardizes inputs (e.g., BatchNorm, LayerNorm).

O

  • Object Detection: A computer vision task that identifies and locates objects within an image.
  • One-Hot Encoding: Representing categorical data as binary vectors.
  • Ontology: A formal representation of knowledge in a domain, often used in AI systems.
  • Optimization Algorithms: Techniques used to minimize or maximize a function (e.g., Gradient Descent, Adam).
  • Overfitting: A modeling error where a machine learning model performs well on training data but poorly on new data.

P

  • Perceptron: The simplest type of neural network, consisting of a single layer.
  • Precision and Recall: Metrics used to evaluate classification models; precision measures accuracy of positive predictions, while recall measures the fraction of positives correctly identified.
  • Pre-trained Model: A model trained on a large dataset and fine-tuned for specific tasks.
  • Probabilistic Graphical Model (PGM): A model that represents probabilistic relationships between variables.
  • Prompt Engineering: The process of designing effective inputs (prompts) to guide AI models’ outputs.

Q

  • Quantum AI: The application of quantum computing to AI tasks for improved efficiency and capabilities.
  • Quantization: Reducing the precision of model parameters to improve efficiency (e.g., for edge devices).

R

  • Recurrent Neural Network (RNN): A neural network designed for sequential data, such as time series or text.
  • Regression Analysis: A statistical method for modeling the relationship between variables.
  • Reinforcement Learning with Human Feedback (RLHF): Training models using feedback from humans to improve alignment.
  • Residual Network (ResNet): A deep neural network architecture with skip connections to improve training.
  • Robotics: The field of designing and programming robots, often incorporating AI.

S

  • Self-Supervised Learning: A learning approach where models generate their own labels from unlabeled data.
  • Sequence-to-Sequence (Seq2Seq): A model architecture for tasks like machine translation.
  • Swarm Intelligence: Collective behavior of decentralized systems inspired by nature (e.g., ant colonies).
  • Synthetic Data: Artificially generated data used to train AI models when real data is scarce or sensitive.

T

  • TensorFlow/Keras: Popular frameworks for building and training machine learning and deep learning models.
  • Transformer: A neural network architecture based on self-attention, widely used in NLP (e.g., GPT, BERT).
  • Transfer Learning: Reusing a pre-trained model for a new task.
  • Tree-Based Models: Models that use decision trees for predictions (e.g., Random Forest, Gradient Boosting).
  • Triplet Loss: A loss function used in tasks like face recognition to learn embeddings.

U

  • Uncertainty Quantification: Measuring the uncertainty in model predictions.
  • Universal Approximation Theorem: A theorem stating that neural networks can approximate any function given sufficient capacity.
  • Unsupervised Learning: A machine learning approach where the model is trained on unlabeled data to find patterns.

V

  • Variational Autoencoder (VAE): A generative model that learns latent representations of data.
  • Vision-Language Model: A model that processes both visual and textual data (e.g., CLIP).

W

  • Weak AI (Narrow AI): AI designed for specific tasks, as opposed to strong AI, which aims for general intelligence.
  • Weight Initialization: Setting initial values for model weights to improve training.
  • Word2Vec: A technique for learning word embeddings from text data.

X

  • XAI (Explainable AI): AI systems designed to provide transparent and understandable explanations for their decisions.
  • XGBoost: A scalable and efficient implementation of gradient boosting for supervised learning.

Y

  • Yield Management: The use of AI and data analysis to optimize pricing and inventory decisions.
  • YOLO (You Only Look Once): A real-time object detection algorithm.

Z

  • Zero-Shot Learning: A model’s ability to perform tasks it was not explicitly trained on.
  • Z-Score Normalization: Scaling data to have a mean of 0 and a standard deviation of 1.

Reference

LEAVE A REPLY

Please enter your comment!
Please enter your name here