Artificial Intelligence Glossary

A collection of artificial intelligence (AI) terms, jargon, and definitions that you and your team should be aware of.

  • Accuracy

    Accuracy refers to the closeness of a measured value to a standard or known value. In the context of artificial intelligence, it often denotes the degree of correctness of a model's predictions compared to the actual outcomes.

  • Actionable Intelligence

    Actionable intelligence is information that can be acted upon or used to make informed decisions. In AI, it refers to insights extracted from data that are relevant and valuable for decision-making or problem-solving.

  • Activation Function

    An activation function is a mathematical operation applied to the output of a neural network node. It introduces non-linearity to the model, enabling it to learn complex patterns and make nonlinear transformations of the input data.

  • Activation Gradient

    Activation gradient refers to the derivative of the activation function with respect to the input to the function. It is crucial for training neural networks using gradient descent optimization algorithms.

  • Adversarial Examples

    Adversarial examples are inputs to machine learning models that are intentionally designed to cause the model to make mistakes. These inputs are often slightly perturbed versions of legitimate data, crafted to exploit vulnerabilities in the model's decision boundary.

  • Anaphora

    Anaphora refers to the linguistic phenomenon where a word or phrase refers back to another word or phrase mentioned earlier in the text. In natural language processing, handling anaphora is important for tasks such as coreference resolution.

  • Annotation

    Annotation involves the labeling or tagging of data to provide additional information or context. In AI, annotation is commonly used in supervised learning tasks to create labeled datasets for training machine learning models.

  • Artificial Intelligence (AI)

    Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems. These processes include learning, reasoning, problem-solving, perception, and language understanding.

  • Artificial Neural Network (ANN)

    An artificial neural network (ANN) is a computational model inspired by the structure and functioning of biological neural networks. It consists of interconnected nodes (neurons) organized in layers, capable of learning and performing tasks such as classification and regression.

  • Auto-classification

    Auto-classification is the process of automatically categorizing or labeling data based on its content or characteristics. It often involves the use of machine learning algorithms to classify data into predefined categories or clusters.

  • Auto-complete

    Auto-complete is a feature commonly found in text editing software and search engines that predicts and suggests completions for the current input based on previously entered text or patterns. In AI, it can be implemented using various techniques such as language models and recommendation systems.

  • Bagging

    A machine learning ensemble technique that combines the predictions of multiple models.

  • BERT

    Bidirectional Encoder Representations from Transformers, a pre-trained natural language processing model.

  • Bias-Variance Tradeoff

    The balance between bias and variance in machine learning models to achieve optimal performance.

  • Big Data

    Large volumes of data, both structured and unstructured, that inundate a business on a day-to-day basis.

  • Cataphora

    A linguistic concept where a pronoun refers to a later noun or phrase.

  • Categorization

    The process of organizing items into categories based on their similarities or differences.

  • Category

    A group or class of things having some common characteristics or attributes.

  • Category Trees

    Hierarchical structures that organize categories into parent-child relationships.

  • Classification

    The process of categorizing data points into classes or categories.

  • Clustering

    The process of grouping similar data points together.

  • Cognitive Map

    A mental representation of physical and spatial information.

  • Composite AI

    The integration of multiple AI technologies to create more advanced systems.

  • Computational Linguistics

    The interdisciplinary field dealing with the statistical and rule-based modeling of natural language from a computational perspective.

  • Computational Semantics

    The branch of computational linguistics and artificial intelligence that focuses on the meaning of words and sentences in a language.

  • Content Enrichment or Enrichment

    The process of enhancing content by adding metadata, tags, or other contextual information.

  • Controlled Vocabulary

    A predefined list of terms used to tag or categorize content.

  • Conversational AI

    AI systems capable of understanding and generating human-like dialogue.

  • Convolution

    A mathematical operation used in neural networks for feature extraction.

  • Convolutional Neural Network (CNN)

    A type of artificial neural network designed for image recognition and processing.

  • Co-occurrence

    The frequency with which two items appear together in a dataset.

  • Corpus

    A large and structured set of texts in digital form, used to study language.

  • Data Augmentation

    The technique of artificially increasing the size of a dataset by creating modified versions of existing data.

  • Data Discovery

    The process of finding and identifying relevant datasets for analysis.

  • Data Drift

    The gradual change in the distribution or properties of data over time, leading to model deterioration.

  • Data Extraction

    The process of retrieving data from various sources and converting it into a usable format.

  • Data Ingestion

    The process of collecting, processing, and importing data into a system or database.

  • Data Labelling

    The process of manually annotating data with labels or tags to facilitate supervised learning.

  • Data Scarcity

    The situation where there is an insufficient amount of data available for analysis or training models.

  • Decision Tree

    A decision support tool that uses a tree-like graph of decisions and their possible consequences.

  • Deep Learning

    A subset of machine learning that utilizes artificial neural networks with multiple layers.

  • Did You Mean (DYM)

    A feature in search engines or text editing software that suggests corrections for misspelled words or phrases.

  • Disambiguation

    The process of resolving ambiguity in natural language or data.

  • Domain Knowledge

    Expertise or understanding of a particular subject area or industry.

  • Embedding

    A mathematical representation of a word, phrase, or document in a continuous vector space.

  • Emotion AI (Affective Computing)

    The branch of artificial intelligence that deals with recognizing, interpreting, processing, and simulating human emotions.

  • Ensemble Learning

    A machine learning technique that combines the predictions of multiple models to improve accuracy and robustness.

  • Entity

    An object or concept that is identifiable and distinct, often referenced in natural language processing.

  • Environmental, Social, and Governance (ESG)

    Criteria used by investors to evaluate a company's impact on society and the environment alongside its financial performance.

  • Ethics in AI

    The study of moral principles and guidelines that govern the development and use of artificial intelligence systems.

  • ETL (Entity Recognition, Extraction)

    Extract, Transform, Load: A process used in data integration to collect data from various sources, transform it into a usable format, and load it into a target database.

  • Explainable AI

    Artificial intelligence models and systems that can provide understandable explanations for their decisions and actions.

  • Extraction or Keyphrase Extraction

    The process of automatically identifying and extracting key phrases or important information from text documents.

  • Feature Engineering

    The process of selecting, transforming, or creating new features from raw data to improve model performance.

  • Federated Learning

    A machine learning approach where multiple decentralized devices collaboratively train a model while keeping data localized.

  • Fine-tuning

    The process of further training a pre-trained model on a specific dataset to improve its performance on a particular task.

  • F-score (F-measure, F1 measure)

    A measure of a test's accuracy that considers both precision and recall, calculated as the harmonic mean of precision and recall.

  • Generative Adversarial Networks (GANs)

    A class of artificial intelligence algorithms used in unsupervised machine learning, composed of two neural networks: the generator and the discriminator.

  • Genetic Algorithms

    A search heuristic inspired by the process of natural selection, used to find optimal solutions to optimization and search problems.

  • Gradient Descent

    An optimization algorithm used to minimize the loss function by adjusting the parameters of a model in the direction of the steepest descent of the gradient.

  • Hallucinations

    In the context of AI, hallucinations refer to incorrect or misleading outputs generated by a model.

  • Hierarchical Reinforcement Learning

    A machine learning technique that applies reinforcement learning to hierarchical tasks, enabling agents to learn and navigate complex environments efficiently.

  • Hybrid AI

    The integration of multiple AI techniques or approaches, such as symbolic AI and machine learning, to solve complex problems.

  • Hyperparameters

    Parameters that define the structure and behavior of a machine learning model, typically set before the learning process begins.

  • Inference

    The process of drawing conclusions from data or models based on observed evidence or prior knowledge.

  • Inference Engine

    A component of artificial intelligence systems that applies logical rules to interpret and reason about data.

  • Insight Engines

    Systems that use artificial intelligence and natural language processing to discover insights and patterns within data.

  • Intelligent Document Extraction and Processing (IDEP)

    The automated extraction and processing of information from unstructured documents using artificial intelligence.

  • Intelligent Document Processing (IDP)

    The use of artificial intelligence technologies to automate the extraction, classification, and processing of information from documents.

  • Internet of Things (IoT)

    A network of interconnected devices embedded with sensors, software, and other technologies to exchange data and perform actions autonomously.

  • Kernel Methods

    A class of algorithms for pattern analysis and machine learning, based on defining and manipulating similarity functions in high-dimensional spaces.

  • K-nearest Neighbors (KNN)

    A non-parametric algorithm used for classification and regression tasks that relies on the similarity of data points in a feature space.

  • Knowledge Graph

    A graph-based knowledge representation that captures relationships between entities and their attributes in a semantic network.

  • Knowledge Model

    A formal representation of knowledge, typically structured in a way that is understandable by computers and used for reasoning and problem-solving.

  • Labelled Data

    Data that has been manually annotated with one or more labels, typically used for supervised machine learning tasks.

  • LangOps (Language Operations)

    The operationalization and management of language-related processes, tools, and workflows.

  • Language Data

    Data specifically related to language, including text corpora, speech recordings, and linguistic annotations.

  • Large Language Models (LLMs)

    Advanced natural language processing models with millions or billions of parameters, capable of generating human-like text.

  • Latent Variable

    A variable that is not directly observed but inferred from other variables, used in statistical models to represent hidden factors.

  • Lemma

    The base or dictionary form of a word, often used in natural language processing and linguistic analysis.

  • Lexicon

    A complete set of meaningful units in a language, including words, morphemes, and phrases, with associated semantic information.

  • Linked Data

    A method of publishing structured data so that it can be interlinked and become more useful through semantic queries.

  • Long Short-Term Memory (LSTM)

    A type of recurrent neural network architecture capable of learning long-term dependencies in sequential data.

  • Machine Learning (ML)

    A subset of artificial intelligence focused on the development of algorithms and statistical models that enable computers to learn and improve from experience.

  • Mean Squared Error (MSE)

    A measure of the average squared difference between predicted and actual values, commonly used to evaluate regression models.

  • Metadata

    Data that provides information about other data, such as the structure, format, or characteristics of a dataset.

  • Model

    A mathematical representation of a real-world process or system used to make predictions, classify data, or gain insights.

  • Model Compression

    The process of reducing the size of a machine learning model without significant loss in performance, often to enable deployment on resource-constrained devices.

  • Model Drift

    The phenomenon where the performance of a machine learning model deteriorates over time due to changes in the underlying data distribution.

  • Model Parameter

    A configuration variable or weight in a machine learning model that is learned from training data and used to make predictions.

  • Morphological Analysis

    The process of analyzing the structure and form of words in a language to understand their grammatical properties and relationships.

  • Natural Language Processing (NLP)

    A branch of artificial intelligence that focuses on the interaction between computers and humans through natural language.

  • Natural Language Understanding

    The ability of a computer program to comprehend and interpret human language in a meaningful way.

  • Neural Architecture Search

    The process of automatically finding the optimal architecture or configuration for a neural network.

  • NLG (Natural Language Generation)

    The process of producing human-like text or speech from structured data or pre-defined templates using artificial intelligence.

  • NLQ (Natural Language Query)

    The capability of querying databases or systems using natural language instead of traditional programming languages or query languages.

  • NLT (Natural Language Technology)

    Technology that enables computers to interact with users using natural language, including understanding, generating, and processing text.

  • One-shot Learning

    A machine learning approach where a model learns from only one example or a few examples of each class, mimicking human learning.

  • Ontology

    A formal representation of knowledge that defines the concepts, relationships, and properties within a domain.

  • Outlier Detection

    The process of identifying anomalies or outliers in a dataset that deviate from the norm or expected behavior.

  • Overfitting

    The phenomenon where a machine learning model learns to fit the training data too closely, leading to poor generalization and performance on unseen data.

  • Parsing

    The process of analyzing the grammatical structure of a sentence to determine its syntactic components and relationships.

  • Part-of-Speech Tagging

    The process of assigning grammatical categories (such as noun, verb, adjective) to words in a sentence.

  • PEMT (Post Edit Machine Translation)

    The process of manually correcting or improving machine-translated text to ensure accuracy and fluency.

  • Post-processing

    The manipulation or enhancement of data or output after it has been generated by a machine learning model or algorithm.

  • Precision

    A metric that measures the proportion of true positive predictions among all positive predictions made by a model.

  • Precision and Recall

    Two complementary metrics used to evaluate the performance of classification models, with precision measuring the accuracy of positive predictions and recall measuring the proportion of actual positives that were correctly identified by the model.

  • Precision-Recall Curve

    A graphical representation of the trade-off between precision and recall for different thresholds of a classification model.

  • Pre-processing

    The manipulation or transformation of raw data before it is fed into a machine learning algorithm or model.

  • Principal Component Analysis (PCA)

    A dimensionality reduction technique used to transform high-dimensional data into a lower-dimensional space while preserving the most important features or patterns.

  • Prompt Engineering

    The process of designing or refining prompts or input examples to guide the behavior of language models or AI systems.

  • Q-Learning

    A model-free reinforcement learning technique where an agent learns to make decisions by iteratively updating a value function based on the expected return of actions taken in different states.

  • Quantum Machine Learning

    The intersection of quantum computing and machine learning, exploring algorithms and models that leverage quantum computing principles to solve complex problems.

  • Random Forest

    An ensemble learning method that constructs multiple decision trees during training and outputs the mode or mean prediction of the individual trees as the final prediction.

  • Recall

    A metric that measures the proportion of actual positive cases that were correctly identified by a model out of all actual positive cases.

  • Recurrent Neural Networks (RNN)

    A class of neural networks designed to process sequential data by maintaining an internal state or memory, allowing them to capture temporal dependencies.

  • Regularization

    A technique used to prevent overfitting in machine learning models by adding a penalty term to the loss function that discourages complex or extreme parameter values.

  • Reinforcement Learning

    A machine learning paradigm where an agent learns to make decisions by interacting with an environment, receiving feedback in the form of rewards or penalties.

  • Relations

    In the context of data and knowledge representation, relations refer to the connections or associations between entities or concepts.

  • Responsible AI

    The ethical and accountable development, deployment, and use of artificial intelligence systems, considering their societal impacts and implications.

  • Rules-based Machine Translation (RBMT)

    An approach to machine translation that relies on explicit linguistic rules and patterns to translate text between languages.

  • SAO (Subject-Action-Object)

    A syntactic structure used to represent relationships between entities in a sentence, consisting of a subject performing an action on an object.

  • Self-Supervised Learning

    A machine learning paradigm where models learn to represent data without human-labeled supervision, often by predicting masked or corrupted input samples.

  • Semantic Network

    A graph-based knowledge representation where nodes represent concepts or entities, and edges represent relationships or connections between them.

  • Semantics

    The study of meaning in language, including the interpretation of words, phrases, and sentences in context.

  • Semantic Search

    A search technique that considers the meaning and context of user queries and documents to retrieve relevant results, often using natural language processing and semantic analysis.

  • Semi-structured Data

    Data that does not conform to a rigid schema or structure but contains some organizational elements, such as tags, keys, or attributes.

  • Semi-Supervised Learning

    A machine learning approach that combines both labeled and unlabeled data for training, leveraging the abundance of unlabeled data and a smaller set of labeled examples.

  • Sentiment

    The emotional tone or attitude expressed in a piece of text, speech, or communication.

  • Sentiment Analysis

    The process of identifying, extracting, and quantifying sentiment from text data to determine the emotional tone or attitude expressed.

  • Similarity (and Correlation)

    Similarity refers to the measure of resemblance or likeness between two objects or entities, while correlation measures the degree of relationship between two variables.

  • Simple Knowledge Organization System (SKOS)

    A standard vocabulary for representing knowledge organization systems, providing a model for expressing the semantics of concepts and relationships.

  • Speech Analytics

    The process of analyzing spoken language to extract insights, patterns, and actionable information from audio recordings or live speech.

  • Speech Recognition

    The ability of a computer program or system to transcribe spoken words or phrases into text.

  • Stochastic Gradient Descent (SGD)

    An optimization algorithm used to minimize the loss function by randomly selecting a subset of training examples at each iteration to update the model parameters.

  • Structured Data

    Data that is organized in a fixed format or schema, typically stored in databases or structured files with well-defined rows and columns.

  • Supervised Learning

    A machine learning approach where models are trained on labeled data, with input-output pairs provided during the training process to learn the mapping between inputs and outputs.

  • Symbolic Methodology

    An approach to artificial intelligence that emphasizes the use of symbols, logic, and rules to represent knowledge and perform reasoning.

  • Syntax

    The set of rules governing the structure and arrangement of words in a language, including grammar, word order, and sentence structure.

  • Tagging

    The process of assigning labels or tags to words, phrases, or documents to categorize or annotate them for analysis or organization.

  • Taxonomy

    A hierarchical classification system used to organize and categorize concepts, topics, or entities based on their relationships and characteristics.

  • Test Set

    A subset of data used to evaluate the performance of a machine learning model after it has been trained on a training set, helping to assess generalization and model quality.

  • Text Analytics

    The process of extracting insights and meaningful information from unstructured text data, including text mining, natural language processing, and sentiment analysis.

  • Text Summarization

    The process of generating concise and coherent summaries of longer texts while preserving the key information and meaning.

  • Thesauri

    Collections of words grouped together as synonyms, related concepts, or hierarchical relationships, used to expand or refine search queries and improve information retrieval.

  • Time Series Analysis

    The process of analyzing and modeling sequential data points collected over time to identify patterns, trends, or anomalies.

  • Tokens

    Individual units of language, such as words, phrases, or symbols, that are extracted or processed as discrete elements in natural language processing tasks.

  • Training Set

    A subset of data used to train a machine learning model, consisting of input-output pairs that the model learns from during the training process.

  • Transfer Learning

    A machine learning technique where knowledge gained from training on one task or dataset is transferred and applied to a different but related task or dataset, often to improve performance or reduce the need for labeled data.

  • Transformer

    A type of deep learning model architecture based on self-attention mechanisms, commonly used in natural language processing tasks such as translation and text generation.

  • Treemap

    A visualization technique used to display hierarchical data structures as nested rectangles, with the area of each rectangle proportional to the data it represents.

  • Triplet Relations (Subject Action Object (SAO))

    A syntactic structure used to represent relationships between entities in a sentence, consisting of a subject performing an action on an object, often used in natural language processing and knowledge representation.

  • Tuning (Model Tuning or Fine Tuning)

    The process of adjusting the hyperparameters or parameters of a machine learning model to optimize its performance on a specific task or dataset.

  • Unbalanced Dataset

    A dataset where the distribution of classes or categories is skewed, with some classes having significantly more samples than others, which can pose challenges for machine learning models.

  • Unstructured Data

    Data that lacks a predefined data model or organization, often in the form of text, images, audio, or video, requiring specialized techniques for analysis and processing.

  • Unsupervised Learning

    A machine learning approach where models learn patterns or structures from unlabeled data without explicit supervision, typically used for clustering, dimensionality reduction, or generative modeling.

  • Validation Set

    A subset of data used to evaluate the performance of a machine learning model during training, often used to tune hyperparameters and assess generalization before testing on unseen data.

  • Variational Autoencoder (VAE)

    A type of generative model that combines the principles of autoencoders and variational inference to learn a latent representation of data and generate new samples.

  • Variational Inference

    A method used to approximate complex probability distributions by transforming them into simpler distributions that are easier to work with, commonly used in Bayesian inference and generative modeling.

  • Zero-Coding Machine Learning

    An approach to machine learning that automates the process of model building and deployment without requiring manual coding or programming by users, often using graphical user interfaces or drag-and-drop tools.

  • Zero-shot Learning

    A machine learning paradigm where models are trained to recognize classes or concepts they have not been explicitly exposed to during training, often using auxiliary information or transfer learning techniques.

Stay informed, stay inspired.
Subscribe to our newsletter.

Get curated weekly analysis of vital developments, ground-breaking innovations, and game-changing resources in AI & ML before everyone else. All in one place, all prepared by experts.