Glossary · WTF

abandonment signals: Events generated when a user starts consuming content (play, open, begin reading) but stops within a threshold window — typically 60 seconds for video, or before reaching a meaningful completion percentage for articles.
agent migration: The process of transitioning an LLM application from a simpler architecture (single call, fixed prompt) to a more capable one (multi-step, multi-skill, stateful) — ideally triggered by empirical signals rather than hype.
algorithmic diversity: The property of a recommendation system that actively manages the variety of content served to users, including both breadth across topics and balance across perspectives.
ANN index: Approximate Nearest Neighbour index — a data structure (FAISS, ScaNN, Qdrant, HNSW) that finds vectors close to a query vector in sublinear time.
avoid signals: Pre-computed per-user lists of content, creators, tags, or categories that should be excluded from or heavily down-ranked in recommendation candidates. Built from aggregated negative signals rather than applied ad-hoc at serving time.
Behavioral signals: Any user action that reveals preference without explicit statement: clicks, completions, skips, time-on-content, return visits, session length. Rich but noisy — behavior reflects many things, not all of them preference.
behavioral trajectory: The directional change in a user's behavior over time — not just what they do, but whether they're doing it more or less, and how fast that's changing.
BM25: Best Matching 25 — a classic probabilistic information retrieval algorithm that scores documents based on term frequency, inverse document frequency, and document length. Often used alongside semantic search for keyword-heavy queries.
CatBoost: Yandex's gradient boosting library, notable for native ordered-boosting on categorical features (avoids target leakage) and symmetric tree growth for good generalisation.
ColBERT: A late-interaction model that stores per-token embeddings for documents and computes relevance via MaxSim over token pairs. Middle ground between dual encoder speed and cross-encoder accuracy.
Collaborative filtering: Recommendation technique that predicts preferences by finding users with similar past behavior. 'People who liked X also liked Y.' No content understanding required — just pattern matching over interaction matrices.
contextual bandit: A bandit algorithm that conditions arm selection on contextual features (user attributes, time of day, etc.) rather than treating all rounds identically. Duolingo's approach is simpler — it uses eligibility-based scoring rather than full contextual features.
contrastive objective: A loss function that pushes positive (user, item) pairs closer in embedding space and negative pairs further apart. The model learns what 'relevant' means by seeing examples of both.
cross-features: Features that depend on both user and item together, like 'has this user listened to this creator before?'. The most powerful ranking signals, but require seeing both sides simultaneously.
cross-modal inference: Reasoning that integrates information across structurally different input types — structured data, free text, metadata — within a single forward pass. The model doesn't process each modality separately; it reasons across them jointly.
decay function: A mathematical function applied to repeated content patterns in a user's history that reduces the weight given to stale or over-represented signals, preventing the recommendation system from over-indexing on recent engagement spikes.
deep agents: AI systems that operate over multiple reasoning steps, use tools, maintain state, and can spawn sub-agents — as opposed to single-shot LLM calls that take an input and return an output.
direct signals: Signals explicitly provided by the user or system — ratings, bookmarks, importance scores, metadata filters. 'The user said this matters.' Strength: unambiguous intent. Weakness: sparse, requires user effort.
disengagement signals: Aggregated behavioral patterns indicating persistent non-preference across a category, tag, or content type. Built from multiple exposure-without-engagement observations rather than any single event.
diversity injection: A broader term for any intervention that increases the variety of content served to a user, encompassing exploration slots, serendipity injection, and algorithmic diversity mechanisms.
Dot: A personal AI built by New Computer that maintains long-term memory of user preferences and context. Uses an agentic memory system that dynamically structures information during creation for later retrieval.
dot product: The sum of element-wise products of two vectors. Measures alignment in the embedding space. Higher = more relevant.
downstream consumer: Any system that reads and acts on part of an LLM's output. In the multi-artifact pattern, multiple downstream consumers each depend on different artifacts of the same response.
Ebbinghaus forgetting curve: Hermann Ebbinghaus's 1885 finding that memory retention decays exponentially over time. Duolingo applies this to notification fatigue: the 'memory' of having seen a template fades with a ~15-day half-life.
echo chamber: A self-reinforcing information environment where existing beliefs are amplified and contrary views are systematically filtered out, regardless of their validity.
EFB: Exclusive Feature Bundling — identifies features that rarely take non-zero values simultaneously and bundles them into a single feature, reducing dimensionality without losing information.
eligibility criteria: Conditions that must be true for a notification template to be available to a specific user in a given round — streak length, learning goals, time of day, etc.
embedding-based retrieval: Using dense vector representations of content (from models like BERT, Arctic, or OpenAI's ada) to find items that are similar in meaning. The backbone of modern search and retrieval systems.
emotional state inference: The process by which an LLM derives a user's likely emotional context from behavioural patterns, content choices, and interaction history — used internally for ranking and personalisation, never surfaced to users.
empathy-not-surveillance: A design principle: the system should act on emotional inference without exposing it. The output should feel attuned, not monitored.
exploration slots: Dedicated positions in a recommendation feed reserved for content outside a user's established preference profile, used to prevent filter bubble formation and discover new user interests.
exploration/exploitation trade-off: The fundamental tension in recommendation and reinforcement learning between exploiting known-good options and exploring unknown options that might be better.
exposed-not-engaged: Content, tags, or creators that were rendered in a user's recommendation surface (impressions logged) but received zero interaction across multiple sessions. The user saw it. They chose not to interact. That's a signal.
feature importance values may change: LightGBM does not guarantee feature importance stability across major versions. The release notes don't document specific importance algorithm changes, but users report significant deltas (see GitHub #6964). Likely caused by training behaviour differences rather than a deliberate formula change.
feature stores: Infrastructure for managing, serving, and versioning ML features across training and inference. Examples include Feast (open-source, batch + online serving), Hopsworks (real-time feature pipelines with RALF-style scheduling), and Tecton (managed, with built-in freshness SLAs). The feature store is where signal stability classification becomes concrete infrastructure.
filter bubble: The informational environment created when algorithmic personalization shows users increasingly narrow content aligned with their existing views, insulating them from contrary perspectives.
frees the underlying dataset: Specifically, free_raw_data=True releases the Python-side raw data after Dataset construction. The internal LightGBM Dataset still exists, but any code that later needs the original raw data (e.g., setting references, modifying metadata, or reconstructing the Dataset) will silently get garbage.
generative recommendation: Using LLMs to generate recommendations directly rather than scoring pre-computed candidate lists. Collapses the candidate retrieval + scoring pipeline but introduces latency and cost tradeoffs.
GOSS: Gradient-based One-Side Sampling — keeps all data points with large gradients (high error) and randomly samples from small-gradient points, reducing the dataset without losing much information gain signal.
gradient boosting: An ensemble method that trains weak learners (usually decision trees) sequentially, where each new learner corrects the errors of the combined ensemble so far.
GRPO: Group Relative Policy Optimization — generates a group of outputs, ranks them relative to each other, reinforces the best. No value model needed.
habituate: The psychological process by which repeated exposure to the same stimulus reduces response. In notification systems, sending the same message repeatedly causes users to ignore it — even if it was initially the most effective option.
Histogram-based splitting: Discretising continuous features into bins (256 by default) so finding the best split scans bins rather than sorting the full dataset. O(bins) instead of O(n·log(n)).
hyperparameter tuning: Systematically searching for the best model configuration (learning rate, tree depth, regularisation, etc.) using techniques like Bayesian optimisation or random search.
In-batch negatives: Using other users' positive items in the same training batch as negatives. Simple, free, surprisingly effective at batch sizes of 4096+.
intention-action gap: The measurable divergence between what a user states they intend to do and what behavioral data shows they actually did. Can be positive (exceeded intentions) or negative (fell short). Direction and magnitude both carry signal.
KL Divergence: Measure of how much one probability distribution differs from another. Used as a penalty to stop RL-trained models from drifting too far from their original behavior.
LambdaRank: A listwise learning-to-rank algorithm that optimises pairwise ranking loss weighted by the change in NDCG from swapping two items.
LangGraph: A framework for building stateful, multi-actor LLM applications as directed graphs — nodes are callables (LLM calls, tools, functions), edges are conditional transitions.
LangSmith: LangChain's observability and evaluation platform for LLM applications. Provides tracing, dataset management, experiment comparison, and evaluator frameworks for systematic testing.
Leaf-wise growth: A tree growth strategy that always splits the leaf with the highest loss reduction, regardless of depth. Produces unbalanced trees that reduce error faster per split, but can overfit more easily on small datasets.
LightGCN: Light Graph Convolutional Network — a simplified GNN for collaborative filtering that removes feature transformation and nonlinear activation, keeping only the core neighbourhood aggregation. Performs surprisingly well given its simplicity.
logged intentions: Behavioral events that represent an intentional act being executed — a user consciously recording or marking an action as deliberate, rather than the system passively observing it.
mcp-memory-service: An open-source MCP server providing persistent, semantically searchable memory with knowledge graph relationships, emotional analysis, salience scoring, and Hebbian learning. Used by AI agents for long-term memory.
micro-batch training: Training a model incrementally across batches of data, passing the previous model as init_model to each subsequent batch. Keeps memory constant regardless of total dataset size.
MoE: Mixture of Experts — an architecture with many expert sub-networks, only a few activated per input. Big model, small per-token cost.
multi-armed bandit: An exploration-exploitation framework where an agent repeatedly chooses among options ('arms') to maximise cumulative reward, balancing trying new options against exploiting known good ones.
multi-artifact output: An LLM response structured as multiple typed sub-documents, each shaped for a distinct downstream consumer. The single inference call amortises across all consumers.
multi-window aggregation: Computing the same statistical aggregate (mean, sum, count, etc.) over multiple distinct time windows simultaneously, allowing trajectory to be inferred by comparing values across windows.
narrative: The prose artifact in a multi-artifact output. A human-readable summary of the model's reasoning and understanding, serving debugging, logging, and chat interfaces.
Negative samples: Items the user did not interact with, used as counterexamples during training. The quality of negatives determines the quality of the model.
negative signals: Behavioral signals derived from non-engagement, abandonment, or repeated avoidance of exposed content. Distinct from explicit dislikes — these are inferred from what users demonstrably did *not* do after being given the opportunity.
nugget-based evaluation: Evaluation where key facts ('nuggets') are identified in a reference answer, then checked against the model's output. Captures partial credit.
Off-policy RL: RL where the agent learns from data generated by a different or older policy. More sample-efficient but can be less stable.
offline agent: An LLM agent that runs on a schedule (cron, event-driven, or periodic batch) rather than in response to user requests. Its outputs are stored as artifacts that the online serving layer reads without invoking the LLM.
one-and-done: A creator or channel that a user sampled exactly once and never returned to, despite being repeatedly exposed to subsequent content from that source. Distinct from creators the user has never been exposed to.
online learning: A machine learning paradigm where the model updates continuously from incoming data rather than being retrained in periodic batch cycles. Enables minute-level adaptation but requires careful infrastructure.
optimal split subsets: In theory, 2^(k-1)-1 possible partitions — exponential. In practice, LightGBM uses an efficient O(k log k) procedure: sort categories by accumulated gradient/hessian, scan the sorted order. Not brute-force, but still costly at high cardinality.
Pareto optimal: A state where you can't improve one objective without making another worse. The best possible trade-off between competing goals.
Park et al.: Reference to 'Generative Agents: Interactive Simulacra of Human Behavior' (Park et al., 2023), which introduced a memory scoring formula for AI agents: score = α_recency × recency + α_importance × importance + α_relevance × relevance. The foundational work that most agent memory systems extend.
persistent context: State persisted between agent invocations — allowing an LLM agent to remember decisions, user preferences, and history across multiple calls without a backend database.
pre-computed personalization: A personalization approach where user models are built in advance (offline) rather than at request time. Trades freshness for speed, cost, and reliability.
pre-computed search params: Search parameters — filters, expansion terms, ranking weights, boost signals — calculated ahead of query time and stored on the user profile. Moved from the query path to the profile update path.
profile artifact: A structured data object (JSON, msgpack, protobuf) generated by an offline agent that encodes personalization state. Think: {preferred_content_types: [...], search_params: {...}, notification_triggers: [...], computed_at: }.
profile-to-API mapping: The translation layer that converts structured user profile data into API call parameters. In search, this means turning profile attributes into query modifiers, filters, expansion terms, and ranking signals.
progressive disclosure: An architectural principle where complexity is hidden until it's needed — users and systems see only the relevant interface for their current task, with more detail available on demand.
qualitative signals: Signals derived from intention, meaning, and narrative context — what someone is trying to do, how something made them feel, where they're going emotionally. Cannot be expressed as a number without losing the point.
quantitative signals: Signals expressed as measurable quantities — time of day, play count, session length, click-through rate. Accurate, auditable, and capable of making users feel like inventory items.
query expansion: The process of adding related terms to a user's search query to improve recall. 'sleep' might expand to 'sleep hygiene, circadian rhythm, REM' — or to 'shift work fatigue, microsleeps, alertness' depending on context.
RAG: Retrieval-Augmented Generation. The pattern of retrieving relevant documents from a knowledge base and injecting them into an LLM's context window before generation. Only as good as the retrieval.
RAGAS: Retrieval-Augmented Generation Assessment. An open-source framework for evaluating RAG pipelines with metrics like faithfulness, answer relevancy, context precision, and context recall.
RALF: Accuracy-Aware Scheduling for Feature Store Maintenance (Wooders et al., VLDB 2024). A scheduling framework that prioritizes feature updates by their impact on downstream prediction accuracy, not by how stale they are.
rate of change: In behavioral analytics: how quickly a user's metric is changing between time windows. A large difference between 7d and 90d windows indicates rapid behavioral change; a small difference indicates stability.
Reciprocal Rank Fusion: A technique for combining ranked result lists from multiple retrieval methods. For each item, scores = Σ 1/(k + rank_i) across all retrieval methods. Simple, effective, and widely used in hybrid search.
recommendation seed: A structured hint to a recommendation engine: a theme, query suggestion, filter parameters, and crucially a human-readable explanation of why this recommendation was generated.
relational signals: Signals derived from connections between entities — co-access patterns, knowledge graphs, social links, Hebbian associations. 'People who liked X also liked Y.' Strength: captures latent preferences. Weakness: popularity bias, cold start.
relative difference: (μ⁺ - μ⁻) / μ⁻ — measures how much better a template performs compared to the baseline reward of its own eligible population. Controls for confounding eligibility criteria without needing to know what those criteria are.
request-time inference: Any LLM call that sits on the critical path between a user action (button press, page load, search query) and the system's response. If the LLM is slow or unavailable, the user waits.
RGCN: Relational Graph Convolutional Network — a GNN variant that learns separate weight matrices for different edge types, allowing the model to treat 'purchased' and 'viewed' relationships differently.
search suggestion chips: UI elements that appear below the search bar showing related or suggested queries. In a personalized system, these can be pre-generated from the same expansion_terms structure — surfacing the expansion before the user even completes their query.
semantic signals: Signals derived from the meaning of content — embeddings, topic models, entity extraction. 'This document is about X.' Strength: handles cold start. Weakness: doesn't know what the user actually wants.
semantic summarization: Using an LLM to produce a natural-language or structured summary of a user's behavioural history that captures intent, interest, and context — not just surface-level keywords or item IDs.
Semantic understanding: The ability to grasp meaning, context, and implication — not just surface pattern-match on tokens. Semantic understanding connects 'anicca,' 'impermanence,' and 'embracing change' as instances of the same underlying concept.
serendipity injection: The deliberate introduction of content that is surprising yet relevant to a user — discovered through low-intensity signals like brief pauses, incomplete reads, or peripheral engagement rather than explicit clicks or completions.
set intentions: Explicit, forward-looking statements a user makes about what they plan to do or how they want to be — before or at the start of a session or period.
signal field: A human-readable explanation attached to a recommendation seed, explaining why that recommendation was generated. Enables auditing and debugging of AI decisions without a separate explainability layer.
skills: Self-contained capability modules that an agent can invoke — each skill has its own prompt, tools, and state, developed and deployed independently.
snapshot vs trajectory: Snapshot: a single-point-in-time measure of user behavior. Trajectory: the direction and rate of change of that behavior across time. Snapshots describe state; trajectories describe momentum.
softmax: A probability distribution where each option's selection probability is proportional to exp(score/τ), with temperature τ controlling how aggressively the algorithm exploits high-scoring options vs. exploring alternatives.
SPLADE: Learned sparse retrieval — produces sparse, high-dimensional representations that work with inverted indexes. Combines neural semantic power with traditional search infrastructure.
staleness tolerance: How much degradation a signal's prediction quality suffers when computed from stale data. High tolerance means daily refresh is fine. Zero tolerance means you need it live.
stated vs observed: The distinction between what a user explicitly reports (preferences, intentions, goals) and what behavioral data shows they actually do. Core tension in personalization system design.
structured output: LLM response constrained to a predefined schema, typically JSON. Enables programmatic consumption without fragile string parsing.
summary evaluators: Experiment-level evaluators in LangSmith that receive the complete set of runs and examples after all individual evaluations complete. Used for aggregate metrics like precision, recall, and F1 that only make sense across an entire dataset.
surveillance language: Copy or UI text that exposes the system's knowledge of a user's behavioural patterns — time-based labels, frequency references, habit callouts. Technically accurate, socially wrong.
synthetic evaluation data: Test data generated by LLMs to simulate realistic user scenarios while preserving privacy. Includes synthetic user backstories, conversation histories, and labeled relevance judgments.
TabPFN v2: A pretrained transformer that performs in-context learning on tabular data — you pass it a dataset, it predicts, no gradient updates needed. Published in Nature, 2025.
Tag frequency: Simple ranking of how often a user has engaged with content carrying a given tag. High frequency = high affinity. Fast, interpretable, and completely blind to meaning.
temporal windows: Discrete time periods used to compute behavioral aggregates. Common choices: 7d (recent), 30d (medium-term), 90d (long-term baseline). Comparing the same metric across windows reveals behavioral trajectory.
Test-Time Compute: Compute spent during inference, as opposed to training. Agentic search trades more inference compute for better answers.
theme-keyed expansion: A query expansion structure where terms are organised by topic theme rather than stored as a flat list. The key is a canonical query theme; the value is the list of expansion terms personalised for that user.
through-line detection: The LLM's ability to identify a unifying semantic theme that runs across multiple, structurally different data signals — behavioral patterns, free-text inputs, and content metadata — simultaneously. The 'through-line' is what connects them.
through-lines: Persistent narrative threads in a user's content history — recurring themes, emotional arcs, ongoing interests that span multiple sessions and signal meaning rather than just behaviour.
trend detection: The problem of distinguishing genuine directional change in user behavior from statistical noise, short-term anomalies, or one-off events. Not every upward trend is real; not every downward trend is a churn signal.
two-tower models: A retrieval architecture where user and item representations are computed independently in separate 'towers' (neural networks), then combined via dot product or similar at serve time. Enables precomputation of item embeddings.
Value model: In traditional RL (like PPO), a separate network estimating expected future reward. GRPO eliminates this by using group-relative comparisons instead.
virtual filesystem: A per-agent or per-user namespace abstraction — a logical directory tree that the agent treats as its working memory, backed by whatever storage layer you already have. The 'filesystem' is the contract, not the implementation.

No results