#paper-review
-
2026-03-12
WTF is the Recovering Difference Softmax Algorithm?Duolingo's notification algorithm isn't just A/B testing with extra steps. It's a bandit that knows when to shut up — and that's the hard part.
-
2026-03-09
KARL: Knowledge Agents via Reinforcement LearningDatabricks trained an RL-based search agent on GLM 4.5 Air that beats Claude 4.6 and GPT 5.2 on enterprise knowledge retrieval — at a fraction of the cost.