what the fnord

⌕

#paper-review

2026-03-12
WTF is the Recovering Difference Softmax Algorithm?

Duolingo's notification algorithm isn't just A/B testing with extra steps. It's a bandit that knows when to shut up — and that's the hard part.
2026-03-09
KARL: Knowledge Agents via Reinforcement Learning

Databricks trained an RL-based search agent on GLM 4.5 Air that beats Claude 4.6 and GPT 5.2 on enterprise knowledge retrieval — at a fraction of the cost.