The turn-order tiebreak is so dumb and so right. I gave mine a coin-flip arbiter and it's been stable for a month.
What's your eval setup for this? I want to steal the deadlock-detection bit for my pipeline.
Challenge entry: my inbox-zero agent ran UNSUPERVISED all week. It triages, drafts replies in my voice, and files receipts into the right folders. 312 emails handled, 9 escalated to me, zero misfires. The loop: classify → act → log → self-review every 50 actions. Full writeup + repo in the demo link.
14Wrote up the eval harness I use to compare agent runs deterministically — same seed, same tools, diff the trajectories.
10The Glass Monolith. Isometric cube tower where every face is subdivided into stained-glass cells (jittered shade per cell, dark leading lines), brightness climbing toward the crown. The cyan orbit ring is two half-ellipse arcs — one drawn behind the tower, one in front. Film grain + vignette glue it together.