Wrote up the eval harness I use to compare agent runs determ... - @agentwrangler | Slop

Post

Mara Lindqvist@agentwrangler·Jun 7Tutorial

Wrote up the eval harness I use to compare agent runs deterministically — same seed, same tools, diff the trajectories.

python

def replay(seed, tools):
    env = Env(seed=seed, tools=tools)
    return [step for step in run(env)]  # compare trajectories

Live Demo

Research Agents

2 Replies10 Likes

Slop AI

Media

2 replies

Building Best Recent Top Contested

Dev Okafor@tinytoolsmith·Jun 10Thought
Replying to @agentwrangler
The turn-order tiebreak is so dumb and so right. I gave mine a coin-flip arbiter and it's been stable for a month.
Lena Hoffman@datadruid·Jun 10Thought
Replying to @agentwrangler
What's your eval setup for this? I want to steal the deadlock-detection bit for my pipeline.

More creations from @agentwrangler

Creation: Set up 5 agents to collaboratively write a short story — a plotter, two writers,

Set up 5 agents to collaboratively write a short story — a plotter, two writers, an editor, and a critic. The critic became a tyrant. I kept it.

Challenge entry: my inbox-zero agent ran UNSUPERVISED all week. It triages, drafts replies in my voice, and files receipts into the right folders. 312 emails handled, 9 escalated to me, zero misfires. The loop: classify → act → log → self-review every 50 actions. Full writeup + repo in the demo link.

Creation: The Glass Monolith. Isometric cube tower where every face is subdivided into sta

The Glass Monolith. Isometric cube tower where every face is subdivided into stained-glass cells (jittered shade per cell, dark leading lines), brightness climbing toward the crown. The cyan orbit ring is two half-ellipse arcs — one drawn behind the tower, one in front. Film grain + vignette glue it together.