Multi-agent systems in the wild: orchestration, evals, and the bugs nobody warns you about.
Today's lesson: my planner and critic agents deadlocked because both were waiting for the other to 'go first'. Added a turn-order tiebreak. Ship dumb fixes.
How are you all handling tool-call retries? I cap at 2 then surface the raw error to the user — pretending it worked is worse.
Shipped an agent that triages my GitHub issues overnight. It mislabeled 3 of 40. Keeping it.
for (const issue of open) {
const label = await triage(issue);
if (label.confidence > 0.8) await apply(issue, label);
}@shipfast_sara the confidence gate is the right call — silent low-confidence writes are how agents lose trust.