[ Blog ]

Notes from the team building Swarmcheck.

Jun 15, 2026·8 min read·Quality Engineering
Your AI agent passed CI. It failed your users.
Vibe coding has made it possible to ship AI products in hours. The tests that come with them often pass. The quality that matters — whether the agent actually does the right thing — is the part nobody automated.
Jun 11, 2026·6 min read·Myth Busting
7 AI QA myths that make AI agents fail in production
A practical teardown of the AI QA myths that weaken release confidence, from prompt testing shortcuts to missing red teaming, voice QA, and LLM evaluation.
Jun 8, 2026·7 min read·Maturity Model
AI QA maturity model: from prompt tests to AI quality engineering
A practical maturity model for teams moving from ad hoc prompt testing to measurable AI quality engineering across agents, chat, voice, evals, and red teaming.
Jun 7, 2026·9 min read·Deep Dive
The art and science of evaluating AI agents
Our ability to measure AI agents has fallen behind our ability to build them. Here is a framework for what great evaluation actually looks like — and where the field needs to go next.
Jun 4, 2026·6 min read·Beginner's Guide
Beginner's guide to AI quality engineering: how to test intelligence before release
AI quality engineering is the discipline of testing whether AI systems are useful, grounded, safe, observable, and ready to ship. Here is where to start.
Jun 2, 2026·8 min read·Deep Dive
8 ways self-evolving AI agents are rewriting the QA playbook
AI systems that construct, improve, and reuse their own subagents are moving from research to engineering practice. Here is what that shift means for the teams responsible for verifying them.
Jun 2, 2026·8 min read·Deep Dive
The verification gap: AI builds faster than you can verify
Building with AI agents is no longer the hard part. Verifying what they built is. Here is why verification is the real bottleneck in AI-assisted development — and what closing that gap actually requires.
Jun 1, 2026·7 min read·Listicle
7 AI QA checks to run before shipping an AI agent
A practical AI QA checklist for teams testing chat agents, voice agents, LLM workflows, prompt changes, and agentic product behaviour before release.
May 31, 2026·6 min read·Beginner's Guide
Beginner's guide to AI QA: how to test AI products that do not behave twice
AI QA is not just automation with a model attached. It is a new quality discipline for non-deterministic products, agents, voice assistants, and LLM workflows.
May 31, 2026·5 min read·Comparison
Traditional QA is testing the wrong thing: 6 AI failures scripts miss
Your E2E suite can click checkout. It cannot tell whether your AI support agent invented a policy, ignored a guardrail, or called the wrong tool.
Apr 18, 2026·6 min read
Why a tuned QA agent beats a scripted suite (and where it doesn't)
Selectors are the worst part of E2E testing. Here's how we replaced them with intent-based agents — and the tradeoffs you should know going in.
Mar 30, 2026·7 min read
Killing the flake: how Swarmcheck handles waits, retries, and timing
Most 'flaky' tests are timing bugs in disguise. We rebuilt our wait strategy from the ground up. Here's what shipped.
Mar 12, 2026·9 min read
Evals for QA agents without losing your mind
A pragmatic guide to evaluating an autonomous tester at scale, with the tradeoffs we made along the way.

Notes from the team building Swarmcheck.

Your AI agent passed CI. It failed your users.

7 AI QA myths that make AI agents fail in production

AI QA maturity model: from prompt tests to AI quality engineering

The art and science of evaluating AI agents

Beginner's guide to AI quality engineering: how to test intelligence before release

8 ways self-evolving AI agents are rewriting the QA playbook

The verification gap: AI builds faster than you can verify

7 AI QA checks to run before shipping an AI agent

Beginner's guide to AI QA: how to test AI products that do not behave twice

Traditional QA is testing the wrong thing: 6 AI failures scripts miss

Why a tuned QA agent beats a scripted suite (and where it doesn't)

Killing the flake: how Swarmcheck handles waits, retries, and timing

Evals for QA agents without losing your mind