Skip to main content
Your AI agent can run your test suite, read the results, understand why tests failed, and fix the code. It doesn’t just report “3 tests failed” - it traces each failure to the root cause and determines whether the bug is in the production code or the test itself.

How the agent handles test failures

When a test fails, the agent doesn’t stop at the error message. It:
  1. Reads the full test output to identify which tests failed and why
  2. Opens the failing test file and reads the test expectations
  3. Opens the code under test and reads the implementation
  4. Determines whether the test expectation is wrong or the implementation has a bug
  5. Applies the fix and re-runs the tests to confirm
This means you can hand off a red test suite and come back to a green one.

What you can ask

Run all tests

Run my tests. If anything fails, fix it.
The agent will execute the full test suite, identify any failures, trace each one to its root cause, and fix the code. It re-runs to confirm everything passes.

Run a specific test class or method

Run only the CheckoutTests and fix whatever is failing.
The agent filters the run to just that class, identifies failures, and fixes them. This is useful when you’re working on a specific feature and don’t want to wait for the full suite.
Run CheckoutTests.testApplyDiscountCode. If it fails, figure out
whether the bug is in the test or the implementation and fix it.
You can go as specific as a single test method. The agent will run it, read the assertion error, compare the test expectation against the implementation, and fix whichever side has the bug.

Fix failing tests

Run tests. If anything fails, figure out why and fix it.
The agent runs the suite, identifies failures, and for each one, reads the test and the production code to understand what went wrong. It applies fixes (to whichever side has the bug) and re-runs to confirm everything passes.

Run tests after a code change

I just refactored the PaymentService. Run the tests for that module
and make sure nothing broke. Fix any regressions.
The agent will find the relevant test files, run them, and if your refactor broke something, it traces exactly which behavior changed and fixes the code to make the tests pass again.

Audit test coverage for a feature area

Run all tests related to authentication. Identify any gaps in coverage
and write tests for any untested paths.
The agent will discover test files matching the area (by name or by searching for relevant imports), run them, and analyze what’s covered. If it finds gaps - like missing tests for the password reset flow or error handling - it writes the missing tests and runs them to confirm they pass.

Write tests for new code

I just added a CartManager class. Write unit tests for the addItem,
removeItem, and calculateTotal methods. Make sure they all pass.
The agent will read the CartManager implementation, write tests covering the happy path and edge cases (empty cart, duplicate items, negative quantities), add them to the test target, and run the suite to verify they all pass.

Fix a flaky test

CheckoutTests.testConcurrentAddToCart fails intermittently.
Run it 5 times, find the root cause, and fix it.
The agent will run the test multiple times, compare the outputs, and look for patterns: race conditions, timing dependencies, shared state leaking between runs. Once it identifies the root cause, it fixes the underlying issue and re-runs to confirm the test is stable.

Run tests on a different simulator

Run the UI tests on iPad Pro. Make sure all layout assertions pass
on the larger screen size.
The agent will resolve the iPad simulator, boot it if needed, and run the UI test suite. This catches layout issues that only appear on different screen sizes. If any tests fail, it traces and fixes the layout problems.

Tips for better results

Name your tests clearly. Tests like testCalculateTotal_withDiscountCode_appliesPercentageOff give the agent immediate context about the expected behavior without reading the implementation.
Tell the agent which module or area to focus on. “Run the payment tests” is faster and more focused than “run all tests” when you’re iterating on a specific feature.
Ask the agent to explain, not just fix. “Explain why this test fails” often gives you better insight than “fix the failing test”, especially when the root cause might be a design issue rather than a simple bug.