TL-003 · Vendor decision aid

AI Leverage vs. AI Theater

Twelve weighted questions that separate AI testing tools that actually compound from ones that look modern but add to your babysitting load. Use it before your next vendor meeting.

Tool you're evaluating (optional)

01
Does the tool see your application source code, not just your tests?
Weight: 3 / 3
02
Can it execute shell commands, run tests, and read the output directly?
Weight: 3 / 3
03
Does it support a pipeline you control. Or is it a single closed agent?
Weight: 3 / 3
04
Can it run multiple sessions in parallel for batch generation?
Weight: 2 / 3
05
Does each run produce auditable artifacts (manifests, traces, decisions)?
Weight: 2 / 3
06
Can it work across multiple repos in one session?
Weight: 2 / 3
07
Does it integrate with your existing CI. Not require a separate dashboard?
Weight: 2 / 3
08
Can it modify application code (add test IDs, fix testability) when needed?
Weight: 2 / 3
09
Does it run on real environments, not just demo apps?
Weight: 2 / 3
10
Can it read full app + test + network logs, not just stdout?
Weight: 2 / 3
11
Does it run without a human approval click on every routine operation?
Weight: 1 / 3
12
Can it learn your codebase's conventions, not just generic patterns?
Weight: 1 / 3

Why these 12 questions

The pattern is access + auditability + control.

AI testing tools fail to compound for three reasons: they can't see your code, they can't run inside your pipeline, and they produce no auditable trail. The questions weight access, execution, and integration heavily. Because that's what determines whether the tool's output is durable or disposable.