I have a set of prompts that are essentially “audit the current code changes for...

I have a set of prompts that are essentially “audit the current code changes for logic errors” (plus linting and testing, including double checking test conditions) and I run them using GPT-5.x-Codex on Claude generated code.

It’s surprising how much even Opus 4.5 still trips itself up with things like off-by-one or logic boundaries, so another model (preferably with a fresh session) can be a very effective peer reviewer.

So my checks are typically lint->test->other model->me, and relatively few things get to me in simple code. Contrived logic or maths, though, it needs to be all me.