QPO Phase 2: from diagonal to off-diagonal QUBO (n=130)
Phase 1 ended with a hypothesis: the pre-filter was a greedy gate in front of the quantum stage. The pipeline pre-scored all candidates, took the top-20 by individual score, and handed those to QAOA. If a feature only scores well in combination with another, the pre-scorer discards it before QAOA can evaluate the combination. The question for Phase 2 was whether removing that ceiling would change the win rate on open-ended tasks.
Phase 2 has two parts. The first removes the pre-filter and tests the hypothesis directly. The second changes the cost function itself. They ran sequentially, each informing the next.
Part A: Removing the Pre-Filter (n=30)
Configuration: 24 qubits, circuit_depth=4, 3 reps per goal, 10 goals. The pre-filter cap was raised from 20 to 24 — QAOA now sees a larger candidate set before the greedy ranker prunes it.
| Goal | QAOA wins | Classical wins | Ties | Mean Δ |
|---|---|---|---|---|
| Git commit message | 0 | 1 | 2 | −0.010 |
| QUBO/QAOA explanation | 1 | 0 | 2 | +0.007 |
| Incident report summary | 0 | 0 | 3 | 0.000 |
| Python docstring | 0 | 0 | 3 | 0.000 |
| Cold outreach email | 0 | 0 | 3 | 0.000 |
| Paragraph compression | 0 | 1 | 2 | −0.017 |
| YAML CI config | 1 | 0 | 2 | +0.033 |
| Code diff security review | 0 | 0 | 3 | 0.000 |
| Research abstract | 0 | 0 | 3 | 0.000 |
| RFC tone rewrite | 1 | 0 | 2 | +0.017 |
| Total | 3 | 2 | 25 | +0.003 |
3 QAOA wins. 2 classical wins. 25 ties. Win rate dropped from 16% (Phase 1) to 10%. Ties climbed from 74% to 83%.
A diagonal cost function means features score independently — no cross-terms. Raising the filter cap gives QAOA access to more candidates evaluated by the same independent scoring function. There’s no new combinatorial structure to find. The greedy ranker and QAOA converge on the same shortlist because the cost landscape doesn’t reward anything else. Any wins are stochastic divergence, not structural advantage.
The pre-filter was not the ceiling. Diagonal QUBO is the ceiling.
The Phase 2 failure condition was explicit: if the git commit win rate stays flat, the pre-filter wasn’t the bottleneck. Git commit went 0 wins, 1 classical win, 2 ties. That’s flat.
Part B: Off-Diagonal QUBO (n=100)
Diagonal QUBO is the problem class where classical greedy search is provably optimal. Part B changes the cost function to encode cross-feature correlation terms — the problem class QAOA was built for.
For each pair of candidates (i, j) in the pre-filtered set, Q_ij is computed as the mean score premium of historical runs whose winning feature vector overlaps with the union of i and j’s active features, weighted by Jaccard similarity. Positive Q_ij: the combination resembles historically high-scoring outputs. Negative Q_ij: the combination resembles historically poor ones. Off-diagonal terms are blended at 0.3× the diagonal weight.
Everything else held constant: 24 qubits, circuit_depth=4, same 10 goals, 10 reps per goal — and prompt depth uniform across all goals (single-sentence to short-paragraph).
| Goal | QAOA wins | Classical wins | Ties | Mean Δ |
|---|---|---|---|---|
| Cold outreach email | 3 | 1 | 6 | +0.004 |
| RFC tone rewrite | 3 | 2 | 5 | +0.003 |
| Research abstract | 3 | 2 | 5 | +0.002 |
| QUBO/QAOA explanation | 2 | 1 | 7 | +0.010 |
| Python docstring | 2 | 1 | 7 | +0.005 |
| Incident report summary | 2 | 1 | 7 | +0.007 |
| Paragraph compression | 2 | 1 | 7 | +0.003 |
| Git commit message | 2 | 1 | 7 | +0.003 |
| Code diff security review | 1 | 0 | 9 | +0.005 |
| YAML CI config | 1 | 3 | 6 | −0.013 |
| Total | 21 | 13 | 66 |
21 QAOA wins. 13 classical wins. 66 ties. Zero circuit fallbacks.
The Signal
Win rates across all phases:
| Phase | n | QAOA win% | Classical win% | Tie% |
|---|---|---|---|---|
| Phase 1 — diagonal, 20q | 50 | 16% | 10% | 74% |
| Phase 2a — diagonal, 24q | 30 | 10% | 7% | 83% |
| Phase 2b — off-diagonal, 24q | 100 | 21% | 13% | 66% |
The off-diagonal terms work. Win rate jumped from 10% to 21% — higher than Phase 1 — while classical wins rose proportionally. The tie rate dropped from 83% to 66%. Fewer ties means the cost structure is creating genuine differentiation. The cross-terms are doing something.
9 out of 10 task types show positive mean Δ for QAOA. The advantage isn’t concentrated in one task class the way Phase 1 was (git commit dominated the Phase 1 wins). It’s distributed across compositional, analytical, generative, and rewrite tasks. That’s a different character — Phase 1’s stochastic divergence looked like noise in one narrow regime. Phase 2b looks more like a consistent, if modest, structural effect.
The exception is YAML CI config: 1 win, 3 losses, mean Δ −0.013. Part A diagonal gave YAML its highest win in that batch (+0.033). Part B off-diagonal turns it negative. YAML is a highly constrained structured output task with a narrow answer space — and arguably one that shouldn’t route through an LLM at all, where a deterministic template would be both faster and more reliable. The co-occurrence cross-terms introduce noise into a landscape that was already well-characterised by individual feature scores. The pattern holds across both phases: QAOA adds value where the answer space is open and feature combinations are non-obvious. It loses ground where the answer space is narrow and greedy already finds the ceiling.
What Comes Next
The natural question is whether this signal survives on real hardware. Both phases ran on a CUDA-accelerated statevector simulator — a classical computation that exactly represents the quantum state. Physical QPU hardware introduces noise, gate errors, and decoherence.
Two outcomes are worth knowing. If the 21% win rate holds on hardware, QAOA offers a genuine advantage for prompt feature selection on tasks with open answer spaces. If noise washes it out, that’s also a finding: the signal is present in simulation but too weak to survive the transition to hardware at current qubit fidelity, and the right question becomes what noise threshold is required for it to emerge.
Either answer is worth publishing. Neither has been run for prompt feature selection — this problem class is novel enough that there’s no prior result to compare against.
Code: github.com/waratahlabs/qpo.