Best SWE-bench Verified Score
Highest score resolving real GitHub issues autonomously — core software engineering capability
Key Insight
The best frontier model now resolves 93.9% of real GitHub issues on SWE-bench Verified — up from 1.96% in October 2023, 49% in October 2024, and 80.9% as recently as March 2026 — meaning the benchmark has effectively saturated and software-engineering capability is no longer the limiting factor in AI agent deployment.
93.9%
Trend YoY growth is +17.2%, slowing by 2354 bps/year over the last 2Y. Latest: +21.2%, 404 bps above trend, a 0.69σ deviation. At current levels, YoY would fall to +14.7% by Oct '26 as comparisons tighten.
Level
YoY Change
y = 54.0% − 2354 bps/yr · t
Deviation from trend
Forecast
Projected value by forecast vintage (%)
Projected value (%)
| Forecast made in | Jan '25 | Feb '25 | Mar '25 | Apr '25 | May '25 | Jun '25 | Jul '25 | Aug '25 | Sep '25 | Oct '25 | Nov '25 | Dec '25 | Jan '26 | Feb '26 | Mar '26 | Apr '26 | May '26 | Jun '26 | Jul '26 | Aug '26 | Sep '26 | Oct '26 | Nov '26 | Dec '26 | Jan '27 | Feb '27 | Mar '27 | Apr '27 | May '27 | MAPE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jan '25 | 71.7 | 70.9 | 88.8 | 103.5 | 108.4 | 115.5 | 134.4 | 134.2 | 136.8 | 139.8 | 143.4 | 148.7 | 181.0 | 152.9 | 210.4 | 211.2 | 159.0 | 173.1 | 220.1 | 147% | ||||||||||
| Feb '25 | 70.3 | 71.3 | 89.4 | 104.3 | 109.3 | 116.5 | 135.6 | 135.4 | 138.7 | 141.3 | 145.0 | 150.6 | 183.8 | 154.9 | 213.7 | 214.7 | 161.3 | 175.5 | 224.0 | 176% | ||||||||||
| May '25 | 72.7 | 79.6 | 90.0 | 93.5 | 99.2 | 115.2 | 113.5 | 115.0 | 117.2 | 118.2 | 134.2 | 119.5 | 154.9 | 152.9 | 120.0 | 132.6 | 153.4 | 108% | ||||||||||||
| Jun '25 | 75.2 | 87.2 | 90.4 | 95.8 | 111.2 | 109.2 | 109.7 | 111.6 | 111.7 | 124.2 | 112.4 | 143.1 | 140.5 | 111.6 | 124.0 | 139.2 | 110% | |||||||||||||
| Sep '25 | 77.2 | 84.8 | 89.6 | 103.8 | 101.2 | 100.0 | 101.3 | 99.6 | 106.0 | 99.2 | 121.3 | 117.5 | 96.1 | 107.8 | 112.8 | 86% | ||||||||||||||
| Nov '25 | 79.2 | 93.1 | 89.1 | 84.2 | 84.2 | 78.6 | 75.6 | 84.5 | 78.0 | 67.5 | 77.9 | 65.8 | 45% | |||||||||||||||||
| Mar '26 | 80.9 | 78.5 | 77.9 | 70.7 | 66.6 | 56.3 | 66.2 | 47.7 | 58% | |||||||||||||||||||||
| Apr '26 | 93.9 | 85.4 | 85.5 | 80.5 | 77.8 | 70.4 | 81.0 | 70.2 | ||||||||||||||||||||||
| May '26 | 85.4 | 85.5 | 80.5 | 77.8 | 70.4 | 81.0 | 70.2 |
YoY change forecast
| Forecast made in | Jan '25 | Feb '25 | Mar '25 | Apr '25 | May '25 | Jun '25 | Jul '25 | Aug '25 | Sep '25 | Oct '25 | Nov '25 | Dec '25 | Jan '26 | Feb '26 | Mar '26 | Apr '26 | May '26 | Jun '26 | Jul '26 | Aug '26 | Sep '26 | Oct '26 | Nov '26 | Dec '26 | Jan '27 | Feb '27 | Mar '27 | Apr '27 | May '27 | MAPE |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Jan '25 | +53.6% | +50.6% | +51.6% | +52.8% | +53.8% | +55.0% | +56.1% | +57.2% | +58.3% | +59.4% | +60.5% | +61.6% | +62.7% | +63.8% | +64.9% | +66.0% | +67.1% | +68.2% | +69.3% | +70.4% | +71.5% | +72.6% | +73.7% | +74.8% | +75.9% | +77.1% | +78.1% | +79.2% | +80.3% | 147% |
| Feb '25 | +52.2% | +52.0% | +53.2% | +54.4% | +55.6% | +56.7% | +57.9% | +59.1% | +60.3% | +61.5% | +62.7% | +63.9% | +65.1% | +66.2% | +67.4% | +68.6% | +69.8% | +70.9% | +72.1% | +73.4% | +74.5% | +75.7% | +76.9% | +78.1% | +79.3% | +80.4% | +81.6% | +82.8% | 176% | |
| May '25 | +38.9% | +45.8% | +45.5% | +45.2% | +44.8% | +44.5% | +44.2% | +43.9% | +43.5% | +43.2% | +42.9% | +42.6% | +42.3% | +41.9% | +41.6% | +41.3% | +41.0% | +40.7% | +40.3% | +40.0% | +39.7% | +39.4% | +39.1% | +38.7% | +38.4% | 108% | ||||
| Jun '25 | +41.4% | +43.3% | +42.6% | +42.0% | +41.4% | +40.7% | +40.1% | +39.5% | +38.8% | +38.3% | +37.6% | +37.0% | +36.4% | +35.8% | +35.1% | +34.5% | +33.9% | +33.2% | +32.6% | +32.0% | +31.3% | +30.7% | +30.1% | +29.5% | 110% | |||||
| Sep '25 | +32.0% | +35.8% | +34.6% | +33.4% | +32.1% | +30.9% | +29.8% | +28.5% | +27.3% | +26.1% | +24.9% | +23.6% | +22.4% | +21.2% | +20.0% | +18.8% | +17.5% | +16.3% | +15.2% | +13.9% | +12.7% | 86% | ||||||||
| Nov '25 | +24.2% | +23.9% | +21.4% | +18.8% | +16.5% | +14.0% | +11.5% | +9.0% | +6.5% | +4.0% | +1.4% | -1.1% | -3.6% | -6.1% | -8.6% | -11.1% | -13.4% | -16.0% | -18.5% | 45% | ||||||||||
| Mar '26 | +10.6% | +8.8% | +5.8% | +2.7% | -0.3% | -3.4% | -6.5% | -9.5% | -12.6% | -15.6% | -18.7% | -21.8% | -24.6% | -27.7% | -30.7% | 58% | ||||||||||||||
| Apr '26 | +21.2% | +12.7% | +10.3% | +8.0% | +5.7% | +3.3% | +1.0% | -1.3% | -3.6% | -6.0% | -8.4% | -10.5% | -12.9% | -15.2% | ||||||||||||||||
| May '26 | +12.7% | +10.3% | +8.0% | +5.7% | +3.3% | +1.0% | -1.3% | -3.6% | -6.0% | -8.4% | -10.5% | -12.9% | -15.2% |
Forecasts use ordinary least-squares linear regression fitted to the YoY change series over a rolling 1Y window. Each row shows a vintage — the forecast as it would have appeared at that point in time. Projected values apply the forecasted YoY change to the prior year's level, chaining forward where actuals are unavailable. MAPE measures forecast accuracy against realized values. These are mechanical trend extrapolations, not economic models.