Forecasting the death of StackOverflow
You've probably seen the news: StackOverflow is almost dead
If nothing extraordinary happens, the site's run is basically over and it may end up as one of the first high-profile casualties of the LLM era.
The only real question is when.
It looks like the all-time high was reached during the COVID lockdowns. Then, after the release of ChatGPT (November 30, 2022), the decline accelerated sharply.

Here is the same graph, but relative to the all-time high.

The statistics are grim. The last positive YoY change came during the COVID lockdown spike. By 2023, the drop stops looking like a slow decline and starts looking like a collapse.
| Year | Avg questions per month | YoY % change |
|---|---|---|
| 2015 | 258,810 | +2.1% |
| 2016 | 266,787 | +3.1% |
| 2017 | 263,134 | -1.4% |
| 2018 | 240,740 | -8.5% |
| 2019 | 226,444 | -5.9% |
| 2020 | 256,441 | +13.2% |
| 2021 | 219,768 | -14.3% |
| 2022 | 196,603 | -10.5% |
| 2023 | 123,436 | -37.2% |
| 2024 | 65,788 | -46.7% |
| 2025 | 18,269 | -72.2% |
The chart speaks for itself. Let's try our first model: linear regression on yearly average questions per month.
Well, it's not very plausible, the model suggests that StackOverflow won't even survive 2026.
The problem is that linear regression assumes constant absolute change, e.g., you lose the same number of questions each year.
A more natural assumption is multiplicative decline: the site shrinks by a percentage, not by a fixed amount. So let's take the logarithm of the question counts and run the same linear regression in log-space.
From now on we will train on 2022-2024 data and test on 2025. At the end, we will try to predict beyond 2025.
Annual avg questions/month
The log-linear model implies a constant relative rate of change (exponential decay). The log-quadratic model allows the relative rate to drift over time - i.e., accelerating or slowing decline.

| Model | Train window | Predicted 2025 (avg/mo) | Actual 2025 (avg/mo) | Error | Error % |
|---|---|---|---|---|---|
| Log-linear regression | 2022-2024 | 39,109 | 18,269 | 20,840 | +114.1% |
| Log-quadratic regression | 2022-2024 | 29,764 | 18,269 | 11,495 | +62.9% |
We can also look at the year-over-year (YoY) percentage change.

There's a clear trend here too. Instead of modeling the absolute volume, we can try modeling the rate of decline itself and see what happens.
Annual YoY% (and implied avg)
| Model | Train window | Pred YoY% 2025 | Actual YoY% 2025 | Pred 2025 (avg/mo) | Actual 2025 (avg/mo) | Error | Error % |
|---|---|---|---|---|---|---|---|
| Linear regression on YoY% | 2022-2024 | -67.6% | -72.2% | 21,283 | 18,269 | 3,014 | +16.5% |
| Log-linear factor model | 2022-2024 | -60.1% | -72.2% | 26,219 | 18,269 | 7,950 | +43.5% |
| Log-quadratic factor model | 2022-2024 | -45.3% | -72.2% | 36,001 | 18,269 | 17,732 | +97.1% |
Monthly questions
We also have monthly question counts. The hope is that finer-grained data gives the model more signal (and more training points), so forecasts might improve.
One natural tool for this is ARIMA, a classic time-series model that predicts the next value from past values (and, depending on the setup, past forecast errors). ARIMA stands for AutoRegressive Integrated Moving Average and is written as ARIMA(p, d, q):
- p: how many past values ("lags") the model uses (autoregressive part)
- d: how many times we difference the series to remove trend (integrated part)
- q: how many past forecast errors it uses (moving average part)
ARIMA hyperparameter search: p<= 5, q<= 5, select=val_mae, val_months=6.
| Model | Train window | Pred avg/mo (2025) | Actual avg/mo (2025) | Error | Error % | Pred total 2025 | Actual total 2025 |
|---|---|---|---|---|---|---|---|
| Log-linear regression | 2022-01..2024-12 | 35,869 | 18,269 | 17,600 | +96.3% | 430,434 | 219,229 |
| Log-quadratic regression | 2022-01..2024-12 | 20,951 | 18,269 | 2,682 | +14.7% | 251,414 | 219,229 |
| ARIMA(5,1,0) | 2022-01..2024-12 | 16,237 | 18,269 | -2,032 | -11.1% | 194,842 | 219,229 |
Now for the "predicting the future" part.
Annual avg questions/month (log-quadratic)
| Model | Train window | 2026 (avg/mo) | 2027 (avg/mo) | 2028 (avg/mo) | 2029 (avg/mo) | 2030 (avg/mo) |
|---|---|---|---|---|---|---|
| Log-quadratic regression | 2022-2025 | 3,812 | 516 | 46 | 3 | 0 |
Annual YoY% (and implied avg)
| Model | Train window | Metric | 2026 | 2027 | 2028 | 2029 | 2030 |
|---|---|---|---|---|---|---|---|
| Linear regression on YoY% | 2022-2025 | YoY% | -90.3% | -109.8% | -129.2% | -148.7% | -168.1% |
| Linear regression on YoY% | 2022-2025 | Implied avg/mo | 1,770 | 0 | 0 | 0 | 0 |
| Log-linear factor model | 2022-2025 | YoY% | -78.6% | -85.2% | -89.7% | -92.9% | -95.1% |
| Log-linear factor model | 2022-2025 | Implied avg/mo | 3,916 | 581 | 60 | 4 | 0 |
Monthly questions (log-quadratic + ARIMA)
ARIMA hyperparameter search: p<= 5, q<= 5, select=val_mae, val_months=6.
| Model (monthly) | Train window | 2026 (avg/mo) | 2027 (avg/mo) | 2028 (avg/mo) | 2029 (avg/mo) | 2030 (avg/mo) |
|---|---|---|---|---|---|---|
| Log-quadratic regression | 2022-01..2025-12 | 3,249 | 382 | 29 | 1 | 0 |
| ARIMA(2,1,5) trend=t | 2022-01..2025-12 | 3,132 | 387 | 33 | 2 | 0 |
| ARIMA(5,1,0) (trained on 2022-2024) | 2022-01..2024-12 | 3,676 | 1,168 | 562 | 350 | 258 |
The models predict that by 2028 StackOverflow will be practically dead.
No Stack Overflow answers were used in the making of this article :(
(ChatGPT replaced them)