📊 The data is ready — now it’s time to put it to work.
In the last post, I walked through how I scraped and cleaned 15 years of NFL stats into a structured dataset. That gave me a foundation to build on.
This time, I’m taking the first step into modeling: training machine learning algorithms to predict fantasy football points.
🧩 Feature Engineering: Turning Stats into Signals
Raw stats are helpful, but machine learning models perform best when we feed them richer features. From the cleaned dataset, I engineered new variables designed to capture both volume and efficiency:
- Passing Yards per Attempt – efficiency for quarterbacks
- Rushing Yards per Attempt – efficiency for running backs
- Receiving Yards per Reception – efficiency for wide receivers and tight ends
- Total Touches – combined rushing attempts + receptions (volume)
- Touchdowns per Touch – scoring efficiency
- Fumbles per Touch – risk adjustment
- Changed Teams – indicator for players who switched teams between seasons
- Position Indicators – one-hot encoding for QB, RB, WR, TE
These, combined with standard box score stats (yards, touchdowns, games played, etc.), gave the model a wide lens on each player’s performance profile.
🧠 Trying Different Models
To see which algorithms best fit this problem, I tested four approaches:
- Linear Regression – a baseline, simple linear relationships.
- K-Nearest Neighbors (KNN) – compares players to “similar” players.
- Random Forest – an ensemble of decision trees that captures complex interactions.
- XGBoost – a gradient boosting method that often excels in structured tabular data.
📊 Comparing Model Performance
I trained each model on historical data and evaluated predictions against actual fantasy points using R² (fit) and MAE (error):
| Model | R² (higher better) | MAE (lower better) | Notes |
|---|---|---|---|
| Linear Regression | 0.42 | ~52 pts | Captures basic trends, misses complexity |
| KNN | 0.45 | ~50 pts | Finds patterns in “similar” seasons but inconsistent |
| Random Forest | 0.61 | ~42 pts | Strong balance of accuracy + interpretability |
| XGBoost | 0.64 | ~40 pts | Best raw performance, but more tuning required |
The clear takeaway: tree-based models (Random Forest, XGBoost) consistently outperformed the simpler methods.
🔑 Feature Importance
One of the strengths of tree-based models like Random Forest is interpretability. By looking at feature importance, we can see which stats most influence predictions.
📊 Figure 1: Top 15 features ranked by importance in the Random Forest model

Not surprisingly, last season’s fantasy points, total touches, and efficiency metrics drive much of the signal.
📈 Actual vs. Predicted Fantasy Points
How well does the model line up with reality? Below is a scatter plot comparing predicted vs. actual fantasy points for the test set.
📊 Figure 2: Actual vs. predicted fantasy points (Random Forest model)

We can see that the model does a decent job of capturing overall trends, even if there’s natural variation and some outliers.
📉 Residuals: Where the Model Misses
Finally, it’s worth looking at the residuals (errors) — the difference between actual and predicted fantasy points.
📊 Figure 3: Distribution of residuals (prediction errors)


The majority of errors are within a reasonable range, but a few players swing well above or below expectations — often explained by injuries, breakout seasons, or unique circumstances that stats alone can’t fully capture.
📈 First Predictions for 2025
After training, I applied the models to the 2024 stats to generate projections for 2025.
Some highlights from the Random Forest model:
- Josh Allen (BUF, QB) – projected QB1 overall
- Lamar Jackson (BAL, QB) – QB2
- Saquon Barkley (PHI, RB) – RB1
- Ja’Marr Chase (CIN, WR) – WR1
- Surprise: Baker Mayfield (TB, QB) cracked the top 5
These predictions aren’t final rankings — they’re a first iteration, but they already look competitive with industry projections.
📝 What I Learned
- Feature engineering matters – volume + efficiency stats improved accuracy noticeably.
- Random Forest is a great balance – interpretable, strong accuracy, and robust.
- XGBoost may be the future – slightly better results, but requires careful tuning to avoid overfitting.
🚀 What’s Next
This is just the beginning. In the next post, I’ll refine the models further — tuning hyperparameters, comparing Random Forest with XGBoost head-to-head, and exploring how the models handle in-season weekly predictions.
The ultimate goal: predictions that aren’t just accurate, but actionable for drafts, trades, and weekly start/sit decisions.
🔗 Code and notebooks: GitHub Repo
🔗 Previous post: Cleaning the Data
