How to Use Machine Learning Models in Backtesting in 2024?
The integration of machine learning (ML) into trading strategies has gained significant traction over recent years. Backtesting these strategies allows traders to evaluate their performance using historical data before deploying them in real-time. In this guide, we'll explore how to use machine learning models in backtesting, specifically using Python, with examples of stock tickers like Apple (AAPL) and Tesla (TSLA).
By the end of this tutorial, you’ll have a foundational understanding of how to integrate machine learning models, such as decision trees, into your trading backtesting process.
Why Use Machine Learning in Trading?
Machine learning brings several advantages to trading strategies:
- Pattern Recognition: ML models excel at detecting complex patterns in historical data that might not be apparent using traditional methods.
- Data-Driven Decisions: ML models can provide a more data-driven approach by training on historical price movements and technical indicators.
- Adaptability: ML models can automatically adjust their parameters based on the data, allowing for more flexible strategies.
Tools and Libraries Required
To perform backtesting using machine learning, you’ll need the following Python libraries:
- scikit-learn: For building and training ML models.
- pandas: For data manipulation.
- yfinance: For fetching historical stock data.
- matplotlib: For visualizations.
- numpy: For numerical calculations.
You can install the necessary libraries via pip
:
pip install scikit-learn pandas yfinance matplotlib numpy
Step 1: Define the Trading Strategy and Data
In this guide, we’ll use a Decision Tree Classifier to predict buy or sell signals based on technical indicators like moving averages and Relative Strength Index (RSI). We'll backtest the strategy on historical stock data of Apple (AAPL), but you can apply the same principles to other tickers like Tesla (TSLA).
Fetch Historical Data
Let’s begin by downloading historical stock data using yfinance
.
import yfinance as yf
import pandas as pd
# Fetch historical data for Apple (AAPL)
ticker = 'AAPL'
data = yf.download(ticker, start='2018-01-01', end='2023-01-01')
# Display the first few rows of data
print(data.head())
This code downloads daily historical stock data for Apple between 2018 and 2023. You can change the start
and end
dates as needed.
Step 2: Generate Features for the ML Model
Machine learning models need features (input variables) to train on. For this example, we’ll calculate technical indicators such as Moving Averages (SMA), Relative Strength Index (RSI), and Daily Returns.
# Calculate 50-day and 200-day SMAs
data['SMA_50'] = data['Adj Close'].rolling(window=50).mean()
data['SMA_200'] = data['Adj Close'].rolling(window=200).mean()
# Calculate daily returns
data['Returns'] = data['Adj Close'].pct_change()
# Calculate the Relative Strength Index (RSI)
def compute_rsi(data, window=14):
delta = data['Adj Close'].diff()
gain = (delta.where(delta > 0, 0)).rolling(window=window).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=window).mean()
rs = gain / loss
rsi = 100 - (100 / (1 + rs))
return rsi
data['RSI'] = compute_rsi(data)
# Drop rows with NaN values
data.dropna(inplace=True)
# Display the data with new features
print(data[['Adj Close', 'SMA_50', 'SMA_200', 'RSI', 'Returns']].head())
Step 3: Define the Target (Buy/Sell Signals)
For this strategy, we’ll predict whether the price will go up or down using past data. We'll use a binary target, where:
1
means we expect the stock price to go up (buy signal).0
means we expect the stock price to go down or stay flat (sell signal).
# Define the target: 1 for buy, 0 for sell
data['Target'] = (data['Returns'].shift(-1) > 0).astype(int)
# Display the updated data
print(data[['Adj Close', 'Returns', 'Target']].tail())
Here, the Target
column indicates whether the price increases the next day. This column is shifted by one day because we're predicting the next day’s movement based on the current day's indicators.
Step 4: Train a Machine Learning Model
We’ll use a Decision Tree Classifier from the scikit-learn
library to predict buy and sell signals.
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
# Define the feature columns (SMA, RSI, and Returns)
features = ['SMA_50', 'SMA_200', 'RSI']
# Split the data into training and test sets (80% training, 20% testing)
X = data[features]
y = data['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
# Initialize and train the Decision Tree model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Predict buy/sell signals on the test set
y_pred = model.predict(X_test)
Step 5: Backtest the Strategy Using Predicted Signals
Now that we have the predicted buy/sell signals, we’ll calculate the strategy's returns based on those predictions.
# Add predictions to the test data
data.loc[X_test.index, 'Predicted_Signal'] = y_pred
# Calculate returns from the strategy
data['Strategy_Returns'] = data['Returns'] * data['Predicted_Signal'].shift()
# Calculate cumulative returns
data['Cumulative_Strategy_Returns'] = (1 + data['Strategy_Returns']).cumprod()
data['Cumulative_Market_Returns'] = (1 + data['Returns']).cumprod()
# Display the final data
print(data[['Returns', 'Strategy_Returns', 'Cumulative_Strategy_Returns', 'Cumulative_Market_Returns']].tail())
Step 6: Visualize the Backtest Results
Now, let’s compare the performance of our ML-based strategy with a simple buy-and-hold strategy by plotting the cumulative returns.
import matplotlib.pyplot as plt
# Plot the cumulative returns of the strategy and the market
plt.figure(figsize=(12, 8))
plt.plot(data['Cumulative_Strategy_Returns'], label='Strategy Returns (ML)')
plt.plot(data['Cumulative_Market_Returns'], label='Market Returns (Buy & Hold)')
plt.title(f'{ticker} - ML Strategy vs Market Returns')
plt.legend()
plt.show()
Step 7: Evaluate Performance
A key part of backtesting is evaluating the performance of your strategy. Here are some common performance metrics you can calculate:
- Sharpe Ratio: Measures risk-adjusted returns.
- Max Drawdown: The maximum loss from the peak to the trough.
- Total Return: Overall performance of the strategy.
# Sharpe Ratio calculation (252 trading days)
sharpe_ratio = (data['Strategy_Returns'].mean() / data['Strategy_Returns'].std()) * (252 ** 0.5)
# Max Drawdown
max_drawdown = ((data['Cumulative_Strategy_Returns'].cummax() - data['Cumulative_Strategy_Returns']) / data['Cumulative_Strategy_Returns'].cummax()).max()
# Total return
total_return = data['Cumulative_Strategy_Returns'][-1] - 1
# Print performance metrics
print(f'Sharpe Ratio: {sharpe_ratio}')
print(f'Max Drawdown: {max_drawdown}')
print(f'Total Return: {total_return * 100:.2f}%')
Conclusion
Machine learning models offer a powerful approach to predicting future price movements and backtesting trading strategies. By using models like the Decision Tree Classifier in combination with technical indicators such as SMAs and RSI, you can create and backtest sophisticated trading algorithms.
However, it’s essential to understand that backtesting on historical data does not guarantee future performance. Optimization and frequent validation are key components of building a successful ML-based trading strategy.
Start with the code provided, experiment with different tickers like AAPL, TSLA, and others, and explore different machine learning models like Random Forest, SVM, or even Neural Networks for more complex strategies.
Full Code Summary
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
# Fetch historical data for AAPL
ticker = 'AAPL'
data = yf.download(ticker, start='2018-01-01', end='2023-01-01')
# Calculate technical indicators
data['SMA_50'] = data['Adj Close'].rolling(window=50).mean()
data['SMA_200'] = data['Adj Close'].rolling(window=200).mean()
data['Returns'] = data['Adj Close'].pct_change()
data['RSI'] = compute_rsi(data)
# Drop NaN values
data.dropna(inplace=True)
# Define the target (1 for buy, 0 for sell)
data['Target'] = (data['Returns'].shift(-1) > 0).astype(int)
# Split the data into training and test sets
features = ['SMA_50', 'SMA_200', 'RSI']
X = data[features]
y = data['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
# Train the decision tree classifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Backtest the strategy
data.loc[X_test.index, 'Predicted_Signal'] = y_pred
data['Strategy_Returns'] = data['Returns'] * data['Predicted_Signal'].shift()
data['Cumulative_Strategy_Returns'] = (1 + data['Strategy_Returns']).cumprod()
data['Cumulative_Market_Returns'] = (1 + data['Returns']).cumprod()
# Plot the performance
plt.figure(figsize=(12, 8))
plt.plot(data['Cumulative_Strategy_Returns'], label='Strategy Returns (ML)')
plt.plot(data['Cumulative_Market_Returns'], label='Market Returns (Buy & Hold)')
plt.title(f'{ticker} - ML Strategy vs Market Returns')
plt.legend()
plt.show()
# Evaluate performance
sharpe_ratio = (data['Strategy_Returns'].mean() / data['Strategy_Returns'].std()) * (252 ** 0.5)
max_drawdown = ((data['Cumulative_Strategy_Returns'].cummax() - data['Cumulative_Strategy_Returns']) / data['Cumulative_Strategy_Returns'].cummax()).max()
total_return = data['Cumulative_Strategy_Returns'][-1] - 1
print(f'Sharpe Ratio: {sharpe_ratio}')
print(f'Max Drawdown: {max_drawdown}')
print(f'Total Return: {total_return * 100:.2f}%')
With this guide, you now know how to use machine learning models in backtesting your trading strategy in 2024.