Hello Readers,
Today we will discuss regression with AdaBoost, a part of scikit module for Python. We shall compare how this boosting technique can allow the regressor to fit with less prediction error than a single decision tree.
We aim to create the graphic on the right showing the fits of a decision tree and one with AdaBoost.
Start Python (I am using 2.7.5) and let us get started!
The Modules
We require a few modules to run the script: numpy, pylab, sklearn.tree, and sklearn.ensemble. Specifically from sklearn.tree and sklearn.ensemble, we will use the DecisionTreeRegressor, and AdaBoostRegressor classes respectively.
Creating the Data
So we will create sinusoidal dataset with cosine using the cos() function and some Gaussian noise with the normal() function on the random number.
Data Creation |
Regression Modeling, Fitting, and Predicting |
Plotting the Predicted Values
Naturally, to visualize the two predicted y values, we plot them over the original y data. Using the pylab module, we can plot the original y values as scatter(), and the predicted y values with plot().
After adding x and y labels, a title, and a legend, we display the plot using show().
Plotting the Actual and Predicted Values |
Which yields the graphic below.
Note how the green line (the single decision tree) has a rough fit, while trying to regress along the modified cos() points. See how the red AdaBoost regression with 299 boosts can better fit the cosine sinusoidal data, from altering the instance weights from the error of current prediction with each boost. Increasing the number of boosts further enables, the regression fit. For more about AdaBoost from scikit, click here.
Thanks for reading,
Wayne
@beyondvalence