ML on GPU
Published 2025-05-12
Working on Predict Calorie Expenditure - Kaggle Playground Series and the number of features is leading me to take a look at performance improvements, including running CV computations on GPU. I’m adding this snippet to keep track of what I learn; it may grow into something bigger down the line.
xgboost has built-in GPU support using the following parameters:
tree_method='hist', # use histogram-based algorithmdevice='cuda'The problem is that scikit-learn outputs CPU-based numpy arrays, so xgboost internally converts the arrays to xgboost.DMatrix, which is expensive. Overall it’s still better, especially for training, but it doesn’t really speed up CV benchmarking.
switching to RAPIDS: cuDF and cuML
Installation guide (WSL 2 in a conda environment):
- Install
rapids=25.04,python=3.11, andcudatoolkitby searching channelsnvidia,rapidsai, andconda-forge - Add a new environment variable to
.bashrc(this is due to some bug apparently, see here):
echo 'export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH' >> ~/.bashrcsource ~/.bashrcEnsure that the following runs correctly:
from numba import cudacuda.detect()Running 2 XGBRegression models (for preprocessing in Predict Podcast Listening Time - Kaggle Playground Series) with 2000 estimators each goes from 1min57s on my 6900HS to 58s on a 3060 laptop GPU.