ML on GPU

Published 2025-05-12

Working on Predict Calorie Expenditure - Kaggle Playground Series and the number of features is leading me to take a look at performance improvements, including running CV computations on GPU. I’m adding this snippet to keep track of what I learn; it may grow into something bigger down the line.

xgboost has built-in GPU support using the following parameters:

tree_method='hist', # use histogram-based algorithm
device='cuda'

The problem is that scikit-learn outputs CPU-based numpy arrays, so xgboost internally converts the arrays to xgboost.DMatrix, which is expensive. Overall it’s still better, especially for training, but it doesn’t really speed up CV benchmarking.

switching to RAPIDS: cuDF and cuML

Installation guide (WSL 2 in a conda environment):

  1. Install rapids=25.04, python=3.11, and cudatoolkit by searching channels nvidia, rapidsai, and conda-forge
  2. Add a new environment variable to .bashrc (this is due to some bug apparently, see here):
echo 'export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Ensure that the following runs correctly:

from numba import cuda
cuda.detect()

Running 2 XGBRegression models (for preprocessing in Predict Podcast Listening Time - Kaggle Playground Series) with 2000 estimators each goes from 1min57s on my 6900HS to 58s on a 3060 laptop GPU.