GPU acceleration

To accelerate HyperGBM with NVIDIA GPU devices, you must install NVIDIA RAPIDS cuML and cuDF, and enable GPU support of all estimators, see Installation Guide for more details.

Accelerate the experiment

To accelerate the experiment with GPU, you should load dataset as cudf.DataFrame and use them as train_data/eval_data/test_data arguments to call the utility make_experiment, the utility will set experiment to run on GPU device.

Example:

import cudf

from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils


def train():
    train_data = cudf.from_pandas(dsutils.load_blood())

    experiment = make_experiment(train_data, target='Class')
    estimator = experiment.run()
    print(estimator)


if __name__ == '__main__':
    train()

Outputs:

LocalizablePipeline(steps=[('data_clean',
                            DataCleanStep(cv=True,
                                          name='data_cle...
                            CumlGreedyEnsemble(weight=[...]))])

It should be noted that the trained estimator is a LocalizablePipeline rather than a sklearn Pipeline. The Localizablepipeline accepts cudf DataFrame as input X for prediction. When you deploy the LocalizablePipeline in a production environment, you need to install the same software as the training environment, including cuML, cuDF, etc.

If you want to deploy the trained estimator in an environment without cuML and cuDF, please call the estimator.as_local() to converts it into a sklearn Pipeline. An example:

import cudf

from hypergbm import make_experiment
from hypernets.tabular.datasets import dsutils


def train():
    train_data = cudf.from_pandas(dsutils.load_blood())

    experiment = make_experiment(train_data, target='Class')
    estimator = experiment.run()
    print(estimator)

    print('-' * 20)
    estimator = estimator.as_local()
    print('localized estimator:\n', estimator)


if __name__ == '__main__':
    train()

Outputs：

LocalizablePipeline(steps=[('data_clean',
                            DataCleanStep(cv=True,
                                          name='data_cle...
                            CumlGreedyEnsemble(weight=[...]))])
--------------------
localized estimator:
 Pipeline(steps=[('data_clean',
                 DataCleanStep(cv=True,
                               name='data_clean')),
                ('est...
                 GreedyEnsemble(weight=[...]))])

Customize Search Space

When running an experiment on GPU, all Transformers and Estimators used in the search space need to support both pandas/numpy data types and cuDF/cupy data types. Users can define new search space based on the search_space_general and CumlGeneralSearchSpaceGenerator from hypergbm.cuml.

An example code:

import cudf

from hypergbm import make_experiment
from hypergbm.cuml import search_space_general
from hypernets.tabular.datasets import dsutils


def my_search_space():
    return search_space_general(n_estimators=100)


def train():
    train_data = cudf.from_pandas(dsutils.load_blood())

    experiment = make_experiment(train_data, target='Class', searcher='mcts', search_space=my_search_space)
    estimator = experiment.run()
    print(estimator)


if __name__ == '__main__':
    train()