Use HyperGBM with Command Line

HyperGBM offers command line tool hypergbm to perform model training, evaluation and prediction. The following code enables the user to view command line help:

hypergm -h

usage: hypergbm [-h] [--log-level LOG_LEVEL] [-error] [-warn] [-info] [-debug]
                [--verbose VERBOSE] [-v] [--enable-dask ENABLE_DASK] [-dask]
                [--overload OVERLOAD]
                {train,evaluate,predict} ...

hypergbm offers three commands: train, evaluate and predict. To get more information, one can use hypergbm <command> -h:

hypergbm train -h
usage: hypergbm train [-h] --train-data TRAIN_DATA [--eval-data EVAL_DATA]
                      [--test-data TEST_DATA]
                      [--train-test-split-strategy {None,adversarial_validation}]
                      [--target TARGET]
                      [--task {binary,multiclass,regression}]
                      [--max-trials MAX_TRIALS] [--reward-metric METRIC]
                      [--cv CV] [-cv] [-cv-] [--cv-num-folds NUM_FOLDS]
                      [--pos-label POS_LABEL]
                      ...

Prepare the Data

When training model with command line, the training data must be saved in a file of form of csv or parque. The returned model is in the form of pickle whoes file ends with .pkl.

For an example of training Bank Marketing data, one can prepare the data as follows:

from hypernets.tabular.datasets import dsutils
from sklearn.model_selection import train_test_split

df = dsutils.load_bank().head(10000)
df_train, df_test = train_test_split(df, test_size=0.3, random_state=9527)
df_train.to_csv('bank_train.csv', index=None)
df_test.to_csv('bank_eval.csv', index=None)

df_test.pop('y')
df_test.to_csv('bank_to_pred.csv', index=None)

where

  • bank_train.csv is used for training

  • bank_eval.csv is used for evaluating the model

  • bank_to_pred.csv is data without targets for predicting

Train the Model

After preparing the data, one can also perform model training with command line:

hypergbm train --train-data bank_train.csv --target y --model-file model.pkl

one will see model.pkl after this process

ls -l model.pkl

rw-rw-r-- 1 xx xx 9154959    17:09 model.pkl

Evaluate the Model

The trained model can be evaluated with the evaluation data:

hypergbm evaluate --model model.pkl --data bank_eval.csv --metric f1 recall auc

{'f1': 0.7993779160186626, 'recall': 0.7099447513812155, 'auc': 0.9705420982746849}

Predict the Test Data

The trained model can be used for predicting a given data as follows:

hypergbm predict --model model.pkl --data bank_to_pred.csv --output bank_output.csv

where the predicting result will be saved to bank_output.csv.

To add other columns of your predicted data to the above file, one can use the parameter --with-data explicitly:

hypergbm predict --model model.pkl --data bank_to_pred.csv --output bank_output.csv --with-data id
head bank_output.csv

id,y
1563,no
124,no
218,no
463,no
...

Furthermore, including all columns of the test data besides the predicting results to the file bank_output.csv can be done by setting --with-data as “*”:

hypergbm predict --model model.pkl --data bank_to_pred.csv --output bank_output.csv --with-data '*'
head bank_output.csv

id,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
1563,55,entrepreneur,married,secondary,no,204,no,no,cellular,14,jul,455,13,-1,0,unknown,no
124,51,management,single,tertiary,yes,-55,yes,no,cellular,11,may,281,2,266,6,failure,no
218,49,blue-collar,married,primary,no,305,yes,yes,telephone,10,jul,834,10,-1,0,unknown,no
463,35,blue-collar,divorced,secondary,no,3102,yes,no,cellular,20,nov,138,1,-1,0,unknown,no
2058,50,management,divorced,tertiary,no,201,yes,no,cellular,24,jul,248,1,-1,0,unknown,no
...