NSGA-II Searcher example

This is an example about how using NSGAIISearcher for multi-objectives optimization.

1. Import modules and prepare data

[1]:

from hypernets.utils import logging as hyn_logging
from hypernets.searchers.nsga_searcher import RNSGAIISearcher

from hypergbm import make_experiment

from hypernets.tabular import get_tool_box
from hypernets.tabular.datasets import dsutils
from hypernets.core.random_state import get_random_state

#hyn_logging.set_level(hyn_logging.WARN)
random_state = get_random_state()

df = dsutils.load_bank().head(1000)
tb = get_tool_box(df)
df_train, df_test = tb.train_test_split(df, test_size=0.2, random_state=9527)

2. Run an experiment within NSGAIISearcher

[2]:

experiment = make_experiment(df_train,
                             eval_data=df_test.copy(),
                             callbacks=[],
                             random_state=1234,
                             search_callbacks=[],
                             target='y',
                             searcher='nsga2',  # available MOO searcher: moead, nsga2, rnsga2
                             searcher_options={'population_size': 10},
                             reward_metric='logloss',
                             objectives=['nf'],
                             early_stopping_rounds=10)

estimators = experiment.run(max_trials=10)
hyper_model = experiment.hyper_model_
hyper_model.searcher

[2]:

NSGAIISearcher(objectives=[PredictionObjective(name=logloss, scorer=make_scorer(log_loss, needs_proba=True), direction=min), NumOfFeatures(name=nf, sample_size=1000, direction=min)], recombination=SinglePointCrossOver(random_state=RandomState(MT19937))), mutation=SinglePointMutation(random_state=RandomState(MT19937), proba=0.7)), survival=<hypernets.searchers.nsga_searcher._RankAndCrowdSortSurvival object at 0x000002177944F6A0>), random_state=RandomState(MT19937)

3. Summary trails

[3]:

df_trials = hyper_model.history.to_df().copy().drop(['scores', 'reward'], axis=1)
df_trials[df_trials['non_dominated'] == True]

[3]:

	trial_no	succeeded	elapsed	non_dominated	model_index	reward_logloss	reward_nf
0	1	True	0.461761	True	0.0	0.256819	0.3750
3	4	True	3.794317	True	1.0	0.540632	0.0000
4	5	True	0.321852	True	2.0	0.217409	0.6875
7	8	True	1.844163	True	3.0	0.164959	0.9375
8	9	True	1.606298	True	4.0	0.190964	0.8750
9	10	True	0.716679	True	5.0	0.218916	0.4375

4. Plot pareto font

We can pick model accord to Decision Maker’s preferences from the pareto plot, the number in the figure indicates the index of pipeline models.

[4]:

fig, ax  = hyper_model.history.plot_best_trials()
fig.show()

../_images/examples_61.NSGAII_example_8_0.png

5. Plot population

[5]:

fig, ax  = hyper_model.searcher.plot_population()
fig.show()

../_images/examples_61.NSGAII_example_10_0.png

6. Evaluate the selected model

[6]:

print(f"Number of pipeline: {len(estimators)} ")

pipeline_model = estimators[0]  # selection the first pipeline model
X_test = df_test.copy()
y_test = X_test.pop('y')

preds = pipeline_model.predict(X_test)
proba = pipeline_model.predict_proba(X_test)

tb.metrics.calc_score(y_test, preds, proba, metrics=['auc', 'accuracy', 'f1', 'recall', 'precision'], pos_label="yes")

Number of pipeline: 6

[6]:

{'auc': 0.826357886904762,
 'accuracy': 0.84,
 'f1': 0.23809523809523808,
 'recall': 0.15625,
 'precision': 0.5}

[ ]: