RNSGA-II Searcher example

This is an example about how using RNSGAIISearcher for multi-objectives optimization.

1. Import modules and prepare data

[1]:

from hypernets.core.random_state import set_random_state
set_random_state(1234)



from hypernets.utils import logging as hyn_logging
from hypernets.examples.plain_model import PlainModel, PlainSearchSpace
from hypergbm import make_experiment

from hypernets.tabular import get_tool_box
from hypernets.tabular.datasets import dsutils
from hypernets.tabular.sklearn_ex import MultiLabelEncoder

#hyn_logging.set_level(hyn_logging.WARN)
df = dsutils.load_bank().head(1000)
tb = get_tool_box(df)
df_train, df_test = tb.train_test_split(df, test_size=0.2, random_state=9527)

2. Run an experiment within NSGAIISearcher

[2]:

import numpy as np


experiment = make_experiment(df_train,
                             eval_data=df_test.copy(),
                             callbacks=[],
                             random_state=1234,
                             search_callbacks=[],
                             target='y',
                             searcher='rnsga2',  # available MOO searchers: moead, nsga2, rnsga2
                             searcher_options=dict(ref_point=np.array([0.1, 2 ]), weights=np.array([0.1, 2]), population_size=10),
                             reward_metric='logloss',
                             objectives=['nf'],
                             early_stopping_rounds=30,
                             drift_detection=False)

estimators = experiment.run(max_trials=30)
hyper_model = experiment.hyper_model_
hyper_model.searcher

[2]:

RNSGAIISearcher(objectives=[PredictionObjective(name=logloss, scorer=make_scorer(log_loss, needs_proba=True), direction=min), NumOfFeatures(name=nf, sample_size=1000, direction=min)], recombination=SinglePointCrossOver(random_state=RandomState(MT19937))), mutation=SinglePointMutation(random_state=RandomState(MT19937), proba=0.7)), survival=_RDominanceSurvival(ref_point=[0.1 2. ], weights=[0.1 2. ], threshold=0.3, random_state=RandomState(MT19937))), random_state=RandomState(MT19937)

3. Summary trails

[3]:

df_trials = hyper_model.history.to_df().copy().drop(['scores', 'reward'], axis=1)
df_trials[df_trials['non_dominated'] == True]

[3]:

	trial_no	succeeded	elapsed	non_dominated	model_index	reward_logloss	reward_nf
4	5	True	0.386380	True	0.0	0.217409	0.625
9	10	True	0.798336	True	1.0	0.218916	0.4375
11	12	True	0.682108	True	2.0	0.287458	0.0
14	15	True	0.455569	True	3.0	0.221304	0.375
26	28	True	0.568367	True	4.0	0.045741	0.9375
28	30	True	0.407018	True	5.0	0.141022	0.6875

4. Plot pareto font

We can pick model accord to Decision Maker’s preferences from the pareto plot, the number in the figure indicates the index of pipeline models.

[4]:

fig, ax  = hyper_model.history.plot_best_trials()
fig.show()

../_images/examples_62.RNSGAII_example_8_0.png

5. Plot population

[5]:

fig, ax  = hyper_model.searcher.plot_population()
fig.show()

../_images/examples_62.RNSGAII_example_10_0.png

6. Evaluate the selected model

[6]:

print(f"Number of pipeline: {len(estimators)} ")

pipeline_model = estimators[0]  # selection the first pipeline model
X_test = df_test.copy()
y_test = X_test.pop('y')

preds = pipeline_model.predict(X_test)
proba = pipeline_model.predict_proba(X_test)

tb.metrics.calc_score(y_test, preds, proba, metrics=['auc', 'accuracy', 'f1', 'recall', 'precision'], pos_label="yes")

Number of pipeline: 6

[6]:

{'auc': 0.8417038690476191,
 'accuracy': 0.855,
 'f1': 0.17142857142857143,
 'recall': 0.09375,
 'precision': 1.0}