RNSGA-II Searcher example

This is an example about how using RNSGAIISearcher for multi-objectives optimization.

1. Import modules and prepare data

[1]:
from hypernets.core.random_state import set_random_state
set_random_state(1234)



from hypernets.utils import logging as hyn_logging
from hypernets.examples.plain_model import PlainModel, PlainSearchSpace
from hypergbm import make_experiment

from hypernets.tabular import get_tool_box
from hypernets.tabular.datasets import dsutils
from hypernets.tabular.sklearn_ex import MultiLabelEncoder

#hyn_logging.set_level(hyn_logging.WARN)
df = dsutils.load_bank().head(1000)
tb = get_tool_box(df)
df_train, df_test = tb.train_test_split(df, test_size=0.2, random_state=9527)

2. Run an experiment within NSGAIISearcher

[2]:
import numpy as np


experiment = make_experiment(df_train,
                             eval_data=df_test.copy(),
                             callbacks=[],
                             random_state=1234,
                             search_callbacks=[],
                             target='y',
                             searcher='rnsga2',  # available MOO searchers: moead, nsga2, rnsga2
                             searcher_options=dict(ref_point=np.array([0.1, 2 ]), weights=np.array([0.1, 2]), population_size=10),
                             reward_metric='logloss',
                             objectives=['nf'],
                             early_stopping_rounds=30,
                             drift_detection=False)

estimators = experiment.run(max_trials=30)
hyper_model = experiment.hyper_model_
hyper_model.searcher
[2]:
RNSGAIISearcher(objectives=[PredictionObjective(name=logloss, scorer=make_scorer(log_loss, needs_proba=True), direction=min), NumOfFeatures(name=nf, sample_size=1000, direction=min)], recombination=SinglePointCrossOver(random_state=RandomState(MT19937))), mutation=SinglePointMutation(random_state=RandomState(MT19937), proba=0.7)), survival=_RDominanceSurvival(ref_point=[0.1 2. ], weights=[0.1 2. ], threshold=0.3, random_state=RandomState(MT19937))), random_state=RandomState(MT19937)

3. Summary trails

[3]:
df_trials = hyper_model.history.to_df().copy().drop(['scores', 'reward'], axis=1)
df_trials[df_trials['non_dominated'] == True]
[3]:
trial_no succeeded elapsed non_dominated model_index reward_logloss reward_nf
4 5 True 0.386380 True 0.0 0.217409 0.625
9 10 True 0.798336 True 1.0 0.218916 0.4375
11 12 True 0.682108 True 2.0 0.287458 0.0
14 15 True 0.455569 True 3.0 0.221304 0.375
26 28 True 0.568367 True 4.0 0.045741 0.9375
28 30 True 0.407018 True 5.0 0.141022 0.6875

4. Plot pareto font

We can pick model accord to Decision Maker’s preferences from the pareto plot, the number in the figure indicates the index of pipeline models.

[4]:
fig, ax  = hyper_model.history.plot_best_trials()
fig.show()
../_images/examples_62.RNSGAII_example_8_0.png

5. Plot population

[5]:
fig, ax  = hyper_model.searcher.plot_population()
fig.show()
../_images/examples_62.RNSGAII_example_10_0.png

6. Evaluate the selected model

[6]:
print(f"Number of pipeline: {len(estimators)} ")

pipeline_model = estimators[0]  # selection the first pipeline model
X_test = df_test.copy()
y_test = X_test.pop('y')

preds = pipeline_model.predict(X_test)
proba = pipeline_model.predict_proba(X_test)

tb.metrics.calc_score(y_test, preds, proba, metrics=['auc', 'accuracy', 'f1', 'recall', 'precision'], pos_label="yes")
Number of pipeline: 6
[6]:
{'auc': 0.8417038690476191,
 'accuracy': 0.855,
 'f1': 0.17142857142857143,
 'recall': 0.09375,
 'precision': 1.0}