MOEA/D Searcher example

This is an example about how using MOEADSearcher for multi-objectives optimization.

1. Import modules and prepare data

[1]:

from hypernets.core.random_state import set_random_state
set_random_state(1234)

from hypernets.utils import logging as hyn_logging
from hypernets.examples.plain_model import PlainModel, PlainSearchSpace
from hypernets.searchers.nsga_searcher import RNSGAIISearcher

from hypergbm import make_experiment

from hypernets.tabular import get_tool_box
from hypernets.tabular.datasets import dsutils
from hypernets.tabular.sklearn_ex import MultiLabelEncoder


hyn_logging.set_level(hyn_logging.WARN)

df = dsutils.load_bank().head(1000)
tb = get_tool_box(df)
df_train, df_test = tb.train_test_split(df, test_size=0.2, random_state=9527)

2. Run an experiment within NSGAIISearcher

[2]:

experiment = make_experiment(df_train,
                             eval_data=df_test.copy(),
                             callbacks=[],
                             random_state=1234,
                             search_callbacks=[],
                             target='y',
                             searcher='moead',  # available MOO searcher: moead, nsga2, rnsga2
                             reward_metric='logloss',
                             objectives=['nf'],
                             drift_detection=False,
                             early_stopping_rounds=30)

estimators = experiment.run(max_trials=30)
hyper_model = experiment.hyper_model_
hyper_model.searcher

[2]:

MOEADSearcher(objectives=[PredictionObjective(name=logloss, scorer=make_scorer(log_loss, needs_proba=True), direction=min), NumOfFeatures(name=nf, sample_size=1000, direction=min)], n_neighbors=2, recombination=SinglePointCrossOver(random_state=RandomState(MT19937)), mutation=SinglePointMutation(random_state=RandomState(MT19937), proba=0.7), population_size=6)

3. Summary trails

[3]:

df_trials = hyper_model.history.to_df().copy().drop(['scores', 'reward'], axis=1)
df_trials[df_trials['non_dominated'] == True]

[3]:

	trial_no	succeeded	elapsed	non_dominated	model_index	reward_logloss	reward_nf
4	5	True	0.446323	True	0.0	0.217409	0.625
6	7	True	4.100305	True	1.0	0.537368	0.0
8	9	True	4.796208	True	2.0	0.253515	0.125
9	10	True	1.060251	True	3.0	0.246395	0.5625
22	30	True	0.366623	True	4.0	0.177716	0.75

4. Plot pareto font

We can pick model accord to Decision Maker’s preferences from the pareto plot, the number in the figure indicates the index of pipeline models.

[4]:

fig, ax  = hyper_model.history.plot_best_trials()
fig.show()

../_images/examples_63.MOEAD_example_8_0.png

5. Plot population

[5]:

fig, ax  = hyper_model.searcher.plot_population()
fig.show()

../_images/examples_63.MOEAD_example_10_0.png

6. Evaluate the selected model

[6]:

print(f"Number of pipeline: {len(estimators)} ")

pipeline_model = estimators[0]  # selection the first pipeline model
X_test = df_test.copy()
y_test = X_test.pop('y')

preds = pipeline_model.predict(X_test)
proba = pipeline_model.predict_proba(X_test)

tb.metrics.calc_score(y_test, preds, proba, metrics=['auc', 'accuracy', 'f1', 'recall', 'precision'], pos_label="yes")

Number of pipeline: 5

[6]:

{'auc': 0.8417038690476191,
 'accuracy': 0.855,
 'f1': 0.17142857142857143,
 'recall': 0.09375,
 'precision': 1.0}

Automatically convert metric to negatives for minimize

[7]:

experiment = make_experiment(df_train,
                             eval_data=df_test.copy(),
                             callbacks=[],
                             random_state=1234,
                             search_callbacks=[],
                             target='y',
                             pos_label="yes",
                             searcher='moead',
                             reward_metric='accuracy',
                             objectives=['precision'],
                             drift_detection=False,
                             early_stopping_rounds=30)

estimators = experiment.run(max_trials=30)
hyper_model = experiment.hyper_model_
hyper_model.history.to_df().copy().drop(['scores', 'reward'], axis=1)[:5]

[7]:

	trial_no	succeeded	elapsed	non_dominated	model_index	reward_accuracy	reward_precision
0	1	True	0.645476	False	NaN	-0.91	-0.777778
1	2	True	0.752835	False	NaN	-0.8975	-0.0
2	3	True	0.779298	False	NaN	-0.8975	-0.0
3	4	True	0.737511	False	NaN	-0.905	-0.625
4	5	True	0.498275	False	NaN	-0.90625	-0.733333

[ ]: