Classification and Regression
This how you can jump from one part to an other in Build and evaluate section.
Model Training ------> Model optimization
Model Training ------> Model Tuning ------> Model Optimization
Model Training -------> Model Tuning ------> Model Ensembling ------> Model Optimization.
Model Training
To train the model, you need to start by how you want to train it, either default
or custom
.In default
option,the system uses all available alghorithms for training, and you can exclude
some of them, However in custom
mode, you can select an estimator or a list of estimators
from the available following list of estimators.
For Classification
‘lr’ - Logistic Regression
‘knn’ - K Neighbors Classifier
‘nb’ - Naive Bayes
‘dt’ - Decision Tree Classifier
‘svm’ - SVM - Linear Kernel
‘rbfsvm’ - SVM - Radial Kernel
‘gpc’ - Gaussian Process Classifier
‘mlp’ - MLP Classifier
‘ridge’ - Ridge Classifier
‘rf’ - Random Forest Classifier
‘qda’ - Quadratic Discriminant Analysis
‘ada’ - Ada Boost Classifier
‘gbc’ - Gradient Boosting Classifier
‘lda’ - Linear Discriminant Analysis
‘et’ - Extra Trees Classifier
‘xgboost’ - Extreme Gradient Boosting
‘lightgbm’ - Light Gradient Boosting Machine
‘catboost’ - CatBoost Classifie
For Regression :
‘lr’ - Linear Regression
‘lasso’ - Lasso Regression
‘ridge’ - Ridge Regression
‘en’ - Elastic Net
‘lar’ - Least Angle Regression
‘llar’ - Lasso Least Angle Regression
‘omp’ - Orthogonal Matching Pursuit
‘br’ - Bayesian Ridge
‘ard’ - Automatic Relevance Determination
‘par’ - Passive Aggressive Regressor
‘ransac’ - Random Sample Consensus
‘tr’ - TheilSen Regressor
‘huber’ - Huber Regressor
‘kr’ - Kernel Ridge
‘svm’ - Support Vector Regression
‘knn’ - K Neighbors Regressor
‘dt’ - Decision Tree Regressor
‘rf’ - Random Forest Regressor
‘et’ - Extra Trees Regressor
‘ada’ - AdaBoost Regressor
‘gbr’ - Gradient Boosting Regressor
‘mlp’ - MLP Regressor
‘xgboost’ - Extreme Gradient Boosting
‘lightgbm’ - Light Gradient Boosting Machine
‘catboost’ - CatBoost Regressor
After that, you need to choose the Number of top n model to select
, For example, to select top 3 models enter 3.In case of custom mode this number must be equal or less than the number of selected estimators, this number must be bigger than 1, if you are going to choose later stacking or Blending method in Model Ensembling section
Set the sort order of the score grid according to
one of the following evaluation metrics: For Classification :Accuracy, AUC, Recall, Precision, Kappa. for Regression : MAE, MSE, RMSE, R2, RMSLE, MAPE, (default = Accuracy for Classification or R2 for Regression)
When cross validation
is set to False, metrics are evaluated on test set. fold
param which is setup in Data Processing section will be ignored when cross validation is set to False.
If Execution time
is set to default, the system takes all the time it needs to process all available estimatores,if set to custom
will terminate execution of the function after execution time minutes
have passed and return results up to that point.It s recomended to keep it as default.
When Ignore estimators
set to True, it excludes estimators with longer training times.
When you complete your setup for model training click Train and Evaluate
. a progressing bar will pops up indicating the progress of the training, when done, the output is a list of trained models along with differents evaluation metrics scores, the one with the best pre-defined score grid will be the first in the list.
The best model from training, will be saved under the file name Trained_machine_learning task_Model_sessionID.pkl (exp :Trained_Classification_Model_1.pkl with sessionID=1)
Module training is mandatory before you skip to other sections in Built and Evaluate.
Model Tuning
After training your model, you can directly go to optimization section, but if you choose to experiment more techniques, and improve the accuracy of model, go to Model Tuning. where you can tune the hyperparametres of the trained models.
The first thing to do in Tuning Model section, is to change the Number of iterarion or re-enter the default value before you start tuning
Enter the number of iterartion in the grid search
, Note that increasing this number may improve model performance but also increases the training time. choose the Evaluation metric for
hyperparameter tuning from the following available metrics For Classification : Accuracy, AUC, Recall, Precision, Kappa. For Regression : MAE, MSE, RMSE, R2, RMSLE, MAPE. The model will be evaluated during tuning process according to the metric you have selected.
Select which Search library for tuning hyperparameters
your are going to use, each one has different values that will be filled in Search algorithm for parameters
.
scikit-learn
possible values:random : random grid search (default)
grid : grid search
scikit-optimize
possible values:bayesian : Bayesian search (default)
tune-sklearn
random : random grid search (default)
grid : grid search
bayesian
hyperopt
optuna
bohb
optuna
possible values:random : randomized search
tpe : Tree-structured Parzen Estimator search (default)
Use early stopping
to stop fitting to a hyperparameter configuration if it performs poorly. Ignored when search_library
is scikit-learn. early_stopping accept following values:
asha for Asynchronous Successive Halving Algorithm
hyperband for Hyperband
median for Median Stopping Rule
If False , early stopping will not be used.
Choose the model with better performance
, if True the object returned is always better performing according to the evaluation metrics.
Click Train and Evaluate button, a progression bar will pop up, showing the progress of the tuning process. The output is a table that shows the best model from training part first, and an other table of the best model from Tuning.
You may need to repeat the tuning process few times to get better results
The best model from tuning, will be saved under the file name Tuned_machine_learning task_Model_sessionID.pkl (exp :Tuned_Classification_Model_1.pkl with sessionID=1)
Model Ensembling
Three available ensembling methods are available : Ensembling and Blending.
Ensembling
After tuning you can go directly to Model optimization, but if you want to experiment ensemling, then First, choose the model you want to ensemble, either from training part, or tuning part. this is mandatory.
if you Select ensemling method
, then the second thing to do is to select the method of ensemling the estimator(s)
, either Bagging or Boosting.
Bagging : also known as Bootstrap aggregating, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting while Boosting is an ensemble meta-algorithm for primarily reducing bias and variance in supervised learning. Boosting is in the family of machine learning algorithms that convert weak learners to strong ones.
For both methods, you need to sepcify the Number of base estimators in the ensemble
,In case of perfect fit, the learning procedure is stopped early.
when Choose to return the best performance
is set to True, the returned object is always better performing according to what you define as evaluation metric.
Select the evaluation metric
from the list, and click Ensemble and Evaluate
button.
The model from Ensembling, will be saved under the file name Ensembled_machinelearning task_Model_sessionID.pkl (exp :Ensembled_Trained_Classification_Model_1.pkl with sessionID=1), if you ensemled models from training section,otherwise the file name will be (exp :Ensembled_Tuned_Classification_Model_1.pkl with sessionID=1) if you ensemble using tuning models.
Blending
Blending method, could be used only if the Number of top n model to select
in Model Training part is bigger than 1.
Select method of blending estimator
: hard uses predicted class labels for majority rule voting. soft, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers. Default value, auto, will try to use ‘soft’ and fall back to ‘hard’ if the former is not supported.
Select the evaluation metric :For Classification : Accuracy, AUC, Recall, Precision, Kappa. For Regression : MAE, MSE, RMSE, R2, RMSLE, MAPE. The model will be evaluated during Blending process according to the metric you have selected.
when Choose to return the best performance
is set to True, the returned object is always better performing according to what you define as evaluation metric.
The model from Blending, will be saved under the file name Blinded_machinelearning task_Model_sessionID.pkl (exp :Blinded_Trained_Classification_Model_1.pkl with sessionID=1), if you ensemled models from training section,otherwise the file name will be (exp :Blinded_Tuned_Classification_Model_1.pkl with sessionID=1) if you ensemble using tuning models.
Stacking
For Stacking, enter the evaluation metric
and choose to return the best performance
, or keep them as default. Click Ensemble and Evaluate.
The model from Stacking, will be saved under the file name Stacked_machinelearning task_Model_sessionID.pkl (exp :Stacked_Trained_Classification_Model_1.pkl with sessionID=1), if you ensemled models from training section,otherwise the file name will be (exp :Stacked_Tuned_Classification_Model_1.pkl with sessionID=1) if you ensemble using tuning models
Model Optimization
This section is mandatory to finalze your model, and choose the best of the models build from Above sections.
Select if you want to optimize using test data
, When set to True, metrics are evaluated on test set instead of CV.
The model from Optimization, will be saved under the file name Best_machinelearning task_Model_sessionID.pkl (exp :Best_Trained_Classification_Model_1.pkl with sessionID=1)
Last updated
Was this helpful?