Classification and Regression

This how you can jump from one part to an other in Build and evaluate section.

  • Model Training ------> Model optimization

  • Model Training ------> Model Tuning ------> Model Optimization

  • Model Training -------> Model Tuning ------> Model Ensembling ------> Model Optimization.

Model Training

To train the model, you need to start by how you want to train it, either default or custom.In default option,the system uses all available alghorithms for training, and you can exclude some of them, However in custom mode, you can select an estimator or a list of estimators from the available following list of estimators.

For Classification

  • ‘lr’ - Logistic Regression

  • ‘knn’ - K Neighbors Classifier

  • ‘nb’ - Naive Bayes

  • ‘dt’ - Decision Tree Classifier

  • ‘svm’ - SVM - Linear Kernel

  • ‘rbfsvm’ - SVM - Radial Kernel

  • ‘gpc’ - Gaussian Process Classifier

  • ‘mlp’ - MLP Classifier

  • ‘ridge’ - Ridge Classifier

  • ‘rf’ - Random Forest Classifier

  • ‘qda’ - Quadratic Discriminant Analysis

  • ‘ada’ - Ada Boost Classifier

  • ‘gbc’ - Gradient Boosting Classifier

  • ‘lda’ - Linear Discriminant Analysis

  • ‘et’ - Extra Trees Classifier

  • ‘xgboost’ - Extreme Gradient Boosting

  • ‘lightgbm’ - Light Gradient Boosting Machine

  • ‘catboost’ - CatBoost Classifie

For Regression :

  • ‘lr’ - Linear Regression

  • ‘lasso’ - Lasso Regression

  • ‘ridge’ - Ridge Regression

  • ‘en’ - Elastic Net

  • ‘lar’ - Least Angle Regression

  • ‘llar’ - Lasso Least Angle Regression

  • ‘omp’ - Orthogonal Matching Pursuit

  • ‘br’ - Bayesian Ridge

  • ‘ard’ - Automatic Relevance Determination

  • ‘par’ - Passive Aggressive Regressor

  • ‘ransac’ - Random Sample Consensus

  • ‘tr’ - TheilSen Regressor

  • ‘huber’ - Huber Regressor

  • ‘kr’ - Kernel Ridge

  • ‘svm’ - Support Vector Regression

  • ‘knn’ - K Neighbors Regressor

  • ‘dt’ - Decision Tree Regressor

  • ‘rf’ - Random Forest Regressor

  • ‘et’ - Extra Trees Regressor

  • ‘ada’ - AdaBoost Regressor

  • ‘gbr’ - Gradient Boosting Regressor

  • ‘mlp’ - MLP Regressor

  • ‘xgboost’ - Extreme Gradient Boosting

  • ‘lightgbm’ - Light Gradient Boosting Machine

  • ‘catboost’ - CatBoost Regressor

After that, you need to choose the Number of top n model to select, For example, to select top 3 models enter 3.In case of custom mode this number must be equal or less than the number of selected estimators, this number must be bigger than 1, if you are going to choose later stacking or Blending method in Model Ensembling section

Set the sort order of the score grid according to one of the following evaluation metrics: For Classification :Accuracy, AUC, Recall, Precision, Kappa. for Regression : MAE, MSE, RMSE, R2, RMSLE, MAPE, (default = Accuracy for Classification or R2 for Regression)

When cross validation is set to False, metrics are evaluated on test set. fold param which is setup in Data Processing section will be ignored when cross validation is set to False.

If Execution time is set to default, the system takes all the time it needs to process all available estimatores,if set to custom will terminate execution of the function after execution time minutes have passed and return results up to that point.It s recomended to keep it as default.

When Ignore estimators set to True, it excludes estimators with longer training times.

When you complete your setup for model training click Train and Evaluate. a progressing bar will pops up indicating the progress of the training, when done, the output is a list of trained models along with differents evaluation metrics scores, the one with the best pre-defined score grid will be the first in the list.

The best model from training, will be saved under the file name Trained_machine_learning task_Model_sessionID.pkl (exp :Trained_Classification_Model_1.pkl with sessionID=1)

  • Module training is mandatory before you skip to other sections in Built and Evaluate.

Model Tuning

After training your model, you can directly go to optimization section, but if you choose to experiment more techniques, and improve the accuracy of model, go to Model Tuning. where you can tune the hyperparametres of the trained models.

The first thing to do in Tuning Model section, is to change the Number of iterarion or re-enter the default value before you start tuning

Enter the number of iterartion in the grid search, Note that increasing this number may improve model performance but also increases the training time. choose the Evaluation metric for hyperparameter tuning from the following available metrics For Classification : Accuracy, AUC, Recall, Precision, Kappa. For Regression : MAE, MSE, RMSE, R2, RMSLE, MAPE. The model will be evaluated during tuning process according to the metric you have selected.

Select which Search library for tuning hyperparameters your are going to use, each one has different values that will be filled in Search algorithm for parameters.

  • scikit-learn possible values:

    • random : random grid search (default)

    • grid : grid search

  • scikit-optimize possible values:

    • bayesian : Bayesian search (default)

  • tune-sklearn possible values:

    • random : random grid search (default)

    • grid : grid search

    • bayesian

    • hyperopt

    • optuna

    • bohb

  • optuna possible values:

    • random : randomized search

    • tpe : Tree-structured Parzen Estimator search (default)

Use early stopping to stop fitting to a hyperparameter configuration if it performs poorly. Ignored when search_library is scikit-learn. early_stopping accept following values:

  • asha for Asynchronous Successive Halving Algorithm

  • hyperband for Hyperband

  • median for Median Stopping Rule

  • If False , early stopping will not be used.

Choose the model with better performance, if True the object returned is always better performing according to the evaluation metrics.

Click Train and Evaluate button, a progression bar will pop up, showing the progress of the tuning process. The output is a table that shows the best model from training part first, and an other table of the best model from Tuning.

You may need to repeat the tuning process few times to get better results

The best model from tuning, will be saved under the file name Tuned_machine_learning task_Model_sessionID.pkl (exp :Tuned_Classification_Model_1.pkl with sessionID=1)

Model Ensembling

Three available ensembling methods are available : Ensembling and Blending.

Ensembling

After tuning you can go directly to Model optimization, but if you want to experiment ensemling, then First, choose the model you want to ensemble, either from training part, or tuning part. this is mandatory.

if you Select ensemling method, then the second thing to do is to select the method of ensemling the estimator(s), either Bagging or Boosting.

Bagging : also known as Bootstrap aggregating, is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It also reduces variance and helps to avoid overfitting while Boosting is an ensemble meta-algorithm for primarily reducing bias and variance in supervised learning. Boosting is in the family of machine learning algorithms that convert weak learners to strong ones.

For both methods, you need to sepcify the Number of base estimators in the ensemble,In case of perfect fit, the learning procedure is stopped early.

when Choose to return the best performance is set to True, the returned object is always better performing according to what you define as evaluation metric.

Select the evaluation metric from the list, and click Ensemble and Evaluate button.

The model from Ensembling, will be saved under the file name Ensembled_machinelearning task_Model_sessionID.pkl (exp :Ensembled_Trained_Classification_Model_1.pkl with sessionID=1), if you ensemled models from training section,otherwise the file name will be (exp :Ensembled_Tuned_Classification_Model_1.pkl with sessionID=1) if you ensemble using tuning models.

Blending

Blending method, could be used only if the Number of top n model to selectin Model Training part is bigger than 1.

Select method of blending estimator : hard uses predicted class labels for majority rule voting. soft, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers. Default value, auto, will try to use ‘soft’ and fall back to ‘hard’ if the former is not supported.

Select the evaluation metric :For Classification : Accuracy, AUC, Recall, Precision, Kappa. For Regression : MAE, MSE, RMSE, R2, RMSLE, MAPE. The model will be evaluated during Blending process according to the metric you have selected.

when Choose to return the best performance is set to True, the returned object is always better performing according to what you define as evaluation metric.

The model from Blending, will be saved under the file name Blinded_machinelearning task_Model_sessionID.pkl (exp :Blinded_Trained_Classification_Model_1.pkl with sessionID=1), if you ensemled models from training section,otherwise the file name will be (exp :Blinded_Tuned_Classification_Model_1.pkl with sessionID=1) if you ensemble using tuning models.

Stacking

For Stacking, enter the evaluation metric and choose to return the best performance, or keep them as default. Click Ensemble and Evaluate.

The model from Stacking, will be saved under the file name Stacked_machinelearning task_Model_sessionID.pkl (exp :Stacked_Trained_Classification_Model_1.pkl with sessionID=1), if you ensemled models from training section,otherwise the file name will be (exp :Stacked_Tuned_Classification_Model_1.pkl with sessionID=1) if you ensemble using tuning models

Model Optimization

This section is mandatory to finalze your model, and choose the best of the models build from Above sections.

Select if you want to optimize using test data, When set to True, metrics are evaluated on test set instead of CV.

The model from Optimization, will be saved under the file name Best_machinelearning task_Model_sessionID.pkl (exp :Best_Trained_Classification_Model_1.pkl with sessionID=1)

Last updated