Clustering

When you set all parameters needded for clustering, including data processing, now you are ready to start training your model

Train with customised number of clusters

This section is for training models where the number of clusters is already defined manually, or automatically when using the following algorithms:

Mean Shift Clustering
Density Based Spatial Clustering
OPTICS Clustering
Affinity propagation

Step 1 : click Train and Evaluate button , you get the following output:

Step 2 : assign different clusters to your data, click Assign clusters you will get the following output:

the model will be save as Clustering_Model_2.pkl
Assigned data will be save as Predicted_data_Assigned_2.csv

in this case 2 is the session ID number

Retrain Using elbow method to determine an optimal number of clusters

This section is for getting an optimal number of clusters using the elbow method, for that you click

Get an optimal number of clusters button, and you will get this graph.

From the image above , we notice that k = 4 is the optimized number of clusters.

Enter the elbowatk number shown in the image above which 4 in our case and click Step 1 : Retrain and Evaluate button.

The model will be saved as Clustering_Model_2_elbow.pkl
The assigned data will be saved as Predicted_data_2_elbow.csv

in this case 2 is the session ID number

Train and tune the number of clusters with data containing labled target column

You can use this section if you do have already labled data, and you want to tune the number of clusters,

Only the following models could be tuned:

K-Means Clustering
Spectral Clustering
Agglomerative Clustering
Birch Clustering
K-Modes Clustering

if you have not selected one of the above models, then you have to start the experiment from the begining, which means choose the right model, process data then come back to training section. if you did choose one of them, then go ahead and fill the following multiselect boxes.

Select the target column containing labels : Name of the target column containing labels.

Select type of task (Automatically inferred when None): Choose from the list

if Classification:

‘ Logistic Regression (Default)
K Nearest Neighbour
Naive Bayes
Decision Tree Classifier
SVM - Linear Kernel
SVM - Radial Kernel
Gaussian Process Classifier
Multi Level Perceptron
Ridge Classifier
Random Forest Classifier
Quadratic Discriminant Analysis
Ada Boost Classifier
Gradient Boosting Classifier
Linear Discriminant Analysis
Extra Trees Classifier
Extreme Gradient Boosting
Light Gradient Boosting
CatBoost Classifier

if Regression:

Linear Regression (Default)
Lasso Regression
Ridge Regression
Elastic Net
Least Angle Regression
Lasso Least Angle Regression
Orthogonal Matching Pursuit
Bayesian Ridge
Automatic Relevance Determ.
Passive Aggressive Regressor
Random Sample Consensus
TheilSen Regressor
Huber Regressor
Kernel Ridge
Support Vector Machine
K Neighbors Regressor
Decision Tree
Random Forest
Extra Trees Regressor
AdaBoost Regressor
Gradient Boosting
Multi Level Perceptron
Extreme Gradient Boosting
Light Gradient Boosting
CatBoost Regressor

Select the evaluation metric: For Classification tasks: Accuracy, AUC, Recall, Precision, F1, Kappa (default = ‘Accuracy’), For Regression tasks: MAE, MSE, RMSE, R2, RMSLE, MAPE (default = ‘R2’).

Select the type of your custom grid : By default, a pre-defined number of clusters is iterated over to optimize the supervised objective. To overwrite default iteration, pass a list of number of clusters to iterate over in Select a list of number of clusters to iterate over multiselect box.

Select the number of folds to be used in cross validation : Number of folds to be used in Kfold CV. Must be at least 2.

By clicking Step 1 : Tune num_clusters and Evaluate button, you will get the following output:

As you see below where indexes is the number of clusters, that 4 clusters is the best number of clusters for this sepecific data.

if you click step 2 : to assign clusters, you will get the following output.

The model will be saved as Clustering_Model_2_tuned.pkl
All your Assigned data will be save as Predicted_data_2_tuned.pkl

in this case 2 is the session ID number

PreviousClassification and Regression NextAnomaly_Detection

Last updated 4 years ago

Was this helpful?