Clustering
When you set all parameters needded for clustering, including data processing, now you are ready to start training your model
Last updated
Was this helpful?
When you set all parameters needded for clustering, including data processing, now you are ready to start training your model
Last updated
Was this helpful?
This section is for training models where the number of clusters is already defined manually, or automatically when using the following algorithms:
Mean Shift Clustering
Density Based Spatial Clustering
OPTICS Clustering
Affinity propagation
Step 1 : click Train and Evaluate
button , you get the following output:
Step 2 : assign different clusters to your data, click Assign clusters
you will get the following output:
the model will be save as Clustering_Model_2.pkl
Assigned data will be save as Predicted_data_Assigned_2.csv
in this case 2 is the session ID number
This section is for getting an optimal number of clusters using the elbow method, for that you click
Get an optimal number of clusters
button, and you will get this graph.
From the image above , we notice that k = 4 is the optimized number of clusters.
Enter the elbowatk number shown in the image above
which 4 in our case and click Step 1 : Retrain and Evaluate
button.
The model will be saved as Clustering_Model_2_elbow.pkl
The assigned data will be saved as Predicted_data_2_elbow.csv
in this case 2 is the session ID number
You can use this section if you do have already labled data, and you want to tune the number of clusters,
Only the following models could be tuned:
K-Means Clustering
Spectral Clustering
Agglomerative Clustering
Birch Clustering
K-Modes Clustering
if you have not selected one of the above models, then you have to start the experiment from the begining, which means choose the right model, process data then come back to training section. if you did choose one of them, then go ahead and fill the following multiselect boxes.
Select the target column containing labels
: Name of the target column containing labels.
Select type of task (Automatically inferred when None)
: Choose from the list
if Classification:
‘ Logistic Regression (Default)
K Nearest Neighbour
Naive Bayes
Decision Tree Classifier
SVM - Linear Kernel
SVM - Radial Kernel
Gaussian Process Classifier
Multi Level Perceptron
Ridge Classifier
Random Forest Classifier
Quadratic Discriminant Analysis
Ada Boost Classifier
Gradient Boosting Classifier
Linear Discriminant Analysis
Extra Trees Classifier
Extreme Gradient Boosting
Light Gradient Boosting
CatBoost Classifier
if Regression:
Linear Regression (Default)
Lasso Regression
Ridge Regression
Elastic Net
Least Angle Regression
Lasso Least Angle Regression
Orthogonal Matching Pursuit
Bayesian Ridge
Automatic Relevance Determ.
Passive Aggressive Regressor
Random Sample Consensus
TheilSen Regressor
Huber Regressor
Kernel Ridge
Support Vector Machine
K Neighbors Regressor
Decision Tree
Random Forest
Extra Trees Regressor
AdaBoost Regressor
Gradient Boosting
Multi Level Perceptron
Extreme Gradient Boosting
Light Gradient Boosting
CatBoost Regressor
Select the evaluation metric
: For Classification tasks: Accuracy, AUC, Recall, Precision, F1, Kappa (default = ‘Accuracy’), For Regression tasks: MAE, MSE, RMSE, R2, RMSLE, MAPE (default = ‘R2’).
Select the type of your custom grid
: By default, a pre-defined number of clusters is iterated over to optimize the supervised objective. To overwrite default iteration, pass a list of number of clusters to iterate over in Select a list of number of clusters to iterate over
multiselect box.
Select the number of folds to be used in cross validation
: Number of folds to be used in Kfold CV. Must be at least 2.
By clicking Step 1 : Tune num_clusters and Evaluate
button, you will get the following output:
As you see below where indexes is the number of clusters, that 4 clusters is the best number of clusters for this sepecific data.
if you click step 2 : to assign clusters, you will get the following output.
The model will be saved as Clustering_Model_2_tuned.pkl
All your Assigned data will be save as Predicted_data_2_tuned.pkl
in this case 2 is the session ID number