> For the complete documentation index, see [llms.txt](https://khalid-bouziane.gitbook.io/mlbridge/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://khalid-bouziane.gitbook.io/mlbridge/modules/advancedml/build-and-evaluate/clustering.md).

# Clustering

## Train with customised number of clusters

This section is for training models where the number of clusters is already defined manually, or automatically when using the following algorithms:

* Mean Shift Clustering
* Density Based Spatial Clustering
* OPTICS Clustering
* Affinity propagation

Step 1 : click **`Train and Evaluate`** button , you get the following output:

![Clustering model training with Kmeans](/files/-MVaqXl9BdfIR92N-bur)

Step 2 : assign different clusters to your data, click **`Assign clusters`** you will get the following output:

![Assign clusters](/files/-MVarVelci6Zs3EQCQ8_)

{% hint style="info" %}

* the model will be save as Clustering\_Model\_2.pkl
* Assigned data will be save as Predicted\_data\_Assigned\_2.csv

in this case 2 is the session ID number
{% endhint %}

## Retrain Using elbow method to determine an optimal number of clusters

This section is for getting an optimal number of clusters using the elbow method, for that you click&#x20;

**`Get an optimal number of clusters`** button, and you will get this graph.

![Getting the number of clusters using elbow method](/files/-MVava3Il0qs2Sg4nQTj)

From the image above , we notice that k = 4 is the optimized number of clusters.

**`Enter the elbowatk number shown in the image above`** which 4 in our case and click **`Step 1 : Retrain and Evaluate`** button.

![Clustering using optimized number of clusters](/files/-MVaxrWsUyUL4LLeUz90)

{% hint style="info" %}

* The model will be saved as Clustering\_Model\_2\_elbow\.pkl
* The assigned data will be saved as Predicted\_data\_2\_elbow\.csv

in this case 2 is the session ID number
{% endhint %}

## Train and tune the number of clusters with data containing labled target column

You can use this section if you do have already labled data, and you want to tune the number of clusters,&#x20;

{% hint style="warning" %}
Only the following models could be tuned:

* K-Means Clustering
* Spectral Clustering
* Agglomerative Clustering
* Birch Clustering
* K-Modes Clustering
  {% endhint %}

if you have not selected one of the above models, then you have to start the experiment from the begining, which means choose the right model, process data then come back to training section. if you did choose one of them, then go ahead and fill the following multiselect boxes.

**`Select the target column containing labels`** : Name of the target column containing labels.

**`Select type of task (Automatically inferred when None)`**: Choose from the list

if **Classification**:

* ‘ Logistic Regression (Default)
* K Nearest Neighbour
* &#x20;Naive Bayes
* Decision Tree Classifier
* SVM - Linear Kernel
* SVM - Radial Kernel
* Gaussian Process Classifier
* &#x20;Multi Level Perceptron
* &#x20;Ridge Classifier
* &#x20;Random Forest Classifier
* &#x20;Quadratic Discriminant Analysis
* &#x20;Ada Boost Classifier
* &#x20;Gradient Boosting Classifier
* &#x20;Linear Discriminant Analysis
* &#x20;Extra Trees Classifier
* &#x20;Extreme Gradient Boosting
* &#x20;Light Gradient Boosting
* &#x20;CatBoost Classifier

if **Regression**:

* Linear Regression (Default)
* Lasso Regression
* Ridge Regression
* Elastic Net
* Least Angle Regression
* Lasso Least Angle Regression
* Orthogonal Matching Pursuit
* Bayesian Ridge
* Automatic Relevance Determ.
* Passive Aggressive Regressor
* Random Sample Consensus
* TheilSen Regressor
* Huber Regressor
* Kernel Ridge
* Support Vector Machine
* K Neighbors Regressor
* Decision Tree
* Random Forest
* Extra Trees Regressor
* AdaBoost Regressor
* Gradient Boosting
* Multi Level Perceptron
* Extreme Gradient Boosting
* Light Gradient Boosting
* CatBoost Regressor

**`Select the evaluation metric`**: For Classification tasks: Accuracy, AUC, Recall, Precision, F1, Kappa (default = ‘Accuracy’), For Regression tasks: MAE, MSE, RMSE, R2, RMSLE, MAPE (default = ‘R2’).

**`Select the type of your custom grid`** : By default, a pre-defined number of clusters is iterated over to optimize the supervised objective. To overwrite default iteration, pass a list of number of clusters to iterate over in **`Select a list of number of clusters to iterate over`** multiselect box.

**`Select the number of folds to be used in cross validation`** : Number of folds to be used in Kfold CV. Must be at least 2.

By clicking **`Step 1 : Tune num_clusters and Evaluate`** button, you will get the following output:

![Tuning number of clusters using supervised learning ](/files/-MVbACSyoThAjQYroHkX)

As you see below where indexes is the number of clusters, that 4 clusters is the best number of clusters for this sepecific data.

if you click step 2 : to assign clusters, you will get the following output.

![Assingning clusters to data.](/files/-MVbBVLpqxMYQo29E4qb)

{% hint style="info" %}

* The model will be saved as Clustering\_Model\_2\_tuned.pkl
* All your Assigned data will be save as Predicted\_data\_2\_tuned.pkl&#x20;

in this case 2 is the session ID number
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://khalid-bouziane.gitbook.io/mlbridge/modules/advancedml/build-and-evaluate/clustering.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
