# Anomaly\_Detection

## Train and Assign anomalies

This section is for training models where the number of clusters is already defined.

Click **`Step 1 : Train` ,** and the training process will start, you will get this type of output

![Training Anomaly\_Detection model](https://1577378233-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MT0DqusrMJSGtXvrnXz%2F-MVbm69d1ql9pXKr-scU%2F-MVbs-FMQG7QdnvSa4c0%2Fimage.png?alt=media\&token=e6486514-3c07-4cdf-8c25-b5a017c71383)

After your model is trained, click **`Step 2 : Assign anomalies` ,** and anomalies will be assigned to data, you get the following output.

![Assigning anomalies to data](https://1577378233-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MT0DqusrMJSGtXvrnXz%2F-MVbm69d1ql9pXKr-scU%2F-MVbt5TTmErkj26fOnUi%2Fimage.png?alt=media\&token=0eddeff1-48ab-4905-9ba5-a118fad96aed)

{% hint style="info" %}

* The Model will be saved as Anomaly\_Detection\_Model\_15.pkl
* The assigned data will be saved under the file name predicted\_data\_Assigned\_15.csv

15 is the session ID number.
{% endhint %}

## Train and tune the fraction parameter with data containing labled target column

This section is for is for training model when data is already labeled. and tune fraction fraction parameter of the model.

Fill the following multiselect boxes.

**`Select the target column containing labels`** : Name of the target column containing labels.

**`Select type of task (Automatically inferred when None)`**: Choose from the list

if **Classification**:

* ‘ Logistic Regression (Default)
* K Nearest Neighbour
* &#x20;Naive Bayes
* Decision Tree Classifier
* SVM - Linear Kernel
* SVM - Radial Kernel
* Gaussian Process Classifier
* &#x20;Multi Level Perceptron
* &#x20;Ridge Classifier
* &#x20;Random Forest Classifier
* &#x20;Quadratic Discriminant Analysis
* &#x20;Ada Boost Classifier
* &#x20;Gradient Boosting Classifier
* &#x20;Linear Discriminant Analysis
* &#x20;Extra Trees Classifier
* &#x20;Extreme Gradient Boosting
* &#x20;Light Gradient Boosting
* &#x20;CatBoost Classifier

if **Regression**:

* Linear Regression (Default)
* Lasso Regression
* Ridge Regression
* Elastic Net
* Least Angle Regression
* Lasso Least Angle Regression
* Orthogonal Matching Pursuit
* Bayesian Ridge
* Automatic Relevance Determ.
* Passive Aggressive Regressor
* Random Sample Consensus
* TheilSen Regressor
* Huber Regressor
* Kernel Ridge
* Support Vector Machine
* K Neighbors Regressor
* Decision Tree
* Random Forest
* Extra Trees Regressor
* AdaBoost Regressor
* Gradient Boosting
* Multi Level Perceptron
* Extreme Gradient Boosting
* Light Gradient Boosting
* CatBoost Regressor

**`Select the evaluation metric`**: For Classification tasks: Accuracy, AUC, Recall, Precision, F1, Kappa (default = ‘Accuracy’), For Regression tasks: MAE, MSE, RMSE, R2, RMSLE, MAPE (default = ‘R2’).

**`Select the method of labeling outliers (default = drop)`** : When method set to drop, it will drop the outliers from training dataset. When **surrogate**, it uses decision function and label as a feature during training.

**`Select the number of folds to be used in cross validation`** Number of folds to be used in Kfold CV. Must be at least 2.

Click **`Step 1 : Tune the fraction parameter and Evaluate`** button, to get the following output:

![Tuning the fraction of the model](https://1577378233-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MT0DqusrMJSGtXvrnXz%2F-MVbm69d1ql9pXKr-scU%2F-MVbzgRYbFYDd6LKMKiA%2Fimage.png?alt=media\&token=35a3dd29-c66b-4a53-8d05-5b710e179ce6)

Click `Step 2 : Assign anomalies to assign anomalies` to assign data with new tuned model.the output will look like this:

![Assigning anomalies to labeled data.](https://1577378233-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MT0DqusrMJSGtXvrnXz%2F-MVbm69d1ql9pXKr-scU%2F-MVc-pKLCRA5SWA5ygF1%2Fimage.png?alt=media\&token=c97f39e7-06dc-45ae-b40e-25169991d3e0)

{% hint style="info" %}

* The model will be saved as Anomaly\_Detection\_Model\_16\_tuned.pkl
* Assigned data will be saved under the file name predicted\_data\_16\_tuned.csv&#x20;

with session ID equal 16 in this case.
{% endhint %}
