Getting Started

Set parameters and start to build your model

Before starting to build your model or make predictions, you need first to set some parameters as follow:

In the Side Bar :

  • Enter the experiment name i.e : Demo

  • Select the ML task you want to perform, choose NLP

  • Specify the NLP model you want to use for training, 5 available models, choose Latent Dirichlet Allocation

  • Enter the number of topics : this is the number of subjects you believe that you corpus has, enter number 2

  • Import your data : either you can use our available data library where you have ready data to use and to get familiar with process, or you can import your own data, in this case you have the choice to import data as csv file or from excel. choose Data library then select kiva

  • session ID : to keep track of each model, it s highly recomended to Change your session ID when ever you want to retrain new model within the same experiment.i.e : 1

  • Activities : there are 4 available activities : Data Description, Statistic, AutoML and ADvancedML, in this section we will choose AutoML.

When done filling all above, the developpement bar will be ready for you to strat processing.

In the Developpement Bar :

You can now check the box show data to display your data.

In the developpement bar three steps are available for the whole processing:

Step 1 : Build

Select your target from the following list, the list contains all data columns, you need to select the column that has text you want to process, in our case we will select en, (hilighted in red above) and click build the model.

After training the model which will take some times depending on the size of your data, the model name along with the default parameters will be displayed, and the location where your model is saved.

Note that the number 1 in the model name nlp_model_1.pkl is the session Id number.

Step 2 : Evaluate

Select either to evaluate from the entire dataset, in this case a plot on the entire dataset will be returned instead of one at the topic level, or you can evaluate by topic which will return a plot based on the topic you have selected. in our case we will choose Topic 1.

Choose the evaluation metric from the list which contains multiple metrics, try some of them to get more insight about your data, in our case we will choose wordcloud, then click Plot.

Step 3 : Predict

Now that we have our model , and satisfied whith the evaluation, we pass to next step which is prediction.

first you choose either the model you have built above or you can choose from a list of models built before within the same experiment, we will choose Model built above, then click Assign data.

A table will be displayed containing your data and 4 more columns:

  • Topic_0 : probability of Topic 0 for each text in that row.

  • Topic_1 : probability of Topic 1 for each text in that row.

  • Dominant_Topic : topic associated to each text in that row.

  • Perc_Dominant_Topic : percentage of dominant topic

and a notice of the location where the predicted data is saved.

The number 1 in the saved data file name Topic_model_data_prediction_1.csv is the session id number

Step 4 : Data Preparation (Optional)

This part will prepare and save your data, and make it ready for other machine learning tasks (i.e: classification,clustering...).

when you click prepare my data, it will drop unecessary columns generated from step 3, and kept only Topic_1 column and Topic_2 column plus your original data, display your data and save it.

For further processing and machine learning tasks, go to the side bar, and select the ML task you want to perform i.e Regression, follow the steps, import data above, and proceed as usually you do.

Last updated