Automatically generate precision, recall and confusion matrix for your NLP/Chatbot training data

Published in

Chatbots Magazine

2 min readFeb 7, 2019

QBox is a free tool that provides a variety of visualisations and metrics that aim to help novice users improve their training data. Those that come from a data science background may however prefer working with established metrics such as precision, recall and F1 and using a confusion matrix to visualise the intersection between different intents (classes).

But how do you go about generating all of this with your Microsoft LUIS, Google Dialogflow or IBM Watson Assistant training data without having to write tons of custom code?

Thankfully, QBox makes this easy and just a few clicks from test the results page.

Simply select the download button at the top of the test results page, you’ll see several different options. Here’s what they do.

Training data file

This contains a copy of the training data used to run the test. Even if you modify your training data within your NLP provider, you can easily get back to the exact version you tested in QBox by downloading this file. Note: if you ran your test on multiple providers, you’ll also see the option to download the training data file for the other providers.

Confusion matrix

This is an Excel file containing a confusion matrix, along with the total number of true and false positives or negatives for each intent, and the precision, recall metrics and F1 for each intent. Within the matrix itself, QBox colour codes each cell based on how much confusion there is between a pair of intents.

Raw test results

A list of each of the tests that QBox ran in Excel format. In theory, from this data, you can calculate the F1 score yourself if you wanted to (though as we’ve learned QBox already does this so no need) — however if you’re interested in what’s going on under the hood, or just want to check our maths, you may find this useful :-)

We hope you’ll find these advanced features useful!

Automatically generate precision, recall and confusion matrix for your NLP/Chatbot training data

Training data file

Confusion matrix

Raw test results

Written by Benoit Alvarez