Nov 2, 2020, Innovations, Web

Voicebots from the QA perspective – case study

Beata Słupek QA Team Leader

Our Hellobot app is an innovative bot based on voice commands which are the only form of user’s communication and consist of a software voice user interface. The goal was building a platform that enables launching voicebots and automating the business processes in the voice channel. Because it was not a typical web or mobile app, the idiosyncrasy of this solution required applying adequate techniques and methods of testing. 

Voicebot tests overview

Voice commands tests depend on factors such as: the strength of voice, the choice of words, or accompanying sounds. They’re connected with a large risk of software failure. What’s more, we’re usually not able to use the classic/common tools, techniques, and methods applied in backend or frontend applications testing, both web and mobile. 

We divided our solution into 4 basic parts:

  • ASR (automatic speech recognition) or STT (speech to text)
  • Artificial intelligence (neuron net)
  • TTS (speech to text)
  • Conversation’s scenario

Apart from that, we tested the whole telecommunication part, too.


ASR is responsible for implementing data into the app – it recognizes speech and changes it into a chain of signs understood by the bot. Testing this module is not easy and undergoes many variables because of different tones of speech, timbre and strength of voice, and the fact that the conversation can be held in environments of varied noise intensity. Additionally, Polish language’s phonetics doesn’t make it easy either. A good example is the word “tak” meaning “yes”. It consists of two voiceless sounds: “T” and “K”. When we say them, we don’t use our vocal cords, so the sounds are articulated quieter than the “A” and can be deafen by the surroundings. 


Artificial intelligence is responsible for interpreting the chain of signs understood by the ASR. It allows the user to hold the conversation in many different ways. An interlocutor defines what to say, and thanks to Polish language’s expanded vocabulary, is able to give the same meaning’s answer in different manners. An example can be confirming a prearranged visit – we can come up with many various synonymous phrases that have to be learnt by our neuron network. We are pretty sure that some of the hundreds of customers using our bot can be even more creative and use a different phrase. 


Our app enables both user-bot and bot-user communication. The latter is possible thanks to used TTS which changes the chain signs of bot’s statement into speech (audio file). It allows providing the user with all the necessary information in the voice form. It’s worth mentioning that unfortunately it’s prone to many traps that need to be taken into consideration – the manner of stating the dates, sums of money, phone numbers, proper names, surnames and their flection. 

Conversation’s scenario

This part of the app has a defined course of conversation which allows meeting all the business needs while using our bot. It’s the right time to define if we want to confirm a visit or inform about promotions. 

More information about possible scenarios served by our bot can be found here


Such an innovative project implies that all of the described modules were something new and challenging. Their complexity made the testing quite difficult. We had to adjust the process to the unconventional character of the project which allowed us to find defects that vary from the typical entries reported during the mobile or web app verification.

An essential and specific matter while testing was verifying the conversation’s scenario held between the bot and a user. The results couldn’t be unequivocally assigned to one of the categories: “works” or “doesn’t work”. They were an open issue that needed an in-depth analysis. It came down to finding out why a user wasn’t able to realize the scenario and finish it with success. The tester’s task was to find the cause of failure and verify if and why the user didn’t understand what to say. Testing the conversation scenario, we also analyzed the form of its course. It involved stating if the scenario is clear and transparent or maybe the other way round – confuses the user being ambiguous and misleading. If the bot can explain to the user what to say step by step, the scenario gets too long and tiring. If it’s too quick, some of the interlocutors still won’t know how to use it. 

In the last phase of testing, we decided to apply the solutions used mainly in the computer gaming industry. After performing the tests within the project team and checking the bot’s stability, we applied the alpha and beta tests. During the first phase, the testers knew only basic business assumptions and had to go through the conversation scenarios giving us their insights and suggestions about the app’s performance. The second phase, on the other hand, hit a bigger scale and engaged a larger number of testers. 

The important issue was fulfilling the assumption that everyone who talks with the bot for the first time should successfully finish the scenario. During the conversation that lasts between 30 seconds and a minute a user doesn’t have much time and possibility to learn how to communicate with a bot. That’s why it’s so important to collect and analyze the experience of people who talked with the bot for the first time. A small team of testers wouldn’t be able to gather such complex data, because they would not manage to recreate the manner of expressing and behaving in such a large number of users. 

Performing the tests by a vast group of independent testers allowed acquiring a sufficiently representative evaluation and data, and enabled keeping the statistics which provided us with the knowledge necessary for implementing the product’s improvements. Nevertheless, it’s not the end of the maintenance phase. We still work on perfecting our solution. In the production environment, bot can still be educated and tested for the newest phrases stated by the system’s users.