Nov 2, 2020, Innovations, Web

Voicebots from the QA perspective – case study

Beata Słupek QA Team Leader

Our Hellobot app represents a pioneering approach in the realm of virtual agents, leveraging a software voice user interface that exclusively responds to voice commands. This innovative bot aims to transform customer experiences by facilitating the launch of voicebots that automate business processes within voice channels, such as contact centers. Unlike conventional web or mobile applications, the unique nature of Hellobot demanded specialized testing techniques to ensure reliability and effectiveness.

The core technology behind Hellobot is advanced AI and voice recognition capabilities, which are critical in accurately interpreting and responding to user commands. This ensures seamless interaction between the user and the bot, mimicking the conversational ease of a live agent. The integration of such sophisticated AI not only enhances the level of customer service provided but also supports functionalities like close dialog management, where the bot can intelligently conclude interactions based on user satisfaction.

In environments where complex queries arise, Hellobot is designed to smoothly transition the interaction to live agents. This hybrid model ensures that while the bot handles regular inquiries, more intricate issues are escalated to human agents, thereby ensuring a higher level of problem resolution and a better overall customer experience.

The ultimate goal of Hellobot is not just to automate tasks but to enhance the efficiency of business operations and improve the quality of interactions in contact centers, contributing to a better level of customer service and satisfaction. By integrating voicebots into their systems, businesses can manage customer interactions more effectively, reduce the workload on live agents, and ultimately, drive significant improvements in operational efficiency and customer engagement.

Voicebot tests overview

Voice commands tests depend on factors such as: the strength of voice, the choice of words, or accompanying sounds. They’re connected with a large risk of software failure. What’s more, we’re usually not able to use the classic/common tools, techniques, and methods applied in backend or frontend applications testing, both web and mobile. 

We divided our solution into 4 basic parts:

  • ASR (automatic speech recognition) or STT (speech to text)
  • Artificial intelligence (neuron net)
  • TTS (speech to text)
  • Conversation’s scenario

Apart from that, we tested the whole telecommunication part, too.

ASR – Automatic Speech Recognition

ASR is responsible for implementing data into the app – it recognizes speech and changes it into a chain of signs understood by the bot. Testing this module is not easy and undergoes many variables because of different tones of speech, timbre and strength of voice, and the fact that the conversation can be held in environments of varied noise intensity. Additionally, Polish language’s phonetics doesn’t make it easy either. A good example is the word “tak” meaning “yes”. It consists of two voiceless sounds: “T” and “K”. When we say them, we don’t use our vocal cords, so the sounds are articulated quieter than the “A” and can be deafen by the surroundings. 

Implementing advanced solutions in ASR systems can significantly enhance customer experiences by ensuring more accurate recognition of voice commands and queries. This is particularly beneficial in contact center environments where virtual agents need to handle a wide array of customer interactions efficiently. By improving ASR capabilities, virtual agents can achieve a better level of understanding and responsiveness, reducing the need for escalation to live agents. However, when a close dialog between a customer and a virtual agent is not sufficient, the seamless transition to live agents ensures that customer inquiries are resolved effectively and efficiently, thereby optimizing the overall customer experience.

AI – Artificial Intelligence

Artificial intelligence, particularly advanced AI, plays a crucial role in interpreting the chain of signs used by Automatic Speech Recognition (ASR) systems. This technology enables virtual agents in contact centers to understand and process various customer inputs, allowing for dynamic conversations in numerous ways. Users can express the same idea in different styles thanks to the rich vocabulary of the Polish language, enhancing the customer experience by accommodating diverse expressions.

For instance, confirming a prearranged visit could be articulated through many synonymous phrases, all of which need to be recognized and understood by the neural network powering the virtual agent. This flexibility in voice recognition is crucial for handling the creativity of hundreds of customers who might use unique phrases to interact with the bot.

Incorporating AI not only supports better level communication through virtual agents but also ensures that live agents in contact centers can step in to close dialogues more effectively when needed. This seamless integration between live and virtual agents optimizes interactions, ensuring that customer experiences are smooth and satisfactory. By constantly learning from interactions, AI enhances its ability to deal with an expanding array of customer expressions, ultimately leading to more efficient and effective customer service operations.

TTS – Text To Speech

Our app enables both user-bot and bot-user communication. The latter is possible thanks to used TTS which changes the chain signs of bot’s statement into speech (audio file). It allows providing the user with all the necessary information in the voice form. It’s worth mentioning that unfortunately it’s prone to many traps that need to be taken into consideration – the manner of stating the dates, sums of money, phone numbers, proper names, surnames and their flection. 

Conversation’s scenario

This part of the app has a defined course of conversation which allows meeting all the business needs while using our bot. It’s the right time to define if we want to confirm a visit or inform about promotions. 

More information about possible scenarios served by our bot can be found here


Such an innovative project implies that all of the described modules were something new and challenging. Their complexity made the testing quite difficult. We had to adjust the process to the unconventional character of the project which allowed us to find defects that vary from the typical entries reported during the mobile or web app verification.

An essential and specific matter while testing was verifying the conversation’s scenario held between the bot and a user. The results couldn’t be unequivocally assigned to one of the categories: “works” or “doesn’t work”. They were an open issue that needed an in-depth analysis. It came down to finding out why a user wasn’t able to realize the scenario and finish it with success. The tester’s task was to find the cause of failure and verify if and why the user didn’t understand what to say. Testing the conversation scenario, we also analyzed the form of its course. It involved stating if the scenario is clear and transparent or maybe the other way round – confuses the user being ambiguous and misleading. If the bot can explain to the user what to say step by step, the scenario gets too long and tiring. If it’s too quick, some of the interlocutors still won’t know how to use it. 

In the last phase of testing, we decided to apply the solutions used mainly in the computer gaming industry. After performing the tests within the project team and checking the bot’s stability, we applied the alpha and beta tests. During the first phase, the testers knew only basic business assumptions and had to go through the conversation scenarios giving us their insights and suggestions about the app’s performance. The second phase, on the other hand, hit a bigger scale and engaged a larger number of testers. 

The important issue was fulfilling the assumption that everyone who talks with the bot for the first time should successfully finish the scenario. During the conversation that lasts between 30 seconds and a minute a user doesn’t have much time and possibility to learn how to communicate with a bot. That’s why it’s so important to collect and analyze the experience of people who talked with the bot for the first time. A small team of testers wouldn’t be able to gather such complex data, because they would not manage to recreate the manner of expressing and behaving in such a large number of users. 

Performing the tests by a vast group of independent testers allowed acquiring a sufficiently representative evaluation and data, and enabled keeping the statistics which provided us with the knowledge necessary for implementing the product’s improvements. Nevertheless, it’s not the end of the maintenance phase. We still work on perfecting our solution. In the production environment, bot can still be educated and tested for the newest phrases stated by the system’s users.