Jobs

Digital Voice Assistant for Data Retrieval in Reporting - a Practical Example

Digital voice assistants are conquering cars, smart homes and numerous service areas. With the right magic word, they already take over minor services. The particular strength of digital voice assistants, however, lies in the accuracy with which hidden insights can be gained from data. This can be particularly useful in the finance sector, as our practical example documents.

Since the invention of the first computers, people have been concerned with how to make their use and interaction more intuitive. Up to now, direct communication via speech has been a particularly desirable possibility for exchange between humans and machines, but understanding and reproducing spoken language has been and still is a particular challenge for technology due to its complexity.

In the following, we want to demonstrate the advantages of a digital voice assistant with the help of a practical example. This practical example is about the development of a prototype voice assistant in the financial sector within the framework of a company's ad-hoc reporting. This means that it simulates the "quick or spontaneous request" for certain data - in our example KPIs - with an immediate answer. On the one hand, this avoids the higher internal need for communication and, on the other hand, the long search for data in reports with many pages, thus saving work and time.

The digital voice assistant can therefore be understood as a kind of "tool" for the manager, which gives him easier and more intuitive access to data. This eliminates the need to "ask questions" through various contact persons.

Reasons for investing in an in-house voice assistant

Beyond the benefits of increased efficiency, time savings and the ability to interact in real time, we have identified other advantages:

  1. Improve execution and accountability: customised design and programming pay in to client-specific requirements. There are no licence fees.
  2. Rapid time-to-market: by using pioneering technology, required solutions can be developed and implemented within a short period of time and the adaptation and management of voice bots can be carried out quickly.
  3. Data sovereignty: through the in-house solution, the customer retains control over sensitive company data. This is because security and the protection of data are more difficult to guarantee with third parties.
     

Technical functioning of a digital voice assistant

Voice assistants use software that recognises speech and translates it into commands. The term "natural language" refers to the language in which people communicate with each other. In the context of speech-based AI, three terms are relevant: Natural Language Processing (NLP), Natural Language Generation (NLG) and Natural Language Understanding (NLU).

Figure 1: Context of Natural Language Processing

Natural Language Processing uses methods from different disciplines, such as AI, to enable computers to identify and understand human speech in both written and spoken language. Unstructured data is transformed into a structured data format.

Natural Language Understanding (NLU) deals with the function of successfully grasping and understanding machine texts. In concrete terms, the grammar and the context of a statement are analysed in order to derive the meaning and the sense of a sentence. The counterpart to this is Natural Language Generation, which generates natural language or texts automatically. Classic NLG applications would be text generation, for example financial texts, text summarisation or the translation of texts.

Figure 2: Machine Learning Process

Development, implementation and application of the voice assistant

The first step requires basic software (Speech-To-Text (STT) and Text-To-Speech (TTS)). STT converts spoken natural language into text and TTS conversely converts text into natural language. You can have the basic software developed yourself or alternatively obtain Open Source Software from providers such as Amazon Polly, Google Cloud or IBM (Watson).

So what are the advantages of developing your own software over procuring open source software? If you want to be ready for use quickly, a corresponding software in the cloud is advantageous. However, if you are thinking in the long term and also see several possible uses for the voice assistant in the company in the long term, then in-house development is advantageous because each additional use case or each additional development measure is cheaper in the future and thus represents the more cost-effective solution in the long term.

The software is then further developed in an agile project procedure together with the internal addressee and with the internal IT. Ideally, an MVP (minimum viable product) is created in order to arrive at the best possible solution with the developer. The needs of the internal addressee are taken as a benchmark.

In our practical example, all conceivable questions - for example, questions about the KPI's in the financial area in the sense of "How high was the turnover in segment xxx last year?" - had to be developed for reporting. These were then integrated into the system by the developer.

Figure. 3: Voice Assistant functionality

Figure 3 illustrates the process of STT and TTS. A question is posed to the system using natural language. This is then answered via the Natural Language Process in the form of text and speech.

Using the voice assistant  - a practical example

In order to better understand the concept of a voice assistant in the form of an in-house solution, we show a practical example. The practical example serves to illustrate how such projects could be implemented in the area of finance/controlling and why in-house solutions are the better approach from our point of view.

The speech recognition software, developed and customised in-house, is based on the existing data infrastructure and enables the query of financial and non-financial KPIs, for example turnover, and the segment, for example Argentina, depending on a given time period such as month, quarter or year in real time. The queried information is complemented by voice and text responses. The machine learning model is the "Acoustic & Language Model", which means that the algorithm was trained on the basis of speech and large amounts of text.

The financial officer wants to be informed about specific KPIs. This is often associated with an increased need for communication and time. Although financial and non-financial KPIs are collected, they are often stored in different sources and the information sought cannot always be found immediately due to different source systems and the flood of data. In addition, the manual operation of the systems and the search for contact persons is inefficient and time-consuming.

To save time, the finance manager opens the specially developed KPI search tool. Now he clicks on the microphone symbol and begins with the voice input.

Figure 4: Voice input with chat function for KPI search

He asks for the amount of sales in April 2021 in the Argentina segment and receives an answer within a few seconds. Now he wants to know even more specifically and asks how high the private customer share of sales is. The tool also recognises and answers this voice command intuitively and effectively.

In addition to the just introduced voice prompt via chat, the voice assistant will soon be able to complement voice and text responses with a graphical report. A data comparison from different sources and systems will be summarised and clearly presented to the user.

Digital voice assistant in finance - listens to the word

The previous questions are already answered accurately by the developed voice assistant, but further features for the assistant are being worked on. For example, questions are being worked on as to where it should be used (app, web, smart speaker) and how it should be meaningfully integrated into existing controlling systems. Nevertheless, within just a few weeks, a functional voice assistant has been created that can already fulfil important tasks.

Many thanks to Keldan Basmacioglu for his contribution to this article.