What? So What? What now?

Amol Kumar
Nov 11, 2019
2 min read

In my October SDA, I present my findings from previous research by giving the audience a general overview of how speech recognition works along with an explanation of my approach to the problem of recognizing spoken names. In my research, I have found that many models used today rely on methods like artificial intelligence that become confused when spelling patterns from different languages are introduced. Names can not only be derived from different languages but can also combine multiple languages with both first and last names. This can lead to a seemingly infinite variability in the ways that names are pronounced. To solve this problem other developers have turned towards artificial intelligence to personalize their software. By this, I mean that they have used artificial intelligence to have their software learn how one person pronounces every name. Upon reading this, I thought that this software would only really be useful to that one person and, therefore, not very useful at all. Granted, this method does have a very high success rate once it has learned how the user speaks. However, I don't feel that this approach actually functions as a permanent solution to the problem. Ideally, we would have a software that anyone can use so long as they are pronouncing the name at least close to the correct pronunciation. This is the core concept for my project for which I hope to create a working model and demonstration. Going forward, I will be implementing a system of assuming the correct pronunciation of the names in my address book, but, to hopefully solve the problem of variability, I will also be changing the system of recognition to be probabilistic. This means that there will be room for error in the user's pronunciation from the pronunciation for the name that I assume.

What? So What? What now?

Recent Posts

Comments