Essential Question Investigation
- Amol Kumar
- Nov 17, 2019
- 2 min read
How can we create a program that recognizes spoken names and proper nouns that are spoken by anyone? Is it possible for this program to be as accurate as a program that does use personalized pronunciation? These are the essential questions from my previous blog that guide my research for this month. Right now, the most advanced and successful models for recognizing spoken names rely on Personalized Personalization. As I mentioned in my last post, this method uses artificial intelligence to specifically function properly for one user. I also mentioned that this method is very accurate for that user, however, this accuracy is only achieved after some usage. The AI model starts with a basic speech recognition process but has the added advantage of being able to learn the specific tone, pitch, and inflections of the user. Along with this, it can also learn how entire names are pronounced by the user through their forced corrections. Learning this new information is where the increased accuracy of these models comes from. However, reading this, I am sure you have gathered that this would be quite the process for the user and it could become more arduous if the user has an accent or a stutter. On top of this, this software would be a completely blank slate at the beginning which would make it extremely prone to make mistakes that the user would then have to forcibly correct. This could be especially dangerous if it were applied in the hands-free calling systems of cars. To remedy this, I believe we need an accurate software that relies solely on phonetics. The issue of differing pronunciations between users can be addressed by using a "probabilistic system" as I mentioned in the last post. This means that the user doesn't have to pronounce the name exactly as expected by the program, but close enough that it is the most likely possibility. For example, let's say that there are two contacts in someone's phone named "Jason" and "Nathan". For the sake of this thought experiment, let's also assume that this "someone" has a funky accent. Now, if they say the name "Nathan" the program will likely realize the similarities between this pronunciation and the pronunciation of "Jason". The probabilistic system would alleviate this confusion by taking into consideration the fundamental differences between the names that are independent of any awkward pronunciations. If the program recognizes that the spoken name starts with an "N" then it can rule out "Jason" and leave "Nathan" then as the most probable answer, hence the "probabilistic system". This process can also be used the opposite way by adding the cumulative probabilities of each phoneme matching with the respective phoneme of a contact name. The largest of these cumulative probabilities will then, again, be taken as the most probable answer. This system will ideally be close to the accuracy of the AI model and, potentially, better, but it will certainly be much more consistent and safe.
Sources:
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky & James H. Martin
Commentaires