May 7th Post

Nowadays, Artificial Intelligence and Machine Learning have become all the rage, but people's faith in this technology may be misplaced with regards to speech recognition. Many would believe that this technology possesses the capability to extrapolate of "think" beyond what it already knows. This idea is false, in fact, the very purpose of AI and Machine Learning is to interpolate and derive general characteristics from a known dataset. When applying this idea to speech recognition, there are two possible approaches. The first is trying to accumulate a large enough dataset such that it could accurately represent the speech patterns of every person not accounted for within it (assuming that this dataset isn't from the collective human race which would obviously be impossible). This approach is implausible because operating under the assumption that any person's voice can be accounted for simply because it sounds like one from your dataset is unrealistic. The second approach could be considered to be the opposite of the first one. This approach involves operating under the assumption that it would only be used by one person and could then accumulate data from this single-user and learn exactly how they speak. The issue with this approach, of course, is that the software would have to be forcibly corrected by the user when their pronunciation doesn't line up with the software's prediction. On top of that this approach would sacrifice all versatility for accuracy. Obviously, this approach could prove to be useful when applied to devices typically limited to one user or even for security purposes. However, Google has developed a third approach that satisfies their needs for its Search by Voice. They have determined that, considering the enormity of possible searches, it would be best to use AI to limit the possible queries based on previous searches from a particular user. I believe this approach to be the best use for AI in regards to speech recognition because it doesn't involve the AI in the actual analysis of the speech data.

