Automatic speech recognition Deep learning



large-scale automatic speech recognition first , convincing successful case of deep learning. lstm rnns can learn deep learning tasks involve multi-second intervals containing speech events separated thousands of discrete time steps, 1 time step corresponds 10 ms. lstm forget gates competitive traditional speech recognizers on tasks.


the initial success in speech recognition based on small-scale recognition tasks based on timit. data set contains 630 speakers 8 major dialects of american english, each speaker reads 10 sentences. small size allows many configurations tried. more importantly, timit task concerns phone-sequence recognition, which, unlike word-sequence recognition, allows weak language models (without strong grammar). allows weaknesses in acoustic modeling aspects of speech recognition more analyzed. error rates listed below, including these results , measured percent phone error rates (per), have been summarized on past 20 years:



the debut of dnns speaker recognition in late 1990s , speech recognition around 2009-2011 , of lstm around 2003-2007, accelerated progress in 8 major areas:



scale-up/out , acclerated dnn training , decoding
sequence discriminative training
feature processing deep models solid understanding of underlying mechanisms
adaptation of dnns , related deep models
multi-task , transfer learning dnns , related deep models
cnns , how design them best exploit domain knowledge of speech
rnn , rich lstm variants
other types of deep models including tensor-based models , integrated deep generative/discriminative models.

all major commercial speech recognition systems (e.g., microsoft cortana, xbox, skype translator, amazon alexa, google now, apple siri, baidu , iflytek voice search, , range of nuance speech products, etc.) based on deep learning.








Comments

Popular posts from this blog

History Alexandra College

Independence United Arab Emirates