Automatic speech recognition Deep learning



large-scale automatic speech recognition first , convincing successful case of deep learning. lstm rnns can learn deep learning tasks involve multi-second intervals containing speech events separated thousands of discrete time steps, 1 time step corresponds 10 ms. lstm forget gates competitive traditional speech recognizers on tasks.


the initial success in speech recognition based on small-scale recognition tasks based on timit. data set contains 630 speakers 8 major dialects of american english, each speaker reads 10 sentences. small size allows many configurations tried. more importantly, timit task concerns phone-sequence recognition, which, unlike word-sequence recognition, allows weak language models (without strong grammar). allows weaknesses in acoustic modeling aspects of speech recognition more analyzed. error rates listed below, including these results , measured percent phone error rates (per), have been summarized on past 20 years:



the debut of dnns speaker recognition in late 1990s , speech recognition around 2009-2011 , of lstm around 2003-2007, accelerated progress in 8 major areas:



scale-up/out , acclerated dnn training , decoding
sequence discriminative training
feature processing deep models solid understanding of underlying mechanisms
adaptation of dnns , related deep models
multi-task , transfer learning dnns , related deep models
cnns , how design them best exploit domain knowledge of speech
rnn , rich lstm variants
other types of deep models including tensor-based models , integrated deep generative/discriminative models.

all major commercial speech recognition systems (e.g., microsoft cortana, xbox, skype translator, amazon alexa, google now, apple siri, baidu , iflytek voice search, , range of nuance speech products, etc.) based on deep learning.








Comments

Popular posts from this blog

Independence United Arab Emirates

History Alexandra College

Management School of Computer Science, University of Manchester