Deep neural networks Deep learning







a deep neural network (dnn) ann multiple hidden layers between input , output layers. similar shallow anns, dnns can model complex non-linear relationships. dnn architectures generate compositional models object expressed layered composition of primitives. layers enable composition of features lower layers, potentially modeling complex data fewer units performing shallow network.


deep architectures include many variants of few basic approaches. each architecture has found success in specific domains. not possible compare performance of multiple architectures, unless have been evaluated on same data sets.


dnns typically feedforward networks in data flows input layer output layer without looping back.


recurrent neural networks (rnns), in data can flow in direction, used applications such language modeling. long short-term memory particularly effective use.


convolutional deep neural networks (cnns) used in computer vision. cnns have been applied acoustic modeling automatic speech recognition (asr).


challenges

as anns, many issues can arise naively trained dnns. 2 common issues overfitting , computation time.


dnns prone overfitting because of added layers of abstraction, allow them model rare dependencies in training data. regularization methods such ivakhnenko s unit pruning or weight decay (






2




{\displaystyle \ell _{2}}

-regularization) or sparsity (






1




{\displaystyle \ell _{1}}

-regularization) can applied during training combat overfitting. alternatively dropout regularization randomly omits units hidden layers during training. helps exclude rare dependencies. finally, data can augmented via methods such cropping , rotating such smaller training sets can increased in size reduce chances of overfitting.


dnns must consider many training parameters, such size (number of layers , number of units per layer), learning rate , initial weights. sweeping through parameter space optimal parameters may not feasible due cost in time , computational resources. various tricks such batching (computing gradient on several training examples @ once rather individual examples) speed computation. large processing throughput of gpus has produced significant speedups in training, because matrix , vector computations required well-suited gpus.








Comments

Popular posts from this blog

Independence United Arab Emirates

History Alexandra College

Management School of Computer Science, University of Manchester