Yüklüyor…
Deep Learning : Foundations and Concepts.
Yazar: | |
---|---|
Diğer Yazarlar: | |
Materyal Türü: | e-Kitap |
Dil: | İngilizce |
Baskı/Yayın Bilgisi: |
Cham :
Springer International Publishing AG,
2023.
|
Edisyon: | 1st ed. |
Konular: | |
Online Erişim: | Full-text access |
İçindekiler:
- Intro
- Preface
- Goals of the book
- Responsible use of technology
- Structure of the book
- References
- Exercises
- Mathematical notation
- Acknowledgements
- Contents
- 1 The Deep Learning Revolution
- 1.1. The Impact of Deep Learning
- 1.1.1 Medical diagnosis
- 1.1.2 Protein structure
- 1.1.3 Image synthesis
- 1.1.4 Large language models
- 1.2. A Tutorial Example
- 1.2.1 Synthetic data
- 1.2.2 Linear models
- 1.2.3 Error function
- 1.2.4 Model complexity
- 1.2.5 Regularization
- 1.2.6 Model selection
- 1.3. A Brief History of Machine Learning
- 1.3.1 Single-layer networks
- 1.3.2 Backpropagation
- 1.3.3 Deep networks
- 2 Probabilities
- 2.1. The Rules of Probability
- 2.1.1 A medical screening example
- 2.1.2 The sum and product rules
- 2.1.3 Bayes' theorem
- 2.1.4 Medical screening revisited
- 2.1.5 Prior and posterior probabilities
- 2.1.6 Independent variables
- 2.2. Probability Densities
- 2.2.1 Example distributions
- 2.2.2 Expectations and covariances
- 2.3. The Gaussian Distribution
- 2.3.1 Mean and variance
- 2.3.2 Likelihood function
- 2.3.3 Bias of maximum likelihood
- 2.3.4 Linear regression
- 2.4. Transformation of Densities
- 2.4.1 Multivariate distributions
- 2.5. Information Theory
- 2.5.1 Entropy
- 2.5.2 Physics perspective
- 2.5.3 Differential entropy
- 2.5.4 Maximum entropy
- 2.5.5 Kullback-Leibler divergence
- 2.5.6 Conditional entropy
- 2.5.7 Mutual information
- 2.6. Bayesian Probabilities
- 2.6.1 Model parameters
- 2.6.2 Regularization
- 2.6.3 Bayesian machine learning
- Exercises
- 3 Standard Distributions
- 3.1. Discrete Variables
- 3.1.1 Bernoulli distribution
- 3.1.2 Binomial distribution
- 3.1.3 Multinomial distribution
- 3.2. The Multivariate Gaussian
- 3.2.1 Geometry of the Gaussian
- 3.2.2 Moments
- 3.2.3 Limitations.
- 3.2.4 Conditional distribution
- 3.2.5 Marginal distribution
- 3.2.6 Bayes' theorem
- 3.2.7 Maximum likelihood
- 3.2.8 Sequential estimation
- 3.2.9 Mixtures of Gaussians
- 3.3. Periodic Variables
- 3.3.1 Von Mises distribution
- 3.4. The Exponential Family
- 3.4.1 Sufficient statistics
- 3.5. Nonparametric Methods
- 3.5.1 Histograms
- 3.5.2 Kernel densities
- 3.5.3 Nearest-neighbours
- Exercises
- 4 Single-layer Networks: Regression
- 4.1. Linear Regression
- 4.1.1 Basis functions
- 4.1.2 Likelihood function
- 4.1.3 Maximum likelihood
- 4.1.4 Geometry of least squares
- 4.1.5 Sequential learning
- 4.1.6 Regularized least squares
- 4.1.7 Multiple outputs
- 4.2. Decision theory
- 4.3. The Bias-Variance Trade-off
- Exercises
- 5 Single-layer Networks: Classification
- 5.1. Discriminant Functions
- 5.1.1 Two classes
- 5.1.2 Multiple classes
- 5.1.3 1-of-K coding
- 5.1.4 Least squares for classification
- 5.2. Decision Theory
- 5.2.1 Misclassification rate
- 5.2.2 Expected loss
- 5.2.3 The reject option
- 5.2.4 Inference and decision
- 5.2.5 Classifier accuracy
- 5.2.6 ROC curve
- 5.3. Generative Classifiers
- 5.3.1 Continuous inputs
- 5.3.2 Maximum likelihood solution
- 5.3.3 Discrete features
- 5.3.4 Exponential family
- 5.4. Discriminative Classifiers
- 5.4.1 Activation functions
- 5.4.2 Fixed basis functions
- 5.4.3 Logistic regression
- 5.4.4 Multi-class logistic regression
- 5.4.5 Probit regression
- 5.4.6 Canonical link functions
- Exercises
- 6 Deep Neural Networks
- 6.1. Limitations of Fixed Basis Functions
- 6.1.1 The curse of dimensionality
- 6.1.2 High-dimensional spaces
- 6.1.3 Data manifolds
- 6.1.4 Data-dependent basis functions
- 6.2. Multilayer Networks
- 6.2.1 Parameter matrices
- 6.2.2 Universal approximation
- 6.2.3 Hidden unit activation functions.
- 6.2.4 Weight-space symmetries
- 6.3. Deep Networks
- 6.3.1 Hierarchical representations
- 6.3.2 Distributed representations
- 6.3.3 Representation learning
- 6.3.4 Transfer learning
- 6.3.5 Contrastive learning
- 6.3.6 General network architectures
- 6.3.7 Tensors
- 6.4. Error Functions
- 6.4.1 Regression
- 6.4.2 Binary classification
- 6.4.3 multiclass classification
- 6.5. Mixture Density Networks
- 6.5.1 Robot kinematics example
- 6.5.2 Conditional mixture distributions
- 6.5.3 Gradient optimization
- 6.5.4 Predictive distribution
- Exercises
- 7 Gradient Descent
- 7.1. Error Surfaces
- 7.1.1 Local quadratic approximation
- 7.2. Gradient Descent Optimization
- 7.2.1 Use of gradient information
- 7.2.2 Batch gradient descent
- 7.2.3 Stochastic gradient descent
- 7.2.4 Mini-batches
- 7.2.5 Parameter initialization
- 7.3. Convergence
- 7.3.1 Momentum
- 7.3.2 Learning rate schedule
- 7.3.3 RMSProp and Adam
- 7.4. Normalization
- 7.4.1 Data normalization
- 7.4.2 Batch normalization
- 7.4.3 Layer normalization
- Exercises
- 8 Backpropagation
- 8.1. Evaluation of Gradients
- 8.1.1 Single-layer networks
- 8.1.2 General feed-forward networks
- 8.1.3 A simple example
- 8.1.4 Numerical differentiation
- 8.1.5 The Jacobian matrix
- 8.1.6 The Hessian matrix
- 8.2. Automatic Differentiation
- 8.2.1 Forward-mode automatic differentiation
- 8.2.2 Reverse-mode automatic differentiation
- Exercises
- 9 Regularization
- 9.1. Inductive Bias
- 9.1.1 Inverse problems
- 9.1.2 No free lunch theorem
- 9.1.3 Symmetry and invariance
- 9.1.4 Equivariance
- 9.2. Weight Decay
- 9.2.1 Consistent regularizers
- 9.2.2 Generalized weight decay
- 9.3. Learning Curves
- 9.3.1 Early stopping
- 9.3.2 Double descent
- 9.4. Parameter Sharing
- 9.4.1 Soft weight sharing
- 9.5. Residual Connections
- 9.6. Model Averaging.
- 9.6.1 Dropout
- Exercises
- 10 Convolutional Networks
- 10.1. Computer Vision
- 10.1.1 Image data
- 10.2. Convolutional Filters
- 10.2.1 Feature detectors
- 10.2.2 Translation equivariance
- 10.2.3 Padding
- 10.2.4 Strided convolutions
- 10.2.5 Multi-dimensional convolutions
- 10.2.6 Pooling
- 10.2.7 Multilayer convolutions
- 10.2.8 Example network architectures
- 10.3. Visualizing Trained CNNs
- 10.3.1 Visual cortex
- 10.3.2 Visualizing trained filters
- 10.3.3 Saliency maps
- 10.3.4 Adversarial attacks
- 10.3.5 Synthetic images
- 10.4. Object Detection
- 10.4.1 Bounding boxes
- 10.4.2 Intersection-over-union
- 10.4.3 Sliding windows
- 10.4.4 Detection across scales
- 10.4.5 Non-max suppression
- 10.4.6 Fast region CNNs
- 10.5. Image Segmentation
- 10.5.1 Convolutional segmentation
- 10.5.2 Up-sampling
- 10.5.3 Fully convolutional networks
- 10.5.4 The U-net architecture
- 10.6. Style Transfer
- Exercises
- 11 Structured Distributions
- 11.1. Graphical Models
- 11.1.1 Directed graphs
- 11.1.2 Factorization
- 11.1.3 Discrete variables
- 11.1.4 Gaussian variables
- 11.1.5 Binary classifier
- 11.1.6 Parameters and observations
- 11.1.7 Bayes' theorem
- 11.2. Conditional Independence
- 11.2.1 Three example graphs
- 11.2.2 Explaining away
- 11.2.3 D-separation
- 11.2.4 Naive Bayes
- 11.2.5 Generative models
- 11.2.6 Markov blanket
- 11.2.7 Graphs as filters
- 11.3. Sequence Models
- 11.3.1 Hidden variables
- Exercises
- 12 Transformers
- 12.1. Attention
- 12.1.1 Transformer processing
- 12.1.2 Attention coefficients
- 12.1.3 Self-attention
- 12.1.4 Network parameters
- 12.1.5 Scaled self-attention
- 12.1.6 Multi-head attention
- 12.1.7 Transformer layers
- 12.1.8 Computational complexity
- 12.1.9 Positional encoding
- 12.2. Natural Language
- 12.2.1 Word embedding
- 12.2.2 Tokenization.
- 12.2.3 Bag of words
- 12.2.4 Autoregressive models
- 12.2.5 Recurrent neural networks
- 12.2.6 Backpropagation through time
- 12.3. Transformer Language Models
- 12.3.1 Decoder transformers
- 12.3.2 Sampling strategies
- 12.3.3 Encoder transformers
- 12.3.4 Sequence-to-sequence transformers
- 12.3.5 Large language models
- 12.4. Multimodal Transformers
- 12.4.1 Vision transformers
- 12.4.2 Generative image transformers
- 12.4.3 Audio data
- 12.4.4 Text-to-speech
- 12.4.5 Vision and language transformers
- Exercises
- 13 Graph Neural Networks
- 13.1. Machine Learning on Graphs
- 13.1.1 Graph properties
- 13.1.2 Adjacency matrix
- 13.1.3 Permutation equivariance
- 13.2. Neural Message-Passing
- 13.2.1 Convolutional filters
- 13.2.2 Graph convolutional networks
- 13.2.3 Aggregation operators
- 13.2.4 Update operators
- 13.2.5 Node classification
- 13.2.6 Edge classification
- 13.2.7 Graph classification
- 13.3. General Graph Networks
- 13.3.1 Graph attention networks
- 13.3.2 Edge embeddings
- 13.3.3 Graph embeddings
- 13.3.4 Over-smoothing
- 13.3.5 Regularization
- 13.3.6 Geometric deep learning
- Exercises
- 14 Sampling
- 14.1. Basic Sampling Algorithms
- 14.1.1 Expectations
- 14.1.2 Standard distributions
- 14.1.3 Rejection sampling
- 14.1.4 Adaptive rejection sampling
- 14.1.5 Importance sampling
- 14.1.6 Sampling-importance-resampling
- 14.2. Markov Chain Monte Carlo
- 14.2.1 The Metropolis algorithm
- 14.2.2 Markov chains
- 14.2.3 The Metropolis-Hastings algorithm
- 14.2.4 Gibbs sampling
- 14.2.5 Ancestral sampling
- 14.3. Langevin Sampling
- 14.3.1 Energy-based models
- 14.3.2 Maximizing the likelihood
- 14.3.3 Langevin dynamics
- Exercises
- 15 Discrete Latent Variables
- 15.1. K-means Clustering
- 15.1.1 Image segmentation
- 15.2. Mixtures of Gaussians
- 15.2.1 Likelihood function.
- 15.2.2 Maximum likelihood.