Yüklüyor…

Deep Learning : Foundations and Concepts.

Detaylı Bibliyografya
Yazar: Bishop, Christopher M.
Diğer Yazarlar: Bishop, Hugh
Materyal Türü: e-Kitap
Dil:İngilizce
Baskı/Yayın Bilgisi: Cham : Springer International Publishing AG, 2023.
Edisyon:1st ed.
Konular:
Online Erişim:Full-text access
İçindekiler:
  • Intro
  • Preface
  • Goals of the book
  • Responsible use of technology
  • Structure of the book
  • References
  • Exercises
  • Mathematical notation
  • Acknowledgements
  • Contents
  • 1 The Deep Learning Revolution
  • 1.1. The Impact of Deep Learning
  • 1.1.1 Medical diagnosis
  • 1.1.2 Protein structure
  • 1.1.3 Image synthesis
  • 1.1.4 Large language models
  • 1.2. A Tutorial Example
  • 1.2.1 Synthetic data
  • 1.2.2 Linear models
  • 1.2.3 Error function
  • 1.2.4 Model complexity
  • 1.2.5 Regularization
  • 1.2.6 Model selection
  • 1.3. A Brief History of Machine Learning
  • 1.3.1 Single-layer networks
  • 1.3.2 Backpropagation
  • 1.3.3 Deep networks
  • 2 Probabilities
  • 2.1. The Rules of Probability
  • 2.1.1 A medical screening example
  • 2.1.2 The sum and product rules
  • 2.1.3 Bayes' theorem
  • 2.1.4 Medical screening revisited
  • 2.1.5 Prior and posterior probabilities
  • 2.1.6 Independent variables
  • 2.2. Probability Densities
  • 2.2.1 Example distributions
  • 2.2.2 Expectations and covariances
  • 2.3. The Gaussian Distribution
  • 2.3.1 Mean and variance
  • 2.3.2 Likelihood function
  • 2.3.3 Bias of maximum likelihood
  • 2.3.4 Linear regression
  • 2.4. Transformation of Densities
  • 2.4.1 Multivariate distributions
  • 2.5. Information Theory
  • 2.5.1 Entropy
  • 2.5.2 Physics perspective
  • 2.5.3 Differential entropy
  • 2.5.4 Maximum entropy
  • 2.5.5 Kullback-Leibler divergence
  • 2.5.6 Conditional entropy
  • 2.5.7 Mutual information
  • 2.6. Bayesian Probabilities
  • 2.6.1 Model parameters
  • 2.6.2 Regularization
  • 2.6.3 Bayesian machine learning
  • Exercises
  • 3 Standard Distributions
  • 3.1. Discrete Variables
  • 3.1.1 Bernoulli distribution
  • 3.1.2 Binomial distribution
  • 3.1.3 Multinomial distribution
  • 3.2. The Multivariate Gaussian
  • 3.2.1 Geometry of the Gaussian
  • 3.2.2 Moments
  • 3.2.3 Limitations.
  • 3.2.4 Conditional distribution
  • 3.2.5 Marginal distribution
  • 3.2.6 Bayes' theorem
  • 3.2.7 Maximum likelihood
  • 3.2.8 Sequential estimation
  • 3.2.9 Mixtures of Gaussians
  • 3.3. Periodic Variables
  • 3.3.1 Von Mises distribution
  • 3.4. The Exponential Family
  • 3.4.1 Sufficient statistics
  • 3.5. Nonparametric Methods
  • 3.5.1 Histograms
  • 3.5.2 Kernel densities
  • 3.5.3 Nearest-neighbours
  • Exercises
  • 4 Single-layer Networks: Regression
  • 4.1. Linear Regression
  • 4.1.1 Basis functions
  • 4.1.2 Likelihood function
  • 4.1.3 Maximum likelihood
  • 4.1.4 Geometry of least squares
  • 4.1.5 Sequential learning
  • 4.1.6 Regularized least squares
  • 4.1.7 Multiple outputs
  • 4.2. Decision theory
  • 4.3. The Bias-Variance Trade-off
  • Exercises
  • 5 Single-layer Networks: Classification
  • 5.1. Discriminant Functions
  • 5.1.1 Two classes
  • 5.1.2 Multiple classes
  • 5.1.3 1-of-K coding
  • 5.1.4 Least squares for classification
  • 5.2. Decision Theory
  • 5.2.1 Misclassification rate
  • 5.2.2 Expected loss
  • 5.2.3 The reject option
  • 5.2.4 Inference and decision
  • 5.2.5 Classifier accuracy
  • 5.2.6 ROC curve
  • 5.3. Generative Classifiers
  • 5.3.1 Continuous inputs
  • 5.3.2 Maximum likelihood solution
  • 5.3.3 Discrete features
  • 5.3.4 Exponential family
  • 5.4. Discriminative Classifiers
  • 5.4.1 Activation functions
  • 5.4.2 Fixed basis functions
  • 5.4.3 Logistic regression
  • 5.4.4 Multi-class logistic regression
  • 5.4.5 Probit regression
  • 5.4.6 Canonical link functions
  • Exercises
  • 6 Deep Neural Networks
  • 6.1. Limitations of Fixed Basis Functions
  • 6.1.1 The curse of dimensionality
  • 6.1.2 High-dimensional spaces
  • 6.1.3 Data manifolds
  • 6.1.4 Data-dependent basis functions
  • 6.2. Multilayer Networks
  • 6.2.1 Parameter matrices
  • 6.2.2 Universal approximation
  • 6.2.3 Hidden unit activation functions.
  • 6.2.4 Weight-space symmetries
  • 6.3. Deep Networks
  • 6.3.1 Hierarchical representations
  • 6.3.2 Distributed representations
  • 6.3.3 Representation learning
  • 6.3.4 Transfer learning
  • 6.3.5 Contrastive learning
  • 6.3.6 General network architectures
  • 6.3.7 Tensors
  • 6.4. Error Functions
  • 6.4.1 Regression
  • 6.4.2 Binary classification
  • 6.4.3 multiclass classification
  • 6.5. Mixture Density Networks
  • 6.5.1 Robot kinematics example
  • 6.5.2 Conditional mixture distributions
  • 6.5.3 Gradient optimization
  • 6.5.4 Predictive distribution
  • Exercises
  • 7 Gradient Descent
  • 7.1. Error Surfaces
  • 7.1.1 Local quadratic approximation
  • 7.2. Gradient Descent Optimization
  • 7.2.1 Use of gradient information
  • 7.2.2 Batch gradient descent
  • 7.2.3 Stochastic gradient descent
  • 7.2.4 Mini-batches
  • 7.2.5 Parameter initialization
  • 7.3. Convergence
  • 7.3.1 Momentum
  • 7.3.2 Learning rate schedule
  • 7.3.3 RMSProp and Adam
  • 7.4. Normalization
  • 7.4.1 Data normalization
  • 7.4.2 Batch normalization
  • 7.4.3 Layer normalization
  • Exercises
  • 8 Backpropagation
  • 8.1. Evaluation of Gradients
  • 8.1.1 Single-layer networks
  • 8.1.2 General feed-forward networks
  • 8.1.3 A simple example
  • 8.1.4 Numerical differentiation
  • 8.1.5 The Jacobian matrix
  • 8.1.6 The Hessian matrix
  • 8.2. Automatic Differentiation
  • 8.2.1 Forward-mode automatic differentiation
  • 8.2.2 Reverse-mode automatic differentiation
  • Exercises
  • 9 Regularization
  • 9.1. Inductive Bias
  • 9.1.1 Inverse problems
  • 9.1.2 No free lunch theorem
  • 9.1.3 Symmetry and invariance
  • 9.1.4 Equivariance
  • 9.2. Weight Decay
  • 9.2.1 Consistent regularizers
  • 9.2.2 Generalized weight decay
  • 9.3. Learning Curves
  • 9.3.1 Early stopping
  • 9.3.2 Double descent
  • 9.4. Parameter Sharing
  • 9.4.1 Soft weight sharing
  • 9.5. Residual Connections
  • 9.6. Model Averaging.
  • 9.6.1 Dropout
  • Exercises
  • 10 Convolutional Networks
  • 10.1. Computer Vision
  • 10.1.1 Image data
  • 10.2. Convolutional Filters
  • 10.2.1 Feature detectors
  • 10.2.2 Translation equivariance
  • 10.2.3 Padding
  • 10.2.4 Strided convolutions
  • 10.2.5 Multi-dimensional convolutions
  • 10.2.6 Pooling
  • 10.2.7 Multilayer convolutions
  • 10.2.8 Example network architectures
  • 10.3. Visualizing Trained CNNs
  • 10.3.1 Visual cortex
  • 10.3.2 Visualizing trained filters
  • 10.3.3 Saliency maps
  • 10.3.4 Adversarial attacks
  • 10.3.5 Synthetic images
  • 10.4. Object Detection
  • 10.4.1 Bounding boxes
  • 10.4.2 Intersection-over-union
  • 10.4.3 Sliding windows
  • 10.4.4 Detection across scales
  • 10.4.5 Non-max suppression
  • 10.4.6 Fast region CNNs
  • 10.5. Image Segmentation
  • 10.5.1 Convolutional segmentation
  • 10.5.2 Up-sampling
  • 10.5.3 Fully convolutional networks
  • 10.5.4 The U-net architecture
  • 10.6. Style Transfer
  • Exercises
  • 11 Structured Distributions
  • 11.1. Graphical Models
  • 11.1.1 Directed graphs
  • 11.1.2 Factorization
  • 11.1.3 Discrete variables
  • 11.1.4 Gaussian variables
  • 11.1.5 Binary classifier
  • 11.1.6 Parameters and observations
  • 11.1.7 Bayes' theorem
  • 11.2. Conditional Independence
  • 11.2.1 Three example graphs
  • 11.2.2 Explaining away
  • 11.2.3 D-separation
  • 11.2.4 Naive Bayes
  • 11.2.5 Generative models
  • 11.2.6 Markov blanket
  • 11.2.7 Graphs as filters
  • 11.3. Sequence Models
  • 11.3.1 Hidden variables
  • Exercises
  • 12 Transformers
  • 12.1. Attention
  • 12.1.1 Transformer processing
  • 12.1.2 Attention coefficients
  • 12.1.3 Self-attention
  • 12.1.4 Network parameters
  • 12.1.5 Scaled self-attention
  • 12.1.6 Multi-head attention
  • 12.1.7 Transformer layers
  • 12.1.8 Computational complexity
  • 12.1.9 Positional encoding
  • 12.2. Natural Language
  • 12.2.1 Word embedding
  • 12.2.2 Tokenization.
  • 12.2.3 Bag of words
  • 12.2.4 Autoregressive models
  • 12.2.5 Recurrent neural networks
  • 12.2.6 Backpropagation through time
  • 12.3. Transformer Language Models
  • 12.3.1 Decoder transformers
  • 12.3.2 Sampling strategies
  • 12.3.3 Encoder transformers
  • 12.3.4 Sequence-to-sequence transformers
  • 12.3.5 Large language models
  • 12.4. Multimodal Transformers
  • 12.4.1 Vision transformers
  • 12.4.2 Generative image transformers
  • 12.4.3 Audio data
  • 12.4.4 Text-to-speech
  • 12.4.5 Vision and language transformers
  • Exercises
  • 13 Graph Neural Networks
  • 13.1. Machine Learning on Graphs
  • 13.1.1 Graph properties
  • 13.1.2 Adjacency matrix
  • 13.1.3 Permutation equivariance
  • 13.2. Neural Message-Passing
  • 13.2.1 Convolutional filters
  • 13.2.2 Graph convolutional networks
  • 13.2.3 Aggregation operators
  • 13.2.4 Update operators
  • 13.2.5 Node classification
  • 13.2.6 Edge classification
  • 13.2.7 Graph classification
  • 13.3. General Graph Networks
  • 13.3.1 Graph attention networks
  • 13.3.2 Edge embeddings
  • 13.3.3 Graph embeddings
  • 13.3.4 Over-smoothing
  • 13.3.5 Regularization
  • 13.3.6 Geometric deep learning
  • Exercises
  • 14 Sampling
  • 14.1. Basic Sampling Algorithms
  • 14.1.1 Expectations
  • 14.1.2 Standard distributions
  • 14.1.3 Rejection sampling
  • 14.1.4 Adaptive rejection sampling
  • 14.1.5 Importance sampling
  • 14.1.6 Sampling-importance-resampling
  • 14.2. Markov Chain Monte Carlo
  • 14.2.1 The Metropolis algorithm
  • 14.2.2 Markov chains
  • 14.2.3 The Metropolis-Hastings algorithm
  • 14.2.4 Gibbs sampling
  • 14.2.5 Ancestral sampling
  • 14.3. Langevin Sampling
  • 14.3.1 Energy-based models
  • 14.3.2 Maximizing the likelihood
  • 14.3.3 Langevin dynamics
  • Exercises
  • 15 Discrete Latent Variables
  • 15.1. K-means Clustering
  • 15.1.1 Image segmentation
  • 15.2. Mixtures of Gaussians
  • 15.2.1 Likelihood function.
  • 15.2.2 Maximum likelihood.