Yüklüyor…

Designing Machine Learning Systems.

Detaylı Bibliyografya
Yazar: Huyen, Chip
Materyal Türü: e-Kitap
Dil:İngilizce
Baskı/Yayın Bilgisi: Sebastopol : O'Reilly Media, Incorporated, 2022.
Edisyon:1st ed.
Konular:
Online Erişim:Full-text access
İçindekiler:
  • Cover
  • Copyright
  • Table of Contents
  • Preface
  • Who This Book Is For
  • What This Book Is Not
  • Navigating This Book
  • GitHub Repository and Community
  • Conventions Used in This Book
  • Using Code Examples
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments
  • Chapter 1. Overview of Machine Learning Systems
  • When to Use Machine Learning
  • Machine Learning Use Cases
  • Understanding Machine Learning Systems
  • Machine Learning in Research Versus in Production
  • Machine Learning Systems Versus Traditional Software
  • Summary
  • Chapter 2. Introduction to Machine Learning Systems Design
  • Business and ML Objectives
  • Requirements for ML Systems
  • Reliability
  • Scalability
  • Maintainability
  • Adaptability
  • Iterative Process
  • Framing ML Problems
  • Types of ML Tasks
  • Objective Functions
  • Mind Versus Data
  • Summary
  • Chapter 3. Data Engineering Fundamentals
  • Data Sources
  • Data Formats
  • JSON
  • Row-Major Versus Column-Major Format
  • Text Versus Binary Format
  • Data Models
  • Relational Model
  • NoSQL
  • Structured Versus Unstructured Data
  • Data Storage Engines and Processing
  • Transactional and Analytical Processing
  • ETL: Extract, Transform, and Load
  • Modes of Dataflow
  • Data Passing Through Databases
  • Data Passing Through Services
  • Data Passing Through Real-Time Transport
  • Batch Processing Versus Stream Processing
  • Summary
  • Chapter 4. Training Data
  • Sampling
  • Nonprobability Sampling
  • Simple Random Sampling
  • Stratified Sampling
  • Weighted Sampling
  • Reservoir Sampling
  • Importance Sampling
  • Labeling
  • Hand Labels
  • Natural Labels
  • Handling the Lack of Labels
  • Class Imbalance
  • Challenges of Class Imbalance
  • Handling Class Imbalance
  • Data Augmentation
  • Simple Label-Preserving Transformations
  • Perturbation
  • Data Synthesis
  • Summary.
  • Chapter 5. Feature Engineering
  • Learned Features Versus Engineered Features
  • Common Feature Engineering Operations
  • Handling Missing Values
  • Scaling
  • Discretization
  • Encoding Categorical Features
  • Feature Crossing
  • Discrete and Continuous Positional Embeddings
  • Data Leakage
  • Common Causes for Data Leakage
  • Detecting Data Leakage
  • Engineering Good Features
  • Feature Importance
  • Feature Generalization
  • Summary
  • Chapter 6. Model Development and Offline Evaluation
  • Model Development and Training
  • Evaluating ML Models
  • Ensembles
  • Experiment Tracking and Versioning
  • Distributed Training
  • AutoML
  • Model Offline Evaluation
  • Baselines
  • Evaluation Methods
  • Summary
  • Chapter 7. Model Deployment and Prediction Service
  • Machine Learning Deployment Myths
  • Myth 1: You Only Deploy One or Two ML Models at a Time
  • Myth 2: If We Don't Do Anything, Model Performance Remains the Same
  • Myth 3: You Won't Need to Update Your Models as Much
  • Myth 4: Most ML Engineers Don't Need to Worry About Scale
  • Batch Prediction Versus Online Prediction
  • From Batch Prediction to Online Prediction
  • Unifying Batch Pipeline and Streaming Pipeline
  • Model Compression
  • Low-Rank Factorization
  • Knowledge Distillation
  • Pruning
  • Quantization
  • ML on the Cloud and on the Edge
  • Compiling and Optimizing Models for Edge Devices
  • ML in Browsers
  • Summary
  • Chapter 8. Data Distribution Shifts and Monitoring
  • Causes of ML System Failures
  • Software System Failures
  • ML-Specific Failures
  • Data Distribution Shifts
  • Types of Data Distribution Shifts
  • General Data Distribution Shifts
  • Detecting Data Distribution Shifts
  • Addressing Data Distribution Shifts
  • Monitoring and Observability
  • ML-Specific Metrics
  • Monitoring Toolbox
  • Observability
  • Summary.
  • Chapter 9. Continual Learning and Test in Production
  • Continual Learning
  • Stateless Retraining Versus Stateful Training
  • Why Continual Learning?
  • Continual Learning Challenges
  • Four Stages of Continual Learning
  • How Often to Update Your Models
  • Test in Production
  • Shadow Deployment
  • A/B Testing
  • Canary Release
  • Interleaving Experiments
  • Bandits
  • Summary
  • Chapter 10. Infrastructure and Tooling for MLOps
  • Storage and Compute
  • Public Cloud Versus Private Data Centers
  • Development Environment
  • Dev Environment Setup
  • Standardizing Dev Environments
  • From Dev to Prod: Containers
  • Resource Management
  • Cron, Schedulers, and Orchestrators
  • Data Science Workflow Management
  • ML Platform
  • Model Deployment
  • Model Store
  • Feature Store
  • Build Versus Buy
  • Summary
  • Chapter 11. The Human Side of Machine Learning
  • User Experience
  • Ensuring User Experience Consistency
  • Combatting "Mostly Correct" Predictions
  • Smooth Failing
  • Team Structure
  • Cross-functional Teams Collaboration
  • End-to-End Data Scientists
  • Responsible AI
  • Irresponsible AI: Case Studies
  • A Framework for Responsible AI
  • Summary
  • Epilogue
  • Index
  • About the Author
  • Colophon.