Menu Expand
Computer Vision: A Modern Approach

Computer Vision: A Modern Approach

David A. Forsyth | Jean Ponce

(2015)

Additional Information

Book Details

Abstract

Appropriate for upper-division undergraduate- and graduate-level courses in computer vision found in departments of Computer Science, Computer Engineering and Electrical Engineering.


This textbook provides the most complete treatment of modern computer vision methods by two of the leading authorities in the field. This accessible presentation gives both a general view of the entire computer vision enterprise and also offers sufficient detail for students to be able to build useful applications. Students will learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods.



Table of Contents

Section Title Page Action Price
Cover Cover
Title Page Title
Contents\r 5
I IMAGE FORMATION\r 31
1 Geometric Camera Models 33
1.1 IMAGE FORMATION 34
1.1.1 Pinhole Perspective 34
1.1.2 Weak Perspective 36
1.1.3 Cameras with Lenses 38
1.1.4 The Human Eye 42
1.2 INTRINSIC AND EXTRINSIC PARAMETERS 44
1.2.1 Rigid Transformations and Homogeneous Coordinates 44
1.2.2 Intrinsic Parameters 46
1.2.3 Extrinsic Parameters 48
1.2.4 Perspective Projection Matrices 49
1.2.5 Weak-Perspective Projection Matrices 50
1.3 GEOMETRIC CAMERA CALIBRATION 52
1.3.1 ALinear Approach to Camera Calibration 53
1.3.2 ANonlinear Approach to Camera Calibration 57
1.4 NOTES 59
2 \rLight and Shading 62
2.1 MODELLING PIXEL BRIGHTNESS 62
2.1.1 Reflection at Surfaces 63
2.1.2 Sources and Their Effects 64
2.1.3 The Lambertian+Specular Model 66
2.1.4 Area Sources 66
2.2 INFERENCE FROM SHADING 67
2.2.1 Radiometric Calibration and High Dynamic Range Images 68
2.2.2 The Shape of Specularities 70
2.2.3 Inferring Lightness and Illumination 73
2.2.4 Photometric Stereo: Shape from Multiple Shaded Images 76
2.3 MODELLING INTERREFLECTION 82
2.3.1 The Illumination at a Patch Due to an Area Source 82
2.3.2 Radiosity and Exitance 84
2.3.3 An Interreflection Model 85
2.3.4 Qualitative Properties of Interreflections 86
2.4 SHAPE FROM ONE SHADED IMAGE 89
2.5 NOTES 91
3 \rColor 98
3.1 HUMAN COLOR PERCEPTION 98
3.1.1 Color Matching 98
3.1.2 Color Receptors 101
3.2 THE PHYSICS OF COLOR 103
3.2.1 The Color of Light Sources 103
3.2.2 The Color of Surfaces 106
3.3 REPRESENTING COLOR 107
3.3.1 Linear Color Spaces 107
3.3.2 Non-linear Color Spaces 113
3.4 A \rMODEL OF IMAGE COLOR 116
3.4.1 The Diffuse Term 118
3.4.2 The Specular Term 120
3.5 INFERENCE FROM COLOR 120
3.5.1 Finding Specularities Using Color 120
3.5.2 Shadow Removal Using Color 122
3.5.3 Color Constancy: Surface Color from Image Color 125
3.6 NOTES 129
II EARLY VISION: JUST ONE IMAGE\r 135
4 \rLinear Filters 137
4.1 LINEAR FILTERS AND CONVOLUTION 137
4.1.1 Convolution 137
4.2 SHIFT INVARIANT LINEAR SYSTEMS 142
4.2.1 Discrete Convolution 143
4.2.2 Continuous Convolution 145
4.2.3 Edge Effects in Discrete Convolutions 148
4.3 SPATIAL FREQUENCY AND FOURIER TRANSFORMS 148
4.3.1 Fourier Transforms 149
4.4 SAMPLING AND ALIASING 151
4.4.1 Sampling 152
4.4.2 Aliasing 155
4.4.3 Smoothing and Resampling 156
4.5 FILTERS AS TEMPLATES 161
4.5.1 Convolution as a Dot Product 161
4.5.2 Changing Basis 162
4.6 TECHNIQUE: NORMALIZED CORRELATION AND FINDING PATTERNS 162
4.6.1 Controlling the Television by Finding Hands by Normalized Correlation\r 163
4.7 TECHNIQUE: SCALE AND IMAGE PYRAMIDS 164
4.7.1 The Gaussian Pyramid 165
4.7.2 Applications of Scaled Representations 166
4.8 NOTES 167
5 \rLocal Image Features 171
5.1 COMPUTING THE IMAGE GRADIENT 171
5.1.1 Derivative of Gaussian Filters 172
5.2 REPRESENTING THE IMAGE GRADIENT 174
5.2.1 Gradient-Based Edge Detectors 175
5.2.2 Orientations 177
5.3 FINDING CORNERS AND BUILDING NEIGHBORHOODS 178
5.3.1 Finding Corners 179
5.3.2 Using Scale and Orientation to Build a Neighborhood 181
5.4 DESCRIBING NEIGHBORHOODS WITH SIFT AND HOG FEATURES 185
5.4.1 SIFT Features 187
5.4.2 HOG Features 189
5.5 COMPUTING LOCAL FEATURES IN PRACTICE 190
5.6 NOTES 190
6 \rTexture 194
6.1 LOCAL TEXTURE REPRESENTATIONS USING FILTERS 196
6.1.1 Spots and Bars 197
6.1.2 From Filter Outputs to Texture Representation 198
6.1.3 Local Texture Representations in Practice 200
6.2 POOLED TEXTURE REPRESENTATIONS BY DISCOVERING TEXTONS 201
6.2.1 Vector Quantization and Textons 202
6.2.2 K-means Clustering for Vector Quantization 202
6.3 SYNTHESIZING TEXTURES AND FILLING HOLES IN IMAGES 206
6.3.1 Synthesis by Sampling Local Models 206
6.3.2 Filling in Holes in Images 209
6.4 IMAGE DENOISING 212
6.4.1 Non-local Means 213
6.4.2 Block Matching 3D (BM3D) 213
6.4.3 Learned Sparse Coding 214
6.4.4 Results 216
6.5 SHAPE FROM TEXTURE 217
6.5.1 Shape from Texture for Planes 217
6.5.2 Shape from Texture for Curved Surfaces 220
6.6 NOTES 221
III \rEARLY VISION: MULTIPLEIMAGES 225
7 \rStereopsis 227
7.1 BINOCULAR CAMERA GEOMETRY AND THE EPIPOLAR CONSTRAINT 228
7.1.1 Epipolar Geometry 228
7.1.2 The Essential Matrix 230
7.1.3 The Fundamental Matrix 231
7.2 BINOCULAR RECONSTRUCTION 231
7.2.1 Image Rectification 232
7.3 HUMAN STEREOPSIS 233
7.4 LOCAL METHODS FOR BINOCULAR FUSION 235
7.4.1 Correlation 235
7.4.2 Multi-Scale Edge Matching 237
7.5 GLOBAL METHODS FOR BINOCULAR FUSION 240
7.5.1 Ordering Constraints and Dynamic Programming 240
7.5.2 Smoothness Constraints and Combinatorial Optimization over Graphs 241
7.6 USING MORE CAMERAS 244
7.7 APPLICATION: ROBOT NAVIGATION 245
7.8 NOTES 246
8 \rStructure from Motion 251
8.1 INTERNALLY CALIBRATED PERSPECTIVE CAMERAS 251
8.1.1 Natural Ambiguity of the Problem 253
8.1.2 Euclidean Structure and Motion from Two Images 254
8.1.3 Euclidean Structure and Motion from Multiple Images 258
8.2 UNCALIBRATED WEAK-PERSPECTIVE CAMERAS 260
8.2.1 Natural Ambiguity of the Problem 261
8.2.2 Affine Structure and Motion from Two Images 263
8.2.3 Affine Structure and Motion from Multiple Images 267
8.2.4 From Affine to Euclidean Shape 268
8.3 UNCALIBRATED PERSPECTIVE CAMERAS 270
8.3.1 Natural Ambiguity of the Problem 271
8.3.2 Projective Structure and Motion from Two Images 272
8.3.3 Projective Structure and Motion from Multiple Images 274
8.3.4 From Projective to Euclidean Shape 276
8.4 NOTES 278
IV \rMID-LEVEL VISION 283
9 \rSegmentation by Clustering 285
9.1 HUMAN VISION: GROUPING AND GESTALT 286
9.2 IMPORTANT APPLICATIONS 291
9.2.1 Background Subtraction 291
9.2.2 Shot Boundary Detection 294
9.2.3 Interactive Segmentation 295
9.2.4 Forming Image Regions 296
9.3 IMAGE SEGMENTATION BY CLUSTERING PIXELS 298
9.3.1 Basic Clustering Methods 299
9.3.2 The Watershed Algorithm 301
9.3.3 Segmentation Using K-means 302
9.3.4 Mean Shift: Finding Local Modes in Data 303
9.3.5 Clustering and Segmentation with Mean Shift 305
9.4 SEGMENTATION, CLUSTERING, AND GRAPHS 307
9.4.1 Terminology and Facts for Graphs 307
9.4.2 Agglomerative Clustering with a Graph 309
9.4.3 Divisive Clustering with a Graph 311
9.4.4 Normalized Cuts 314
9.5 IMAGE SEGMENTATION IN PRACTICE 315
9.5.1 Evaluating Segmenters 316
9.6 NOTES 317
10 \rGrouping and Model Fitting 320
10.1 THE HOUGH TRANSFORM 320
10.1.1 Fitting Lines with the Hough Transform 320
10.1.2 Using the Hough Transform 322
10.2 FITTING LINES AND PLANES 323
10.2.1 Fitting a Single Line 324
10.2.2 Fitting Planes 325
10.2.3 Fitting Multiple Lines 326
10.3 FITTING CURVED STRUCTURES 327
10.4 Robustness 329
10.4.1 M-Estimators 330
10.4.2 RANSAC: Searching for Good Points 332
10.5 FITTING USING PROBABILISTIC MODELS 336
10.5.1 Missing Data Problems 337
10.5.2 Mixture Models and Hidden Variables 339
10.5.3 The EM Algorithm for Mixture Models 340
10.5.4 Difficulties with the EM Algorithm 342
10.6 MOTION SEGMENTATION BY PARAMETER ESTIMATION 343
10.6.1 Optical Flow and Motion 345
10.6.2 Flow Models 346
10.6.3 Motion Segmentation with Layers 347
10.7 MODEL SELECTION: WHICH MODEL IS THE BEST FIT? 349
10.7.1 Model Selection Using Cross-Validation 352
10.8 NOTES 352
11 \rTracking 356
11.1 SIMPLE TRACKING STRATEGIES 357
11.1.1 Tracking by Detection 357
11.1.2 Tracking Translations by Matching 360
11.1.3 Using Affine Transformations to Confirm a Match 362
11.2 TRACKING USING MATCHING 364
11.2.1 Matching Summary Representations 365
11.2.2 Tracking Using Flow 367
11.3 TRACKING LINEAR DYNAMICAL MODELS WITH KALMAN FILTERS 369
11.3.1 Linear Measurements and Linear Dynamics 370
11.3.2 The Kalman Filter 374
11.3.3 Forward-backward Smoothing 375
11.4 DATA ASSOCIATION 379
11.4.1 Linking Kalman Filters with Detection Methods 379
11.4.2 Key Methods of Data Association 380
11.5 PARTICLE FILTERING 380
11.5.1 Sampled Representations of Probability Distributions 381
11.5.2 The Simplest Particle Filter 385
11.5.3 The Tracking Algorithm 386
11.5.4 A \rWorkable Particle Filter 388
11.5.5 \rPractical Issues in Building Particle Filters 390
11.6 NOTES 392
V\r HIGH-LEVEL VISION 395
12 \rRegistration 397
12.1 REGISTERING RIGID OBJECTS 398
12.1.1 Iterated Closest Points 398
12.1.2 Searching for Transformations via Correspondences 399
12.1.3 Application: Building Image Mosaics 400
12.2 MODEL-BASED VISION: REGISTERING RIGID OBJECTS WITH PROJECTION 405
12.2.1 Verification: Comparing Transformed and Rendered Source to Target\r 407
12.3 REGISTERING DEFORMABLE OBJECTS 408
12.3.1 Deforming Texture with Active Appearance Models 408
12.3.2 Active Appearance Models in Practice 411
12.3.3 Application: Registration in Medical Imaging Systems 413
12.4 NOTES 418
13 \rSmooth Surfaces and Their Outlines 421
13.1 ELEMENTS OF DIFFERENTIAL GEOMETRY 423
13.1.1 Curves 423
13.1.2 Surfaces 427
13.2 CONTOUR GEOMETRY 432
13.2.1 The Occluding Contour and the Image Contour 432
13.2.2 The Cusps and Inflections of the Image Contour 433
13.2.3 Koenderink’s Theorem 434
13.3 VISUAL EVENTS: MORE DIFFERENTIAL GEOMETRY 437
13.3.1 The Geometry of the Gauss Map 437
13.3.2 Asymptotic Curves 439
13.3.3 The Asymptotic Spherical Map 440
13.3.4 Local Visual Events 442
13.3.5 The Bitangent Ray Manifold 443
13.3.6 Multilocal Visual Events 444
13.3.7 The Aspect Graph 446
13.4 NOTES 447
14 \rRange Data 452
14.1 ACTIVE RANGE SENSORS 452
14.2 RANGE DATA SEGMENTATION 454
14.2.1 Elements of Analytical Differential Geometry 454
14.2.2 Finding Step and Roof Edges in Range Images 456
14.2.3 Segmenting Range Images into Planar Regions 461
14.3 RANGE IMAGE REGISTRATION AND MODEL ACQUISITION 462
14.3.1 Quaternions 463
14.3.2 Registering Range Images 464
14.3.3 Fusing Multiple Range Images 466
14.4 OBJECT RECOGNITION 468
14.4.1 Matching Using Interpretation Trees \r 468
14.4.2 Matching Free-Form Surfaces Using Spin Images 471
14.5 KINECT 476
14.5.1 Features 477
14.5.2 Technique: Decision Trees and Random Forests 478
14.5.3 Labeling Pixels 480
14.5.4 Computing Joint Positions 483
14.6 NOTES 483
15 \rLearning to Classify 487
15.1 CLASSIFICATION, ERROR, AND LOSS 487
15.1.1 Using Loss to Determine Decisions 487
15.1.2 Training Error, Test Error, and Overfitting 489
15.1.3 Regularization 490
15.1.4 Error Rate and Cross-Validation 493
15.1.5 Receiver Operating Curves 495
15.2 MAJOR CLASSIFICATION STRATEGIES 497
15.2.1 Example: Mahalanobis Distance\r 497
15.2.2 Example: Class-Conditional Histograms and Naive Bayes 498
15.2.3 Example: Classification Using Nearest Neighbors \r 499
15.2.4 Example: The Linear Support Vector Machine 500
15.2.5 Example: Kernel Machines 503
15.2.6 Example: Boosting and Adaboost 505
15.3 PRACTICAL METHODS FOR BUILDING CLASSIFIERS 505
15.3.1 Manipulating Training Data to Improve Performance 507
15.3.2 Building Multi-Class Classifiers Out of Binary Classifiers 509
15.3.3 Solving for SVMS and Kernel Machines 510
15.4 NOTES 511
16 classifying Images\r 512
16.1 BUILDING GOOD IMAGE FEATURES 512
16.1.1 Example Applications 512
16.1.2 Encoding Layout with GIST Features 515
16.1.3 Summarizing Images with Visual Words 517
16.1.4 The Spatial Pyramid Kernel 519
16.1.5 Dimension Reduction with Principal Components 523
16.1.6 Dimension Reduction with Canonical Variates 524
16.1.7 Example Application: Identifying Explicit Images 528
16.1.8 Example Application: Classifying Materials 532
16.1.9 Example Application: Classifying Scenes 532
16.2 CLASSIFYING IMAGES OF SINGLE OBJECTS 534
16.2.1 Image Classification Strategies 535
16.2.2 Evaluating Image Classification Systems 535
16.2.3 Fixed Sets of Classes 538
16.2.4 Large Numbers of Classes 539
16.2.5 Flowers, Leaves, and Birds: Some Specialized Problems 541
16.3 IMAGE CLASSIFICATION IN PRACTICE 542
16.3.1 Codes for Image Features 543
16.3.2 Image Classification Datasets 543
16.3.3 Dataset Bias 545
16.3.4 Crowdsourcing Dataset Collection 545
16.4 NOTES 547
17 \rDetecting Objects in Images 549
17.1 THE SLIDING WINDOW METHOD 549
17.1.1 Face Detection 550
17.1.2 Detecting Humans 555
17.1.3 Detecting Boundaries 557
17.2 DETECTING DEFORMABLE OBJECTS 560
17.3 THE STATE OF THE ART OF OBJECT DETECTION 565
17.3.1 Datasets and Resources 568
17.4 NOTES 569
18 \rTopics in Object Recognition 570
18.1 WHAT SHOULD OBJECT RECOGNITION DO? 570
18.1.1 What Should an Object Recognition System Do? 570
18.1.2 Current Strategies for Object Recognition 572
18.1.3 What Is Categorization? 572
18.1.4 Selection: What Should Be Described? 574
18.2 FEATURE QUESTIONS 574
18.2.1 Improving Current Image Features 574
18.2.2 Other Kinds of Image Feature 576
18.3 Geometric Questions 577
18.4 SEMANTIC QUESTIONS 579
18.4.1 Attributes and the Unfamiliar 580
18.4.2 Parts, Poselets and Consistency 581
18.4.3 Chunks of Meaning\r 584
VI APPLICATIONS AND TOPICS\r 587
19 \rImage-Based Modeling andRendering 589
19.1 VISUAL HULLS 589
19.1.1 Main Elements of the Visual Hull Model 591
19.1.2 Tracing Intersection Curves 593
19.1.3 Clipping Intersection Curves 596
19.1.4 Triangulating Cone Strips 597
19.1.5 Results 598
19.1.6 Going Further: Carved Visual Hulls 602
19.2 PATCH-BASED MULTI-VIEW STEREOPSIS 603
19.2.1 Main Elements of the PMVS Model 605
19.2.2 Initial Feature Matching 608
19.2.3 Expansion 609
19.2.4 Filtering 610
19.2.5 Results 611
19.3 THE LIGHT FIELD 614
19.4 NOTES 617
20 \rLooking at People 620
20.1 HMM’S, DYNAMIC PROGRAMMING, AND TREE-STRUCTURED MODELS 620
20.1.1 Hidden Markov Models 620
20.1.2 Inference for an HMM 622
20.1.3 Fitting an HMM with EM 627
20.1.4 Tree-Structured Energy Models 630
20.2 PARSING PEOPLE IN IMAGES 632
20.2.1 Parsing with Pictorial Structure Models 632
20.2.2 Estimating the Appearance of Clothing 634
20.3 TRACKING PEOPLE 636
20.3.1 Why Human Tracking Is Hard 636
20.3.2 Kinematic Tracking by Appearance 638
20.3.3 Kinematic Human Tracking Using Templates 639
20.4 3D FROM 2D: LIFTING 641
20.4.1 Reconstruction in an Orthographic View 641
20.4.2 Exploiting Appearance for Unambiguous Reconstructions 643
20.4.3 Exploiting Motion for Unambiguous Reconstructions \r 645
20.5 ACTIVITY RECOGNITION 647
20.5.1 Background: Human Motion Data 647
20.5.2 Body Configuration and Activity Recognition 651
20.5.3 Recognizing Human Activities with Appearance Features 652
20.5.4 Recognizing Human Activities with Compositional Models 654
20.6 RESOURCES 654
20.7 NOTES 656
21 \rImage Search and Retrieval 657
21.1 THE APPLICATION CONTEXT 657
21.1.1 Applications 658
21.1.2 User Needs 659
21.1.3 Types of Image Query 660
21.1.4 What Users Do with Image Collections 661
21.2 BASIC TECHNOLOGIES FROM INFORMATION RETRIEVAL 662
21.2.1 Word Counts 662
21.2.2 Smoothing Word Counts 663
21.2.3 Approximate Nearest Neighbors and Hashing 664
21.2.4 Ranking Documents 668
21.3 IMAGES AS DOCUMENTS 669
21.3.1 Matching Without Quantization 670
21.3.2 Ranking Image Search Results 671
21.3.4 Laying Out Images for Browsing 674
21.4 PREDICTING ANNOTATIONS FOR PICTURES 675
21.4.1 Annotations from Nearby Words 676
21.4.2 Annotations from the Whole Image 676
21.4.3 Predicting Correlated Words with Classifiers 678
21.4.4 Names and Faces 679
21.4.5 Generating Tags with Segments 681
21.5 THE STATE OF THE ART OF WORD PREDICTION 684
21.5.1 Resources 685
21.5.2 Comparing Methods 685
21.5.3 Open Problems 686
21.6 NOTES 689
VII BACKGROUND MATERIAL\r 691
22 \rOptimization Techniques 693
22.1 LINEAR LEAST-SQUARES METHODS 693
22.1.1 Normal Equations and the Pseudoinverse 694
22.1.2 Homogeneous Systems and Eigenvalue Problems 695
22.1.3 Generalized Eigenvalues Problems 696
22.1.4 An Example: Fitting a Line to Points in a Plane 696
22.1.5 Singular Value Decomposition 697
22.2 NONLINEAR LEAST-SQUARES METHODS 699
22.2.1 Newton’s Method: Square Systems of Nonlinear Equations. 700
22.2.2 Newton’s Method for Overconstrained Systems\r 700
22.2.3 The Gauss–Newton and Levenberg–Marquardt Algorithms 701
22.3 SPARSE CODING AND DICTIONARY LEARNING 702
22.3.1 Sparse Coding \r 702
22.3.2 Dictionary Learning 703
22.3.3 Supervised Dictionary Learning 705
22.4 MIN-CUT/MAX-FLOW PROBLEMS AND COMBINATORIAL OPTIMIZATION 705
22.4.1 Min-Cut Problems 706
22.4.2 Quadratic Pseudo-Boolean Functions 707
22.4.3 Generalization to Integer Variables 709
22.5 NOTES 712
Bibliography 714
Index 767
List of Algorithms 790