Computer Vision: A Modern Approach

David A. Forsyth; Jean Ponce

BOOK

Computer Vision: A Modern Approach

David A. Forsyth | Jean Ponce

(2015)

Additional Information

Book Details

ISBN: 978-1-292-01408-1
Edition: 2
Language: English
Pages: 792
Subjects

Abstract

Appropriate for upper-division undergraduate- and graduate-level courses in computer vision found in departments of Computer Science, Computer Engineering and Electrical Engineering.

This textbook provides the most complete treatment of modern computer vision methods by two of the leading authorities in the field. This accessible presentation gives both a general view of the entire computer vision enterprise and also offers sufficient detail for students to be able to build useful applications. Students will learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods.

Section Title	Page	Action	Price
Cover	Cover
Title Page	Title
Contents\r	5
I IMAGE FORMATION\r	31
1 Geometric Camera Models	33
1.1 IMAGE FORMATION	34
1.1.1 Pinhole Perspective	34
1.1.2 Weak Perspective	36
1.1.3 Cameras with Lenses	38
1.1.4 The Human Eye	42
1.2 INTRINSIC AND EXTRINSIC PARAMETERS	44
1.2.1 Rigid Transformations and Homogeneous Coordinates	44
1.2.2 Intrinsic Parameters	46
1.2.3 Extrinsic Parameters	48
1.2.4 Perspective Projection Matrices	49
1.2.5 Weak-Perspective Projection Matrices	50
1.3 GEOMETRIC CAMERA CALIBRATION	52
1.3.1 ALinear Approach to Camera Calibration	53
1.3.2 ANonlinear Approach to Camera Calibration	57
1.4 NOTES	59
2 \rLight and Shading	62
2.1 MODELLING PIXEL BRIGHTNESS	62
2.1.1 Reflection at Surfaces	63
2.1.2 Sources and Their Effects	64
2.1.3 The Lambertian+Specular Model	66
2.1.4 Area Sources	66
2.2 INFERENCE FROM SHADING	67
2.2.1 Radiometric Calibration and High Dynamic Range Images	68
2.2.2 The Shape of Specularities	70
2.2.3 Inferring Lightness and Illumination	73
2.2.4 Photometric Stereo: Shape from Multiple Shaded Images	76
2.3 MODELLING INTERREFLECTION	82
2.3.1 The Illumination at a Patch Due to an Area Source	82
2.3.2 Radiosity and Exitance	84
2.3.3 An Interreflection Model	85
2.3.4 Qualitative Properties of Interreflections	86
2.4 SHAPE FROM ONE SHADED IMAGE	89
2.5 NOTES	91
3 \rColor	98
3.1 HUMAN COLOR PERCEPTION	98
3.1.1 Color Matching	98
3.1.2 Color Receptors	101
3.2 THE PHYSICS OF COLOR	103
3.2.1 The Color of Light Sources	103
3.2.2 The Color of Surfaces	106
3.3 REPRESENTING COLOR	107
3.3.1 Linear Color Spaces	107
3.3.2 Non-linear Color Spaces	113
3.4 A \rMODEL OF IMAGE COLOR	116
3.4.1 The Diffuse Term	118
3.4.2 The Specular Term	120
3.5 INFERENCE FROM COLOR	120
3.5.1 Finding Specularities Using Color	120
3.5.2 Shadow Removal Using Color	122
3.5.3 Color Constancy: Surface Color from Image Color	125
3.6 NOTES	129
II EARLY VISION: JUST ONE IMAGE\r	135
4 \rLinear Filters	137
4.1 LINEAR FILTERS AND CONVOLUTION	137
4.1.1 Convolution	137
4.2 SHIFT INVARIANT LINEAR SYSTEMS	142
4.2.1 Discrete Convolution	143
4.2.2 Continuous Convolution	145
4.2.3 Edge Effects in Discrete Convolutions	148
4.3 SPATIAL FREQUENCY AND FOURIER TRANSFORMS	148
4.3.1 Fourier Transforms	149
4.4 SAMPLING AND ALIASING	151
4.4.1 Sampling	152
4.4.2 Aliasing	155
4.4.3 Smoothing and Resampling	156
4.5 FILTERS AS TEMPLATES	161
4.5.1 Convolution as a Dot Product	161
4.5.2 Changing Basis	162
4.6 TECHNIQUE: NORMALIZED CORRELATION AND FINDING PATTERNS	162
4.6.1 Controlling the Television by Finding Hands by Normalized Correlation\r	163
4.7 TECHNIQUE: SCALE AND IMAGE PYRAMIDS	164
4.7.1 The Gaussian Pyramid	165
4.7.2 Applications of Scaled Representations	166
4.8 NOTES	167
5 \rLocal Image Features	171
5.1 COMPUTING THE IMAGE GRADIENT	171
5.1.1 Derivative of Gaussian Filters	172
5.2 REPRESENTING THE IMAGE GRADIENT	174
5.2.1 Gradient-Based Edge Detectors	175
5.2.2 Orientations	177
5.3 FINDING CORNERS AND BUILDING NEIGHBORHOODS	178
5.3.1 Finding Corners	179
5.3.2 Using Scale and Orientation to Build a Neighborhood	181
5.4 DESCRIBING NEIGHBORHOODS WITH SIFT AND HOG FEATURES	185
5.4.1 SIFT Features	187
5.4.2 HOG Features	189
5.5 COMPUTING LOCAL FEATURES IN PRACTICE	190
5.6 NOTES	190
6 \rTexture	194
6.1 LOCAL TEXTURE REPRESENTATIONS USING FILTERS	196
6.1.1 Spots and Bars	197
6.1.2 From Filter Outputs to Texture Representation	198
6.1.3 Local Texture Representations in Practice	200
6.2 POOLED TEXTURE REPRESENTATIONS BY DISCOVERING TEXTONS	201
6.2.1 Vector Quantization and Textons	202
6.2.2 K-means Clustering for Vector Quantization	202
6.3 SYNTHESIZING TEXTURES AND FILLING HOLES IN IMAGES	206
6.3.1 Synthesis by Sampling Local Models	206
6.3.2 Filling in Holes in Images	209
6.4 IMAGE DENOISING	212
6.4.1 Non-local Means	213
6.4.2 Block Matching 3D (BM3D)	213
6.4.3 Learned Sparse Coding	214
6.4.4 Results	216
6.5 SHAPE FROM TEXTURE	217
6.5.1 Shape from Texture for Planes	217
6.5.2 Shape from Texture for Curved Surfaces	220
6.6 NOTES	221
III \rEARLY VISION: MULTIPLEIMAGES	225
7 \rStereopsis	227
7.1 BINOCULAR CAMERA GEOMETRY AND THE EPIPOLAR CONSTRAINT	228
7.1.1 Epipolar Geometry	228
7.1.2 The Essential Matrix	230
7.1.3 The Fundamental Matrix	231
7.2 BINOCULAR RECONSTRUCTION	231
7.2.1 Image Rectification	232
7.3 HUMAN STEREOPSIS	233
7.4 LOCAL METHODS FOR BINOCULAR FUSION	235
7.4.1 Correlation	235
7.4.2 Multi-Scale Edge Matching	237
7.5 GLOBAL METHODS FOR BINOCULAR FUSION	240
7.5.1 Ordering Constraints and Dynamic Programming	240
7.5.2 Smoothness Constraints and Combinatorial Optimization over Graphs	241
7.6 USING MORE CAMERAS	244
7.7 APPLICATION: ROBOT NAVIGATION	245
7.8 NOTES	246
8 \rStructure from Motion	251
8.1 INTERNALLY CALIBRATED PERSPECTIVE CAMERAS	251
8.1.1 Natural Ambiguity of the Problem	253
8.1.2 Euclidean Structure and Motion from Two Images	254
8.1.3 Euclidean Structure and Motion from Multiple Images	258
8.2 UNCALIBRATED WEAK-PERSPECTIVE CAMERAS	260
8.2.1 Natural Ambiguity of the Problem	261
8.2.2 Affine Structure and Motion from Two Images	263
8.2.3 Affine Structure and Motion from Multiple Images	267
8.2.4 From Affine to Euclidean Shape	268
8.3 UNCALIBRATED PERSPECTIVE CAMERAS	270
8.3.1 Natural Ambiguity of the Problem	271
8.3.2 Projective Structure and Motion from Two Images	272
8.3.3 Projective Structure and Motion from Multiple Images	274
8.3.4 From Projective to Euclidean Shape	276
8.4 NOTES	278
IV \rMID-LEVEL VISION	283
9 \rSegmentation by Clustering	285
9.1 HUMAN VISION: GROUPING AND GESTALT	286
9.2 IMPORTANT APPLICATIONS	291
9.2.1 Background Subtraction	291
9.2.2 Shot Boundary Detection	294
9.2.3 Interactive Segmentation	295
9.2.4 Forming Image Regions	296
9.3 IMAGE SEGMENTATION BY CLUSTERING PIXELS	298
9.3.1 Basic Clustering Methods	299
9.3.2 The Watershed Algorithm	301
9.3.3 Segmentation Using K-means	302
9.3.4 Mean Shift: Finding Local Modes in Data	303
9.3.5 Clustering and Segmentation with Mean Shift	305
9.4 SEGMENTATION, CLUSTERING, AND GRAPHS	307
9.4.1 Terminology and Facts for Graphs	307
9.4.2 Agglomerative Clustering with a Graph	309
9.4.3 Divisive Clustering with a Graph	311
9.4.4 Normalized Cuts	314
9.5 IMAGE SEGMENTATION IN PRACTICE	315
9.5.1 Evaluating Segmenters	316
9.6 NOTES	317
10 \rGrouping and Model Fitting	320
10.1 THE HOUGH TRANSFORM	320
10.1.1 Fitting Lines with the Hough Transform	320
10.1.2 Using the Hough Transform	322
10.2 FITTING LINES AND PLANES	323
10.2.1 Fitting a Single Line	324
10.2.2 Fitting Planes	325
10.2.3 Fitting Multiple Lines	326
10.3 FITTING CURVED STRUCTURES	327
10.4 Robustness	329
10.4.1 M-Estimators	330
10.4.2 RANSAC: Searching for Good Points	332
10.5 FITTING USING PROBABILISTIC MODELS	336
10.5.1 Missing Data Problems	337
10.5.2 Mixture Models and Hidden Variables	339
10.5.3 The EM Algorithm for Mixture Models	340
10.5.4 Difficulties with the EM Algorithm	342
10.6 MOTION SEGMENTATION BY PARAMETER ESTIMATION	343
10.6.1 Optical Flow and Motion	345
10.6.2 Flow Models	346
10.6.3 Motion Segmentation with Layers	347
10.7 MODEL SELECTION: WHICH MODEL IS THE BEST FIT?	349
10.7.1 Model Selection Using Cross-Validation	352
10.8 NOTES	352
11 \rTracking	356
11.1 SIMPLE TRACKING STRATEGIES	357
11.1.1 Tracking by Detection	357
11.1.2 Tracking Translations by Matching	360
11.1.3 Using Affine Transformations to Confirm a Match	362
11.2 TRACKING USING MATCHING	364
11.2.1 Matching Summary Representations	365
11.2.2 Tracking Using Flow	367
11.3 TRACKING LINEAR DYNAMICAL MODELS WITH KALMAN FILTERS	369
11.3.1 Linear Measurements and Linear Dynamics	370
11.3.2 The Kalman Filter	374
11.3.3 Forward-backward Smoothing	375
11.4 DATA ASSOCIATION	379
11.4.1 Linking Kalman Filters with Detection Methods	379
11.4.2 Key Methods of Data Association	380
11.5 PARTICLE FILTERING	380
11.5.1 Sampled Representations of Probability Distributions	381
11.5.2 The Simplest Particle Filter	385
11.5.3 The Tracking Algorithm	386
11.5.4 A \rWorkable Particle Filter	388
11.5.5 \rPractical Issues in Building Particle Filters	390
11.6 NOTES	392
V\r HIGH-LEVEL VISION	395
12 \rRegistration	397
12.1 REGISTERING RIGID OBJECTS	398
12.1.1 Iterated Closest Points	398
12.1.2 Searching for Transformations via Correspondences	399
12.1.3 Application: Building Image Mosaics	400
12.2 MODEL-BASED VISION: REGISTERING RIGID OBJECTS WITH PROJECTION	405
12.2.1 Verification: Comparing Transformed and Rendered Source to Target\r	407
12.3 REGISTERING DEFORMABLE OBJECTS	408
12.3.1 Deforming Texture with Active Appearance Models	408
12.3.2 Active Appearance Models in Practice	411
12.3.3 Application: Registration in Medical Imaging Systems	413
12.4 NOTES	418
13 \rSmooth Surfaces and Their Outlines	421
13.1 ELEMENTS OF DIFFERENTIAL GEOMETRY	423
13.1.1 Curves	423
13.1.2 Surfaces	427
13.2 CONTOUR GEOMETRY	432
13.2.1 The Occluding Contour and the Image Contour	432
13.2.2 The Cusps and Inflections of the Image Contour	433
13.2.3 Koenderink’s Theorem	434
13.3 VISUAL EVENTS: MORE DIFFERENTIAL GEOMETRY	437
13.3.1 The Geometry of the Gauss Map	437
13.3.2 Asymptotic Curves	439
13.3.3 The Asymptotic Spherical Map	440
13.3.4 Local Visual Events	442
13.3.5 The Bitangent Ray Manifold	443
13.3.6 Multilocal Visual Events	444
13.3.7 The Aspect Graph	446
13.4 NOTES	447
14 \rRange Data	452
14.1 ACTIVE RANGE SENSORS	452
14.2 RANGE DATA SEGMENTATION	454
14.2.1 Elements of Analytical Differential Geometry	454
14.2.2 Finding Step and Roof Edges in Range Images	456
14.2.3 Segmenting Range Images into Planar Regions	461
14.3 RANGE IMAGE REGISTRATION AND MODEL ACQUISITION	462
14.3.1 Quaternions	463
14.3.2 Registering Range Images	464
14.3.3 Fusing Multiple Range Images	466
14.4 OBJECT RECOGNITION	468
14.4.1 Matching Using Interpretation Trees \r	468
14.4.2 Matching Free-Form Surfaces Using Spin Images	471
14.5 KINECT	476
14.5.1 Features	477
14.5.2 Technique: Decision Trees and Random Forests	478
14.5.3 Labeling Pixels	480
14.5.4 Computing Joint Positions	483
14.6 NOTES	483
15 \rLearning to Classify	487
15.1 CLASSIFICATION, ERROR, AND LOSS	487
15.1.1 Using Loss to Determine Decisions	487
15.1.2 Training Error, Test Error, and Overfitting	489
15.1.3 Regularization	490
15.1.4 Error Rate and Cross-Validation	493
15.1.5 Receiver Operating Curves	495
15.2 MAJOR CLASSIFICATION STRATEGIES	497
15.2.1 Example: Mahalanobis Distance\r	497
15.2.2 Example: Class-Conditional Histograms and Naive Bayes	498
15.2.3 Example: Classification Using Nearest Neighbors \r	499
15.2.4 Example: The Linear Support Vector Machine	500
15.2.5 Example: Kernel Machines	503
15.2.6 Example: Boosting and Adaboost	505
15.3 PRACTICAL METHODS FOR BUILDING CLASSIFIERS	505
15.3.1 Manipulating Training Data to Improve Performance	507
15.3.2 Building Multi-Class Classifiers Out of Binary Classifiers	509
15.3.3 Solving for SVMS and Kernel Machines	510
15.4 NOTES	511
16 classifying Images\r	512
16.1 BUILDING GOOD IMAGE FEATURES	512
16.1.1 Example Applications	512
16.1.2 Encoding Layout with GIST Features	515
16.1.3 Summarizing Images with Visual Words	517
16.1.4 The Spatial Pyramid Kernel	519
16.1.5 Dimension Reduction with Principal Components	523
16.1.6 Dimension Reduction with Canonical Variates	524
16.1.7 Example Application: Identifying Explicit Images	528
16.1.8 Example Application: Classifying Materials	532
16.1.9 Example Application: Classifying Scenes	532
16.2 CLASSIFYING IMAGES OF SINGLE OBJECTS	534
16.2.1 Image Classification Strategies	535
16.2.2 Evaluating Image Classification Systems	535
16.2.3 Fixed Sets of Classes	538
16.2.4 Large Numbers of Classes	539
16.2.5 Flowers, Leaves, and Birds: Some Specialized Problems	541
16.3 IMAGE CLASSIFICATION IN PRACTICE	542
16.3.1 Codes for Image Features	543
16.3.2 Image Classification Datasets	543
16.3.3 Dataset Bias	545
16.3.4 Crowdsourcing Dataset Collection	545
16.4 NOTES	547
17 \rDetecting Objects in Images	549
17.1 THE SLIDING WINDOW METHOD	549
17.1.1 Face Detection	550
17.1.2 Detecting Humans	555
17.1.3 Detecting Boundaries	557
17.2 DETECTING DEFORMABLE OBJECTS	560
17.3 THE STATE OF THE ART OF OBJECT DETECTION	565
17.3.1 Datasets and Resources	568
17.4 NOTES	569
18 \rTopics in Object Recognition	570
18.1 WHAT SHOULD OBJECT RECOGNITION DO?	570
18.1.1 What Should an Object Recognition System Do?	570
18.1.2 Current Strategies for Object Recognition	572
18.1.3 What Is Categorization?	572
18.1.4 Selection: What Should Be Described?	574
18.2 FEATURE QUESTIONS	574
18.2.1 Improving Current Image Features	574
18.2.2 Other Kinds of Image Feature	576
18.3 Geometric Questions	577
18.4 SEMANTIC QUESTIONS	579
18.4.1 Attributes and the Unfamiliar	580
18.4.2 Parts, Poselets and Consistency	581
18.4.3 Chunks of Meaning\r	584
VI APPLICATIONS AND TOPICS\r	587
19 \rImage-Based Modeling andRendering	589
19.1 VISUAL HULLS	589
19.1.1 Main Elements of the Visual Hull Model	591
19.1.2 Tracing Intersection Curves	593
19.1.3 Clipping Intersection Curves	596
19.1.4 Triangulating Cone Strips	597
19.1.5 Results	598
19.1.6 Going Further: Carved Visual Hulls	602
19.2 PATCH-BASED MULTI-VIEW STEREOPSIS	603
19.2.1 Main Elements of the PMVS Model	605
19.2.2 Initial Feature Matching	608
19.2.3 Expansion	609
19.2.4 Filtering	610
19.2.5 Results	611
19.3 THE LIGHT FIELD	614
19.4 NOTES	617
20 \rLooking at People	620
20.1 HMM’S, DYNAMIC PROGRAMMING, AND TREE-STRUCTURED MODELS	620
20.1.1 Hidden Markov Models	620
20.1.2 Inference for an HMM	622
20.1.3 Fitting an HMM with EM	627
20.1.4 Tree-Structured Energy Models	630
20.2 PARSING PEOPLE IN IMAGES	632
20.2.1 Parsing with Pictorial Structure Models	632
20.2.2 Estimating the Appearance of Clothing	634
20.3 TRACKING PEOPLE	636
20.3.1 Why Human Tracking Is Hard	636
20.3.2 Kinematic Tracking by Appearance	638
20.3.3 Kinematic Human Tracking Using Templates	639
20.4 3D FROM 2D: LIFTING	641
20.4.1 Reconstruction in an Orthographic View	641
20.4.2 Exploiting Appearance for Unambiguous Reconstructions	643
20.4.3 Exploiting Motion for Unambiguous Reconstructions \r	645
20.5 ACTIVITY RECOGNITION	647
20.5.1 Background: Human Motion Data	647
20.5.2 Body Configuration and Activity Recognition	651
20.5.3 Recognizing Human Activities with Appearance Features	652
20.5.4 Recognizing Human Activities with Compositional Models	654
20.6 RESOURCES	654
20.7 NOTES	656
21 \rImage Search and Retrieval	657
21.1 THE APPLICATION CONTEXT	657
21.1.1 Applications	658
21.1.2 User Needs	659
21.1.3 Types of Image Query	660
21.1.4 What Users Do with Image Collections	661
21.2 BASIC TECHNOLOGIES FROM INFORMATION RETRIEVAL	662
21.2.1 Word Counts	662
21.2.2 Smoothing Word Counts	663
21.2.3 Approximate Nearest Neighbors and Hashing	664
21.2.4 Ranking Documents	668
21.3 IMAGES AS DOCUMENTS	669
21.3.1 Matching Without Quantization	670
21.3.2 Ranking Image Search Results	671
21.3.4 Laying Out Images for Browsing	674
21.4 PREDICTING ANNOTATIONS FOR PICTURES	675
21.4.1 Annotations from Nearby Words	676
21.4.2 Annotations from the Whole Image	676
21.4.3 Predicting Correlated Words with Classifiers	678
21.4.4 Names and Faces	679
21.4.5 Generating Tags with Segments	681
21.5 THE STATE OF THE ART OF WORD PREDICTION	684
21.5.1 Resources	685
21.5.2 Comparing Methods	685
21.5.3 Open Problems	686
21.6 NOTES	689
VII BACKGROUND MATERIAL\r	691
22 \rOptimization Techniques	693
22.1 LINEAR LEAST-SQUARES METHODS	693
22.1.1 Normal Equations and the Pseudoinverse	694
22.1.2 Homogeneous Systems and Eigenvalue Problems	695
22.1.3 Generalized Eigenvalues Problems	696
22.1.4 An Example: Fitting a Line to Points in a Plane	696
22.1.5 Singular Value Decomposition	697
22.2 NONLINEAR LEAST-SQUARES METHODS	699
22.2.1 Newton’s Method: Square Systems of Nonlinear Equations.	700
22.2.2 Newton’s Method for Overconstrained Systems\r	700
22.2.3 The Gauss–Newton and Levenberg–Marquardt Algorithms	701
22.3 SPARSE CODING AND DICTIONARY LEARNING	702
22.3.1 Sparse Coding \r	702
22.3.2 Dictionary Learning	703
22.3.3 Supervised Dictionary Learning	705
22.4 MIN-CUT/MAX-FLOW PROBLEMS AND COMBINATORIAL OPTIMIZATION	705
22.4.1 Min-Cut Problems	706
22.4.2 Quadratic Pseudo-Boolean Functions	707
22.4.3 Generalization to Integer Variables	709
22.5 NOTES	712
Bibliography	714
Index	767
List of Algorithms	790

Computer Vision: A Modern Approach

Additional Information

Book Details

Abstract

Table of Contents

Contact Us

Quick Navigation