服务承诺
资金托管
原创保证
实力保障
24小时客服
使命必达
51Due提供Essay,Paper,Report,Assignment等学科作业的代写与辅导,同时涵盖Personal Statement,转学申请等留学文书代写。
51Due将让你达成学业目标
51Due将让你达成学业目标
51Due将让你达成学业目标
51Due将让你达成学业目标私人订制你的未来职场 世界名企,高端行业岗位等 在新的起点上实现更高水平的发展
积累工作经验
多元化文化交流
专业实操技能
建立人际资源圈Mechanical_Vision
2013-11-13 来源: 类别: 更多范文
Machine vision class assignment
Stanislav Kovaˇiˇ, Aleˇ Leonardis, Matej Kristan, Janez Perˇ cc s s 21. november 2010
Povzetek This document describes a class assignment for the school year 2000/2011 that should be completed no later than January 31, 2011. The assignment consists of four sequentially related parts that you have to complete within three-week intervals. A successfully completed assignment gives you 5 ECTS, equivalent to 125 hours of student work required for the Machine vision course.
1
1
Description and requirements
The main goal is to design a working prototype for active acquisition of images of limited number of various business cards that will be robust against lighting, position, orientation, scale, and shape variations. The only assumption will be that the card is a planar object and occupies substantial part of the image. In other words, the card itself is a dominant object in the scene. The prototype shall provide the following functionality. A person shows a business card to the camera (e.g. USB camera connected to the PC), while translating and rotating it against the camera. In the acquired sequence of images the algorithm first searches for image features that are present in the largest planar surface, assuming that the largest surface corresponds to the card. This way the largest plane – and therefore the card – is detected. These features are used to segment the card in each image in the sequence and knowing the correspondence among image features in the images we can put them into alignment, i.e. translate and rotate them into a preselected reference position. We can expect that not all regions in one image are of equal quality in all images. Therefore, we can combine the best parts (regions) among all images to produce a better image. We can also store the feature descriptors that have been selected during detection phase (a model) for later use. We can do so for each and every business card. Later on, when one of the business cards that have been previously acquired and stored is presented to the system again, the system segments the card from the background and matches its description to the stored models. If match is found, we say that the card has been recognized. Otherwise, the card is unknown. The overall assignment consists of four parts. For each part you are expected to read the underlying literature, understand the methods and their implementations (algorithms) that are publicly available, design and implement the ”missing parts”, write a report and give a five minutes oral presentation, which will be followed by discussion, in the class. The following deadlines should be met: • Detection of stable image features Thursday, 25.11.2010. • Projective camera geometry Thursday, 16.12.2010. • Segmentation Thursday, 6.01.2011. • Classification / recognition Thursday, 20.01.2011. The deadline for submitting the report is two days (Tuesday evening) before the Thursday’s class meeting. E-mail for submission: stanislav.kovacic@fe.uni-lj.si.
1.1
Assignment Part 1
The objective is to acquire image sequences for a couple of business cards and to select an appropriate detector of local image features. Put a card in front of the camera, then translate and rotate it. Take image sequences of cards that are rich in texture, but also a few that are not that well
2
textured. Acquire high quality images, i.e., good focus, good contrast, low noise, good resolution, appropriate lighting. Use raw image format or at least do not use high compression ratio in case of JPEG or MPEG image coding. Take a few images such that you would be able to test whether radial lens distortion is present in the images. Make selection of local image features among: • SIFT [8], • SURF [4] and • MSER [10]. Take an image from your acquired image base and transform (deform) it with perspective or affine transformation using different parameter values. Then run the feature detectors as proposed above to see how well the selected feature detectors perform, subject to image transformation. For example, do they find the same features regardless transformation used' Do they consistently locate these features in the presence of additive Gaussian image noise' Hint: read [8]. In the first report describe two of three detectors mentioned before, their advantages and disadvantages. Based on your analysis select one of the detectors that you think performs best. Describe the selected detector in detail and use it in subsequent parts of the assignment. You can use Matlab implementation of these detectors, e.g. SIFT [12], MSER [12] and SURF [11]. Submission 1: • The report, describing your image base and stability study of one of the image detectors. Provide theoretical description of one of the detectors and arguments for that decision/selection (two pages). Briefly describe the other two detectors (one page maximum). Also include illustrative examples of detected features. Comment on radial distortion (half page). • Powerpoint presentation (∼5min). • Implementation in Matlab. Deadline: November 23.(Submission)/25.(Presentation), 2010.
1.2
Assignment Part 2
The objective is to detect local image features that are present on the dominant object plane throughout longer /longest image sequence Take two consecutive images and find those points that obey projective image transformation between images. Use feature detector that you have selected as a result of the previous assignment (Part 1) 1.1. For robust point matching use RANSAC [6] (see also [2, 1]). Also read the paper on how to detect points in two images that are lying in the same scene plane [13]. However, there will be points included that obey the projective plane transformation, but are not part of the card. One way to filter out such points is to observe how stable the detected points are in a longer image sequence, say, using five-pair
3
image sequence. The remaining points are those that are consistently present in the image sequence in a small neighborhood, and therefore are said to be stable. Also analyze whether it is better to match consecutive adjacent images or each fifth or tenth image in sequence. Submission 2: • Report describing theoretical background and experimental results. • Powerpoint presentation (∼5min). • Matlab implementation. Deadline: December 14. (Submission) / 16. (Presentation), 2010.
1.3
Assignment Part 3
The objectives are to segment the card from the background, transform all images in sequence into reference position, and to combine all images in a way to produce a ”better”image. The results of the previous task are image features for each image that are present on a business card. The number of points is much lower than the number of all pixels that belong to the card region. In the next step we want to segment the card region from the background based on a sparse cloud of image features. In other words, we want to find (all) pixels that are part of the card region, and the rest that belong to the background. One option to solve this task is to use the method of GraphCut [9, 5]. The input to GraphCut algorithm are two sets of points: the set of initial points that belong to the object and the set of initial points that definitely belong to the background. The result of the algorithm is segmented image. Your task is therefore to define (derive) the initial points and run GraphCut. In addition, you have to define similarity measure between pixels, which is also part (parameter) of the algorithm. You can use any Matlab implementation, e.g. [7, 3]. Run GraphCut on each image in the sequence to produce a sequence of segmented images. The images have different positions and orientations and therefore not all parts of an image are of the same quality. Your task is therefore to produce a better image that is a composition of the best image parts from the whole image sequence. First, register (align) all images in sequence, that is, transform (translate, rotate, scale) all images in the sequence into a reference position, then combine the images. There are many ways how this could be done. For example, you can evaluate the quality of an image region (or individual pixels) based on the sharpness of that region and then select only the most sharp regions from the image sequence to produce the composed and potentially better image ''. Submission 3: • Report, describing theoretical background and experiments. • Powerpoint presentation (∼5min). • Matlab implementation. Deadline: January 2. (Report) / January 6. (Presentation) 2011.
4
1.4
Assignment Part 4
The objective is the recognition of a business card based on previously stored models of some number of business cards. Based on the three previous steps you are able to identify those pixels in the acquired image that represent the card, and you can transform the card (the detected image features) into the reference position. After this step, you are ready to do recognition of business card contents. In the context of this assignment, the recognition means the classification of the business card into one of the several classes - one class per card. To be able to do this, you will need to arrange your previously acquired images into classes. Each class should contain multiple (different) images of the same business card. If you did not acquire a dataset of business cards already, you have to do it now. Use five different business cards, and acquire five images of each business card. Use this opportunity to think about choice of images you will include into the dataset – with the aim of achieving the best recognition results. For the actual classification simply use the nearest neighbor search. Describe this method only briefly - so it will be documented in your report, but no in-depth description is necessary. In classification, we usually deal with two (interconnected) problems: how to describe the sample, that we need to recognize, and how to compare two samples using the chosen descriptors. Think of the trivial solutions in your case: geometric alignment of the two business card images and the calculation of (simple – Euclidean) distance in the N dimensional space of those image pixels that lay inside segmented area of the card. Why is such approach inappropriate' In your report, describe the reasons why not to proceed in this direction (and reasons for, if you can find any). Next, implement two descriptors, which will be used for classification. First descriptor will be simple histogram of pixel values, which lay inside the (segmented) area of a card. Choose the most appropriate distance (or similarity) metric for use with such descriptors. Test this method on your image dataset. How will you measure and express the performance of such recognition method' The second descriptor should be the region covariance descriptor, [']). Use the distance metric, described in the paper. Implement the method by yourself, following the suggested literature and test its performance. Describe both recognition methods in your report. Compare their basic design, compare their advantages and disadvantages and comment the results. Propose a few (2-3) alternative approaches that it could be used as a solution for this and similar problems. Describe their appropriateness for the problem at hand regarding the parameters of your problem (e.g. size of your dataset). When commenting alternative approaches, cite the primary sources (journal or conference papers) where the approach was proposed. Bonus assignment: design your solution in a way, that it will be able to determine, whether the business card you are trying to recognize is already in the database, or it is unknown to the system. Submission 4:
5
• Report, describing theoretical background and experiments. • Powerpoint presentation, accompanying the report (∼10min). • Matlab implementation. Deadline: January 18. (Report) / January 20. (Presentation), 2011.
Literatura
[1] Ransac related site, http://vision.ece.ucsb.edu/ liani/Research/RANSAC/RANSAC.shtml. [2] Wikipedia: Ransac, http://en.wikipedia.org/wiki/RANSAC. [3] S. Bagon, Shai bagon’s matlab code, http://www.wisdom.weizmann.ac.il/ bagon/matlab.html. [4] H. Bay, T. Tuytelaars, and L. Van Gool, Surf: Speeded up robust features, Lecture notes in computer science 3951 (2006), 404. [5] A. Eriksson, O. Barr, K. ˚str A ”om, F. Georgsson, and N. B ”orlin, Image Segmentation Using Minimal Graph Cuts, Proceedings SSBA, 2006, pp. 45–48. [6] M.A. Fischler and R.C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, (1981). [7] S. Lankton, Growcut image segmentation, http://www.mathworks.com/matlabcentral/fileexchange/19091-growcutimage-segmentation, 2008. [8] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2004), no. 2, 91–110. [9] J. Malcolm, Y. Rathi, and A.R. Tannenbaum, A graph cut approach to image segmentation in tensor space, (2007). [10] J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image and Vision Computing 22 (2004), no. 10, 761–767. [11] P. Strandmark, http://www.maths.lth.se/matematiklth/personal/petter/surfmex.php, 2008. [12] A. Vedaldi, VLFeat, http://www.vlfeat.org/install-matlab.html, 2009. [13] E. Vincent and R. Laganiere, Detecting planar homographies in an image pair, Proceedings of the 2nd International Symposium on Image and Signal Processing and Analysis, Citeseer, 2001, pp. 182–187. zu-
6

