This Web Page is News

Frequently Asked Questions:

Exercise 3

Some of the questions on exercises 1 and 2 are relevant for this exercise too (see F.A.Qs – exercises 1,2).

First part: Detection and Identification

Q1. May we choose implement an algorithm which contains features that weren't studied in class ?
A1. Of course, but make sure you have a good understanding of it.

Q2. We have one question regarding the first part of the project. Are we supposed to recognize partial plates as well or only complete ones?

A2. It is OK if you do not detect a plate which is partially out of the picture, or partially hidden by some other object.

Q3. I have memory problems – the matrix representing a picture seems to be quite large. Does the following makes sense?

Doing a first phase of calculation on a smaller matrix, representing the same picture, but with reduced resolution, then a second phase with the original resolution, but concentrating only on the more “interesting” parts of the picture.

A3. See A1. If you use such a solution, remember to explain its relevance (or irrelevance) to the visual system.

Q4. Regarding the first part (detection) - are we allowed to use convolution with any operator?

A4. Any convolution you do (and any other part of your algorithm) should be well motivated. That is, you should explain why it is relevant to the visual system.

Q5. Can we combine information of different types (e.g, color and lines’ orientations)?

A5. See A4.

Q6. You gave an example of 3 file-names of the output:

two_cars_detect.ps

two_cars_identify.ps

two_cars_identify.txt

What is the difference between XXX_detect.ps and XXX_identify.ps (what should be in XXX _identify.ps)

A6. The file XXX_detect.ps is the output of the detection part. The XXX_identify.ps file should demonstrate the running of the second (optional) part – the process of identifying the cars’ numbers.

Q7. Should all the matlab files for both the license plate and the stereo algorithm all be placed in the same directory? ~/cns05a/3 ?

A7. Yes.

Q8. Should we recognize plates of bikes as well?

A8. You don’t have to.

Q9. The resulting ps files are very big (4MB each). The storage place on our unix machines is too small to handle a list of photos being processed (there will probably be more than just 3 photos). Any ideas what could we do with those files?

A9. With some effort I guess you can make them smaller (less than 1MB).

Your program will be tested on an (at least) 15MB free quota account.

Q10. May any of the ps files contain more than one page?

A10. No.

Q11. How much time would you consider reasonable for the processing of one input?

A11. For a 500 by 500 pixels input, a few seconds should suffice. But certainly not more than a minute.

For a 3000 by 3000 pixels input, a few seconds should suffice too. But certainly not more than two minutes.

Second part: Stereo Algorithms

Q1. Where can I find the paper by Schwartz and Yeshurun?

A1. You can download it from here.

Q2. Where can I find Marr’s book (‘Vision’) to read about his algorithm with Poggio?

A2. At the library.

Q3.The authors of "Cepstral Filtering" article, admit that the performance of the presented algorithm on natural images is much worse than on RDS (as we have seen when running our implementation). Would this fact be taken into consideration?
A3. This is indeed one of the parameters you should consider and discuss in your documentation.

Q4. What is a disparity map?
A4. A disparity map is a matrix containing the disparity for each pixel of the original images.

Q5. What is the maximum size of the input images? Can we assume that it is 1000x1000 (maximum)?

A5. Yes.

Q6. Hezy states in his article that a 16000 x 16000 pixel (conventional constant resolution) must be used to achieve the desired results. You said the maximum size will be 1000 x 1000. How does the 2 combine?
A6. Your program will be tested using a maximum 1000 x 1000 images. However you may use any pictures to check it yourself, and add the results to your report.

Q7. Can we add examples of more running tests to the documentation?
A7. Only within the page limits.

Q8. We were wondering if we can assume the following regarding the input pictures in the first question (of the second project):
a. Pictures have been taken from the same height, that is, all disparities will only have an X element (and no "diagonal" disparities).
b. The Axes of the two pictures will be parallel, that is, the X axis of the first image is parallel to the X axis of the second image, and the same goes for the Y axis.
A8. Yes for both.

Q9. Is the resolution of the disparity map to be the same as the original pictures?
A9. Not necessarily.

Q10. In the link you gave us for stereo pictures, the disparity gets larger as the object gets further. It is inconsistent with the eyes as far as I understand. Are these "valid" pictures?
A10. Put a finger about 20 cm in front of you, concentrate on it, an close you left eye and then and your right eye. Note what happens to the background. Now do the same when you fixate on the background instead of fixating on your finger. What is the different ? For further details, look for 'disparity' in Marr's book.

Q11. Concerning the cepstral filtering paper: What should be the assumptions about the input resolution? How far are the images? In the paper the writers assumed that the images are in a constant distance, and 6-12 minutes correspond to about 20 pixels. But that requires large input images to produce good results (larger than 1000X1000).
A11. You cannot assume the distance of the image - note that stereo vision works in various distances. However you may assume the angle per pixels assumed in the article.

Q12. If our algorithm outputs both positive and negative disparities; should we omit the sign (use the same color for +a and -a) or shift the disparity so we don't have negative disparities ?
A12. Shift it.

Q13. May we assume we are not supposed to recognize disparity larger than, say, 10 pixels?
A13. If you make such assumptions, you should explain whether such bound has a good biological justification, or it is just due to a technical reason (justify both the existence of the bound, and the value of the parameter you chose).

Q14. ( On "Columnar Cepstral..."): When do they use a 1 dimensional Fourier transform and when a 2 dimensional one?
A14. The examples (images) imply a 2D FFT, but the appendix use 1D. The algorithm itself calls for a 2D FFT, since it is performed on an image. If only a horizontal shift is expected (or allowed), it will mean that the peak should be looked for only on the X axis of the cepstrum - but it is a 2D FFT in any case. The reason a 1D FFT have been used in the appendix is just for the sake of the simplicity of
the explanation.

Q15. Should we scale the disparity map to produce a more comprehensible picture or should the values in the disparity map represent the actual disparity values (and thus be values between 1 and 10 which look all black to dark gray)?
A15. The output bitmap file should be visually checkable; that is - the colormap you choose should make relevant disparity visible.

Q16. What do you mean by natural, and artificial pictures? does the image you gave 2_L, 2_R count as natural or artificial?

Is there a way to create stereo images from natural images? or just look for examples on the net?

A16. The samples 2_L.bmp, 2_R.bmp are artificial. You can find natural pairs on the web. They are usually made by taking two pictures of the same scene, from to close points.

Q17. Why are we asked to test the stereo algorithm on the edge-map of the original input, as well as the original input? What type of edge detection should we use for this purpose? Will the algorithm work better if we apply edge-detection first?

A17. The use of edge-detection is motivated by our knowledge of the how the visual information is processed in our brain.

Use edge detection that you think fits best the visual system. In your document refer to the behavior of the algorithm with and without edge detection, when discussing its relevance to the brain.

Q18. Are we allowed to use a readymade Cepstrum function? How about a ready made Fourier fuction?

A18. You should implement the Cepstrum, but you can use Matlab’s Fourier fuctions.

Q19. I have trouble debugging my Cepstrum algorithm. Any suggestions?

A19. You can start with reducing the problem to only two concatenated windows, each of height 1.

Q20. I Tried implementing the Cepstrum algorithm, but I get symmetric respond, in particular, I can’t distinguish disparity 3 from disparity –3.

A20. It is expected to get a symmetric behavior. This is due to the 'modulo' property of the fft. What you may do is:

Pad your input (the two concatenated windows) with zeros, and then (as suggested in the paper) consider only the values at d/2 to 1.5 d.

Q21. When we implement the algorithm from the paper (Cesptral filtering) we get very poor results. This is because the resolution of the test images is very low relative to the window size.

If we do as they say in the paper the resultion of the disparity image is unacceptably low.
When we made a slight adjustment and recalculated the window for each pixel (that is we changed the offset from which the window started and re-interleaved the image accordingly) we got excellent results.

We wanted to ask if what we did is ok ? (We will give a full explanation of the differences between what we did and what was suggested in the article is our report).
A21. Whether windows overlap (and how much if they do) is indeed one of the parameters you can control in you experiments. Explain what values should be set in order to get better similarity to human visual system and what values yield best performance.

Q22. We have used the built in matlab function to perform the edge detection (using the sobel method which we learned in class). Is that ok, or do we have to implement it ourselves ?

A22. For this part it is OK to use a ready made edge detection function.

Q23. Can we assume that the stereo input files are "gray scaled"?

A23. Yes.

Q24. How much time would you consider reasonable for the processing of one input (a stereo pair)?

A24. For a 500 by 500 pixels input, a few seconds should suffice. But certainly not more than a minute.