Frequently Asked Questions for Computational Neuroscience, Exercise 2
1. May I assume that at T(N) every object
overlaps its center of T(n - 1)?
A. No.
2. Should we implement things in the way
the brain work ?
A. Not exactly.
Please read first section of "Project definition".
3. When emerging from the side, a rectangle
might look like a triangle; when approaching from infinity, all types begin
as a single pixel. Thus, we can identify the type only after we have dealt
with the object for some
time. Is this OK ?
A. Yes. Just
make sure your output file follows the right format.
4. Is the project definitions different for
different sized teams ?
A. No.
5. May we assume that one of eyes is locked
on the same point during the whole session?
A. You may
assume that both eyes are fixate. Remember that the disparity measure is
relative (it depends on the location of the origin of axis in each image).
We will say that the disparity of two pixels is 0 if they share the same
X coordinate. If the center of an object is located at (x,y) on the left
image and at (x + delta , y) on the right image, then we say the disparity
is delta.
6. Are triangles equilateral ? isosceles
? can we assume anything about the angles ?
A. You may
assume nothing about the triangles.
7. Can we assume that shapes center are never
closer then 30 pixels (the maximum movement of a shape per frame) ? If
not, can you give us a hint of how to correlate a shape with its center
?
A. No. But
you may assume that the center of an object will be closer to its location
in the previous frame than any other object.
8. When objects come into the screen, we
don't know their exact center, thus the disparity calculation maybe incorrect
(we calculate disparity based on the part of the shape we see). Is that
ok?
A. As all shapes
are flat, the disparity of center is equals to the disparity of a corner.
Thus you may calculate the disparity as soon as a shape (partially) appears.
9. Must we use edge detectors ? Can we use
other operators instead (for example calculating "cornerness measure"
- how many of the R - radius neighbors of a pixel have the same color
or "roundness measure" - perimeter ^ 2 / area is lowest for
circles) ? It seems like the problem can be solved without using edge detectors
at all...
A. Your system
should be a rough approximation of V1. The problem of detecting the objects
can be solved in other ways, but this is not your task here. You may use
other operators together with the edge detectors, as long as you justify
why they are a rough approximation of V1.
10. Given the disparity of an object, how
do we calculate its depth?
A. You don't
need to. On the z coordinate you should output the disparity value. We
assume that an object at infinity has disparity 0, that is - has the same
x coordinates on the given left and right images.
11. Can we assume that an object approaching
/ retrieving from the 'eye' will create a difference in disparity in all
cases?
A. Yes.
12. Is it required to implement the whole
project using neural networks? In order to match objects in two images,
we need to find the nearest object to the previous one. can we use the
mathematical function minimum or do we need to use Neural network?
A. You don't
have to implement the whole project using neural networks. Nevertheless,
it should be a rough approximation of V1, therefore every operator you
use should be implementable in neural networks, and plausible model for
V1 (justify your solution).
13. Is it possible to build a different neural
layer for each object in the image, and then process each separately, or
do we need to process them all together?
A. You may
choose to do so, as long as at the end you have a whole single system which
performs the described task.
14. In the project definition and question
# 7 in the FAQ you used the word 'close'. How do you define the distance
?
A. sqrt
( dx^2 + dy^2 )
15. Can a shape in one picture (left or right)
have no match in the other picture (right or left) in a certain frame?
A. This might
happen only because of disparity - when the shape is near the frame of
the image.
16. Can a shape appear through all frames
only partially (be cut in the side by the border), or is there always a
frame in which all of the shape
appears?
A. No. Any
shape will always detach from all borders at least in one of the frames.
17. It is said that we can use the 'images'
toolbox. Are we allowed to use all functions in the 'images' toolbox,
especially bwlabel?
A. You are
allowed to use any function, as long as it is an approximation of what
the visual cortex does. Do justify your algorithm. See Q9.
18. Do we have to use neural networks themselves,
or do we have to use methods that can be applied to neural networks?
A. The latter,
but you have to give a good explanation of why it is "implementable" in
n.n. and why it is a plausible model of the visual cortex.
19. What is considered an "edge detector"?
Can we use any function that returns points that are in lines of
a certain angles in the image ? May
we detect the edges as we wish?
A. see answer
for Q9 and Q17.
20. Can the two eyes see different objects?
A. Only on
the boundaries (because of the disparity).
21. Can one object appear and another disappear
on a single frame ?
A. Yes.
22. Can more than one object disappear on
a frame?
A. Yes.
23. Can there be a situation in which an
object left the frame, and a new object entered so that the new object
center is within the 30 pixel radius of the previous object ?
A. No.
23. What is meant by the edge detectors being
of "various size" ?
A. It means
that they give strongest response to edges os a specific length / width
.
24. Does V1 accept input in log-polar coordinates
?
A. Yes, but
you don't need to use this fact in your model.
25. We need to make the approximation of
the V1, the question is how rough this approximation should be ? Can you
give a general guidance about what functions of matlab are sufficiently
"low level" to be used ? for example, can we use conv2 function ?
A. You may
use conv2 function, but note Q9 and Q19.
26. I have a "Miluim" which, according to
your web page entitles me a late deadline. What should I do ?
A. Just add
a photocopy of the "Ishur" when you submit the exercise (please note the
rules).
27. How accurate should the results be ?
A. A small
fluctuations of +-5 pixels are allowed. The output format should exactly
match the given example.
28. How do you define the center of an object
?
A. The center
of an object is the the average of all its pixels' coordinates.
29. Where can I find additional info about
the physiology and functionality of the visual cortex ?
A. You may
look at the chapter about vision in "Physiology of Behavior" by Carlson
(can be found in social sciences library).
30. From the definition of the exercise I
concluded that an object in the left eye can be +10 to -10 pixels far (in
the X axis) from its twin object in the
right eye. Can it be that another object
will exist in this range ? Can it be that when putting the right
picture over its twin left picture
different objects will partly overlap ?
To sum it up what is the criteria to determine who is the twin object of
a left eye image in the right eye image and vice versa ?
A. The criteria
are better described in the two papers suggested for your stereo algorithm.
Note two important issues:
a. Your algorithm for calculating the depth
map may work on the low level input (that is - before you "know" what the
object is).
b. Say a confusing situation exist (for
example - what you described above) - in the sense that a person
that watches this scene might: not understand what s/he; or perceives
it as two alternating scenarios; then we expect a good model of the
visual system to have (similar) problems in "understanding" this input.
31. Can we assume that every object eventually
reaches a certain size (big enough to find its shape)?
A. Yes.
32. When we calculate an object's center,
is the function that calculates it has to be correlated with a similar
operator in the visual cortex ?
A. Not really.
The hole model should, but the exact coordinates of each object are more
for enabling a check of your result.
33. We understood that the 'left' pictures
represent what the left eye sees and the 'right' pictures represent what
the right eye sees.
However, in all the 'left' pictures (left3.bmp
for example) the objects appear to be closer to the left picture boundary
than in the 'right'
pictures (for example, right3.bmp), while
in reality the right eye sees objects left to where the left eye sees them.
So, shouldn't the objects be
closer to the left picture boundary in the
'right' pictures?
A. Focus on
the wall in front of you. Put a finger in front your eyes, but keep looking
on the wall. You will notice a contradiction
for what you have just said. For more details,
you may look at "Vision" by Marr, page 113.
34. In the project description you said that
in every frame only one new object may appear, but on the 6th image two
new objects appear.
We relied on the assumption that only one
new object appears every time in order to distinguish the new object from
other, and from there on to "follow" it. For example, how can we do the
depth from stereo calculation when two new object may appear at the same
time ?
A. This was
indeed a mistake. Nevertheless, the only reason for allowing only one new
object at a time was to have a
unique order of the objects in the output
file. Note that the left - right correlation for the disparity calculation
need not be performed on the high level "recognition", but on the lower
level by one of the two algorithms given to you in class.
Of course in a case where two new object
appear at the same time, it doesn't matter which one will appear first
in the output file (all possibilities will be accepted).
35. Which center should we output in case
of non zero disparity (left/right/average) ?
A. Left.
36. What disparity should be given when an
object appears in one eye only (maybe shortest distance from boundaries)
?
A. Either shortest
distance from boundaries or zero.
37. Some of the functionality we are to implement
are done in the brain in other parts than V1. What should we do ?
A. Try to implement
the things that are in V1 as described in the project definitions. The
(more complex) features - should be reasonable. Most important - explain
what you did and why you did it this way.
38. Which center should we output in case
of non zero disparity when only right object is seen ?
A. Right.
39. What is the minimal size an object will
get at some point of the movie ?
A. Any object
will be at list 6 x 6 pixels at one frame (where 6 x 6 is the size of the
minimal bounding rectangle).
40. Where should we store the results ?
A. './output.txt'
(that is, at /cns00b/output.txt).
41. On which matlab version would the project
be checked (on which server should we test it - nova,
libra, plab-##, etc.) ?
A. libra.
42. The project specifies that we should
use something like receptors identifying different lines and angles and
use them to identify objects,
there is an enormous number of possible
lines and angles, how are we supposed to handle so much data and deduct
something from it (except by
neuron networks) cant we just identify the
shapes by a module that checks their number of corners or something?
A. You don't
have to use all possible orientation. You may choose your own parameter
for "angular resolution". But this have to be the first step. From there
on, counting the corners may be a plausible solution as some others.
43. In what way can we use the algorithm
we learned for identifying disparity by a peak in the frequency domain
of the log of the pictures? can't we first identify the objects and after
matching between the objects identified in both eyes - calculate the disparity
?
A. You may
use any algorithm studied in class for calculating depth map / disparity
map from raw data (NOT from already identified objects) as this simulates
better what happens in V1. You may use Yeshurun's and Schwartz's cepstrum,
Marr's and Pogio's neural network, or just correlation (calculate correlation
of every point and its neighborhood the optional corresponding points and
neighborhoods in the other picture, and selecting the one with the highest
correlation).
44. Can you please put the output file of
the input you've put in the project definition (we want to compare results)?
A. No. You
are to print the your results (this will be part of your grade).
45. Is there more input available to test
our program ?
A. Currently
no. If there will be I'll publish on this site.
46. From where should the job read the input
? In the Project Definition you wrote '~cns00b/input/', but this directory
doesn't exist. Did you mean ~/cns00b/input/ ? In that case, can I assume
than I am running from ~/cns00b (so I can read from ./input) ? If not,
matlab doesn't know the ~ character as a home directory.
A. Your program
(when copied to "/cns00b/" of some home dir) should read it's input from
the sub directory "./input/". In this sub directory there will be
the image files "Left1.bmp", "Right1.bmp", "Left2.bmp" etc. There will
also be a file named "info.txt" containing the number of pairs of images.
Note that through the checking procedure,
your entire "~/cns00b/" directory will be copied to the checkers
account and run from there (using input from "./cns00b/input/".
47. We are currently debating on an important
theoretical issue, regarding the order of processing in the model:
One possibility is to detect edges (of various
sizes and lengths) on ALL the picture BEFORE processing. This option may
resemble one of the known
functions of V1, (and the entire visual
system) which includes seeking parallel firing from ADJACENT neurons,
and hence discovering lines,
angles, and finally - OBJECTS. while this
option may seem more logical, it is less efficient in the time needed and
entails too much easily avoided
complexity.
The other possibility is to take into consideration,
that there are locations in the image, which contain clusters of neurons
that fire
simultaneously, since all of them are responding
to (approximately) the same visual stimulus. Therefore, we can use these
clusters to
first determine that there IS an object
there (in that general location). Only the SECOND step will be to identify
the object's features - edges,
angels, etc. i.e. - use the function bwselect
to select the objects, since the visual system knows there is a unique
stimulus at that location.
To sum up, the first possibility entails
using edge detection on all the picture from the very beginning, in order
to determine where the objects
are. The second option is dividing
the field into distinct areas beforehand, and only then detect edges, angle,
etc.
A. Whatever
you choose please EXPLAIN why you chose what you chose, what were the alternatives,
etc. Note that it is otherwise almost impossible to get this from your
code only.
Note that receptors on the retina fire in
response to any light (not just defined objects). And some neurons in V1
may fire more in response to darkness.
If you make simplifying assumption which
are due to the limited power of the computer and are not part of your model
(you would do it in a different way if you had more powerful computer)
do WRITE it DOWN and explain.
48. Can there be a situation in which there
are same number of objects in both eyes, but these are not the same objects
(because one in just leaving and the other is just appearing) ?
A. Yes.
49. Is it ok to assume that the coordinate
system is such that the left top pixel of the picture matrix has the coordinates
x=1, y=1, the pixel to its
right has the coordinates x=2, y=1, the
pixel below it has the coordinates x=1, y=2 etc.?
A. Yes. This
will be the coordinates used.
50. Can we assume that the image is always
256x256 pixels?
A. Yes.
51. About the documentation: You wrote that
it should not exceed two 2 - sided pages (for pages of print). How
strict is it ?
A. Any work
which will exceed this limit by more than one page will lose points. This
should be more than enough, as you are asked NOT
TO PRINT THE MATLAB CODE.