Welcome to TDBSoverflow, Our class's own StackOverflow. Our rules:
  1. Use only meaningful and self-explanatory titles
  2. Tag your questions with meaningful keywords
  3. Use upvotes and downvotes to rate the answers
  4. When you receive a satisfying answer - Click the "V" button
Remember: you may get up to 5 bonus points to your final grade!

2016 Moed A Q3(i)

+1 vote

In the answer to that question when computing the cost of the last join, it's been noted that only the selection result have to be sorted so the cost of the last join is 2(B(R1)/V(R1,C)).

I understand that the other table in the join is already sorted by the E column but it's also noted that the query is computed in a pipeline which means that when computing the last join both of the tables to be joined is already in the memory, so as I see it it shouldn't cost anything.

If the meaning was that it's on the disc but we don't have calculate the cost of writing it, I don't understand why is the last pass over both tables (in the merge join algorithm) is not been counted as well in the cost of the last join?

Am I missing something?if not what of the two options I presented is the correct one?


Aviv Tahasa
asked Feb 4, 2018 by atahasa (280 points)

1 Answer

+2 votes
Best answer

The relation is in memory, but it's not sorted. 'In Memory' doesn't mean the entire relation fits in the memory - it can be big. it only means that you can perform the next task right after the previous one (without disk access), aka pipelining.

Since the relation is not sorted like you need for the join, to sort it you must write it to the disk and read it again, and only then perform the join (which is now free, like you said).

answered Feb 4, 2018 by Assaf (31,090 points)
selected Feb 4, 2018 by atahasa