Right Deep, Left Deep and Bushy Joins

December 9th, 2011

At UKOUG someone asked me if DB Optimizer’s VST diagrams could deal with left deep verses right deep execution plans. What is right deep verses left deep? Good question. In join trees (not VST) the object on the left is acted upon first then the object on the right.  Below are left deep and right deep examples of the same query, showing

  • query text
  • join tree
  • join tree  modified to more clearly show actions
  • VST showing the same actions

All  of this boils down to the point that a right deep HJ can return rows earlier than a left deep HJ. A left deep HJ has to wait for each join to finished completely so that the result set can be hashed before  the next step can probe it. On the other hand, in a right deep HJ, it’s not the result sets that are being hashed, but a table at each level, thus each table can be hashed without waiting for intermediary results and once these hashes are complete  a probed row can flow all the way through the join tree,  from bottom to top, similar to how a nested loop can start giving results early.  The Left Deep HJs only have two open work areas at a time where as the Right Deep can have multiple work areas open.  One of the key thing to keep i n mind is how much data we have to hash. If the intermediate result sets are large (and/or growing each level) then that represents more work each step of the way.

Normally I don’t think in left deep verses right deep because it doesn’t change the the path through the join tree. On the other hand it does change whether we are hashing a result set or we are hashing a table. If hashing a table then the results can be returned, in theory, more quickly.

For NL joins there are only left deep join. The object on the left always probes the object on the right.  The object on the left is always accessed first ( there is no need to modify the object on the left  first and probe from the right with NL).

 

Besides left deep and right deep, there are  also bushy joins which Oracle doesn’t do unless forces to through sub-queries or views as in the following:

Finally,  the above VST diagrams were modified to be more easily compared to the join tree diagrams. Below are the VST diagrams as displayed by default in Embarcadero’s DB Optimizer.   DB Optimizer shows the tables all in a line because the tables are one to one relationships.  Keeping the VST diagram layout constant, it is easier to see the differences in execution paths:

Note on hinting and order of tables in explain plan:

For NESTED LOOPS and HASH JOINS the hint takes one argument, which is the “inner” table, ie the one that get’s probed into.

For NESTED LOOPS the argument is the table probed in to (ie the second in order of access)

For HASH JOINS the argument is the table that gets hashed and probed into (the first table in order of access)

The part I find confusing is that the inner table appears in a different order for NESTED LOOPS and HASH JOINS. For NESTED LOOPS the inner table appears second, which makes sense to me. Thus I’d expect the inner table to also be the second table in the execution plan for HASH JOINS but before we can probe into the inner table in a HASH JOIN we have to hash it first, which means the first thing we access is the the inner table , thus the inner table the first thin in the execution plan.

For NESTED LOOPS  if order is “LEADING(X,Y) then  nested loops hint can only be on Y, ie USE_NL(Y) because the inner table can  only be the second table in order of execution.

For HASH JOINS if the order is “LEADING(X,Y) then the hash join hint can only be on X, ie USE_HASH(X) because the inner table has to be accessed first to make a hash of it before we can probe the hash with the outer table.

A couple of good references on USE_NL and USE_HASH

 

For more information see:


Uncategorized

  1. Trackbacks

  2. No trackbacks yet.
  1. Comments

  2. December 12th, 2011 at 11:01 | #1

    Kyle,

    Unfortunately none of your graphics show up in my browser..perhaps you can tell me what I need to make them visible?

    Anyway, the point I was making at the conference was not related to the applicability of VSTs to right deep join trees but to the concept of minimising the “running row count” as an optimisation method. Forgive the text based clarification that follows.

    Imagine you have a large fact table F and three dimension tables D1,D2,D3. Let us further assume that we have star transformations disabled and there are no bitmap join indexes.

    To minimise the running row count you might come up with D1,D2,D3,F as the join order as the cartesion join of the dimension tables will still leave you with a relatively small running row count. The join order F,D1,D2,D3 seems unattractive as we are starting with our very large table. However, this will likely be the optimal join order.

    We will probably get three hash joins with all join inputs swapped (a right deep join tree). As a result the three dimension tables will be cached before the fact table is accessed. Each row in the fact table will then be matched against the dimension tables. If a match is found against all three dimensions then the query outputs a row. At no stage does the intermediate result of the join of F and D1 ever actually materialize so its cardinality is of little relevance.

  3. December 12th, 2011 at 17:56 | #2

    Hi Tony,
    Thanks for stopping buy. Yes you are right about the star query limitations.
    For star type queries, the visualization itself is powerful and most articles I see on star queries include similar visualizations. On the other hand the basic optimization approach laid out for VST would be different for star type queries. I haven’t yet sat down and looked into how to leverage to the diagrams highlight optimization paths in star queries yet. Definitely something on the to do list.
    Not much left of the above post without the graphics. I email you word version.


× 5 = forty five