Thursday, September 13, 2012

Parent-Child lookup

I'm writing this post in response to this question at StackOverflow (and the Pentaho forum post here), unfortunately as I am a new member at Stack Overflow they won't let me post images :-/

The question was:

I have a source table (from a MS DB) that contains a field "ParentID", which references the PK of "ID"
The target Postgres table is filled (new "ID" PK inserted from sequence) and the "ParentID" is set to null. The stream has a the new PK inserted within it.
So now at this stage i have a valid stream of
"ID","ParentID", "NewTargetID"
100   NULL     1
101   100      2
How do i do a "Stream Lookup" (or maybe something else) and lookup the values to create a new stream field to give me a new field of "NewParentID"
"ID","ParentID","NewTargetID","NewParentID"
100   NULL    1     NULL -< (How do i generate this ?)
101   100       2     1 -< (How do i generate this ?)
Thanks
David

My response is:
One way is to have two copies of the data, so you can use one as the lookup stream and one as the data stream:

 The "Lookup by parent" step looks like this:



I ran this with a Data Grid step containing the values from the OP, and got these results:

I'll try to get this example transformation up on Box or our Wiki.

1 comment:

  1. Oops, looks like I forgot that the lookup step will read all rows from the info step before doing the lookup processing. For data sources with more than 10000 rows this will cause the transformation to hang. Thanks to Jens for pointing this out.

    That's what I get for rushing to a solution, just goes to show you there's no substitute for good design :)

    ReplyDelete