I got some pretty good feedback about my previous UDJC step that creates a Data Grid model of a Kettle stream, but I noticed right away that the process for using it is a bit long. You have to paste in the step, wire it into the trans, run the trans, remove the steps before the UDJC, then wire the Data Grid in place of the UDJC, delete the UDJC, and save the trans with a different name.
The Kettle API provides all of the above operations, so I thought I should upgrade the step to create a new transformation with the Data Grid step as the source, connected to the rest of the transformation (after the UDJC step). Along the way I also found some issues with threading and metadata in the previous UDJC step, so I fixed them in the new one.
The code for the new "Create Reproduction Transformation" UDJC step is located on Gist here. You "install" it the same way as the previous version, either by pasting the code into a new UDJC step, or by pasting it in from a transformation that already contains the filled-in UDJC step. Then you wire it up in the stream:
Then run the transformation. The transformation runs as if the step weren't there (it passes through all data), but a new transformation is created, containing the Data Grid step hooked up to the "rest of the transformation":
Looking at the Data Grid step, you can see the data from the stream:
One thing to remember is that the step the Data Grid is wired to might have referenced the previous step(s). Because the metadata is different for each step, I couldn't interrogate the step to see if it referenced previous steps, and thus I couldn't change the setting. This means if the step after the Data Grid references previous steps (like Merge Rows for example), you will have to edit it to reference the Data Grid instead.
Anyway, I hope this helps! If you try it out, please let me know how/if it works, if it's useful, and if you'd like to see any other features. Cheers!
No comments:
Post a Comment