Sunday, January 13, 2013

New PDI/Kettle project structure

In case you haven't heard, the Kettle project in Subversion has been restructured to be cleaner and to use Apache Ivy for dependency management.  This has been a long time coming, and PDI/Kettle is now more consistent with other Pentaho projects.  The "cut-over" from the old project to the new is scheduled to occur on Monday, January 14, 2013.

If you currently have changes in a working copy of Kettle trunk, you should not commit them into the new structure as it has changed.  For example, all the Kettle modules' source code used to reside in folders named src-<module_name>, such as src-core or src-db.  The modules have been reorganized such that you can check out and build individual modules if you choose.  So now each module has its own folder under the root, such as core/ and db/.  Inside these folders are src/ folders, which contain the same files and package structure as the old Kettle project.  So the files that used to be in src-core/ are now in core/src.

Other structural changes may impact your working copy as well. For example, the old "ui/" folder is now located at "assembly/package-res/", because "ui" is a Kettle module so the "ui" folder now contains the contents of "src-ui/".  The Ant build scripts have been updated to reflect this, and there is now a "create-dot-classpath" Ant target that will generate a ".classpath" and "<project_name>.launch" file to get you up and going in Eclipse.

For more information, consult the README.txt file in the root folder, as well as the readme.txt file in the plugins/ folder.

I wrote a quick Groovy script to try and provide a mapping of any changed files you may have in your current working copy, so you will know where to commit the changes. I tried to extend it to create diff files to be used as patches, but I could never get it to work very well, probably because I'm using Cygwin on Windows with its Subversion command-line client.  It shouldn't be too hard for Linux users familiar with Groovy to extend the following script to try and automate the patch creation process.

The script is very simple and looks like this:

def module_map = ['src-core':'core/src',

'svn status'.execute().text.eachLine {line ->
    def svn_op = line.charAt(0)
    def old_fname = line.substring(1).trim()
    def path_segments = old_fname.split('/')
    def old_module = path_segments[0]
    def new_module = module_map.get(old_module)
    def file_mapping = new_module ? "$old_fname -> ${new_module}/${path_segments[1..-1].join('/')}" : old_fname
    println "${svn_op}\t${file_mapping}"

I also put the script on Gist here.

The module-map is the key to the location of the restructured files, the script simply calls "svn status" and for each changed file in your working copy, it will use the module-map to show the new location of the file.  Note that some files will not be mapped to a new location; this is either because the location hasn't changed, or because I forgot a mapping :-P

To use it, simply create a script called merge_helper.groovy (or whatever you want to call it) with the above contents and place it in your working copy of the old project structure.  Run the script with the command "groovy merge_helper.groovy" and it should show you output that will look something like this:

M       .classpath
M       src-plugins/market/src/org/pentaho/di/core/market/ -> plugins/market/src/org/pentaho/di/core/market/
M       src-plugins/market/src/org/pentaho/di/core/market/messages/ -> plugins/market/src/org/pentaho/di/core/market/messages/
M       src-plugins/market/src/org/pentaho/di/ui/spoon/dialog/ -> plugins/market/src/org/pentaho/di/ui/spoon/dialog/
M       src-ui/org/pentaho/di/ui/spoon/job/ -> ui/src/org/pentaho/di/ui/spoon/job/

If you have changes to ".classpath", be warned that there is no version-controlled file called ".classpath" any longer.  If you have new JARs or source folders to commit, please consult the README files for guidance on how to update the appropriate files. Also you can comment on this blog post with any questions about the migration process.

We hope you will find the new project structure easier to use, and the use of Apache Ivy will allow us to avoid many of the headaches that come with upgrading third-party JARs, especially to ensure that Pentaho products (which depend on each other) are using the same (or compatible) versions of their dependencies.