- Apache Hadoop 0.20.x (hadoop-20)
- Cloudera CDH3u4 (cdh3u4)
- Cloudera CDH4 (cdh4)
- MapR (mapr)
The values in parentheses in the list above are the folder names under the Big Data plugin's "hadoop-configurations", each of which contains JARs and other resources needed to run PDI against a particular distribution. To select a distribution for PDI to use, you edit the plugin.properties file in the Big Data plugin's root folder and set the "active.hadoop.configuration" property to one of the folder names above. The default setting is for Apache Hadoop 0.20.x:
active.hadoop.configuration=hadoop-20
Apache Hadoop 1.0.3 is not compatible with the Apache Hadoop 0.20.x line, and thus PDI doesn't work with 1.0.3 out-of-the-box. So I set out to find a way to make that happen.
First, I simply copied the hadoop-20 folder to a "hadoop-103" folder in the same directory (pentaho-big-data-plugin/hadoop-configurations/). Then I replaced the following JARs in the client/ subfolder with the versions from the Apache Hadoop 1.0.3 distribution:
commons-codec-<version>.jar
hadoop-core-<version>.jar
and I added the following JAR from the Hadoop 1.0.3 distribution to the client/ subfolder as well:
commons-configuration-<version>.jar
Then I changed the property in plugins.properties to point to my new folder:
active.hadoop.configuration=hadoop-103
Then I started PDI and was able to use steps like Hadoop Copy Files and Pentaho MapReduce (see the Wiki for How-Tos).
NOTE: I didn't try to get all functionality working or tested. Specifically, I didn't try anything related to Hive, HBase, Sqoop, or Oozie. For Hive, I'm hoping the PDI client will work against any Hive server running on an Apache Hadoop 0.20.x cluster, or any compatible configuration. If I test any of these Hadoop technologies, I will update this blog post.
If you try this procedure (for 1.0.3, 1.0.x, or any other Hadoop distribution), let me know if it works for you, especially if you had to do anything I haven't listed here :) Cheers!
Hi Matt, your configuration works also with 1.0.4.
ReplyDeleteThanks,
Davide