Update code to support newer java versions #586

luisfponce · 2019-06-05T18:56:33Z

Compile HiBench using JDK 1.11 for hadoop 3.2.0 and spark 2.4.0

supporting the following benchmarks:

sparkbench
hadoopbench

Environment variables: JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk/
Compile command: mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming

Log:

[INFO] Reactor Summary:
[INFO] 
[INFO] hibench 7.1-SNAPSHOT ............................... SUCCESS [  0.188 s]
[INFO] hibench-common 7.1-SNAPSHOT ........................ SUCCESS [  5.859 s]
[INFO] HiBench data generation tools 7.1-SNAPSHOT ......... SUCCESS [ 11.859 s]
[INFO] sparkbench 7.1-SNAPSHOT ............................ SUCCESS [  0.014 s]
[INFO] sparkbench-common 7.1-SNAPSHOT ..................... SUCCESS [  7.313 s]
[INFO] sparkbench micro benchmark 7.1-SNAPSHOT ............ SUCCESS [  4.936 s]
[INFO] sparkbench machine learning benchmark 7.1-SNAPSHOT . SUCCESS [  8.397 s]
[INFO] sparkbench-websearch 7.1-SNAPSHOT .................. SUCCESS [  4.023 s]
[INFO] sparkbench-graph 7.1-SNAPSHOT ...................... SUCCESS [  6.131 s]
[INFO] sparkbench-sql 7.1-SNAPSHOT ........................ SUCCESS [  3.402 s]
[INFO] sparkbench project assembly 7.1-SNAPSHOT ........... SUCCESS [  9.242 s]
[INFO] hadoopbench 7.1-SNAPSHOT ........................... SUCCESS [  0.003 s]
[INFO] hadoopbench-sql 7.1-SNAPSHOT ....................... SUCCESS [  2.297 s]
[INFO] mahout 7.1-SNAPSHOT ................................ SUCCESS [  5.024 s]
[INFO] PEGASUS: A Peta-Scale Graph Mining System 2.0-SNAPSHOT SUCCESS [  0.942 s]
[INFO] nutchindexing 7.1-SNAPSHOT ......................... SUCCESS [  4.124 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

Compile HiBench using JDK 1.8 for hadoop 3.2.0 and spark 2.4.0

supporting the following benchmarks:

sparkbench
flinkbench
hadoopbench
stormbench
gearpumpbench

Environment variables: JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk/
Compile command: mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11

Log:

[INFO] Reactor Summary:
[INFO] 
[INFO] hibench 7.1-SNAPSHOT ............................... SUCCESS [  0.149 s]
[INFO] hibench-common 7.1-SNAPSHOT ........................ SUCCESS [  7.683 s]
[INFO] HiBench data generation tools 7.1-SNAPSHOT ......... SUCCESS [ 11.872 s]
[INFO] sparkbench 7.1-SNAPSHOT ............................ SUCCESS [  0.013 s]
[INFO] sparkbench-common 7.1-SNAPSHOT ..................... SUCCESS [  7.477 s]
[INFO] sparkbench micro benchmark 7.1-SNAPSHOT ............ SUCCESS [  5.131 s]
[INFO] sparkbench machine learning benchmark 7.1-SNAPSHOT . SUCCESS [ 10.215 s]
[INFO] sparkbench-websearch 7.1-SNAPSHOT .................. SUCCESS [  3.450 s]
[INFO] sparkbench-graph 7.1-SNAPSHOT ...................... SUCCESS [  7.457 s]
[INFO] sparkbench-sql 7.1-SNAPSHOT ........................ SUCCESS [  3.747 s]
[INFO] sparkbench-streaming 7.1-SNAPSHOT .................. SUCCESS [  5.236 s]
[INFO] sparkbench project assembly 7.1-SNAPSHOT ........... SUCCESS [ 10.339 s]
[INFO] flinkbench 7.1-SNAPSHOT ............................ SUCCESS [  0.003 s]
[INFO] flinkbench-streaming 7.1-SNAPSHOT .................. SUCCESS [  7.554 s]
[INFO] gearpumpbench 7.1-SNAPSHOT ......................... SUCCESS [  0.003 s]
[INFO] gearpumpbench-streaming 7.1-SNAPSHOT ............... SUCCESS [  7.013 s]
[INFO] hadoopbench 7.1-SNAPSHOT ........................... SUCCESS [  0.002 s]
[INFO] hadoopbench-sql 7.1-SNAPSHOT ....................... SUCCESS [  1.663 s]
[INFO] mahout 7.1-SNAPSHOT ................................ SUCCESS [  4.487 s]
[INFO] PEGASUS: A Peta-Scale Graph Mining System 2.0-SNAPSHOT SUCCESS [  0.817 s]
[INFO] nutchindexing 7.1-SNAPSHOT ......................... SUCCESS [  3.344 s]
[INFO] stormbench 7.1-SNAPSHOT ............................ SUCCESS [  0.001 s]
[INFO] stormbench-streaming 7.1-SNAPSHOT .................. SUCCESS [  2.136 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------

luisfponce · 2019-06-06T14:31:37Z

Hi @carsonwang,

I work for SSP Intel doing Data Analytics Reference Stack
I'm wondering if is possible to merge this, due we are looking for HiBench to be built using latest java version to test Spark and Hadoop.

Best regards,
Luis

carsonwang · 2019-06-17T09:02:23Z

@luisfponce , thank you for working on this. We are reviewing and validating this.

common/src/main/scala/com/intel/hibench/common/streaming/metrics/KafkaCollector.scala

common/pom.xml

.travis.yml

autogen/src/main/java/org/apache/hadoop/fs/dfsioe/TestDFSIO.java

autogen/src/main/java/org/apache/hadoop/fs/dfsioe/TestDFSIOEnh.java

.travis.yml

sparkbench/common/pom.xml

carsonwang · 2019-07-31T07:04:19Z

@gczsjdy , can you help take a look at the latest update?

gcz2022 · 2019-07-31T10:14:49Z

@carsonwang No problem.

.travis.yml

gcz2022 · 2019-08-01T02:53:47Z

README.md

-  - Hadoop: Apache Hadoop 2.x, CDH5, HDP
-  - Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x
+### Supported Hadoop/Spark releases: ###
+  - Hadoop: Apache Hadoop 2.x, 3.x, CDH5, HDP


Did you test Hadoop 3.0, 3.1, 3.3?
Otherwise 2.x, 3.2?

Why do you separate streaming/non-streaming frameworks？ I don't see a very good reason.

Q: Did you test Hadoop 3.0, 3.1, 3.3?
A: No, you right, Otherwise 2.x, 3.2.

Q: Why do you separate streaming/non-streaming frameworks？
A: Because Scala < 2.12 does not compiles on java 1.11 jdk and, scala 2.12
requires to change (or bump) the package org.apache.kafka from 0.8.2.1 to at least 0.10.2.2 and then the whole code related with Kafka and streaming testing must be ported.
This last kafka version (0.10.2.2) will require to modify following classes:

KafkaCollector.scala

KafkaConsumer.scala

MetricsUtil.scala

So, bottom line, as mentioned in previous comment for @carsonwang, to avoid break the streaming benchmarks in scala 2.11 and 2.10 was streaming/non-streaming frameworks split.

Done: Otherwise 2.x, 3.2.

gcz2022 · 2019-08-01T03:14:55Z

docs/build-hibench.md

+    mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
+
+Supported frameworks only: hadoopbench, sparkbench, (Not yet tested flinkbench, stormbench, gearpumpbench)
+Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql)


I think all modules can be built under JDK8? We normally use 8 in our environment.

For Spark 2.4 wont support sql benchmarks, Hive not used anymore.
I can be more specific on this and document that for Spark xx version SQL benchmarks not supported.

I'm not sure about the Not yet tested part, leaving it on master seems... @carsonwang

I got rid off it, to avoid causing noise in master.

gcz2022 · 2019-08-01T03:20:35Z

docs/build-hibench.md

+Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql)
+
+### Build using JDK 1.11
+If you are interested in building using Java 11 indicate that streaming benchmarks won't be compiled also, specify scala, spark and hadoop version as below


Should we also specify:

Which scala version(besides 2.12) is compatible with JDK11?

Which Hadoop/Spark version is compatible with JDK11?
About the streaming benchmarks support, I think it's okay to lack some streaming(Flink, Gearpump, Spark Streaming, but not Structured Streaming) support on new versions, as long as we pointed it out clearly.

Q: Which Hadoop/Spark version is compatible with JDK11?
A: This is not my area, but documentation could be more specific if required.
At least it can be wrote down that Scala2.12 + JDK11 + Spark2.4 (Compiled with Scala 2.12) works excluding the streaming and SQL benchmarks)_

Yes, please indicate that.

gcz2022 · 2019-08-01T03:26:12Z

docs/build-hibench.md

+    mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming
+
+Supported frameworks only: hadoopbench, sparkbench (Does not support flinkbench, stormbench, gearpumpbench)
+Supported modules includes: micro, ml(machine learning), websearch and graph (does not support streaming, structuredStreaming and sql)


What problem did SQL module meet? It's an essential part of Spark, leaving it alone makes not much sense : )

Structured Streaming is a part of SQL, so making SQL work can also benefit SS

Q: What problem did SQL module meet?
A: For newer versions of Spark HiveContext is deprecated, I can point out in de documentation that if -Dspark=2.4 or further versions required then SQLBench won't work.
(Again here it is necessary an update/port of ScalaSparkSQLBench.scala code)

HiveContext is deprecated In Spark 2, HiveContext is deprecated. Replace all usage with an instantiation of the singletonSparkSession: val spark: SparkSession = SparkSession.builder .config(conf) .enableHiveSupport() .getOrCreate() Most functionality of HiveContext is now available directly on the SparkSession instance. Note that, if you need them, SparkContext and SQLContext are now properties of SparkSession: val sc = spark.sparkContext val sqlContext = spark.sqlContext

Info Source

Thanks I got it.
I think supporting Spark 2.4 without SQL module is quite weird. I can think of 2 ways:

Drop Spark 1.6 support and modify the ScalaSparkSQLBench.scala to use SparkSession, which is introduced in Spark 2.0

Create another seperate ScalaSparkSQLBench, deciding which class to use by Spark version

I like the first one better, newer HiBench version should drop some old codebase. cc @carsonwang

gcz2022

Thanks @luisfponce , I left some comments.

gcz2022 · 2019-08-06T05:26:44Z

.travis.yml

+        export HDFS_DATANODE_USER=root
+        export HDFS_SECONDARYNAMENODE_USER=root
+        export YARN_RESOURCEMANAGER_USER=root
+        export YARN_NODEMANAGER_USER=root


Has it to be root? What if we don't set these environment variables?

Q: Has it to be root?
A: Not really, depending on the user.

Q: What if we don't set these environment variables?
A: If those variables were not set (only starting Hadoop 3.2 services) I got:

start-dfs.sh:

Starting namenodes on [localhost] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [ubuntu-hib] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

start-yarn.sh

Starting resourcemanager ERROR: Attempting to operate on yarn resourcemanager as root ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation. Starting nodemanagers ERROR: Attempting to operate on yarn nodemanager as root ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.

However, I moved:

HDFS_NAMENODE_USER=$USER
HDFS_DATANODE_USER=$USER
HDFS_SECONDARYNAMENODE_USER=$USER
YARN_RESOURCEMANAGER_USER=$USER
YARN_NODEMANAGER_USER=$USER

to hadoop-env.sh, and now it is user agnostic, and travis.yml looks cleaner.

gcz2022 · 2019-08-06T05:30:33Z

.travis.yml

+        export HADOOP_HDFS_HOME=$HADOOP_HOME
+        export YARN_HOME=$HADOOP_HOME
+        export HADOOP_INSTALL=$HADOOP_HOME
+        export SPARK_DIST_CLASSPATH=$(/opt/$HADOOP_BINARIES_FOLDER/bin/hadoop classpath)


I think it's better to remove unnecessary envs(Line 46-54), I suppose even if they are not set, Spark/Hadoop will probe the right HOME, and that's verified in the original travis(for Spark 1.6, though).

Correct, I will get rid off them.

gcz2022 · 2019-08-06T05:32:00Z

.travis.yml

+
+    sudo -E ./travis/configssh.sh
+    sudo -E ./travis/restart_hadoop_spark.sh
+    sudo -E ./bin/run_all.sh


:nit new line
And other files. : )

gcz2022 · 2019-08-06T05:32:28Z

.travis.yml

-  - cp ./travis/spark.conf ./conf/
-  - /opt/hadoop-2.6.5/bin/yarn node -list 2
-  - sudo -E ./bin/run_all.sh
+  - |


:nit remove this

I used this pipe (literal style) because since my perspective looks cleaner when putting script in yaml files, avoiding writting \ every line.

Other way it would look like this example:

script: - if [[ "$java_ver" == 11 ]]; then \ mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming \ elif [[ "$java_ver" == 8 ]]; then \ mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 \ elif [[ "$java_ver" == 7 ]]; then \ mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11 \ mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11 \ mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10 \ else \ exit 1 \ fi - sudo -E ./travis/configssh.sh - sudo -E ./travis/restart_hadoop_spark.sh - sudo -E ./bin/run_all.sh

instead of currently it is:

script: - | if [[ "$java_ver" == 11 ]]; then mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming elif [[ "$java_ver" == 8 ]]; then mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 elif [[ "$java_ver" == 7 ]]; then mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11 mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11 mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10 else exit 1 fi sudo -E ./travis/configssh.sh sudo -E ./travis/restart_hadoop_spark.sh sudo -E ./bin/run_all.sh

Up to you, for me both ways still working, (and Mr. Yaml lint indicates both ways works too)

Thanks, but seems

if [[ "$java_ver" == 11 ]]; then mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming elif [[ "$java_ver" == 8 ]]; then mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 elif [[ "$java_ver" == 7 ]]; then mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11 mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11 mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10 else exit 1 fi sudo -E ./travis/configssh.sh sudo -E ./travis/restart_hadoop_spark.sh sudo -E ./bin/run_all.sh

without any pipes is also valid, the \ns will be automatically escaped in travis?

gcz2022 · 2019-08-06T05:38:33Z

travis/config_hadoop_spark.sh

+    cp ./travis/artifacts/hadoop32/mapred-site.xml $HADOOP_CONF_DIR
+    cp ./travis/artifacts/hadoop32/yarn-site.xml $HADOOP_CONF_DIR
+    sed -i "s|<maven.compiler.source>1.6</maven.compiler.source>|<maven.compiler.source>1.8</maven.compiler.source>|g" pom.xml
+    sed -i "s|<maven.compiler.target>1.6</maven.compiler.target>|<maven.compiler.target>1.8</maven.compiler.target>|g" pom.xml


Why not 1.11?

My bad, will changed for Java11 + maven compiler version 3.8
Source: Choose Java Version

.travis.yml

gcz2022 · 2019-08-06T06:32:15Z

docs/build-hibench.md

+    mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming
+
+Supported frameworks only: hadoopbench, sparkbench (Does not support flinkbench, stormbench, gearpumpbench)
+Supported modules includes: micro, ml(machine learning), websearch and graph (does not support streaming, structuredStreaming and sql)


Thanks I got it.
I think supporting Spark 2.4 without SQL module is quite weird. I can think of 2 ways:

Drop Spark 1.6 support and modify the ScalaSparkSQLBench.scala to use SparkSession, which is introduced in Spark 2.0

Create another seperate ScalaSparkSQLBench, deciding which class to use by Spark version

I like the first one better, newer HiBench version should drop some old codebase. cc @carsonwang

gcz2022 · 2019-08-06T06:46:16Z

docs/build-hibench.md

+### Build using JDK 1.8
+If you are interested in building using Java 11 specify scala, spark and hadoop version as below
+
+    mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11


This might be misleading that Java 8 can only be used with the specified Scala/Hadoop/Spark version. I think we can drop this section and only leave JDK 11 section.

No problem.

gcz2022 · 2019-08-06T06:47:08Z

docs/build-hibench.md

+    mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
+
+Supported frameworks only: hadoopbench, sparkbench, (Not yet tested flinkbench, stormbench, gearpumpbench)
+Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql)


I'm not sure about the Not yet tested part, leaving it on master seems... @carsonwang

gcz2022 · 2019-08-06T06:48:07Z

docs/build-hibench.md

+Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql)
+
+### Build using JDK 1.11
+If you are interested in building using Java 11 indicate that streaming benchmarks won't be compiled also, specify scala, spark and hadoop version as below


Yes, please indicate that.

* sparkbench/assembly/pom.xml: * Changed property name activation on `allModules` profile. * Added new profile that excludes `sparkbench-streaming` artifact. * sparkbench/pom.xml: * Changed property name activation on `allModules` profile. * Added new profile that excludes `streaming` module. * Added profile spark2.4 due spark-core_2.12 supports > 2.4.0 version. * Added profile scala 2.12. Scala < 2.12 does not compiles on java 1.11 jdk. * Added profile hadoop3.2 to propagate this variable to all spark benchmark. * sparkbecnh/streaming/pom.xml: * Added profile spark2.4 on sparkbench-streaming POM with spark-streaming-kafka-0-8_2.11 version 2.4.0. Signed-off-by: Luis Ponce <luis.f.ponce.navarro@linux.intel.com>

luisfponce · 2019-08-26T17:58:21Z

Hi @carsonwang, @gczsjdy

Important questions here:

Apache Hadoop 3.x support only Java 8 according to official website Hadoop Java Versions, and Java 11 support is WIP.

So, if HiBench is built using source and target JDK11 and then is run in Travis CI, then we get:

Exception in thread "main" java.lang.UnsupportedClassVersionError: 
HiBench/DataGen has been compiled by a more recent version of the Java Runtime (class file version 55.0), 
this version of the Java Runtime only recognizes class file versions up to 52.0

Nevertheless Clearlinux (and possibly other clients that has patched Hadoop too ) has compiled Hadoop 3.2 and Spark 2.4 using Java 11 patches as Data Analytics Reference Stacks documentation did.

Is there a way to compile HiBench using Java11 but skip the testing part?

We'd like to get HiBench JDK11 from upstream, and that's why this PR contribution.
By the way following benchmarks passes in both Spark and Hadoop frameworks built using Java 11 on following image Clearlinux DARS MKL that I personally test:

micro.sort
ml.bayes
ml.pca
ml.gbt
ml.rf
ml.svd
ml.lda
ml.svm
websearch.nutchindexing
graph.nweight

Hi Bench log on following link:
https://gist.github.com/luisfponce/4c25c353c0e13e34556d356970766ae5

* Moved mapred-site and yarn-site xml files to a created a folder that contains the artifacts for either hadoop 2.6 or 3.2, those will be pickd up depending on the testing needs in travis.yml * Moved spark-env file to a created a folder that contains the artifacts for either spark1.6 or 2.4, those will be pickd up depending on the testing needs in travis.yml * Created hadoop-env.sh file for Hadoop 3.2 to store required environment variables to start hdfs and yarn services. * Removed harcoded values from haddop.conf and spark.conf, this will be filled up depending on the testing needs. * Added an `install_hadoop_spark` script that will download hadoop and spark binaries depending on the testing needs. * Added a `config_hadoop_spark` script that will setup hadoop, spark and hibench depending on the testing needs. * Added a `jdk_ver` script to pick up the current java version installed for travis CI. * `restart_hadoop_spark` script modified to be agnostic to the required binaries for testing. * travis/config_hadoop_spark.sh: * for Java 8 and 11 skiping `sql` test since HIVE is no longer used to perform queries. Newer Spark version perform queries using `SparkSession` no longer used `import org.apache.spark.sql` * .travis.yml: * Added `dist: trusty` to keep using this distro, Travis picks up xenial if not especified.. If Any greather Ubuntu version required in Travis won't support openjdk 7. * Refactored the CI flow to behave, download, setup, run and test hadoop and spark depending on the jdk required either versions 7, 8 and 11. * Hibench will be configured depending on the jdk required either versions 7, 8 and 11. * Hibench will be built depending on the jdk required either versions 7, 8 and 11. * benchmarks will be run for all jdk versions set. Signed-off-by: Luis Ponce <luis.f.ponce.navarro@linux.intel.com>

* autogen/pom.xml * Add hadoop mr2 profile to be used for hadoop hdfs and client. Signed-off-by: Luis Ponce <luis.f.ponce.navarro@linux.intel.com>

* docs/build-hibench.md: * Update 2.4 version to specify Spark Version. * Add Specify Hadoop version documentation. * Add Build using JDK 11 documentation. * README.md: * Update Supported Hadoop/Spark releases to hadoop 3.2 and spark 2.4 Signed-off-by: Luis Ponce <luis.f.ponce.navarro@linux.intel.com>

carsonwang

@luisfponce, I noticed one issue in the pom. Others look good to me.

carsonwang · 2019-09-06T06:16:58Z

docs/build-hibench.md

@@ -37,6 +37,11 @@ default . For example , if we want use spark2.0 and scala2.11 to build hibench.
 package` , but for spark2.0 and scala2.10 , we need use the command `mvn -Dspark=2.0 -Dscala=2.10 clean package` .
 Similarly , the spark1.6 is associated with the scala2.10 by default.

+### Specify Hadoop Version ###
+To specify the spark version, use -Dhadoop=xxx(3.2). By default, it builds for hadoop 2.4


nit: spark version -> hadoop version

carsonwang · 2019-09-06T06:30:55Z

sparkbench/assembly/pom.xml

@@ -159,7 +159,43 @@
      </dependencies>
      <activation>
        <property>
-          <name>!modules</name>
+          <name>!exclude-streaming</name>


If a user specifies modules=xxx and doesn't specify exclude-streaming, this allModules will be activated, which is not expected.

gcz2022 · 2019-09-09T10:31:06Z

travis/artifacts/hadoop32/mapred-site.xml

+      <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
+    </property>
+
+</configuration>


:nit empty line

gcz2022 · 2019-09-09T10:32:45Z

travis/artifacts/hadoop32/hdfs-site.xml

+  <property>
+    <name>dfs.client.use.datanode.hostname</name>
+    <value>true</value>
+  </property>


gcz2022 · 2019-09-09T11:02:51Z

docs/build-hibench.md

@@ -28,7 +28,7 @@ Because some Maven plugins cannot support Scala version perfectly, there are som


 ### Specify Spark Version ###
-To specify the spark version, use -Dspark=xxx(1.6, 2.0, 2.1 or 2.2). By default, it builds for spark 2.0
+To specify the spark version, use -Dspark=xxx(1.6, 2.0, 2.1, 2.2 or 2.4). By default, it builds for spark 2.0


Actually this Spark 2.4 support doesn't include SQL module?
This is the only main remaining concern for this patch, see
#586 (comment)
I think we can drop Spark 1.6 support and modify the SQL module code to support 2.4 in HiBench 8.0, whoever needs 1.6 can go to HiBench 7.0. @carsonwang

gcz2022 · 2019-09-09T11:05:51Z

.travis.yml

+    if [[ "$java_ver" == 11 ]]; then
+        mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming
+    elif [[ "$java_ver" == 8 ]]; then
+        mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11


Curious about this, even if we don't run SQL module tests for Spark2.4, how did the compiling work...

william-wang · 2020-02-17T03:35:09Z

Is there any progress on this ticket, when will this ticket be avaliable?

luisfponce · 2020-02-19T16:42:10Z

Will retake it, resolve conflicts and get back to you @william-wang @gczsjdy

sajanraj · 2020-02-27T08:45:59Z

[ERROR] [ERROR] Some problems were encountered while processing the POMs:
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 22, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 27, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 26, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-mllib_2.10:jar must be a valid version but is '${spark.version}'. @ line 32, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 27, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 27, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-mllib_2.10:jar must be a valid version but is '${spark.version}'. @ line 33, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-graphx_2.10:jar must be a valid version but is '${spark.version}'. @ line 39, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 28, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-hive_2.10:jar must be a valid version but is '${spark.version}'. @ line 34, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 26, column 16
[ERROR] 'dependencies.dependency.version' for org.apache.spark:spark-streaming_2.10:jar must be a valid version but is '${spark.version}'. @ line 32, column 16
 @
[ERROR] The build could not read 7 projects -> [Help 1]
[ERROR]
[ERROR]   The project com.intel.hibench.sparkbench:sparkbench-common:7.1-SNAPSHOT (/home/sajanraj_t_d/metro/HiBench/sparkbench/common/pom.xml) has 1 error
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 22, column 16
[ERROR]
[ERROR]   The project com.intel.hibench.sparkbench:sparkbench-micro:7.1-SNAPSHOT (/home/sajanraj_t_d/metro/HiBench/sparkbench/micro/pom.xml) has 1 error
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 27, column 16
[ERROR]
[ERROR]   The project com.intel.hibench.sparkbench:sparkbench-ml:7.1-SNAPSHOT (/home/sajanraj_t_d/metro/HiBench/sparkbench/ml/pom.xml) has 2 errors
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 26, column 16
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-mllib_2.10:jar must be a valid version but is '${spark.version}'. @ line 32, column 16
[ERROR]
[ERROR]   The project com.intel.hibench.sparkbench:sparkbench-websearch:7.1-SNAPSHOT (/home/sajanraj_t_d/metro/HiBench/sparkbench/websearch/pom.xml) has 1 error
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 27, column 16
[ERROR]
[ERROR]   The project com.intel.hibench.sparkbench:sparkbench-graph:7.1-SNAPSHOT (/home/sajanraj_t_d/metro/HiBench/sparkbench/graph/pom.xml) has 3 errors
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 27, column 16
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-mllib_2.10:jar must be a valid version but is '${spark.version}'. @ line 33, column 16
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-graphx_2.10:jar must be a valid version but is '${spark.version}'. @ line 39, column 16
[ERROR]
[ERROR]   The project com.intel.hibench.sparkbench:sparkbench-sql:7.1-SNAPSHOT (/home/sajanraj_t_d/metro/HiBench/sparkbench/sql/pom.xml) has 2 errors
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-core_2.10:jar must be a valid version but is '${spark.version}'. @ line 28, column 16
[ERROR]     'dependencies.dependency.version' for org.apache.spark:spark-hive_2.10:jar must be a valid version but is '${spark.version}'. @ line 34, column 16

mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming getting above error, is there any fix for this?

luisfponce force-pushed the update_code_to_support_newer_java_versions branch 9 times, most recently from b20b1f1 to cd25224 Compare June 6, 2019 00:29

carsonwang reviewed Jun 18, 2019

View reviewed changes

common/src/main/scala/com/intel/hibench/common/streaming/metrics/KafkaCollector.scala Outdated Show resolved Hide resolved

common/pom.xml Outdated Show resolved Hide resolved

gcz2022 reviewed Jun 19, 2019

View reviewed changes

.travis.yml Show resolved Hide resolved

.travis.yml Outdated Show resolved Hide resolved

autogen/src/main/java/org/apache/hadoop/fs/dfsioe/TestDFSIO.java Outdated Show resolved Hide resolved

autogen/src/main/java/org/apache/hadoop/fs/dfsioe/TestDFSIOEnh.java Show resolved Hide resolved

luisfponce force-pushed the update_code_to_support_newer_java_versions branch 15 times, most recently from 731dcac to 6455c21 Compare June 24, 2019 21:30

carsonwang mentioned this pull request Jul 2, 2019

Spark 2.4 Support #583

Open

gcz2022 reviewed Jul 3, 2019

View reviewed changes

.travis.yml Outdated Show resolved Hide resolved

.travis.yml Show resolved Hide resolved

sparkbench/common/pom.xml Outdated Show resolved Hide resolved

luisfponce force-pushed the update_code_to_support_newer_java_versions branch 3 times, most recently from 90502d6 to 51e5c71 Compare July 24, 2019 22:55

gcz2022 reviewed Aug 1, 2019

View reviewed changes

gcz2022 reviewed Aug 6, 2019

View reviewed changes

luisfponce force-pushed the update_code_to_support_newer_java_versions branch from 51e5c71 to bc996f9 Compare August 22, 2019 18:10

luisfponce force-pushed the update_code_to_support_newer_java_versions branch from bc996f9 to a300f3b Compare August 22, 2019 19:01

luisfponce force-pushed the update_code_to_support_newer_java_versions branch 6 times, most recently from a97527b to e631f22 Compare August 27, 2019 20:35

luisfponce added 3 commits August 27, 2019 16:06

Update autogen

27085de

* autogen/pom.xml * Add hadoop mr2 profile to be used for hadoop hdfs and client. Signed-off-by: Luis Ponce <luis.f.ponce.navarro@linux.intel.com>

luisfponce force-pushed the update_code_to_support_newer_java_versions branch from e631f22 to 0e48596 Compare August 27, 2019 21:07

carsonwang reviewed Sep 6, 2019

View reviewed changes

gcz2022 reviewed Sep 9, 2019

View reviewed changes

gcz2022 mentioned this pull request Apr 8, 2020

HiBench compatibility issues with Hadoop 3.x in micro.dfsioe prepare part... #546

Open

gcz2022 mentioned this pull request Apr 17, 2020

Many workloads do not work in CDH6.0.0( Hadoop3 ) #614

Open

Update code to support newer java versions #586

Are you sure you want to change the base?

Update code to support newer java versions #586

Conversation

luisfponce commented Jun 5, 2019 • edited Loading

Compile HiBench using JDK 1.11 for hadoop 3.2.0 and spark 2.4.0

Compile HiBench using JDK 1.8 for hadoop 3.2.0 and spark 2.4.0

luisfponce commented Jun 6, 2019 • edited Loading

carsonwang commented Jun 17, 2019

carsonwang commented Jul 31, 2019

gcz2022 commented Jul 31, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gcz2022 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luisfponce Aug 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luisfponce Aug 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luisfponce Aug 22, 2019 • edited Loading

Choose a reason for hiding this comment

luisfponce Aug 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luisfponce commented Aug 26, 2019 • edited Loading

carsonwang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

william-wang commented Feb 17, 2020

luisfponce commented Feb 19, 2020

sajanraj commented Feb 27, 2020

luisfponce commented Jun 5, 2019 •

edited

Loading

luisfponce commented Jun 6, 2019 •

edited

Loading

luisfponce Aug 22, 2019 •

edited

Loading

luisfponce Aug 22, 2019 •

edited

Loading

luisfponce Aug 22, 2019 •

edited

Loading

luisfponce Aug 22, 2019 •

edited

Loading

luisfponce commented Aug 26, 2019 •

edited

Loading