Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update code to support newer java versions #586

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 47 additions & 23 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
dist: trusty
sudo: required
language: java
jdk:
- openjdk11
- openjdk8
luisfponce marked this conversation as resolved.
Show resolved Hide resolved
- openjdk7
before_install:
- cat /etc/hosts # optionally check the content *before*
Expand All @@ -10,32 +13,53 @@ before_install:
- cat /proc/cpuinfo | grep cores | wc -l
- free -h
install:
- hibench=$(pwd)
- cd /opt/
- wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.6.tgz
- tar -xzf spark-1.6.0-bin-hadoop2.6.tgz
- wget https://archive.apache.org/dist/hadoop/core/hadoop-2.6.5/hadoop-2.6.5.tar.gz
- tar -xzf hadoop-2.6.5.tar.gz
- cd ${hibench}
- cp ./travis/spark-env.sh /opt/spark-1.6.0-bin-hadoop2.6/conf/
- cp ./travis/core-site.xml /opt/hadoop-2.6.5/etc/hadoop/
- cp ./travis/hdfs-site.xml /opt/hadoop-2.6.5/etc/hadoop/
- cp ./travis/mapred-site.xml /opt/hadoop-2.6.5/etc/hadoop/
- cp ./travis/yarn-site.xml /opt/hadoop-2.6.5/etc/hadoop/
- cp ./travis/hibench.conf ./conf/
- cp ./travis/benchmarks.lst ./conf/
- |
export java_ver=$(./travis/jdk_ver.sh)
if [[ "$java_ver" == 11 ]]; then
export HADOOP_VER=3.2.0
export SPARK_VER=2.4.3
export SPARK_PACKAGE_TYPE=without-hadoop-scala-2.12
elif [[ "$java_ver" == 8 ]]; then
export HADOOP_VER=3.2.0
export SPARK_VER=2.4.3
export SPARK_PACKAGE_TYPE=without-hadoop
elif [[ "$java_ver" == 7 ]]; then
export HADOOP_VER=2.6.5
export SPARK_VER=1.6.0
export SPARK_PACKAGE_TYPE=hadoop2.6
else
exit 1
fi

# Folders where are stored Spark and Hadoop depending on version required
export SPARK_BINARIES_FOLDER=spark-$SPARK_VER-bin-$SPARK_PACKAGE_TYPE
export HADOOP_BINARIES_FOLDER=hadoop-$HADOOP_VER
export HADOOP_CONF_DIR=/opt/$HADOOP_BINARIES_FOLDER/etc/hadoop/
export HADOOP_HOME=/opt/$HADOOP_BINARIES_FOLDER

sudo -E ./travis/install_hadoop_spark.sh
sudo -E ./travis/config_hadoop_spark.sh
luisfponce marked this conversation as resolved.
Show resolved Hide resolved
before_script:
- "export JAVA_OPTS=-Xmx512m"
cache:
directories:
- $HOME/.m2
script:
- mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11
- mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11
- mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10
- sudo -E ./travis/configssh.sh
- sudo -E ./travis/restart_hadoop_spark.sh
- cp ./travis/hadoop.conf ./conf/
- cp ./travis/spark.conf ./conf/
- /opt/hadoop-2.6.5/bin/yarn node -list 2
- sudo -E ./bin/run_all.sh
luisfponce marked this conversation as resolved.
Show resolved Hide resolved
- |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:nit remove this

Copy link
Author

@luisfponce luisfponce Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this pipe (literal style) because since my perspective looks cleaner when putting script in yaml files, avoiding writting \ every line.

Other way it would look like this example:

script:
  - if [[ "$java_ver" == 11 ]]; then \
        mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming \
    elif [[ "$java_ver" == 8 ]]; then \
        mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 \
    elif [[ "$java_ver" == 7 ]]; then \
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11 \
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11 \
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10 \
    else \
        exit 1 \
    fi

  - sudo -E ./travis/configssh.sh
  - sudo -E ./travis/restart_hadoop_spark.sh
  - sudo -E ./bin/run_all.sh

instead of currently it is:

script:
  - |
    if [[ "$java_ver" == 11 ]]; then
        mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming
    elif [[ "$java_ver" == 8 ]]; then
        mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
    elif [[ "$java_ver" == 7 ]]; then
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10
    else
        exit 1
    fi

    sudo -E ./travis/configssh.sh
    sudo -E ./travis/restart_hadoop_spark.sh
    sudo -E ./bin/run_all.sh

Up to you, for me both ways still working, (and Mr. Yaml lint indicates both ways works too)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but seems

    if [[ "$java_ver" == 11 ]]; then
        mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming
    elif [[ "$java_ver" == 8 ]]; then
        mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
    elif [[ "$java_ver" == 7 ]]; then
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11
        mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10
    else
        exit 1
    fi

    sudo -E ./travis/configssh.sh
    sudo -E ./travis/restart_hadoop_spark.sh
    sudo -E ./bin/run_all.sh

without any pipes is also valid, the \ns will be automatically escaped in travis?

if [[ "$java_ver" == 11 ]]; then
mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming
elif [[ "$java_ver" == 8 ]]; then
mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about this, even if we don't run SQL module tests for Spark2.4, how did the compiling work...

sudo -E ./travis/configssh.sh
sudo -E ./travis/restart_hadoop_spark.sh
sudo -E ./bin/run_all.sh
elif [[ "$java_ver" == 7 ]]; then
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10
sudo -E ./travis/configssh.sh
sudo -E ./travis/restart_hadoop_spark.sh
sudo -E ./bin/run_all.sh
else
exit 1
fi
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,12 +135,12 @@ There are totally 19 workloads in HiBench. The workloads are divided into 6 cate
4. Fixwindow (fixwindow)

The workloads performs a window based aggregation. It tests the performance of window operation in the streaming frameworks.


### Supported Hadoop/Spark/Flink/Storm/Gearpump releases: ###

- Hadoop: Apache Hadoop 2.x, CDH5, HDP
- Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x
### Supported Hadoop/Spark releases: ###
- Hadoop: Apache Hadoop 2.x, 3.2, CDH5, HDP
- Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x, Spark 2.4.x

### Supported Flink/Storm/Gearpump releases: ###
- Flink: 1.0.3
- Storm: 1.0.1
- Gearpump: 0.8.1
Expand Down
15 changes: 14 additions & 1 deletion autogen/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,20 @@
<version>${hadoop.mr2.version}</version>
</dependency>
</dependencies>

<profiles>
<profile>
<id>hadoop3.2</id>
<properties>
<hadoop.mr2.version>3.2.0</hadoop.mr2.version>
</properties>
<activation>
<property>
<name>hadoop</name>
<value>3.2</value>
</property>
</activation>
</profile>
</profiles>
<build>
<plugins>
<plugin>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
import java.util.Date;
import java.util.StringTokenizer;

import org.apache.commons.logging.*;

import org.apache.hadoop.fs.*;
import org.apache.hadoop.mapred.*;
Expand All @@ -33,6 +32,8 @@
import org.apache.hadoop.conf.*;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* Distributed i/o benchmark.
Expand Down Expand Up @@ -69,8 +70,7 @@ public class TestDFSIO extends Configured implements Tool {
private static final int DEFAULT_BUFFER_SIZE = 1000000;
private static final String BASE_FILE_NAME = "test_io_";
private static final String DEFAULT_RES_FILE_NAME = "TestDFSIO_results.log";

private static final Log LOG = FileInputFormat.LOG;
private static final Logger LOG = LoggerFactory.getLogger(FileInputFormat.class);
private static Configuration fsConfig = new Configuration();
private static final long MEGA = 0x100000;
private static String TEST_ROOT_DIR = System.getProperty("test.build.data","/benchmarks/TestDFSIO");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@

import java.util.Date;
import java.util.StringTokenizer;
import java.util.Arrays;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;

import org.apache.commons.logging.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapreduce.Job;
Expand Down Expand Up @@ -85,7 +87,7 @@

public class TestDFSIOEnh extends Configured implements Tool {

private static final Log LOG = LogFactory.getLog(TestDFSIOEnh.class);
private static final Logger LOG = LoggerFactory.getLogger(TestDFSIOEnh.class);
luisfponce marked this conversation as resolved.
Show resolved Hide resolved
private static final int TEST_TYPE_READ = 0;
private static final int TEST_TYPE_WRITE = 1;
private static final int TEST_TYPE_CLEANUP = 2;
Expand Down Expand Up @@ -952,7 +954,7 @@ protected static void runAnalyse(FileSystem fs, Configuration fsConfig,
e.printStackTrace();
} finally {
fs.delete(DfsioeConfig.getInstance().getReportTmp(fsConfig), true);
FileUtil.copyMerge(fs, DfsioeConfig.getInstance().getReportDir(fsConfig), fs, DfsioeConfig.getInstance().getReportTmp(fsConfig), false, fsConfig, null);
copyMerge(fs, DfsioeConfig.getInstance().getReportDir(fsConfig), fs, DfsioeConfig.getInstance().getReportTmp(fsConfig), false, fsConfig, null);
LOG.info("remote report file " + DfsioeConfig.getInstance().getReportTmp(fsConfig) + " merged.");
BufferedReader lines = new BufferedReader(new InputStreamReader(new DataInputStream(fs.open(DfsioeConfig.getInstance().getReportTmp(fsConfig)))));
String line = null;
Expand Down Expand Up @@ -1001,8 +1003,60 @@ else if (sampleUnit == GIGA)
}
res.println("\n-- Result Analyse -- : " + ((System.currentTimeMillis() - t1)/1000) + "s");
res.close();
}

}

/** Copy all files in a directory to one output file (merge). */
@Deprecated
public static boolean copyMerge(FileSystem srcFS, Path srcDir, FileSystem dstFS, Path dstFile, boolean deleteSource,
Configuration conf, String addString) throws IOException {
dstFile = checkDest(srcDir.getName(), dstFS, dstFile, false);

if (!srcFS.getFileStatus(srcDir).isDirectory())
return false;

OutputStream out = dstFS.create(dstFile);

try {
FileStatus contents[] = srcFS.listStatus(srcDir);
Arrays.sort(contents);
for (int i = 0; i < contents.length; i++) {
if (contents[i].isFile()) {
InputStream in = srcFS.open(contents[i].getPath());
try {
IOUtils.copyBytes(in, out, conf, false);
if (addString != null)
out.write(addString.getBytes("UTF-8"));

} finally {
in.close();
}
}
}
} finally {
out.close();
}

if (deleteSource) {
return srcFS.delete(srcDir, true);
} else {
return true;
}
}

private static Path checkDest(String srcName, FileSystem dstFS, Path dst, boolean overwrite) throws IOException {
if (dstFS.exists(dst)) {
FileStatus sdst = dstFS.getFileStatus(dst);
if (sdst.isDirectory()) {
if (null == srcName) {
throw new IOException("Target " + dst + " is a directory");
}
return checkDest(null, dstFS, new Path(dst, srcName), overwrite);
} else if (!overwrite) {
throw new IOException("Target " + dst + " already exists");
}
}
return dst;
}
@Deprecated
protected static void analyzeResult( FileSystem fs,
int testType,
Expand Down
18 changes: 17 additions & 1 deletion common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
<phase>process-resources</phase>
<goals>
<goal>add-source</goal>
<goal>compile</goal>
<goal>${maven.assembly.plugin.goal}</goal>
</goals>
</execution>
<execution>
Expand Down Expand Up @@ -99,6 +99,7 @@
<properties>
<scala.version>2.11.8</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.assembly.plugin.goal>compile</maven.assembly.plugin.goal>
</properties>
<activation>
<property>
Expand All @@ -112,6 +113,7 @@
<properties>
<scala.version>2.10.4</scala.version>
<scala.binary.version>2.10</scala.binary.version>
<maven.assembly.plugin.goal>compile</maven.assembly.plugin.goal>
</properties>
<activation>
<property>
Expand All @@ -126,6 +128,7 @@
<properties>
<scala.version>2.11.8</scala.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.assembly.plugin.goal>compile</maven.assembly.plugin.goal>
</properties>
<activation>
<property>
Expand All @@ -134,5 +137,18 @@
</property>
</activation>
</profile>

<profile>
<id>exclude-streaming</id>
<properties>
<maven.assembly.plugin.goal>doc</maven.assembly.plugin.goal>
</properties>
<activation>
<property>
<name>exclude-streaming</name>
</property>
</activation>
</profile>

</profiles>
</project>
17 changes: 16 additions & 1 deletion docs/build-hibench.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Because some Maven plugins cannot support Scala version perfectly, there are som


### Specify Spark Version ###
To specify the spark version, use -Dspark=xxx(1.6, 2.0, 2.1 or 2.2). By default, it builds for spark 2.0
To specify the spark version, use -Dspark=xxx(1.6, 2.0, 2.1, 2.2 or 2.4). By default, it builds for spark 2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this Spark 2.4 support doesn't include SQL module?
This is the only main remaining concern for this patch, see
#586 (comment)
I think we can drop Spark 1.6 support and modify the SQL module code to support 2.4 in HiBench 8.0, whoever needs 1.6 can go to HiBench 7.0. @carsonwang


mvn -Psparkbench -Dspark=1.6 -Dscala=2.11 clean package
tips:
Expand All @@ -37,6 +37,11 @@ default . For example , if we want use spark2.0 and scala2.11 to build hibench.
package` , but for spark2.0 and scala2.10 , we need use the command `mvn -Dspark=2.0 -Dscala=2.10 clean package` .
Similarly , the spark1.6 is associated with the scala2.10 by default.

### Specify Hadoop Version ###
To specify the spark version, use -Dhadoop=xxx(3.2). By default, it builds for hadoop 2.4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: spark version -> hadoop version


mvn -Psparkbench -Dhadoop=3.2 -Dspark=2.4 clean package

### Build a single module ###
If you are only interested in a single workload in HiBench. You can build a single module. For example, the below command only builds the SQL workloads for Spark.

Expand All @@ -48,3 +53,13 @@ Supported modules includes: micro, ml(machine learning), sql, websearch, graph,
For Spark 2.0 and Spark 2.1, we add the benchmark support for Structured Streaming. This is a new module which cannot be compiled in Spark 1.6. And it won't get compiled by default even if you specify the spark version as 2.0 or 2.1. You must explicitly specify it like this:

mvn -Psparkbench -Dmodules -PstructuredStreaming clean package

### Build using JDK 1.11
**For Java 11 it is suitable to be built for Spark 2.4 _(Compiled with Scala 2.12)_ and/or Hadoop 3.2 only**

If you are interested in building using Java 11 indicate that streaming benchmarks won't be compiled, and specify scala, spark, hadoop and maven compiler version as below

mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming -Dmaven-compiler-plugin.version=3.8.0

Supported frameworks only: hadoopbench, sparkbench (Does not support flinkbench, stormbench, gearpumpbench)
Supported modules includes: micro, ml(machine learning), websearch and graph (does not support streaming, structuredStreaming and sql)
38 changes: 37 additions & 1 deletion sparkbench/assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,43 @@
</dependencies>
<activation>
<property>
<name>!modules</name>
<name>!exclude-streaming</name>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user specifies modules=xxx and doesn't specify exclude-streaming, this allModules will be activated, which is not expected.

</property>
</activation>
</profile>

<profile>
<id>exclude-streaming</id>
<dependencies>
<dependency>
<groupId>com.intel.hibench.sparkbench</groupId>
<artifactId>sparkbench-micro</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.hibench.sparkbench</groupId>
<artifactId>sparkbench-ml</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.hibench.sparkbench</groupId>
<artifactId>sparkbench-websearch</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.hibench.sparkbench</groupId>
<artifactId>sparkbench-graph</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.hibench.sparkbench</groupId>
<artifactId>sparkbench-sql</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
<activation>
<property>
<name>exclude-streaming</name>
</property>
</activation>
</profile>
Expand Down
Loading