This is a general discussion question regarding the size of the fat-jars produced by the emma-spark-examples and emma-flink-examples modules.
Running
find -name '*jar' | grep -v original | grep -v nexus | xargs du -hs
in the project root shows the following output
65M ./emma-examples/emma-examples-spark/target/emma-examples-spark-0.2-SNAPSHOT.jar
64M ./emma-examples/emma-examples-flink/target/emma-examples-flink-0.2-SNAPSHOT.jar
440K ./emma-examples/emma-examples-library/target/emma-examples-library-0.2-SNAPSHOT.jar
420K ./emma-examples/emma-examples-library/target/emma-examples-library-0.2-SNAPSHOT-tests.jar
148K ./emma-spark/target/emma-spark-0.2-SNAPSHOT.jar
148K ./emma-flink/target/emma-flink-0.2-SNAPSHOT.jar
20K ./emma-gui/target/emma-gui-0.2-SNAPSHOT.jar
56K ./emma-quickstart/target/emma-quickstart-0.2-SNAPSHOT.jar
3,7M ./emma-language/target/emma-language-0.2-SNAPSHOT.jar
3,9M ./emma-language/target/emma-language-0.2-SNAPSHOT-tests.jar
The emma-flink-examples and emma-spark-examples jars are ~65M each, which is also indicative of the expected size of any client jars binding emma-language and one of emma-flink or emma-spark in the future.
A closer in emma-spark-examples reveals the root causes (output is similar for the other one).
mvn dependency:list -DincludeScope=runtime -DoutputAbsoluteArtifactFilename=true \
| grep '/home/alexander/.m2/repository' \
| awk -F":compile:" '{print $2}' \
| xargs du -hs \
| sort -r -h \
| sed "s|$HOME/.m2/repository/||"
The list looks as follows.
14M org/scalanlp/breeze_2.11/0.12/breeze_2.11-0.12.jar
12M org/scalaz/scalaz-core_2.11/7.2.7/scalaz-core_2.11-7.2.7.jar
7,0M org/spire-math/spire_2.11/0.7.4/spire_2.11-0.7.4.jar
4,4M org/typelevel/cats-kernel_2.11/0.9.0/cats-kernel_2.11-0.9.0.jar
3,7M org/emmalanguage/emma-language/0.2-SNAPSHOT/emma-language-0.2-SNAPSHOT.jar
3,4M com/chuusai/shapeless_2.11/2.3.2/shapeless_2.11-2.3.2.jar
3,3M org/typelevel/cats-core_2.11/0.9.0/cats-core_2.11-0.9.0.jar
3,0M org/scalacheck/scalacheck_2.11/1.13.4/scalacheck_2.11-1.13.4.jar
2,0M org/apache/commons/commons-math3/3.4.1/commons-math3-3.4.1.jar
1,2M org/typelevel/cats-laws_2.11/0.9.0/cats-laws_2.11-0.9.0.jar
1,2M net/sourceforge/f2j/arpack_combined_all/0.1/arpack_combined_all-0.1.jar
1,1M org/xerial/snappy/snappy-java/1.1.2.6/snappy-java-1.1.2.6.jar
1,0M org/apache/parquet/parquet-jackson/1.9.0/parquet-jackson-1.9.0.jar
944K org/apache/parquet/parquet-column/1.9.0/parquet-column-1.9.0.jar
780K org/apache/parquet/parquet-encoding/1.9.0/parquet-encoding-1.9.0.jar
764K org/codehaus/jackson/jackson-mapper-asl/1.9.11/jackson-mapper-asl-1.9.11.jar
748K com/github/rwl/jtransforms/2.4.0/jtransforms-2.4.0.jar
724K org/scalactic/scalactic_2.11/3.0.3/scalactic_2.11-3.0.3.jar
480K log4j/log4j/1.2.17/log4j-1.2.17.jar
440K org/emmalanguage/emma-examples-library/0.2-SNAPSHOT/emma-examples-library-0.2-SNAPSHOT.jar
384K org/apache/parquet/parquet-format/2.3.1/parquet-format-2.3.1.jar
344K com/univocity/univocity-parsers/2.4.1/univocity-parsers-2.4.1.jar
288K io/spray/spray-json_2.11/1.3.3/spray-json_2.11-1.3.3.jar
280K org/typelevel/cats-free_2.11/0.9.0/cats-free_2.11-0.9.0.jar
276K com/typesafe/config/1.3.1/config-1.3.1.jar
268K org/apache/parquet/parquet-hadoop/1.9.0/parquet-hadoop-1.9.0.jar
244K io/verizon/quiver/core_2.11/5.5.14-scalaz-7.2/core_2.11-5.5.14-scalaz-7.2.jar
228K org/codehaus/jackson/jackson-core-asl/1.9.11/jackson-core-asl-1.9.11.jar
208K org/typelevel/cats-kernel-laws_2.11/0.9.0/cats-kernel-laws_2.11-0.9.0.jar
180K org/scalanlp/breeze-macros_2.11/0.12/breeze-macros_2.11-0.12.jar
164K com/github/mpilquist/simulacrum_2.11/0.10.0/simulacrum_2.11-0.10.0.jar
164K com/github/fommil/netlib/core/1.1.2/core-1.1.2.jar
148K org/emmalanguage/emma-spark/0.2-SNAPSHOT/emma-spark-0.2-SNAPSHOT.jar
144K com/github/scopt/scopt_2.11/3.5.0/scopt_2.11-3.5.0.jar
108K com/jsuereth/scala-arm_2.11/2.0/scala-arm_2.11-2.0.jar
96K commons-pool/commons-pool/1.5.4/commons-pool-1.5.4.jar
88K org/spire-math/spire-macros_2.11/0.7.4/spire-macros_2.11-0.7.4.jar
72K commons-codec/commons-codec/1.5/commons-codec-1.5.jar
44K org/typelevel/discipline_2.11/0.7.2/discipline_2.11-0.7.2.jar
44K org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar
44K org/apache/parquet/parquet-common/1.9.0/parquet-common-1.9.0.jar
36K org/typelevel/machinist_2.11/0.6.1/machinist_2.11-0.6.1.jar
24K com/typesafe/scala-logging/scala-logging-slf4j_2.11/2.1.2/scala-logging-slf4j_2.11-2.1.2.jar
20K net/sf/opencsv/opencsv/2.3/opencsv-2.3.jar
16K org/scala-sbt/test-interface/1.0/test-interface-1.0.jar
12K org/typelevel/catalysts-macros_2.11/0.0.5/catalysts-macros_2.11-0.0.5.jar
12K org/slf4j/slf4j-log4j12/1.7.25/slf4j-log4j12-1.7.25.jar
8,0K org/typelevel/cats-macros_2.11/0.9.0/cats-macros_2.11-0.9.0.jar
8,0K com/typesafe/scala-logging/scala-logging-api_2.11/2.1.2/scala-logging-api_2.11-2.1.2.jar
4,0K org/typelevel/macro-compat_2.11/1.1.1/macro-compat_2.11-1.1.1.jar
4,0K org/typelevel/cats-jvm_2.11/0.9.0/cats-jvm_2.11-0.9.0.jar
4,0K org/typelevel/cats_2.11/0.9.0/cats_2.11-0.9.0.jar
4,0K org/typelevel/catalysts-platform_2.11/0.0.5/catalysts-platform_2.11-0.0.5.jar
It might be better to rely on the breeze version shipped with the dataflow engine rather than bundling our own. @ParkL could you check the versions bundled with Spark 2.1.0 and Flink 1.2.1?
I am not sure what to do with scalaz. It seems that we're only using it due to quiver, and I am not aware of any alternative which has smaller footprint or, say, relies on cats.
I am open for suggestions.
This is a general discussion question regarding the size of the fat-jars produced by the
emma-spark-examplesandemma-flink-examplesmodules.Running
in the project root shows the following output
The
emma-flink-examplesandemma-spark-examplesjars are ~65M each, which is also indicative of the expected size of any client jars bindingemma-languageand one ofemma-flinkoremma-sparkin the future.A closer in
emma-spark-examplesreveals the root causes (output is similar for the other one).The list looks as follows.
It might be better to rely on the breeze version shipped with the dataflow engine rather than bundling our own. @ParkL could you check the versions bundled with Spark 2.1.0 and Flink 1.2.1?
I am not sure what to do with
scalaz. It seems that we're only using it due toquiver, and I am not aware of any alternative which has smaller footprint or, say, relies oncats.I am open for suggestions.