Intel's modifications by anton-malakhov · Pull Request #1 · jspark1105/tbb_test

anton-malakhov · 2019-01-31T21:42:55Z

No description provided.

commit 7ad4e68338a55eb2dba48e7de5ae48b582732681 Author: Chandrasekaran <amrish.chandrasekaran@intel.com> Date: Thu Jan 3 12:32:13 2019 -0600 Modified mlp_tbb.cc to try different configs. Added python script for analysing load imbalance

For promotion to github repo See merge request DeveloperProducts/Runtimes/Threading/customer-samples/mlp!1

jspark1105 · 2019-02-01T02:10:49Z

-
-CFLAGS = -DMKL_ILP64 -m64 -I${MKLROOT}/include -I${NUMAROOT}/include -I${TBBROOT}/include -mavx2 -mfma -mf16c -fopenmp -mavx512f -Wall #-march=skylake
+SP ?=1
+UBN ?=0


So USE_BROADCAST_NODE doesn't give speedup right?

If there was an improvement, it was very small. It basically removes the overhead of spawning a task for the very first node only.

jspark1105 · 2019-02-01T02:11:48Z

+UBN ?=0
+TI  ?=0
+FG  ?=1
+NB  ?=0


Does NUMA_BIND hurt performance?

jspark1105 · 2019-02-01T02:12:29Z

 #seq = y
-
-CFLAGS = -DMKL_ILP64 -m64 -I${MKLROOT}/include -I${NUMAROOT}/include -I${TBBROOT}/include -mavx2 -mfma -mf16c -fopenmp -mavx512f -Wall #-march=skylake
+SP ?=1


What does SP stand for?

SP is for "split" and it is the factor that is used to create more chunks for the parallel_for. So you get SP * nthreads_per_socket. We left the default at 1, although our tests did show improvements with SP=2.

jspark1105 · 2019-02-01T02:14:23Z


-CXX ?= g++
+#CXX ?= g++
+CXX = icpc


Can we please test with gcc to be on the same page? We're using gcc 5.5.0

jspark1105 · 2019-02-01T02:15:26Z

-#CC = /usr/local/opt/gcc/bin/g++-7 -std=c++11
-
-LDFLAGS = -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -lpthread -lm -ldl ${NUMAROOT}/lib/libnuma.a ${TBBROOT}/lib/libtbb.a
+LDFLAGS = -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -L${NUMAROOT}/lib -lnuma -L${TBBROOT}/lib -ltbb -L/usr/lib -liomp5


Can we please not use Intel OpenMP to be on the same page? Is using mkl_sequential needed? I basically want to know what change is needed for better performance and which is not. It would be great if we can know the minimal changes to get good perf.

jspark1105 · 2019-02-01T02:18:43Z

 public:
  pinning_observer(tbb::task_arena& arena, int numa_node_id)
-      : tbb::task_scheduler_observer(arena), numa_node_id_(numa_node_id) {
+      : tbb::task_scheduler_observer(arena), arena_(arena), numa_node_id_(numa_node_id) {


Where are we using arena_ ?

jspark1105 · 2019-02-01T02:38:09Z

+        SP*nthreads_per_socket,
        [&](size_t task_id) {
+          double sgst = dsecnd();
+          int tid = numa_node_id_ * nthreads_per_socket + task_id;


If a thread grabs more than 1 task, we want to account the execution time of the multiple tasks to the thread. This is a reason why I kept track execution time based on current_thread_index . Please let me know your thought.

anton-malakhov and others added 12 commits January 24, 2019 19:01

Picking mlp_tbb.c from:

dbfb4a2

commit 7ad4e68338a55eb2dba48e7de5ae48b582732681 Author: Chandrasekaran <amrish.chandrasekaran@intel.com> Date: Thu Jan 3 12:32:13 2019 -0600 Modified mlp_tbb.cc to try different configs. Added python script for analysing load imbalance

Adding Mike's Makefile with default values added (rather arbitrarily)

12d49e9

Addressed feedback

a098b26

Fixed make string

4619946

removed may_sleep

f32b68c

Make the definition of CC closer to the original code

5298910

Make the definition of CC closer to the original code, again

da98c6e

Make the commented-out code look better

cb83636

removed hard-coded path

a4965f9

Merge branch 'for_promotion' into 'master'

24392b1

For promotion to github repo See merge request DeveloperProducts/Runtimes/Threading/customer-samples/mlp!1

Merge branch 'master' into master

7df27f8

Fixes after resolving conflicts

0b207bd

anton-malakhov force-pushed the master branch from 7df27f8 to 24392b1 Compare February 1, 2019 00:09

Measure total time for OMP as well

88b1008

jspark1105 reviewed Feb 1, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intel's modifications#1

Intel's modifications#1
anton-malakhov wants to merge 13 commits intojspark1105:masterfrom
anton-malakhov:master

anton-malakhov commented Jan 31, 2019

Uh oh!

jspark1105 Feb 1, 2019

Uh oh!

vossmjp Feb 4, 2019

Uh oh!

jspark1105 Feb 1, 2019

Uh oh!

jspark1105 Feb 1, 2019

Uh oh!

vossmjp Feb 4, 2019

Uh oh!

jspark1105 Feb 1, 2019

Uh oh!

jspark1105 Feb 1, 2019 •

edited

Loading

Uh oh!

jspark1105 Feb 1, 2019

Uh oh!

jspark1105 Feb 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

anton-malakhov commented Jan 31, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jspark1105 Feb 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jspark1105 Feb 1, 2019 •

edited

Loading