-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathGenerationOfFatBinaries.tex
More file actions
24 lines (18 loc) · 3.37 KB
/
GenerationOfFatBinaries.tex
File metadata and controls
24 lines (18 loc) · 3.37 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
\section{Generation of fat binaries}
To make code generation consistent and straightforward the following scheme is proposed:
\begin{enumerate}
\item For each source file provided, the driver spawns the execution of preprocessor, compiler and assembler for the host and each available target device type. This results in the generation of an object file for each target device type. The toolchain of a given target may be modified so that it uses the same definitions (header files) as the host toolchain if that suits the system constraints.
\item Target linkers combine dedicated target objects into target shared libraries, one for each target device type. The commands passed to the target frontend by the compiler driver always assume the creation of a shared library even if the commands passed to the driver by the user specify otherwise. The driver performs the translation of the host frontend commands to target frontend commands to assure that a target shared library is generated.
\item The host linker combines host object files into an executable/shared library and incorporates each target shared libraries as is (no actual linking is done between host and target objects) into a designated section within the host binary. The format of a binary section for offloading to a specific device is target-dependent and will be thereafter handled by the target RTL at runtime.
\item A new driver command-line group option \command{–target-offoad=Ti} where \command{Ti} is a valid target triples that specify which target device types the user wants to support in the execution of OpenMP target regions. All options following \command{-target-offload=Ti} are forwarded to that device toolchain. The user can specify as many \command{-target-offload=Ti} options as devices he wants to support. An example, of the invocation of the compiler would be:
\\ ~ \\
\command{clang -–fopenmp -–target powerpc64-ibm-linux-gnu\\ –-target-offload=nvptx64-nvidia-cuda --fopenmp --target-offload=x86-pc-linux-gnu -fopenmp foo.c bar.c –-o foobar.bin}
\\ ~ \\
for a hypothetical system where the host is a PowerPC processor and the available target device types are an NVIDIA GPU and x86 processor.
\item For each source file, the compiler driver will issue commands to to create intermediate files for each possible compilation phase (LLVM IR, assembly, object) and target (host or device). However, that is not exposed to the user as the driver has the ability to bundle multiple (related) files generated by different toolchains into a single one. Therefore, when using separate compilation, the user should invoke the compiler in the same way (except for the device target specification) he would if no offloading support was required. For example:
\\ ~ \\
\command{clang -–fopenmp -–target powerpc64-ibm-linux-gnu\\ –-target-offload=nvptx64-nvidia-cuda --fopenmp --target-offload=x86-pc-linux-gnu -fopenmp foo.c bar.c -c}
\\ ~ \\
\command{clang -–fopenmp -–target powerpc64-ibm-linux-gnu\\ –-target-offload=nvptx64-nvidia-cuda --fopenmp --target-offload=x86-pc-linux-gnu -fopenmp foo.o bar.o –-o foobar.bin}
\end{enumerate}
The resulting host executable/shared library will depend on the offload runtime library -– \libomptarget{}. This library will handle the initialization of target RTLs and translate the offload interface from compiler-generated code to the target RTL during program execution.