-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathREADME.txt
More file actions
175 lines (141 loc) · 8.78 KB
/
README.txt
File metadata and controls
175 lines (141 loc) · 8.78 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
///////////////////////////////////////////////////////////
// //
// RUMA has it: Rewired User-space Memory Access //
// is Possible! //
// Felix Martin Schuhknecht, Jens Dittrich, Ankur Sharma //
// Proceedings of the VLDB Endowment, Vol. 9, No. 10 //
// //
// Used Code Base //
// Version August 30, 2016 //
// Information Systems Group //
// Saarland University //
// infosys.cs.uni-saarland.de //
// //
///////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////
// Setup //
///////////////////////////////////////////////////////////
The codebase must be run on a Linux OS (tested with Debian 8.1, Kernel 3.16). The system has to be set up slightly to run the codebase. Root access is required for some of the changes. Perform the following steps:
- Preallocate a certain number of huge pages (42 here) on a single NUMA node (0 here) with "numactl -m 0 echo 42 > /proc/sys/vm/nr_hugepages_mempolicy". Make sure enough huge pages are allocated for the corresponding test.
- Mount a hugetlbfs under /mnt/hugetlbfs. First create a folder named hugetlbfs under /mnt/ and then mount it using "mount -t hugetlbfs none /mnt/hugetlbfs -o pagesize=2M".
For testing, we recommend to open the folder completely using "chmod 777 /mnt/hugetlbfs" to avoid access right issues.
- /proc/sys/vm/overcommit_memory sets the overcommit policy. By default, it is set to 0 which disallows obvious overcommits. We recommend setting it to 1 which allows overcommits without a check.
- /proc/sys/vm/max_map_count limits the number of mappings that are possible at the same time. By default it is set to 2^16, we recommend setting it to 2^24.
- /sys/kernel/mm/transparent_hugepage/enabled can be set to "always", "madvice", and "never". It influences when vmem is backed by transparent hugepages.
///////////////////////////////////////////////////////////
// Compilation //
///////////////////////////////////////////////////////////
The code base can be build under Linux using the two scripts "compile.sh" and "compile_debug.sh" in the main folder. The compilation has been tested with gcc/g++ 4.9.
///////////////////////////////////////////////////////////
// Execution //
///////////////////////////////////////////////////////////
An execution of ./Rewiring_Microbench results in the following output, showing how and which tests can be run. Note that not all methods can be combined with all tests. The codebase will catch invalid combinations.
Usage: ./a.out <method> <test_type>[(test_parameter)] <number_of_entries> <number_of_repetitions> <detailed (true|false)> [<output_file>]
Methods
1. Memcpy
2. Cow memcpy
3. Cow memcpy using populate
4. Cow memcpy using huge pages
5. Cow memcpy using huge pages and populate
6. Cow memcpy with shuffling
7. Cow memcpy with shuffling using populate
8. Cow memcpy with shuffling using huge pages
9. Cow memcpy with shuffling using huge pages and populate
10. Memcpy with shuffling
11. Memcpy with shuffling using huge pages
12. Traditional partitioning with histogram
13. Mapped partitioning without histogram
14. Mapped partitioning without histogram using huge pages
15. Cow memcpy with shuffling using remap_file_pages
16. Cow memcpy with shuffling using remap_file_pages and huge pages
17. Traditional partitioning with block-chains
18. Mapped partitioning with block-chains
19. Mapped partitioning with block-chains using huge pages
20. Mix huge and small pages
21. Mapped partitioning with block-chains using adaptive block sizes
22. Rewired vector
23. Rewired vector using huge pages
24. STL vector
30. Rewired vector using prepopulation
31. Rewired vector using huge pages and prepopulation
32. Map and Populate
36. Mremaped vector
37. Mremaped vector using huge pages
40. Allocation Rewiring (Pool)
41. Allocation Software Indirection (Pool)
42. Allocation Mmap
43. Software Indirection Vector (hugepages)
44. Mremaped partitioning using huge pages
Test Types
1. Read sequential x% src (x is test_parameter)
2. Read sequential x% dst (x is test_parameter)
3. Read x% random,uniform src (x is test_parameter).
4. Read x% random,uniform dst (x is test_parameter).
5. Write sequential x% src (x is test_parameter).
6. Write sequential x% dst (x is test_parameter).
7. Write x% random,uniform src (x is test_parameter).
8. Write x% random,uniform dst (x is test_parameter).
9. Read sequential all dst (two times)
10. Read sequential all dst from several chunks
11. Read sequential all dst and merge several chunks
12. Radix-sort all dst
13. Radix-sort all dst from several partitions (from consecutive memory region)
14. Radix-sort all dst from several partitions (using an indirection)
15. Vector insertion test
18. Page fault test
19. Read x% random,uniform dst (x is test_parameter) through an indirection.
20. Write sequential all dst (two times)
21. Read x% random,uniform dst (x is test_parameter) with dependent access.
22. Read x% random,uniform dst (x is test_parameter) with dependent access (mixed).
23. Read sequential x% dst (x is test_parameter) with dependent access.
An example execution could be:
./Rewiring_Microbench "1" "2(100)" "100000" "3" "false"
This executes the first method in combination with the second test type (reading 100% of the data) for 100000 elements. The run is repeated 3 times and no detailed statistics are collected. As no additional filename is specified, the output is presented on the command line.
///////////////////////////////////////////////////////////
// Output //
///////////////////////////////////////////////////////////
The output of the previous test is:
Parsing copy method.
Parsing test type.
Number of elements = 100000
Number of runs = 3
=============== Run 1 ===============
Traditional memcpy:
0.000007,malloc-src
0.000577,init-malloc-src
0.000006,malloc-dst
0.000612,memcpy
Starting tests:
0.000101,read-seq-dst //total=4999950000
Cleanup:
Freeing malloc-src
Freeing malloc-dst
=============== Run 2 ===============
Traditional memcpy:
0.000007,malloc-src
0.000429,init-malloc-src
0.000006,malloc-dst
0.000443,memcpy
Starting tests:
0.000101,read-seq-dst //total=4999950000
Cleanup:
Freeing malloc-src
Freeing malloc-dst
=============== Run 3 ===============
Traditional memcpy:
0.000006,malloc-src
0.000373,init-malloc-src
0.000006,malloc-dst
0.000454,memcpy
Starting tests:
0.000100,read-seq-dst //total=4999950000
Cleanup:
Freeing malloc-src
Freeing malloc-dst
Traditional memcpy
malloc-src, init-malloc-src, malloc-dst, memcpy, shm-src, mmap-src, init-shm-src, mmap-dst, remap-dst, shm-src-huge, read-seq-src, read-seq-dst, read2-seq-dst, read-rand-src, read-rand-dst, write-seq-src, write-seq-dst, write-rand-src, write-rand-dst, write-seq-dst-per-elem, build-histogram, squeeze-area-copy, squeeze-area-mmap, shm-dst, shm-dst-huge, merge, radix-sort, indirection, vector-insert, crack-in-place, index-insert, index-access, write2-seq-dst
0.000007, 0.000577, 0.000006, 0.000612, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000101, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000
0.000007, 0.000429, 0.000006, 0.000443, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000101, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000
0.000006, 0.000373, 0.000006, 0.000454, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000100, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000
0.000007, 0.000460, 0.000006, 0.000503, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000101, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000
First, the three executed runs are shown in detail. Then an overview output is generated that shows the three runs in tabular format and adds a row for the average of the three runs.