forked from larrystevenwise/krping
-
Notifications
You must be signed in to change notification settings - Fork 0
zhuyj/krping
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Kernel Mode RDMA Perf/Ping Module --- Updated 8/2024 --- ============ Introduction ============ The krperf module is a kernel loadable module that utilizes the Open Fabrics verbs to implement a client/server ping/pong perf program. The module was implemented as a test vehicle for working with the mlx4/5 IB branch of the OFA project. The goals of this program include: - Simple harness to test kernel-mode verbs: connection setup, send, recv, rdma read, rdma write, and completion notifications. - Client/server model. - IP addressing used to identify remote peer. - Transport independent utilizing the RDMA CMA service - No user-space application needed. - Just a test utility...nothing more. This module allows establishing connections and running ping/pong tests via a /proc entry called /proc/krperf. This simple mechanism allows starting many kernel threads concurrently and avoids the need for a user space application. The krperf module is designed to utilize all the major DTO operations: send, recv, rdma read, and rdma write. Its goal was to test the API and as such is not necessarily an efficient test. Once the connection is established, the client and server begin a ping/pong loop: Client Server --------------------------------------------------------------------- SEND(ping source buffer rkey/addr/len) RECV Completion with ping source info RDMA READ from client source MR RDMA Read completion SEND .go ahead. to client RECV Completion of .go ahead. SEND (ping sink buffer rkey/addr/len) RECV Completion with ping sink info RDMA Write to client sink MR RDMA Write completion SEND .go ahead. to client RECV Completion of .go ahead. Validate data in source and sink buffers <repeat the above loop> ============ To build/install the krperf module ============ # git clone https://github.com/zhuyj/krperf.git # cd krperf <edit Makefile and set KSRC accordingly> # make && make install # modprobe rdma_krperf ============ Using krperf ============ Communication from user space is done via the /proc filesystem. krperf exports file /proc/krperf. Writing commands in ascii format to /proc/krperf will start krperf threads in the kernel. The thread issuing the write to /proc/krperf is used to run the krperf test, so it will block until the test completes, or until the user interrupts the write. Here is a simple example to start an rping test using the rdma_krperf module. The server's address is 192.168.69.127. The client will connect to this address at port 9999 and issue 100 ping/pong messages. This example assumes you have two systems connected via IB and the IPoverIB devices are configured on the 192.168.69/24 subnet accordingly. Server: # modprobe rdma_krperf # echo "server,addr=192.168.69.127,port=9999,verbose" >/proc/krperf The echo command above will block until the krperf test completes, or the user hits ctrl-c. On the client: # modprobe rdma_krperf # echo "client,addr=192.168.69.127,port=9999,count=100,verbose" >/proc/krperf Just like on the server, the echo command above will block until the krperf test completes, or the user hits ctrl-c. The syntax for krperf commands is a string of options separated by commas. Options can be single keywords, or in the form: option=operand. Operands can be integers or strings. Note you must specify the _same_ options on both sides. For instance, if you want to use the server_invalidate option, then you must specify it on both the server and client command lines. Opcode Operand Type Description ------------------------------------------------------------------------ client none Initiate a client side krperf thread. server none Initiate a server side krperf thread. addr string The server's IP address in dotted decimal format. Note the server can use 0.0.0.0 to bind to all devices. port integer The server's port number in host byte order. count integer The number of rping iterations to perform before shutting down the test. If unspecified, the count is infinite. size integer The size of the rping data. Default for krperf is 65 bytes. verbose none Enables printk()s that dump the rping data. Use with caution! validate none Enables validating the rping data on each iteration to detect data corruption. mem_mode string Determines how memory will be registered. Modes include dma, and reg. Default is dma. server_inv none Valid only in reg mr mode, this option enables invalidating the client's reg mr via SEND_WITH_INVALIDATE messages from the server. local_dma_lkey none Use the local dma lkey for the source of writes and sends, and in recvs read_inv none Server will use READ_WITH_INV. Only valid in reg mem_mode. ============ Memory Usage: ============ The krperf client uses 4 memory areas: start_buf - the source of the ping data. This buffer is advertised to the server at the start of each iteration, and the server rdma reads the ping data from this buffer over the wire. rdma_buf - the sink of the ping data. This buffer is advertised to the server each iteration, and the server rdma writes the ping data that it read from the start buffer into this buffer. The start_buf and rdma_buf contents are then compared if the krperf validate option is specified. recv_buf - used to recv "go ahead" SEND from the server. send_buf - used to advertise the rdma buffers to the server via SEND messages. The krperf server uses 3 memory areas: rdma_buf - used as the sink of the RDMA READ to pull the ping data from the client, and then used as the source of an RDMA WRITE to push the ping data back to the client. recv_buf - used to receive rdma rkey/addr/length advertisements from the client. send_buf - used to send "go ahead" SEND messages to the client. ============ Memory Registration Modes: ============ Each of these memory areas are registered with the RDMA device using whatever memory mode was specified in the command line. The mem_mode values include: dma, and reg (aka fastreg). The default mode, if not specified, is dma. The dma mem_mode uses a single dma_mr for all memory buffers. The reg mem_mode uses a reg mr on the client side for the start_buf and rdma_buf buffers. Each time the client will advertise one of these buffers, it invalidates the previous registration and fast registers the new buffer with a new key. If the server_invalidate option is on, then the server will do the invalidation via the "go ahead" messages using the IB_WR_SEND_WITH_INV opcode. Otherwise the client invalidates the mr using the IB_WR_LOCAL_INV work request. On the server side, reg mem_mode causes the server to use the reg_mr rkey for its rdma_buf buffer IO. Before each rdma read and rdma write, the server will post an IB_WR_LOCAL_INV + IB_WR_REG_MR WR chain to register the buffer with a new key. If the krperf read-inv option is set then the server will use IB_WR_READ_WITH_INV to do the rdma read and skip the IB_WR_LOCAL_INV wr before re-registering the buffer for the subsequent rdma write operation. ============ Stats ============ While krperf threads are executing, you can obtain statistics on the thread by reading from the /proc/krperf file. If you cat /proc/krperf, you will dump IO statistics for each running krperf thread. The format is one thread per line, and each thread contains the following stats separated by white spaces: Statistic Description --------------------------------------------------------------------- Name krperf thread number and device being used. Send Bytes Number of bytes transferred in SEND WRs. Send Messages Number of SEND WRs posted Recv Bytes Number of bytes received via RECV completions. Recv Messages Number of RECV WRs completed. RDMA WRITE Bytes Number of bytes transferred in RDMA WRITE WRs. RDMA WRITE Messages Number of RDMA WRITE WRs posted. RDMA READ Bytes Number of bytes transferred via RDMA READ WRs. RDMA READ Messages Number of RDMA READ WRs posted. Here is an example of the server side output for 5 krperf threads: # cat /proc/krperf 1-amso0 0 0 16 1 12583960576 192016 0 0 2-mthca0 0 0 16 1 60108570624 917184 0 0 3-mthca0 0 0 16 1 59106131968 901888 0 0 4-mthca1 0 0 16 1 101658394624 1551184 0 0 5-mthca1 0 0 16 1 100201922560 1528960 0 0 # ============ EXPERIMENTAL ============ There are other options that enable micro benchmarks to measure the kernel rdma performance. These include: Opcode Operand Type Description ------------------------------------------------------------------------ poll none enable polling duplex none valid only with bw, this enables bidirectional mode tx-depth none set the sq depth for bw tests See the awkit* files to take the data logged in the kernel log and compute RTT/2 or Gbps results. Use these at your own risk. Server: echo "server,addr=10.95.3.140,port=9999" > /proc/krperf ib [27069.884232] krperf: proc read called... [27220.945037] krperf: proc write |server,addr=10.95.3.140,port=9999| [27220.945154] server [27220.945243] ipaddr (10.95.3.140) [27220.947003] port 9999 [27220.947123] created cm_id 000000002b46f391 [27220.947231] rdma_bind_addr successful [27220.947323] rdma_listen [27240.830449] cma_event type 4 cma_id 00000000a3734083 (child) [27240.830581] child cma 00000000a3734083 [27240.830781] Fastreg supported - device_cap_flags 0x4341c76 [27240.830896] created pd 000000008ee3ee0a [27240.831254] created cq 000000007afca1ac [27240.831556] created qp 000000000664f622 [27240.831651] krperf: krperf_setup_buffers called on cb 000000003586dd51 [27240.831788] krperf: reg rkey 0x18021010 page_list_len 1 [27240.831914] krperf: allocated & registered buffers... [27240.832012] accepting client connection request [27240.832466] cma_event type 9 cma_id 00000000a3734083 (child) [27240.832545] recv completion [27240.832571] Received rkey 20021101 addr c28e76d80 len 64 from peer [27240.832576] ESTABLISHED [27240.832869] server received sink adv [27240.832977] krperf: post_inv = 1, reg_mr new rkey 0x18021001 pgsz 4096 len 64 iova_start 429f4d340 [27240.833134] server posted rdma read req [27240.833153] rdma read completion [27240.833324] server received read complete [27240.833422] server posted go ahead [27240.833432] send completion [27240.833651] recv completion [27240.833755] Received rkey 20021102 addr c28e76300 len 64 from peer [27240.833866] server received sink adv [27240.833963] krperf: post_inv = 0, reg_mr new rkey 0x18021002 pgsz 4096 len 64 iova_start 429f4d340 [27240.834102] rdma write from lkey 18021002 laddr 429f4d340 len 64 [27240.834213] rdma write completion [27240.834330] server rdma write complete [27240.834427] server posted go ahead [27240.834436] send completion [27240.834654] recv completion [27240.834743] Received rkey 20021103 addr c28e76d80 len 64 from peer [27240.834849] server received sink adv [27240.834945] krperf: post_inv = 1, reg_mr new rkey 0x18021003 pgsz 4096 len 64 iova_start 429f4d340 [27240.835085] server posted rdma read req [27240.835097] rdma read completion [27240.835274] server received read complete [27240.835372] server posted go ahead [27240.835381] send completion [27240.835598] recv completion [27240.835687] Received rkey 20021104 addr c28e76300 len 64 from peer [27240.835793] server received sink adv [27240.835888] krperf: post_inv = 0, reg_mr new rkey 0x18021004 pgsz 4096 len 64 iova_start 429f4d340 [27240.836026] rdma write from lkey 18021004 laddr 429f4d340 len 64 [27240.836137] rdma write completion [27240.836234] server rdma write complete [27240.836332] server posted go ahead [27240.836340] send completion [27240.836560] recv completion [27240.836656] Received rkey 20021105 addr c28e76d80 len 64 from peer [27240.836781] server received sink adv [27240.836876] krperf: post_inv = 1, reg_mr new rkey 0x18021005 pgsz 4096 len 64 iova_start 429f4d340 [27240.837015] server posted rdma read req [27240.837028] rdma read completion [27240.837206] server received read complete [27240.837303] server posted go ahead [27240.837312] send completion [27240.837530] recv completion [27240.837619] Received rkey 20021106 addr c28e76300 len 64 from peer [27240.837725] server received sink adv [27240.837820] krperf: post_inv = 0, reg_mr new rkey 0x18021006 pgsz 4096 len 64 iova_start 429f4d340 [27240.837973] rdma write from lkey 18021006 laddr 429f4d340 len 64 [27240.838082] rdma write completion [27240.838180] server rdma write complete [27240.838276] server posted go ahead [27240.838285] send completion [27240.838781] cma_event type 10 cma_id 00000000a3734083 (child) [27240.838882] krperf: DISCONNECT EVENT... [27240.839000] krperf: wait for RDMA_READ_ADV state 10 [27240.839175] krperf: cq completion in ERROR state [27240.839338] krperf_free_buffers called on cb 000000003586dd51 [27240.839748] destroy cm_id 000000002b46f391 Client: echo "client,addr=10.95.3.140,port=9999,count=3" > /proc/krperf ib [29292.878783] krperf: proc read called... [29450.905890] krperf: proc write |client,addr=10.95.3.140,port=9999,count=3| [29450.905905] client [29450.905908] ipaddr (10.95.3.140) [29450.905910] port 9999 [29450.905912] count 3 [29450.905917] created cm_id 000000000c9829b2 [29450.905961] cma_event type 0 cma_id 000000000c9829b2 (parent) [29450.906119] cma_event type 2 cma_id 000000000c9829b2 (parent) [29450.906132] Fastreg supported - device_cap_flags 0x4341c76 [29450.906135] rdma_resolve_addr - rdma_resolve_route successful [29450.906141] created pd 0000000050f8ace5 [29450.906332] created cq 000000001916766c [29450.906557] created qp 000000006ed264fa [29450.906558] krperf: krperf_setup_buffers called on cb 00000000fd924db6 [29450.906595] krperf: reg rkey 0x20021110 page_list_len 1 [29450.906597] krperf: allocated & registered buffers... [29450.908793] cma_event type 9 cma_id 000000000c9829b2 (parent) [29450.908800] ESTABLISHED [29450.908814] rdma_connect successful [29450.908822] krperf: post_inv = 1, reg_mr new rkey 0x20021101 pgsz 4096 len 64 iova_start c28e76d80 [29450.908828] RDMA addr c28e76d80 rkey 20021101 len 64 [29450.908849] send completion [29450.909868] recv completion [29450.909972] krperf: post_inv = 1, reg_mr new rkey 0x20021102 pgsz 4096 len 64 iova_start c28e76300 [29450.909986] RDMA addr c28e76300 rkey 20021102 len 64 [29450.910006] send completion [29450.910872] recv completion [29450.910976] krperf: post_inv = 1, reg_mr new rkey 0x20021103 pgsz 4096 len 64 iova_start c28e76d80 [29450.910989] RDMA addr c28e76d80 rkey 20021103 len 64 [29450.911008] send completion [29450.911817] recv completion [29450.911919] krperf: post_inv = 1, reg_mr new rkey 0x20021104 pgsz 4096 len 64 iova_start c28e76300 [29450.911933] RDMA addr c28e76300 rkey 20021104 len 64 [29450.911952] send completion [29450.912777] recv completion [29450.912882] krperf: post_inv = 1, reg_mr new rkey 0x20021105 pgsz 4096 len 64 iova_start c28e76d80 [29450.912895] RDMA addr c28e76d80 rkey 20021105 len 64 [29450.912914] send completion [29450.913748] recv completion [29450.913850] krperf: post_inv = 1, reg_mr new rkey 0x20021106 pgsz 4096 len 64 iova_start c28e76300 [29450.913865] RDMA addr c28e76300 rkey 20021106 len 64 [29450.913884] send completion [29450.914721] recv completion [29450.914914] cq flushed [29450.915115] krperf_free_buffers called on cb 00000000fd924db6 [29450.915399] destroy cm_id 000000000c9829b2 END-OF-FILE
About
Kernel Mode RDMA Perf/Ping
Resources
Stars
Watchers
Forks
Packages 0
No packages published
Languages
- C 99.2%
- Makefile 0.8%