summaryrefslogtreecommitdiffstats
path: root/tools/lib/bpf
diff options
context:
space:
mode:
authorSunil Muthuswamy2019-05-23 01:10:44 +0200
committerDavid S. Miller2019-05-23 03:00:36 +0200
commit14a1eaa8820e8f3715f0cb3c1790edab67a751e9 (patch)
tree1c70c329bce47f8b1defcd997d4bec27a5ecb345 /tools/lib/bpf
parenthv_sock: perf: Allow the socket buffer size options to influence the actual s... (diff)
downloadkernel-qcow2-linux-14a1eaa8820e8f3715f0cb3c1790edab67a751e9.tar.gz
kernel-qcow2-linux-14a1eaa8820e8f3715f0cb3c1790edab67a751e9.tar.xz
kernel-qcow2-linux-14a1eaa8820e8f3715f0cb3c1790edab67a751e9.zip
hv_sock: perf: loop in send() to maximize bandwidth
Currently, the hv_sock send() iterates once over the buffer, puts data into the VMBUS channel and returns. It doesn't maximize on the case when there is a simultaneous reader draining data from the channel. In such a case, the send() can maximize the bandwidth (and consequently minimize the cpu cycles) by iterating until the channel is found to be full. Perf data: Total Data Transfer: 10GB/iteration Single threaded reader/writer, Linux hvsocket writer with Windows hvsocket reader Packet size: 64KB CPU sys time was captured using the 'time' command for the writer to send 10GB of data. 'Send Buffer Loop' is with the patch applied. The values below are over 10 iterations. |--------------------------------------------------------| | | Current | Send Buffer Loop | |--------------------------------------------------------| | | Throughput | CPU sys | Throughput | CPU sys | | | (MB/s) | time (s) | (MB/s) | time (s) | |--------------------------------------------------------| | Min | 407 | 7.048 | 401 | 5.958 | |--------------------------------------------------------| | Max | 455 | 7.563 | 542 | 6.993 | |--------------------------------------------------------| | Avg | 440 | 7.411 | 451 | 6.639 | |--------------------------------------------------------| | Median | 446 | 7.417 | 447 | 6.761 | |--------------------------------------------------------| Observation: 1. The avg throughput doesn't really change much with this change for this scenario. This is most probably because the bottleneck on throughput is somewhere else. 2. The average system (or kernel) cpu time goes down by 10%+ with this change, for the same amount of data transfer. Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'tools/lib/bpf')
0 files changed, 0 insertions, 0 deletions