Hello,

after finishing the implementation part last time, this week it is time for some performance comparison. Although both the HVM and PVH port are not 100% feature complete, they are in a state where basic bench marking is very much possible. The results, as you will see, look very promising when comparing them with the ones of the original implementation.

Table of contents

HermitCore includes a number of different benchmarks to determine the performance. This post compares the results of some of these benchmarks for HermitCore running in different environments. The following environments were tested:

  • KVM
    HermitCore is started in a virtual machine on Linux using QEMU with KVM acceleration

  • QEMU
    A virtual machine on Linux using only QEMU without acceleration

  • HVM
    Running as a fully virtualized guest on Xen

  • PVH
    A hybrid PVH guest running on Xen

For each test, the virtual machines were given the same amount of resources. They were started with one virtual CPU core and 512 megabytes of RAM. All tests were performed on the same machine, a Lenovo Thinkpad T470 with the following specifications:

  • Intel Core i5-7300U CPU with 2.6 GHz and 4 Cores

  • 16 gigabytes of RAM

  • 256 gigabytes SSD storage

  • Running Arch Linux

Basic Benchmark

First the overhead of a system call and a reschedule was measured. The basic benchmark invokes the system calls getpid and sched_yield up to 10.000 times after the cache has been warmed up. It measures how many CPU cycles the respective calls need on average. Getpid is the system call with the shortest runtime, it can be used to determine the general overhead of a system call. Sched_yield checks if another task is ready to be executed and switches to this task. The benchmark also checks how long it takes to allocate a megabyte of memory and how long the first write access to a page table takes.

System activity KVM QEMU HVM PVH
getpid 9 122 12 12
sched_yield 79 360 90 83
malloc 5858 51812 51311 86658
write access 3368 34626 42607 83368
Results of the Basic Benchmark


It is not surprising that HermitCore running on KVM shows the best performance overall. It is however interesting to see that a PVH and HVM guest can almost keep up with it in regards to system call performance. What is also surprising is that the memory access of a PVH guest is much slower than that of a HVM guest considering that both their mechanism for page table management is virtualized in hardware.

Stream Benchmark

The Sustainable Memory Bandwidth in Current High Performance Computers STREAM benchmark is a synthetic test written in Fortran to measure the performance of four distinct long vector operations. They represent the elementary operations on which vector codes are based and are specifically intended to eliminate data re-use. The results display the sustainable memory bandwidth in megabytes per second and the corresponding computation time for the four vector operations.

Name Function bytes per iteration
Copy a(i) = b(i) 16
Scale a(i) = q * b(i) 16
Sum a(i) = b(i) + c(i) 24
Triad a(i) = b(i) + q * c(i) 24
Functions used in the STREAM benchmark


Environment Bandwidth MB/s Avg time Min time Max time
    Copy    
KVM 23342.8 0.009865 0.009596 0.014814
QEMU 5153.7 0.045812 0.043464 0.047941
HVM 24294.7 0.009369 0.009220 0.010860
PVH 24141.9 0.009469 0.009278 0.012305
         
    Scale    
KVM 16556.5 0.013793 0.013529 0.017478
QEMU 1094.8 0.218594 0.204610 0.229119
HVM 17263.1 0.013157 0.012976 0.015088
PVH 17189.3 0.013252 0.013031 0.019612
         
    Add    
KVM 19264.9 0.017679 0.017441 0.020491
QEMU 1562.1 0.225724 0.215092 0.237715
HVM 20038.9 0.016974 0.016767 0.018722
PVH 19955.0 0.017068 0.016838 0.022021
         
    Triad    
KVM 19088.8 0.017928 0.017602 0.021669
QEMU 897.4 0.394932 0.374413 0.415066
HVM 19856.3 0.017111 0.016922 0.018756
PVH 19772.2 0.017232 0.016994 0.021154
Results of the STREAM benchmark


It is very surprising to see, that a HVM guest running in Xen outperforms all others in terms of possible memory bandwidth and corresponding computing time. Especially if considered that a HVM guest in the testing environment is actually a virtual machine running inside a virtual machine (the Xen hypervisor) running on top of Linux. Although a KVM and PVH VM achieve almost the same results with only one to four percent deviation. You can also see that the virtualization purely based on QEMU seems to be rather inefficient and slow.

Boot time

At last the time needed by the VMs to boot was compared. The included hello world test was run in all environments and the reported boot time was noted. This time is how long it takes HermitCore until it is able to start the hello world application.

Environment Time in ms
KVM 80
QEMU 60
HVM 6140
PVH 80
Boot time for the different Environments


HermitCore takes about the same time for it in all environments. The only exception is a HVM guest. It takes this guest very long in comparison to detect and start the emulated devices it is provided by Xen, which results in a about 80 times slower boot time.

Conclusion

The results of the previous benchmarks show that when running as either a HVM or PVH guest in Xen, HermitCore is definitely able to perform as well as a the original implementation running as a KVM accelerated virtual machine in QEMU. There are some small exceptions, notably the very slow boot time of a HVM guest in comparison to all others and the long memory access times of a PVH and HVM guest in terms of CPU cycles, but the overall results are very similar.

I hope you will join me again next time for the last post of this series. There i will provide a short summary and some possible outlooks into the future.

So long Jan