Boosting Hyper with MAY

In this blog post I will describe how to port hyper to use may for asynchronous IO. You will see how easy to porting the thread based code to coroutine based. And also get the high performance that may powers up.

About Hyper

hyper is rust most important Http library that is widely used and it’s trying to give the best asynchronous IO experience for users. Currently hyper is using tokio for async io. But it’s also having a v0.10.x tag that is thread based which is used by Rocket project.

I will port the thread based version to the coroutine based version for hyper.

The test server is the simple hello example in hyper and I will list all the bench results also.

Bench Settings

Machine Specs:

  • Logical Cores: 4 (4 cores x 1 threads)
  • Memory: 4gb ECC DDR3 @ 1600mhz
  • Processor: CPU Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
  • Operating System: Ubuntu VirtualBox guest

Bench client

wrk

The master that using tokio

Suppose that you have clone the hyper repo in local. Just checkout the master branch of it and run the following commands to start the server

1
2
3
4
$ git checkout origin/master
$ cargo run --example=hello --release
......
Listening on http://127.0.0.1:3000 with 1 thread.

and the bench result is

1
2
3
4
5
6
7
8
9
10
$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
2 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.30ms 0.92ms 18.82ms 93.04%
Req/Sec 44.99k 7.35k 51.85k 88.50%
895392 requests in 10.03s, 110.15MB read
Requests/sec: 89283.87
Transfer/sec: 10.98MB
wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200 1.72s user 5.70s system 73% cpu 10.046 total

The v0.10.x that using thread

checkout the thread version code and run the server. By default it will spawn enough threads to keep all cpu busy.

1
2
3
4
$ git checkout origin/0.10.x
$ cargo run --example=hello --release
......
Listening on http://127.0.0.1:3000

and the bench result is:

1
2
3
4
5
6
7
8
9
10
$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
2 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 52.54us 56.39us 12.14ms 99.36%
Req/Sec 74.67k 7.90k 91.51k 65.00%
742924 requests in 10.04s, 62.35MB read
Requests/sec: 73998.86
Transfer/sec: 6.21MB
wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200 0.85s user 5.70s system 65% cpu 10.043 total

you can notice that the performance is not as good as previous one because the thread module doesn’t support async IO.

The coroutine based version

now apply the patch to the thread based version. this patch only has a few changes that just replace the necessary std APIs to may APIs and change the thread number to 1000 coroutines.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$ git checkout -b coroutine
$ git remote add may https://github.com/Xudong-Huang/hyper
$ git fetch may
$ git cherry-pick may/master
$ git show HEAD --stat
......
Cargo.toml | 1 +
examples/hello.rs | 2 ++
src/client/pool.rs | 3 ++-
src/lib.rs | 1 +
src/net.rs | 4 ++--
src/server/listener.rs | 19 +++++++++++--------
src/server/mod.rs | 7 ++++---
src/server/response.rs | 7 ++++---
8 files changed, 27 insertions(+), 17 deletions(-)

first modify src/hello.rs to using only one thread

1
may::config().set_io_workers(1).set_stack_size(0x2000);

and the test result is

1
2
3
4
5
6
7
8
9
$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
2 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.26ms 6.48ms 105.13ms 98.35%
Req/Sec 38.84k 3.21k 45.75k 85.00%
773544 requests in 10.05s, 64.92MB read
Requests/sec: 76956.66
Transfer/sec: 6.46MB

Ok, it’s almost the same as the thread version.

now change the io workers to 3 and see what happened

1
may::config().set_io_workers(3).set_stack_size(0x2000);

bench result is:

1
2
3
4
5
6
7
8
9
10
$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
2 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.50ms 2.10ms 29.28ms 89.43%
Req/Sec 95.76k 19.05k 139.73k 58.50%
1910441 requests in 10.07s, 160.33MB read
Requests/sec: 189790.65
Transfer/sec: 15.93MB
wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200 1.86s user 10.68s system 124% cpu 10.071 total

much better now!

Conclusion

  1. coroutine based io is much powerful than thread based
  2. it’s quite easy to port thread based system to coroutine based system
  3. hyper’s master is really fast with the fact that it runs only on one thread. faster than coroutine based. maybe the whole arch is totally different.