Xudong-Huang's Blog

Why hyper coroutine version is slower than the future version

Posted on 2018-01-10

| Words count in article 929

Last time I post a blog Boosting Hyper with MAY that you can see the coroutine version is slower than the future version with a single working thread. I’m curious about why it’s slow. Some questions arise in my head. Maybe the context switching cost is too high? Or maybe the logic of thread version is not as optimized as the future version. After all the thread version of hyper is not actively developed compared with the master branch.

So I decide to profile the server to see what actually happens.

Install the profiling tools

I use cargo-profile, it’s not actively developed now, but usable.

I use the following command to install it on my ubuntu vm

1 2	$ sudo apt-get install valgrind $ cargo install cargo-profiler

Modify the echo server

We need to let the server exit normally to make the profile tool happy. Just insert this code in the echo server example

fn main() {
    ......
    // insert the code to let the server exit after 10s
    std::thread::spawn(|| {
        std::thread::sleep(std::time::Duration::from_secs(10));
        std::process::exit(0);
    });
    println!("Listening on http://127.0.0.1:3000");
}

Running the profile

We use the release version to see what happened with hyper.

1
2
3

$ cd hyper
$ cargo build --example=hello --release
$ cargo profiler callgrind --bin target/release/examples/hello

And run the client in another terminal at the same time.

1	$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 20

After 10 seconds you will see the result printed out by cargo-profile.

The may version result

70,276,542 (8.9%) ???:_..std..io..Write..write_fmt..Adaptor....a$C$..T....as..core..fmt..Write..::write_str 
-----------------------------------------------------------------------
64,212,448 (8.2%) memcpy-sse2-unaligned.S:__memcpy_sse2_unaligned
-----------------------------------------------------------------------
61,684,172 (7.8%) ???:_..hyper..header..NewlineReplacer....a$C$....b....as..core..fmt..Write..::write_str
-----------------------------------------------------------------------
55,352,952 (7.0%) ???:_..std..io..buffered..BufWriter..W....as..std..io..Write..::write
-----------------------------------------------------------------------
47,665,042 (6.1%) ???:_..hyper..http..h1..HttpWriter..W....as..std..io..Write..::write
-----------------------------------------------------------------------

The future version result

76,417,403 (12.0%) memcpy-sse2-unaligned.S:__memcpy_sse2_unaligned
-----------------------------------------------------------------------
71,438,100 (11.2%) memset.S:__GI_memset
-----------------------------------------------------------------------
32,188,192 (5.0%) ???:_..futures..future..map_err..MapErr..A$C$..F....as..futures..future..Future..::poll
-----------------------------------------------------------------------
27,514,102 (4.3%) ???:hyper::proto::h1::parse::_..impl..hyper..proto..Http1Transaction..for..hyper..proto..ServerTransaction..::parse
-----------------------------------------------------------------------
22,228,235 (3.5%) ???:tokio_core::reactor::Core::poll
-----------------------------------------------------------------------
19,713,445 (3.1%) ???:_..hyper..proto..dispatch..Dispatcher..D$C$..Bs$C$..I$C$..B$C$..T$C$..K....::poll_read

Conclusion

Apparently the hyper thread version is not fully optimized. It spend it’s most time to format strings. While the future version spend most of it’s time to copy memory which makes sense that the server is just echo back strings.

And we also notice that the future based version spend a noticeable amount of time to call the poll method by the framework

(3.5%) ???:tokio_core::reactor::Core::poll

May version context switch runtime is not that heavy, I found this in the profile

(1.8%) ???:_..F..as..generator..gen_impl..FnBox..::call_box

P.S.

For a fare comparison for simple http echo server, I think the most proper candidates are tokio_minihttp and the http example in may project. They don’t evolve too much frameworks and just do the echo things.

Below is the result on my ubuntu vm

tokio_minihttp(1 thread)

$ wrk http://127.0.0.1:8080 -d 10 -t 2 -c 20
Running 10s test @ http://127.0.0.1:8080
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   262.43us  818.35us  17.52ms   97.63%
    Req/Sec    61.71k     9.21k   79.79k    84.50%
  1228552 requests in 10.01s, 124.19MB read
Requests/sec: 122691.03
Transfer/sec:     12.40MB
wrk http://127.0.0.1:8080 -d 10 -t 2 -c 20  1.80s user 7.25s system 90% cpu 10.017 total

may http example(1 thread)

$ wrk http://127.0.0.1:8080 -d 10 -t 2 -c 20
Running 10s test @ http://127.0.0.1:8080
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   502.23us    2.12ms  28.01ms   95.76%
    Req/Sec    87.78k     8.60k  113.63k    79.50%
  1748461 requests in 10.02s, 155.07MB read
Requests/sec: 174571.22
Transfer/sec:     15.48MB
wrk http://127.0.0.1:8080 -d 10 -t 2 -c 20  1.80s user 9.59s system 113% cpu 10.019 total

may http example(2 thread)

$ wrk http://127.0.0.1:8080 -d 10 -t 2 -c 20
Running 10s test @ http://127.0.0.1:8080
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   119.37us  673.66us  15.79ms   98.46%
    Req/Sec   180.75k    18.13k  197.66k    87.00%
  3599209 requests in 10.01s, 319.22MB read
Requests/sec: 359663.27
Transfer/sec:     31.90MB
wrk http://127.0.0.1:8080 -d 10 -t 2 -c 20  3.68s user 15.86s system 195% cpu 10.010 total

Introducing may_rpc

Posted on 2018-01-03

| Words count in article 449

I’d like to introduce the may_rpc project which is a coroutine based RPC framework for rust that powered by may. It’s inspired by tarpc which has more detailed documentation.

may_rpc allows users write coroutine style code for server/client logic which is very efficient.

How to use the framework

Declare the rpc specification

rpc! {
    /// the connection type, default is Tcp
    /// valid types: Udp, Tcp, Multiplex
    net: Multiplex;
    /// Say hello
    rpc hello(name: String) -> String;
    /// add two number
    rpc add(x: u32, y: u32 ) -> u32;
}

The rpc! macro expands to a collection of items that form an rpc service. In the above example, the rcp! macro is called with a user supplied rcp spec. This will generate RpcClient type, RpcServer type and RpcSpec trait. These generated types make it easy and ergonomic to write server and client without dealing with sockets or serialization directly. Just simply implement the generated traits for sever, that’s it.

when net type is Multiplex, you can freely clone the client instance without multiple connections to the server.

Implement RpcSpec for server

struct HelloImpl;

impl RpcSpec for HelloImpl {
    fn hello(&self, name: String) -> String {
        name
    }

    fn add(&self, x: u32, y: u32) -> u32 {
        x + y
    }
}

Create the server

1	let server = RpcServer(HelloImpl).start("127.0.0.1:3333").unwrap();

The returned server is just a coroutine handle. You can shut down it later if necessary.

Create a client and call the service

1
2
3

let client = RpcClient::connect("127.0.0.1:3333").unwrap();
assert_eq!(&client.hello(String::from("may_rpc")).unwrap(), "may_rpc");
assert_eq!(client.add(10, 32).unwrap(), 42);

Shut down the service gracefully

1 2	unsafe { server.coroutine().cancel() }; server.join().ok();

This will kill all the coroutines that spawned by the server even they are not finished.

You can see more examples in the project.

Performance

Just run the throughput example under may_rcp, the service is return immediately after receiving a request, so the result is just the framework’s maximum limit.

Machine Specs:

Logical Cores: 4 (4 cores x 2 threads)
Memory: 4gb ECC DDR3 @ 1600mhz
Processor: CPU Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
Operating System: Windows 10

Test config:

To fully utilize the CPU, I use the following config on my laptop. Normally it will reach the maximum speed with workers:io_workers = 3:2, you can tune yours accordingly.

1	may::config().set_workers(6).set_io_workers(4);

result:

$ cargo run --example=throughput --release
......
     Running `target\release\examples\throughput.exe`
206127.39 rpc/second

Easy coding with high performance, enjoy your life!

Announcing may_actor

Posted on 2018-01-01

| Words count in article 346

I’d like to introduce you a simple actor library that implemented based on MAY.

With this library, when create an actor

you don’t need to declare messages that passed into the actor
you don’t have to implement “actor” interface or trait for your actor.

You just wrap your actual actor struct with the Actor<T> type. It’s like a handle that used to access the actor.

fn it_works() {
    let i = 0u32;
    let a = Actor::new(i);
    a.call(|me| *me += 2);
    a.call(|_me| panic!("support panic inside"));
    a.call(|me| *me += 4);
    // the view would wait previous messages process done
    a.view(|me| assert_eq!(*me, 6));
}

You send message to the actor with the call API. It accepts a closure that have the &mut T as parameter. So that you can change it’s internal state. The closure would be send to a queue inside the actor, and the actor would execute the closure by a coroutine that associate with it. This API would not block user’s execution and would return immediately.

You can also view the actor internal state by the view API. It accepts a closure that have the &T as parameter, so that you can access the state without modify permission. The closure would be executed by the associated coroutine if there are no other pending messages need to process. And it will block until the closure returns. So during the view stage, you are guaranteed that no others are modifying the actor.

The actor can be cloned to get a new handle, this is just like how Arc<T> works, after all the actor handle got dropped, the associated coroutine will automatically exit.

And you can transmute a &self type unsafely to actor handle Actor<T>. This is convenient when you need to get the actual handle that need to passed to other actors in your implementations.

However, this simple library doesn’t support spawn actors across processes.

For more examples, please visit the repo on github.

Release May 0.2.0

Posted on 2018-01-01

| Words count in article 282

I’m glad to release MAY 0.2.0. You can ref the Changes in github.

ChangeLog

v0.2.0

MAY 0.2.0 focus on changing the spawn APIs to unsafe so that apply rust safety rules. And this is a breaking change to v0.1.0

all the spawn APs are declared with unsafe for TLS access and stack exceed limitations
add go! macro for ease of use ‘spawn’ APIs to avoid writing unsafe block explicitly
add simple http server example
remove unsafe code for MAY configuration
improve documentation

Why spawn a coroutine is unsafe

you can ref the caveat of May for the following two reasons.

if user access TLS, it may trigger undefined behavior
if user coroutine implementation exceed the stack limitation, it will trigger undefined behavior.

To apply rust safety rules, the spawn APIs must be declared with unsafe keyword.

The `go!` macro

Because of the unsafe property of spawn APIs, now you have to add a unsafe block when creating a new coroutine, like this:

unsafe {
    may::coroutine::spawn(|| {
       ... 
    });
}

However, surround everything in the unsafe block has two drawbacks.

you can’t get enough optimization by the rust compiler for the unsafe block
it’s a little annoying to write the unsafe every where.

a better code should be like this to enable the optimization for the coroutine implementation:

let closure = move || {
    ...
};

unsafe { 
    may::coroutine::spawn(closure);
}

So I introduce the go! macro just doing the same thing for you. It’s just a thin wrapper for the spawn APIs.

1	let join_handle = go!(move \|\| { ... });

Happy new year for 2018!

Tune Stack Size when using MAY

Posted on 2017-12-26

| Words count in article 400

Because MAY doesn’t support automatic stack increasing, you have to determine the stack size of the coroutine before running your application. Read this for more information.

When creating coroutines in MAY, the library would alloc a chunk of memory from heap as the stack for each coroutine. The default stack size is 4k words, in 64bit system it is 32k bytes. For most of the cases, this stack size is big enough for simple coroutines. Buf if you have a very complicated coroutine that need more stack size, or you want to shrink the stack size to save some memory usage, you can set the stack size explicitly to a reasonable value.

Change default coroutine stack size

You can config the default stack size at the initialization stage. when create a coroutine and you not specify the stack size for it, it will use this default stack size.

The unit is by word. the flowing code would set the default stack size as 8k bytes.

1
2
3

may::config().set_stack_size(0x400);
// this coroutine would use 8K bytes stack
may::coroutine::spawn(...);

Set stack size for a single coroutine

You can use the coroutine Builder to specify a stack size for the new spawned coroutine. This would ignore the global default stack size.

The following code would spawn a coroutine with 16k bytes stack

// this coroutine would use 16K bytes stack
may::coroutine::Builder::new()
        .stack_size(0x800)
        .spawn(...)
        .unwrap();

Get the coroutine stack usage

If you need to know the exact stack usage number for your coroutine, you can set the stack size to an odd number. If the passed in stack size is an odd number, MAY would initialize the whole stack for the coroutine with a special pattern data, thus during the programme executing we can detect the footprint of the stack, after the coroutine is finished, MAY would print out the actual usage.

For example the blow code

extern crate may;
use std::io::{self, Read};

fn main() {
    may::coroutine::Builder::new()
        .name("test".to_owned())
        .stack_size(0x1000 - 1)
        .spawn(|| {
            println!("hello may");
        })
        .unwrap();

    println!("Press any key to continue...");
    let _ = io::stdin().read(&mut [0u8]).unwrap();
}

would give output like this

1 2	hello may coroutine name = Some("test"), stack size = 4095, used size = 266

MAY restrictions

Posted on 2017-12-25

| Words count in article 464

Though MAY is easy to use, but it does have some restrictions. And these restrictions are hard to remove. You must know them very well before writing any user coroutine code.

If you are not aware of them, your application will easily lost performance or even trigger undefined behaviors.

I will list 4 rules bellow

Don’t call thread blocking APIs

This is obvious. Calling thread block version APIs in coroutine would halt the worker thread and can’t schedule other ready coroutines. It will hurt the performance.

Those block version APIs includes:

io block operation such as socket read()/write()
sync primitive APIs such as Mutex/Condvar
thread related APIs such as sleep()/yield_now()
and functions that call those block version api internally, such as third party libraries

The solution is calling MAY API instead. And port necessary dependency libraries to May compatible version.

Don’t use Thread Local Storage

Access TLS in coroutine would trigger undefined behavior and it will be hard to debug the issue.

There is a post already cover this topic. And the solution is using Coroutine Local Storage instead.

But if you are depending on a third party function that uses TLS you likely get hurt sooner or later. There is an issue that discuss this a bit.

Currently calling libstd APIs is safe in MAY coroutines.

Don’t run CPU bound tasks for long time

MAY scheduler runs coroutines cooperatively which means if a running coroutine doesn’t yield out, it will occupy the running thread and other coroutines will not be scheduled on that thread.

MAY APIs will automatically yield out if necessary, so this is not a problem. But if you are running a long time CPU bound task in coroutine, you’d better call coroutine::yield_now() manually at appropriate point.

Don’t exceed the stack

MAY doesn’t support automatic stack increasing. Each coroutine alloc a limited stack size for its own. If the coroutine exceeds it’s stack size, it will trigger undefined behavior.

So that you should avoid calling recursive functions in coroutine. Recursive function calls would easily exhaust your stack.

And you also should avoid calling functions that internally use a big stack space like std::io::copy().

may::config()::set_stack_size() can be used to set the default stack size for all coroutines. And you can use may::coroutine::Builder to specify a single coroutine stack size.

1	Builder::new().stack_size(0x2000).spawn(...)

I write a another post that describes how to tune the stack size for your application.

Summary

Develop a coroutine based system in rust is not an easy task. MAY can’t prevent users doing wrong things. So you should know those caveats clearly before using the library. I hope those restrictions not prevent you using the library 🙂

Change generator-rs license

Posted on 2017-12-24

| Words count in article 139

In reddit thread jechse remind me of using MAY project:

A bit of a licensing concern though: the generator dependency
is licensed under the LGPL, and since rust statically links 
dependencies, any project that uses the May crate will also 
be statically linking the LGPL crate, and have to worry about 
all of the licensing implications of that. IANAL, and 
everything may actually be more kosher than I think, but LGPL
code almost always presents a not-insignificant difficulty in 
getting adopted by companies with cautious lawyers in my 
experience.

So I decide to change the license of generator-rs from LGPL2.1 to MIT/Apache to remove any obstacle to use the libraries.

I have very little experience of using public license. But I do believe a proper public license could help both the project and users.

Thanks!

Boosting Hyper with MAY

Posted on 2017-12-23

| Words count in article 915

In this blog post I will describe how to port hyper to use may for asynchronous IO. You will see how easy to porting the thread based code to coroutine based. And also get the high performance that may powers up.

About Hyper

hyper is rust most important Http library that is widely used and it’s trying to give the best asynchronous IO experience for users. Currently hyper is using tokio for async io. But it’s also having a v0.10.x tag that is thread based which is used by Rocket project.

I will port the thread based version to the coroutine based version for hyper.

The test server is the simple hello example in hyper and I will list all the bench results also.

Bench Settings

Machine Specs:

Logical Cores: 4 (4 cores x 1 threads)
Memory: 4gb ECC DDR3 @ 1600mhz
Processor: CPU Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz
Operating System: Ubuntu VirtualBox guest

Bench client

wrk

The master that using tokio

Suppose that you have clone the hyper repo in local. Just checkout the master branch of it and run the following commands to start the server

$ git checkout origin/master
$ cargo run --example=hello --release
......
Listening on http://127.0.0.1:3000 with 1 thread.

and the bench result is

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.30ms    0.92ms  18.82ms   93.04%
    Req/Sec    44.99k     7.35k   51.85k    88.50%
  895392 requests in 10.03s, 110.15MB read
Requests/sec:  89283.87
Transfer/sec:     10.98MB
wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200  1.72s user 5.70s system 73% cpu 10.046 total

The v0.10.x that using thread

checkout the thread version code and run the server. By default it will spawn enough threads to keep all cpu busy.

$ git checkout origin/0.10.x
$ cargo run --example=hello --release
......
Listening on http://127.0.0.1:3000

and the bench result is:

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    52.54us   56.39us  12.14ms   99.36%
    Req/Sec    74.67k     7.90k   91.51k    65.00%
  742924 requests in 10.04s, 62.35MB read
Requests/sec:  73998.86
Transfer/sec:      6.21MB
wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200  0.85s user 5.70s system 65% cpu 10.043 total

you can notice that the performance is not as good as previous one because the thread module doesn’t support async IO.

The coroutine based version

now apply the patch to the thread based version. this patch only has a few changes that just replace the necessary std APIs to may APIs and change the thread number to 1000 coroutines.

$ git checkout -b coroutine
$ git remote add may https://github.com/Xudong-Huang/hyper
$ git fetch may
$ git cherry-pick may/master
$ git show HEAD --stat
......
 Cargo.toml             |  1 +
 examples/hello.rs      |  2 ++
 src/client/pool.rs     |  3 ++-
 src/lib.rs             |  1 +
 src/net.rs             |  4 ++--
 src/server/listener.rs | 19 +++++++++++--------
 src/server/mod.rs      |  7 ++++---
 src/server/response.rs |  7 ++++---
 8 files changed, 27 insertions(+), 17 deletions(-)

first modify src/hello.rs to using only one thread

1	may::config().set_io_workers(1).set_stack_size(0x2000);

and the test result is

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.26ms    6.48ms 105.13ms   98.35%
    Req/Sec    38.84k     3.21k   45.75k    85.00%
  773544 requests in 10.05s, 64.92MB read
Requests/sec:  76956.66
Transfer/sec:      6.46MB

Ok, it’s almost the same as the thread version.

now change the io workers to 3 and see what happened

1	may::config().set_io_workers(3).set_stack_size(0x2000);

bench result is:

$ wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200
Running 10s test @ http://127.0.0.1:3000
  2 threads and 200 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.50ms    2.10ms  29.28ms   89.43%
    Req/Sec    95.76k    19.05k  139.73k    58.50%
  1910441 requests in 10.07s, 160.33MB read
Requests/sec: 189790.65
Transfer/sec:     15.93MB
wrk http://127.0.0.1:3000 -d 10 -t 2 -c 200  1.86s user 10.68s system 124% cpu 10.071 total

much better now!

Conclusion

coroutine based io is much powerful than thread based
it’s quite easy to port thread based system to coroutine based system
hyper’s master is really fast with the fact that it runs only on one thread. faster than coroutine based. maybe the whole arch is totally different.

MAY - Rust Stackful Coroutine Library

Posted on 2017-12-23

| Words count in article 368

I’m excited to announce the May project.

Introduction

May is a high performance stackful coroutine library that can be thought of rust version goroutine. you can use it easily to design and develop massive concurrent programs in rust.

The development of this library was heavily inspired by Python’s generator and Go’s goroutine.

We are not only fearless concurrency in rust but now
embracing concurrency.

About the name

You can think of MAY as “multi-thread asynchronous yield” abbreviation. But actually it’s my daughter’s nickname. She was born in May when rust v1.0 got released

Goal of May

High performance with multi-core supported on main stream OS
Writing asynchronous code with synchronous style just like writing golang program
Easily porting existing code to coroutine based programs

Main Components and corresponding std library components

	may	std
coroutine	`may::coroutine`	`std::thread`
io	`may::net`	`std::net`
sync primitives	`may::sync`	`std::sync`

Of course May has some features that the std library doesn’t supply. Like cancelling a coroutine, general select API, MPMC channels and so on.

Current status

I have use this library for sometime both in my private projects and company evaluation component. Most of the features are quite stable. Though I didn’t fully test it in every conner cases. The main goals are all fulfilled now. You can explorer more examples on the github page.

Contributing

The project is still in a fresh stage, and there are a lot of things need to polish. I warmly welcome any suggestions for the project, including code review, documentation, usage, examples, tests and performance. you can try it out and create issues in the project.

License

MIT/Apache-2.0

P.S.

My mother tongue isn’t English. I hope you can follow my mind here.

I learned a lot from the rust community in past two years. This is a great community with a lot of edge techs and inspired new ideas. It’s my honor to contribute back to the community.

I hope with more matured features and libraries, we can attract more and more people from other programing languages, like golang and python.

I will use this blog to cover several other topics about May project very soon.

Merry Christmas!

Don’t visit thread local storage in coroutine context

Posted on 2017-12-18

| Words count in article 289

Visit TLS variables in coroutine context is not safe. It has the same effect of access global variables without protection in multi thread environment. Because coroutines could run on any thread that the scheduler assign with it at run time, and each time a coroutine is scheduled again the running thread may be a different one.

But only read TLS from coroutines context is some sort of “safe” for that it doesn’t change any thing. It may get outdated value when it scheduled to a different thread, where the TLS value is not the same as the former thread.

If you really need a local storage variable usage in the coroutine context, then use the coroutine local storage(CLS) instead. the CLS guarantees that each coroutine has its own unique local storage. Beside, if you access a CLS in the thread context then it will fall back to its TLS storage according to the given key. So access CLS from thread context is totally safe.

In rust declare a TLS is through using the thread_local macro, where declare a CLS variable is through using coroutine_local macro, so what you need to do is just replacing the thread_local word with coroutine_local. Below is a simple example that shows how it works.

fn coroutine_local_many() {
    use std::sync::atomic::{AtomicUsize, Ordering};
    coroutine_local!(static FOO: AtomicUsize = AtomicUsize::new(0));

    coroutine::scope(|scope| {
        for i in 0..10 {
            scope.spawn(move || {
                FOO.with(|f| {
                    assert_eq!(f.load(Ordering::Relaxed), 0);
                    f.store(i, Ordering::Relaxed);
                    assert_eq!(f.load(Ordering::Relaxed), i);
                });
            });
        }
    });
    // called in thread
    FOO.with(|f| {
        assert_eq!(f.load(Ordering::Relaxed), 0);
    });
}

Install the profiling tools

Modify the echo server

Running the profile

The may version result

The future version result

Conclusion

P.S.

How to use the framework

Declare the rpc specification

Implement RpcSpec for server

Create the server

Create a client and call the service

Shut down the service gracefully

Performance

ChangeLog

Why spawn a coroutine is unsafe

The go! macro

Happy new year for 2018!

Change default coroutine stack size

Set stack size for a single coroutine

Get the coroutine stack usage

Don’t call thread blocking APIs

Don’t use Thread Local Storage

Don’t run CPU bound tasks for long time

Don’t exceed the stack

Summary

About Hyper

Bench Settings

The master that using tokio

The v0.10.x that using thread

The coroutine based version

Conclusion

Introduction

About the name

Goal of May

Main Components and corresponding std library components

Current status

Contributing

License

P.S.

The `go!` macro