morfizm (morfizm) wrote,
morfizm
morfizm

GRPC server throttling

Imagine your server can handle 7000 RPS. What would you expect if you're sending 8000 RPS? I'd naturally expect some queuing to happen to handle short spikes, but if queue is overloaded, to quickly turn away all the excess requests (1000 RPS), while still gracefully handing the 7000 RPS that server can handle. So a client would see something like: 8000 RPS sent, 7000 RPS succeeded, 1000 RPS failed.

Apparently the world isn't interested in utilizing 100% of their hardware. They'd rather over-provision and load-balance away from their throughput problems, than deal with them properly.
To my deepest surprise, GRPC, the super-modern super-duper-protobuf-optimized RPC technology that uses HTTP 2.0 streams:
- doesn't have a supported way to define server-side throttling,
- doesn't have a simple way to define client request timeout.

So the example above will accrue 1000 RPS in an unbound queue until it blows up out of memory. A client will see a gradual reduction of latency of responses and then suddenly a dead server.

Solutions.

Server side (Java-specific):
- Server may use a custom Executor for threadpool and request queue. The default is fork-join threadpool with unbound queue. However, you can't really provide a ThreadPoolExecutor with a bound blocking queue. Server may generate more work items to be added to the pool, and if it can put some but not all, it will end up in a weird internal state with all sorts of issues. The right way to control both concurrency and queue size is to use FixedThreadPool with unbound queue as Executor, and define your own server-side interceptor, which will have a queue or a counter for requests in flight, and will turn away the requests when server is overloaded. Learned it from a principal staff guy from Salesforce, who attended SV Code Camp this weekend. He said he's spent quite a bit of time on it and talked with Google guys, who said they didn't implement it, because they didn't want to cover all possible cases in the world, just wanted to make a default case work really well for majority of people, but it's up to you to implement your own. (Conclusion: Google guys think that server throttling is not what majority of people would want to have.) I've found an example of what I need in 2 year old forum post: https://groups.google.com/d/msg/grpc-io/LTFpelGTdtw/ZrVjQqhzCAAJ

Client side:
- There's a Deadline thing, configured on a function stub, which is normally instantiated once per client object. Deadline is designed to be shared for all subsequent requests and designed for streaming scenario when you care that the entire work is finished by a specified time. Throttling scenario, when you want a timeout per request, isn't supported out of the box. You can construct a new stub for each call, which seem to work, but doesn't look like, or you can write your own default deadline interceptor (e.g. https://github.com/salesforce/grpc-java-contrib/blob/master/grpc-contrib/src/main/java/com/salesforce/grpc/contrib/interceptor/DefaultDeadlineInterceptor.java).
- When a client requests expires on deadline, client writes some garbage to the HTTP stream, and it causes the server to log a message in netty layer. I am yet to figure out how to rid of that message, because the logging alone can bring down server's performance.

Upd.: actually found that thread on server interceptor solution: https://groups.google.com/forum/#!topic/grpc-io/XCMIva8NDO8
Tags: in english, software development, work
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 12 comments