GRPC server throttling - morfizm
|Oct. 9th, 2017 11:59 am GRPC server throttling|
Imagine your server can handle 7000 RPS. What would you expect if you're sending 8000 RPS? I'd naturally expect some queuing to happen to handle short spikes, but if queue is overloaded, to quickly turn away all the excess requests (1000 RPS), while still gracefully handing the 7000 RPS that server can handle. So a client would see something like: 8000 RPS sent, 7000 RPS succeeded, 1000 RPS failed.12 comments - Leave a comment
Apparently the world isn't interested in utilizing 100% of their hardware. They'd rather over-provision and load-balance away from their throughput problems, than deal with them properly.
To my deepest surprise, GRPC, the super-modern super-duper-protobuf-optimized RPC technology that uses HTTP 2.0 streams:
- doesn't have a supported way to define server-side throttling,
- doesn't have a simple way to define client request timeout.
So the example above will accrue 1000 RPS in an unbound queue until it blows up out of memory. A client will see a gradual reduction of latency of responses and then suddenly a dead server.
Server side (Java-specific):
- Server may use a custom Executor for threadpool and request queue. The default is fork-join threadpool with unbound queue. However, you can't really provide a ThreadPoolExecutor with a bound blocking queue. Server may generate more work items to be added to the pool, and if it can put some but not all, it will end up in a weird internal state with all sorts of issues. The right way to control both concurrency and queue size is to use FixedThreadPool with unbound queue as Executor, and define your own server-side interceptor, which will have a queue or a counter for requests in flight, and will turn away the requests when server is overloaded. Learned it from a principal staff guy from Salesforce, who attended SV Code Camp this weekend. He said he's spent quite a bit of time on it and talked with Google guys, who said they didn't implement it, because they didn't want to cover all possible cases in the world, just wanted to make a default case work really well for majority of people, but it's up to you to implement your own. (Conclusion: Google guys think that server throttling is not what majority of people would want to have.) I've found an example of what I need in 2 year old forum post: https://groups.google.com/d/msg/grpc-io/LTFpelGTdtw/ZrVjQqhzCAAJ
- There's a Deadline thing, configured on a function stub, which is normally instantiated once per client object. Deadline is designed to be shared for all subsequent requests and designed for streaming scenario when you care that the entire work is finished by a specified time. Throttling scenario, when you want a timeout per request, isn't supported out of the box. You can construct a new stub for each call, which seem to work, but doesn't look like, or you can write your own default deadline interceptor (e.g. https://github.com/salesforce/grpc-java-contrib/blob/master/grpc-contrib/src/main/java/com/salesforce/grpc/contrib/interceptor/DefaultDeadlineInterceptor.java).
- When a client requests expires on deadline, client writes some garbage to the HTTP stream, and it causes the server to log a message in netty layer. I am yet to figure out how to rid of that message, because the logging alone can bring down server's performance.
Upd.: actually found that thread on server interceptor solution: https://groups.google.com/forum/#!topic/grpc-io/XCMIva8NDO8
|Date:||October 9th, 2017 08:26 pm (UTC)|| |
|Date:||October 10th, 2017 09:05 am (UTC)|| |
Haven't seen it, thanks! Nice text.
On that page everything seems apparent, but I can imagine I could pick up some useful bits from this kind of book. Added to reading list.
|Date:||October 10th, 2017 12:44 am (UTC)|| |
You like consistency, right? RPS stands for "requests per second" so your question RPS/sec means "requests per second^2" which is a weird metric to use ;)
Edited at 2017-10-10 12:44 am (UTC)
Это ускорение скорости запросов, очевидно :)
|Date:||October 10th, 2017 06:36 am (UTC)|| |
|Date:||October 10th, 2017 06:39 am (UTC)|| |
Yep, that's my point: re-configuring stub every time is like crutches, because the scenario isn't considered important enough.
Having a way to set global timeout would be good. That said, often different timeouts used for different methods in a service anyway. Such as non-mutating methods (get / list) are lowest, mutating (update) are in the middle, analytics (query) are slowest. So some per call tweaking may still be needed.
сегодня говорили про наш сервис и мне говорят "ну, представь что у нас будет миллион запросов в день!" (и поэтому надо сложно зафигачить) я вздохнул и сказал "это была бы неплохая проблема"
Edited at 2017-10-10 06:21 am (UTC)
|Date:||October 10th, 2017 06:42 am (UTC)|| |
Я понимаю твой поинт про ненадобность overengineering'а, но 11.6 RPS мне не кажется чем-то большим. Разве что если каждый запрос это какая-то значительная продажа, от которой приходит много бабла. Второй разумный вариант - если цель написать прототип и потом его выкинуть. В остальных случаях я бы согласился с твоим оппонентом.
даже 11 запросов в секунду было бы неплохо для начала получить. у нас запросов мало, много зато обработки на каждый запрос (но я не в обработке сижу, так что пофиг совершенно!)
Scalability problems are the best to have, yes :) Well, if the the scalability is caused by user traffic (not always the case in batch processing pipelines).