Apparently the world isn't interested in utilizing 100% of their hardware. They'd rather over-provision and load-balance away from their throughput problems, than deal with them properly.
To my deepest surprise, GRPC, the super-modern super-duper-protobuf-optimized RPC technology that uses HTTP 2.0 streams:
- doesn't have a supported way to define server-side throttling,
- doesn't have a simple way to define client request timeout.
So the example above will accrue 1000 RPS in an unbound queue until it blows up out of memory. A client will see a gradual reduction of latency of responses and then suddenly a dead server.
Server side (Java-specific):
- Server may use a custom Executor for threadpool and request queue. The default is fork-join threadpool with unbound queue. However, you can't really provide a ThreadPoolExecutor with a bound blocking queue. Server may generate more work items to be added to the pool, and if it can put some but not all, it will end up in a weird internal state with all sorts of issues. The right way to control both concurrency and queue size is to use FixedThreadPool with unbound queue as Executor, and define your own server-side interceptor, which will have a queue or a counter for requests in flight, and will turn away the requests when server is overloaded. Learned it from a principal staff guy from Salesforce, who attended SV Code Camp this weekend. He said he's spent quite a bit of time on it and talked with Google guys, who said they didn't implement it, because they didn't want to cover all possible cases in the world, just wanted to make a default case work really well for majority of people, but it's up to you to implement your own. (Conclusion: Google guys think that server throttling is not what majority of people would want to have.) I've found an example of what I need in 2 year old forum post: https://groups.google.com/d/msg/grpc-io/LTFpelGTdtw/ZrVjQqhzCAAJ
- There's a Deadline thing, configured on a function stub, which is normally instantiated once per client object. Deadline is designed to be shared for all subsequent requests and designed for streaming scenario when you care that the entire work is finished by a specified time. Throttling scenario, when you want a timeout per request, isn't supported out of the box. You can construct a new stub for each call, which seem to work, but doesn't look like, or you can write your own default deadline interceptor (e.g. https://github.com/salesforce/grpc-java-contrib/blob/master/grpc-contrib/src/main/java/com/salesforce/grpc/contrib/interceptor/DefaultDeadlineInterceptor.java).
- When a client requests expires on deadline, client writes some garbage to the HTTP stream, and it causes the server to log a message in netty layer. I am yet to figure out how to rid of that message, because the logging alone can bring down server's performance.
Upd.: actually found that thread on server interceptor solution: https://groups.google.com/forum/#!topic/grpc-io/XCMIva8NDO8