October 9th, 2017


GRPC server throttling

Imagine your server can handle 7000 RPS. What would you expect if you're sending 8000 RPS? I'd naturally expect some queuing to happen to handle short spikes, but if queue is overloaded, to quickly turn away all the excess requests (1000 RPS), while still gracefully handing the 7000 RPS that server can handle. So a client would see something like: 8000 RPS sent, 7000 RPS succeeded, 1000 RPS failed.

Apparently the world isn't interested in utilizing 100% of their hardware. They'd rather over-provision and load-balance away from their throughput problems, than deal with them properly.
To my deepest surprise, GRPC, the super-modern super-duper-protobuf-optimized RPC technology that uses HTTP 2.0 streams:
- doesn't have a supported way to define server-side throttling,
- doesn't have a simple way to define client request timeout.

So the example above will accrue 1000 RPS in an unbound queue until it blows up out of memory. A client will see a gradual reduction of latency of responses and then suddenly a dead server.


Server side (Java-specific):
- Server may use a custom Executor for threadpool and request queue. The default is fork-join threadpool with unbound queue. However, you can't really provide a ThreadPoolExecutor with a bound blocking queue. Server may generate more work items to be added to the pool, and if it can put some but not all, it will end up in a weird internal state with all sorts of issues. The right way to control both concurrency and queue size is to use FixedThreadPool with unbound queue as Executor, and define your own server-side interceptor, which will have a queue or a counter for requests in flight, and will turn away the requests when server is overloaded. Learned it from a principal staff guy from Salesforce, who attended SV Code Camp this weekend. He said he's spent quite a bit of time on it and talked with Google guys, who said they didn't implement it, because they didn't want to cover all possible cases in the world, just wanted to make a default case work really well for majority of people, but it's up to you to implement your own. (Conclusion: Google guys think that server throttling is not what majority of people would want to have.) I've found an example of what I need in 2 year old forum post: https://groups.google.com/d/msg/grpc-io/LTFpelGTdtw/ZrVjQqhzCAAJ

Client side:
- There's a Deadline thing, configured on a function stub, which is normally instantiated once per client object. Deadline is designed to be shared for all subsequent requests and designed for streaming scenario when you care that the entire work is finished by a specified time. Throttling scenario, when you want a timeout per request, isn't supported out of the box. You can construct a new stub for each call, which seem to work, but doesn't look like, or you can write your own default deadline interceptor (e.g. https://github.com/salesforce/grpc-java-contrib/blob/master/grpc-contrib/src/main/java/com/salesforce/grpc/contrib/interceptor/DefaultDeadlineInterceptor.java).
- When a client requests expires on deadline, client writes some garbage to the HTTP stream, and it causes the server to log a message in netty layer. I am yet to figure out how to rid of that message, because the logging alone can bring down server's performance.

Upd.: actually found that thread on server interceptor solution: https://groups.google.com/forum/#!topic/grpc-io/XCMIva8NDO8


Finally something radically new in the IR industry, aside from machine learned ranking, which (reportedly) works in production at large scale.
Hash signature-based matching phase of search, as a replacement for inverted index (document posting lists), with plenty of interesting optimizations.

Link to PDF from SIGIR 2017:

Video presentation: https://www.youtube.com/watch?v=1-Xoy5w5ydM

// Bonus: it uses bloom filters ;)

Note: the topic in general is kinda advanced, but this particular video should be understandable for beginner programmers. Not recommended for general crowd, though.