How to build robust GraphQL APIs
May 27, 2018
Just read this awesome article in building GraphQL APIs. Github and facebook have created their own public facing GraphQL APIs. When I first dapple with this topic last year, one of the crucial question is resource management. Resource management is specifically the CPU and memory usage of data persistence layer.
In RESTful world, API designers can require the callers to bound the request. Take some naïve example in place I work, when the caller of
/orders, we require them to specify the limit of number of orders or the date range. So the Swagger looks a bit like this.
/orders: get: security: - CourierAuth:  parameters: - name: page in: query type: integer description: 'default set to 1' - name: limit in: query type: integer description: 'default set to 30' - name: status in: query type: string gtgt required: true enum: - pending - in_progress - is_completed responses: '200': description: OK schema: $ref: '#/definitions/Orders' summary: List orders description: List orders tags: - Order produces: - application/json
But in the world of GraphQL, there is no elegant way to specify this. Also, the limit can be different, even for the same application, with different usage, the result set can be differ so much.
In the context of database, we can separate into 2 questions:
- How can we make sure an arbitrary GraphQL query won’t run too long and use up all CPU of the database?
- How can we make sure an arbitrary GraphQL query won’t return a yuuuuuuuggeeeeeee results and use up all memory / bandwidth of client?
Turns out there can be mathematical / academic ways to deal with these problem.
So first this paper established that for every queries in GraphQL, there is a way to transform it into non-redundant queries in ground-typed normal form. Sounds intimidating? Yes it is. I am still grasping what is that. Anyone has an idea let me know.
And then based on the algorithm mentioned in the original paper2, we can determine the size of the result in polynomial time. After that, we can choose to really run the query or just return an error of “result set too large”.
In the following weeks, I would like to have time to prototype it, but I cannot guarantee anything.
P.S: A little bit takeaway in reading this:
- Formalization of problem really helps us to think about it. Formalization essentially separate convoluting concepts and parameters into mathematical notations. Real world socio-political or human dynamics problems may not be applicable but most other problems, I would argue, is at least partly applicable to this approach.
- Everytime I read mathematical notations I got cognitive overload. Anyway to solve this? I am drawing mindmap but still very overloaded.