It took me a long time to understand the benefits of GraphQL. The official website summarizes it on the front page as:
A query language for your API
GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.
I had a really hard time trying to understand what this means.
I had to read the following articles to get a better sense of GraphQL:
GraphQL as I See it
The fundamental assumption is that there already exists usable data from the beginning. Let’s call it “the original data.” GraphQL comes into the picture to add another layer with regards to the access to this data, which increases its utility. Potential benefits include: i) precise definition of the accessible data; ii) querying capabilities that yield just the data you ask for and nothing else; and iii) improved performance. The last claim can be true only in specific circumstances, which we will look at later.
One crude analogy could be made here, and that is that GraphQL is like SQL in that it provides data definition and querying, except that GraphQL is only added as an afterthought to a database that is already operational.
There is not any particular restriction or assumption on the means of data fetching between the GraphQL layer and the original data. The original data can reside in local files, local databases, or remote databases — or it can even be simple variables in the GraphQL implementation language (there are many).
Two Parties Involved in Use of GraphQL
The following two parties are involved in the introduction and use of GraphQL to the original data: the GraphQL Manager and the GraphQL User (my own terms, not the GraphQL developers’). Their roles are as follows:
The GraphQL Manager:
- Defines type-wise structure of the data available for querying, in a schema-like fashion; this will be visible to the GraphQL User by introspection
- Implements the methods to access the data, in the form of “resolvers”; this will not be visible to the GraphQL User
The GraphQL User:
- Queries the data within the bounds of the GraphQL query format defined by the GraphQL Manager
Note the GraphQL Manager by himself is not responsible for managing the original data. Even though he “defines” the structure of the data available through the GraphQL interface, this definition does not determine how the original data is actually stored and managed. Rather, he defines what of the original data he wants to be exposed through the GraphQL interface.
Where GraphQL Is Inserted
In the typical scenario where a server holds data and clients request it, GraphQL is usually added on the server’s side, so the server can provide a better interface with querying capabilities to its clients. This seems what is usually assumed in the documents on GraphQL that I have read.
I do not think this has to be the only way, however. You could use GraphQL in a bridge to it. Or you could even use it on the client’s side. As long as the GraphQL Manager can implement the resolvers that fit the given scenario, you should be able to enjoy the benefits of GraphQL.
One of the purported benefits of GraphQL is increased performance. However, you have to note it applies only in the context where GraphQL is used to implement a Web API, and its performance is compared to that of a strict REST API (not a Web API in general), and also only when the GraphQL Manager carefully implemented the relevant resolvers in a performant way.
Generally speaking, increased performance is not inherent in GraphQL by itself, despite the impression that the documents on GraphQL might give you; it does not “come for free” just by using GraphQL alone. On the contrary, a naïve implementation of resolvers by the GraphQL Manager is likely to give you pretty bad performance.
The following resources helped me understand this issue:
Performance becomes an issue when the nested objects are allowed to be requested and are indeed requested. Let’s say you want a list of movies that match a certain criteria (say, older than 20 years), with the list of performers for each such movie. Nested objects will be returned to such a query, which are a list of movie data, with a list of performers for each movie. In the relational database terminology, there is either a one-to-many or many-to-many relation between the Movie entity and the Performer entity.
If this query is to be executed against a database server with a REST API, there will be N+1 sub-queries.
- One query is sent to the
/api/movies
endpoint, for example, to get the list of N movies
- One query is sent to the
/api/performer/id
endpoint for each performer who appeared in each of the N movies above
If the API is implemented in the strict sense of REST, then this is inevitable by its very definition. This is exactly what the REST architecture prescribes. In contrast, there is no such limitation to implementing a GraphQL-based Web API, so you can just provide one endpoint for handling all the GraphQL queries.
In GraphQL, a resolver is implemented for each field of an object type. In the scenario above, if resolvers for [Movie]
and Performer(id)
are implemented naively, then the there will also be N+1 resolver function invocations and the performance will be just as bad. However, unlike with the REST API whose architectural design decisions make this inescapable, there is room for performance tuning for GraphQL.
I do not know what kind of performance tuning methods are there, but memoization is definitely one of them. But more importantly, since this is such a common occurrence with GraphQL, there are already libraries to mitigate the situation. These include DataLoader (originally from Facebook, just like GraphQL is) and Apollo Data sources (see “ReactとApolloを使ってGithub GraphQL APIを試してみる – Qiita,” “世のフロントエンドエンジニアにApollo Clientを布教したい – Qiita” ). I have not looked into these so I do not know what exactly they do.
Addendum: Apparently I was not the only one to feel odd about that recommended design pattern not being automated. gajus/graphql-lazyloader: GraphQL directive that adds Object-level data resolvers addresses it. Also, “Moving OkCupid from REST to GraphQL” seems interesting.
For application development with React (also from Facebook), there is Relay — the production-ready GraphQL client for React.. A Japanese document explains using Relay with React and GraphQL. It deals with pagination and a similar discussion can be found in “Pagination | GraphQL.”
My Small Grievance about GraphQL’s Notation
In GraphQL’s schema definition, you can define object types and query types, among other things. An object type can have multiple fields (or “slots” in the Lisp nomenclature), and it means the said type is an aggregate of those fields. A query type definition seems to follow exactly the same format with an object type definition except the “query” and “type” keywords. However, the included fields of a query type do not constitute just one pattern of query as a whole.
Take the following query type for example (taken from “Schema basics – Apollo Server – Apollo GraphQL Docs” ):
type Query {
books: [Book]
authors: [Author]
}
This query type definition does not mandate that every query be always one for a list of books and a list of authors at the same time. Rather, it means you can query for a list of books or a list of authors, or both. This goes contrary to the naïve aggregate assumption you might get from object type definitions — I certainly did! I do not think this was explained this way in any of the resources I referenced. It took me a long while to realize this.
Each field of a query type definition is an entry point of query, for which the GraphQL Manager has to provide a resolver, as we saw earlier.