Performance Testing a GraphQL API

This is a guest post by Bob Reselman.

GraphQL is an API architecture that was created and used at Facebook starting in 2012 in order to improve the way it delivered its News Feed data. The technology has enjoyed significant adoption since going open source in 2015, and today GraphQL is powering APIs from companies such as Atlassian, Credit Karma, GitHub, Intuit, KLM, Pinterest, Shopify, The New York Times, WordPress and Yelp.

Facebook created GraphQL as a specification that has been implemented in a number of technologies, such as Node.js, Ruby, Python, PHP, Java, and C#.

GraphQL is a transformational technology. It requires a new understanding of API design as well as API testing, particularly in terms of performance testing.

Addressing Static Data Structures and Recursive Queries

GraphQL attempts to address two sets of issues developers encounter when using an API: the issue created by the immutable nature of data structures typically returned by an API, and the issue of recursive queries. Recursive queries are continuous calls back to an API in order to create a data set that is useful. Let’s take a look at an example scenario illustrating these issues.

Imagine that I have to implement an application feature that allows users to search for books according to a particular author and then publish the details of that author’s books. In this case, I’ll use the public API published by Open Library. The Open Library API is REST(ful), so in order to get the book information according to a particular author, I query Open Library’s search API by creating a URL, like so:

http://openlibrary.org/search.json?author=charles+bukowski

This will return book data related to the author Charles Bukowski. An excerpt of the response is shown below:

Sample listing: A REST API typically returns data according to a predefined data structure

Notice that the API returns a lot of data. Some of the data is directly relevant to the books the author has written, but there’s also data that’s extraneous to books.

The data points that are interesting to me are the titles of the books and their associated ISBN numbers. When I dig around the response, I can indeed find a title and ISBN number for each book. (I’ve identified interesting information by highlighting it with a red border above.) Once I have an ISBN number, I can use the books section of the Open Library API to get the details of the particular book.

I create a URL like so:

https://openlibrary.org/api/books?bibkeys=ISBN:0876850875

The API responds with a JSON structure shown below:

{
    "ISBN:0876850875": {
        "bib_key": "ISBN:0876850875",
        "preview": "noview",
        "preview_url": "https://openlibrary.org/books/OL17228140M/Post_office",
        "info_url": "https://openlibrary.org/books/OL17228140M/Post_office"
    }
}

Sometimes an API will return a data point that is a URL that describes another API call

Notice that the sample listing above returns a JSON structure that has both simple string data as well as some URLs. These URLs describe calls I can make back to the API to get even more information. I submit to the API the URL described by the field info_url in the sample listing:

https://openlibrary.org/books/OL17228140M/Post_office

The response I get back is the HTML for an actual webpage that contains all the book’s details, as shown below:

An API can return complex data in any data format, even HTML

In order to get the information I needed to satisfy my feature’s requirements, I had to make three trips to the API — first, to get books by an author, then to get book details according to ISBN address, and then to get all of the book’s details using the URL provided by the second call. And, as a result of each call, not only did I get back information of interest, but I also received a lot of information that was of no interest to me whatsoever.

Instead of doing all this work, wouldn’t it be cool if I could describe the information I want and then just make one query to get what I want, as I want it?

Enter GraphQL

GraphQL is designed to allow developers to create queries that describe a response’s data structures in terms of the object graph represented by the API. Then, once a query is submitted to the GraphQL API, the response is returned according to fields defined. Below is an example of such a query:

query{
  search(author: "charles bukowski") {
    books {
      title
      isbn
      num_of_pages
      opening_paragraph
      cover_art
    }
  }
}

This query is written in the GraphQL query language. The meaning of the query is, Search for the books written by the author Charles Bukowski. For each book returned, show the title of the book, its ISBN number, the number of pages in the book, the opening paragraph, and the cover art.

Every field returned by the query is custom to the object graph behind the API. Also, the name of the query, search, is custom. GraphQL allows distinct naming of queries and mutations as long as the name is not one of the GraphQL reserve words. (A mutation is a query that alters the data stored by the APIs.)

What is an object graph?

An object graph is a structure in discrete mathematics that describes things and the relationship between things. The formal term for a thing is a node. The term that describes a relationship between two things is an edge.

In a traditional relational database, data is organized in tables, which are structured using rows and columns. NoSQL databases store data in a document, and the data relationships within are described in a multi-level outline format, similar to the way a scholar organizes content in a book or academic paper.

An object graph, on the other hand, offers the flexibility to easily describe a multitude of relationships between a multitude of nodes, as shown in the figure below.

As you can see, while using the REST API to get book information took multiple trips over the network back to the server to get the data needed to satisfy the requirement at hand, GraphQL required only a single query to accomplish the same goal. And GraphQL returned only the information as described in the query, with no additional overhead. I could change the structure of data in the response by altering the fields described in the GraphQL query. In short, not only was the data exchange reduced to one trip to the network, the amount of data funneled back to the client was considerably less when using GraphQL as opposed to REST. And, while the data structure in a REST response is immutable, GraphQL data structures are flexible.

Yet, while things might seem all well and good on initial observation, there is an underlying problem that performance testers need to be aware of when considering GraphQL. Unless precautions are taken, there is a significant risk of creating unintended server-side bottlenecks when implementing a GraphQL API. In fact, this problem was identified by the folks at Netflix late in 2018.

The Hazard of Performance Bottlenecks in GraphQL

Just because GraphQL requires only one trip over the network between the client and server to fulfill a query request, it does not necessarily follow that the number of overall network transactions is reduced. Nor does it mean that performance problems are minimized. Let’s take a look at a scenario that illustrates this potential hazard.

Figure 3 below describes a typical multi-request interaction between a client and a REST API. Notice that the scenario illustrates three calls to a REST API server located in a global region, US CENTRAL, and that each call is routed to a particular application that fulfills the particular request. Each fulfillment application is located in a global region different from the REST API server. In other words, everything is all over the place.

A REST API that routes to distinct fulfillment applications

While the network topology is not optimal, at least the potential for hazard is constrained to the given request. So, if a performance problem is encountered, it’s fairly easy to trace. In fact, the remedy might require nothing more than comparing the URLs in the access logs to routing traffic in the REST API server.

Now, let’s take a look at a scenario in which a GraphQL API is put in place to consolidate data retrieval into a single request. As shown in figure 4 below, a client submits a single query to the GraphQL API server and gets back all the data that’s needed in the query response.

A single GraphQL query can experience poor performance server-side due to increased network latency internally

However, behind the scenes on the server side, the GraphQL server makes several calls over the network to different fulfillment applications in different locations. Intelligence in the GraphQL server aggregates the data from the different data providers in a single response. Thus, a single trip to the network on the client side results in many trips to the network on the server side, and any one of the server-side trips can incur significant network latency. This was the problem that Netflix first encountered when it implemented Monet, an internal project that manages the company’s marketing campaigns on external platforms.

The Netflix engineers consolidated all API activity into a single GraphQL API URL. However, behind the scenes, the GraphQL server was delegating work out to a variety of REST services that resided in different data centers. The performance was not that good. The remedy was to move all the REST servers into the same data center as the GraphQL server. Performance improved significantly due to the close proximity of the servers’ hardware.

The Key to Performance Testing a GraphQL API Is Adequate Preparation

Netflix was fortunate to be able to identify and remedy the problem in a timely manner. However, others might not be so lucky unless certain precautions are made when planning performance analysis. It has to do with the way that calls are made from a client to GraphQL.

Unlike REST, in which a request is made to a distinct URL that represents a particular resource, clients submit a query to a GraphQL API via an HTTP POST to a single URL that exposes GraphQL API. The internals in the GraphQL API server do the subsequent route forwarding. Thus, tracing the complete route a request takes through the server-side infrastructure can be hard unless preparations are made.

When bottlenecks are suspected, you simply cannot rely on correlating information in HTTP access logs with other tracing information to determine a root cause. At the least, you’ll need to provide a correlation ID in the query request to the GraphQL API as a key by which to trace the request’s behavior as it executes throughout the API.

The important thing to understand is that monitoring performance behavior in a GraphQL API is not as simple as measuring the timespan between client-side request and response. Not only can performance bottlenecks occur anywhere on the server side, but monitoring performance can be particularly vexing because request routing can be obscure. Precautionary preparation is required. What those precautions need to be will be special to the given GraphQL implementation. The key to test preparation is to understand the GraphQL implementation in detail and then to make sure that comprehensive monitoring that can detect instances of performance bottleneck is in force.

Putting It All Together

GraphQL is growing in popularity as a way for companies to publish APIs. The technology provides a concise yet flexible approach to data management. While using a REST API can require a client to make multiple trips over the network to get the data required to fulfill a particular need, GraphQL allows clients to make a single trip to the network. GraphQL also allows clients to declare the data structure that will exactly meet the need at hand. This is an advantage over using a REST because data structures in a REST API are predefined and immutable.

However, GraphQL is not a panacea that is to going to eliminate every problem associated with API implementation. Performance bottlenecks can happen, and the root causes can be hard to identify. Thus, it’s important that test practitioners work closely with the GraphQL API developers to put proper tracing mechanisms in place and to create performance tests that go beyond measuring simple request and response time spans between client and server. Every part of a GraphQL implementation needs to be subject to performance analysis.

GraphQL has a lot to offer, but in order to make sure that it’s a technology that will be effective in the enterprise, planning and precautions must be taken.

Bob Reselman is a nationally-known software developer, system architect, industry analyst, and technical writer/journalist. Bob has written many books on computer programming and dozens of articles about topics related to software development technologies and techniques, as well as the culture of software development. Bob is a former Principal Consultant for Cap Gemini and Platform Architect for the computer manufacturer, Gateway. Bob lives in Los Angeles. In addition to his software development and testing activities, Bob is in the process of writing a book about the impact of automation on human employment. He lives in Los Angeles and can be reached on LinkedIn at www.linkedin.com/in/bobreselman.

Performance Testing a GraphQL API

Addressing Static Data Structures and Recursive Queries

Sample listing: A REST API typically returns data according to a predefined data structure

Sometimes an API will return a data point that is a URL that describes another API call

An API can return complex data in any data format, even HTML

Enter GraphQL

The Hazard of Performance Bottlenecks in GraphQL

A REST API that routes to distinct fulfillment applications

A single GraphQL query can experience poor performance server-side due to increased network latency internally

The Key to Performance Testing a GraphQL API Is Adequate Preparation

Putting It All Together

In This Article:

Sign up for our newsletter

Share this article

Other Blogs

How to Identify, Fix, and Prevent Flaky Tests

Test Planning: A Comprehensive Guide for Success

Managing Distributed QA Teams