47 Degrees joins forces with Xebia read more

Using Fetch for optimizing GraphQL query execution

Using Fetch for optimizing GraphQL query execution

This is the second article in a series on the Fetch library (as of publication, v1.0.0). Fetch is an open source library for simple and efficient data access for Scala and Scala.js.

Previously, we covered how to use Fetch to optimize requests to GitHub’s REST API, however, the next iteration of the Github API (in development) uses GraphQL. This is to give the clients a query language that allows a user to acquire all their requested data in one attempt. Many people have found that proliferating HTTP endpoints can be difficult, especially when nesting and joining data from multiple endpoints; GraphQL attempts to solve this by providing a typed specification of the API together with a query language.

So, what’s the point of having a library like Fetch when we can avoid issues with traditional REST APIs using GraphQL? Well, it turns out that the Fetch execution model is perfect for interpreting GraphQL queries. In this article, I’m going to show an example of a GraphQL query interpreter that uses Fetch to execute and optimize queries.

GraphQL query

We’ll start by writing a GraphQL query that pulls the data we want from GitHub:

  • Given an organization, provide all of the repositories, and for each of these return:
    • The languages in which the repository is written.
    • The collaborators of the repository.

It looks like the following:

query {
  organization(login:"47deg") {
    repositories(first: 100){
      name,
      languages,
      collaborators
    }
  }
}

Let’s break it down:

  • We start by querying the "47deg" organization.
  • For this organization, we look at the first 100 repositories.
  • For each repository, we are interested in the name, languages, and collaborators.

So how do we represent this in Scala? We’ll write data structures that model GraphQL queries like the one shown above.

case class OrganizationQuery(org: String, repos: Option[RepositoriesQuery])
case class RepositoriesQuery(
    n: Int,
    name: Option[Unit] = None,
    languages: Option[LanguagesQuery] = None,
    collaborators: Option[CollaboratorsQuery] = None
)
case class LanguagesQuery()
case class CollaboratorsQuery()

An organization query gets its name and optionally queries its repositories. When querying repositories, we have the number n of repos to fetch, and optionally the names, languages, and collaborators.

We can now model the data structures resulting from the execution of such a query. We’ll have an Organization type for representing orgs, Repo for representing repositories, and Project, which is an aggregate of a repository and its metadata:

case class Organization(org: String, projects: List[Project])
case class Repo(name: String)
case class Project(name: Option[String], languages: List[String], collaborators: List[String])

An organization has a name and a list of Projects, each of which has an optional name, and zero or more languages/collaborators.

Parsing

How do we go from a GraphQL query to an OrganizationQuery? Before anything else, we’ll need to parse the GraphQL queries and convert them to our data structure. We’ll use the atto library for implementing a friendly little parser for our GraphQL query.

Showing how to use Atto to parse GraphQL queries is out of the scope of this article, but you can take a look at the code in case you are interested. Let’s assume that we have an Atto parser for OrganizationQuery that we’ll call queryParser:

def queryParser: Parser[OrganizationQuery]

And we get accurate results depending on the query, see how all the query metadata and subqueries are represented in the RepositoriesQuery result:

queryParser.parseOnly("""
  query {
    organization(login:"47deg") {
      repositories(first: 100){
        name,
        languages,
        collaborators
      }
    }
  }
""")
// Done(,OrganizationQuery(47deg,Some(RepositoriesQuery(100,Some(()),Some(LanguagesQuery()),Some(CollaboratorsQuery())))))

We could restrict the query to only one repository, and the parser picks that up:

queryParser.parseOnly("""
  query {
    organization(login:"47deg") {
      repositories(first: 1){
        name,
        languages,
        collaborators
      }
    }
  }
""")
// Done(,OrganizationQuery(47deg,Some(RepositoriesQuery(1,Some(()),Some(LanguagesQuery()),Some(CollaboratorsQuery())))))

Another option would be only fetching languages or collaborators:

queryParser.parseOnly("""
  query {
    organization(login:"47deg") {
      repositories(first: 100){
        languages
      }
    }
  }
""")
// Done(,OrganizationQuery(47deg,Some(RepositoriesQuery(100,None,Some(LanguagesQuery()),None))))


queryParser.parseOnly("""
  query {
    organization(login:"47deg") {
      repositories(first: 100){
        collaborators
      }
    }
  }
""")
// Done(,OrganizationQuery(47deg,Some(RepositoriesQuery(100,None,None,Some(CollaboratorsQuery())))))

Interpreter

Now that we can parse queries we are ready to write our GraphQL interpreter, which will consist of a function that takes an OrganizationQuery and gives us a Fetch[F, Organization] that we can then execute.

Let’s start by writing this type of function; we’ll call it fetchOrg:

def fetchOrg[F[_]: ConcurrentEffect](q: OrganizationQuery): Fetch[F, Organization] =
  q.repos match {
    case None    => Fetch.pure(Organization(q.org, List()))
    case Some(r) => fetchRepos(q.org, r).map(rs => Organization(q.org, rs))
  }

It should be pretty straightforward:

  • If we aren’t querying the organization’s repos, we just create an Organization with no projects.
  • If we need the organization’s repos, we fetch them using fetchRepos, and then we build the resulting Organization.

What would fetchRepos look like? It’s a bit more involved than the previous function since it has to take into account the subqueries, but it boils down to executing the subqueries if needed, and combining the results into a list of Projects:

private def fetchRepos[F[_]: ConcurrentEffect](
    org: String,
    q: RepositoriesQuery
): Fetch[F, List[Project]] = q match {
  case RepositoriesQuery(n, name, Some(_), Some(_)) =>
    for {
      repos <- Repos.fetch(org)
      projects <- repos
        .take(n)
        .traverse(repo =>
          (Languages.fetch(repo), Collaborators.fetch(repo)).mapN {
            case (ls, cs) => Project(name >> Some(repo.name), ls, cs)
        })
    } yield projects

  case RepositoriesQuery(n, name, Some(_), None) =>
    for {
      repos <- Repos.fetch(org)
      projects <- repos.take(n).traverse(r => {
        Languages.fetch(r).map(ls => Project(name >> Some(r.name), ls, List()))
      })
    } yield projects

  case RepositoriesQuery(n, name, None, Some(_)) =>
    for {
      repos <- Repos.fetch(org)
      projects <- repos.take(n).traverse(r => {
        Collaborators.fetch(r).map(cs => Project(name >> Some(r.name), List(), cs))
      })
    } yield projects

  case RepositoriesQuery(n, name, Some(_), Some(_)) =>
    for {
      repos <- Repos.fetch(org)
      projects <- repos
        .take(n)
        .traverse(repo =>
          (Languages.fetch(repo), Collaborators.fetch(repo)).tupled.map {
            case (ls, cs) => Project(name >> Some(repo.name), ls, cs)
        })
    } yield projects
}

Let’s take a look at these cases one by one and see what’s going on. For the first case, we’re not interested in languages or collaborators, so we can satisfy the query by just fetching the repositories.

  case RepositoriesQuery(n, name, None, None) =>
    Repos.fetch(org)
         .map(_.take(n))
         .map(_.map(r => Project(name >> Some(r.name), List(), List())))

In the second case, we’re only interested in the repository languages and not collaborators so, after fetching the repositories we traverse over them and get their languages. Using traverse here will allow Fetch to batch the request to fetch languages since it knows they are independent.

  case RepositoriesQuery(n, name, Some(_), None) =>
    for {
      repos <- Repos.fetch(org)
      projects <- repos.take(n).traverse(r => {
        Languages.fetch(r).map(ls => Project(name >> Some(r.name), ls, List()))
      })
    } yield projects

The next case is similar to the previous example, except this time, we are only interested in the collaborators and not the languages.

  case RepositoriesQuery(n, name, None, Some(_)) =>
    for {
      repos <- Repos.fetch(org)
      projects <- repos.take(n).traverse(r => {
        Collaborators.fetch(r).map(cs => Project(name >> Some(r.name), List(), cs))
      })
    } yield projects

Finally, whenever we are interested in both languages and collaborators, we’ll fetch the repositories first and then traverse over them, simultaneously getting the repository languages and collaborators.

  case RepositoriesQuery(n, name, Some(_), Some(_)) =>
    for {
      repos <- Repos.fetch(org)
      projects <- repos
        .take(n)
        .traverse(repo =>
          (Languages.fetch(repo), Collaborators.fetch(repo)).mapN {
            case (ls, cs) => Project(name >> Some(repo.name), ls, cs)
        })
    } yield projects

Running queries

Now that we can turn OrganizationQuery into Fetch[F, Organization] we can start running some queries. We’ll use this function for convenience:

def runQuery[F[_]: ConcurrentEffect](q: String): Fetch[F, Organization] =
  queryParser.parseOnly(q) match {
    case ParseResult.Done(_, query) => fetchOrg[F](query)
    case _                          => Fetch.error(new Exception("Oh noes"))
  }

We’ll use Fetch’s debugging facilities to visualize the execution of the queries and interpret the query we saw at the beginning of the article:

val io = Fetch.runLog[IO](runQuery("""
  query {
    organization(login:"47deg") {
      repositories(first: 100){
        name,
        languages,
        collaborators
      }
    }
  }
"""))

val (log, result) = io.unsafeRunSync

println(result)
// Organization(47deg,List(Project(Some(fetch),List(scala),List(Peter, Ale)), Project(Some(arrow),List(kotlin),List(Raul, Paco, Simon))))

println(describe(log))
// Fetch execution 🕛 0.01 seconds
//
//  [Round 1] 🕛 0.00 seconds
//    [Fetch one] From `Repos` with id 47deg 🕛 0.00 seconds
//  [Round 2] 🕛 0.00 seconds
//    [Batch] From `Collaborators` with ids List(Repo(fetch), Repo(arrow)) 🕛 0.00 seconds
//    [Batch] From `Languages` with ids List(Repo(fetch), Repo(arrow)) 🕛 0.00 seconds

As you can see on the output from fetch.debug.describe, our query executed in two rounds of execution:

  • The first round provided the repositories for the "47deg" organization.
  • The second round performs two batched requests in parallel:
    • A request to get the collaborators from the repos.
    • A request to get the languages from the repos.

If we aren’t interested in languages or collaborators, we won’t request any data from those sources:

val io = Fetch.runLog[IO](runQuery("""
  query {
    organization(login:"47deg") {
      repositories(first: 100){
        name,
        languages
      }
    }
  }
"""))

val (log, result) = io.unsafeRunSync

println(result)
// Organization(47deg,List(Project(None,List(scala),List()), Project(None,List(kotlin),List())))

println(describe(log))
// Fetch execution 🕛 0.01 seconds
//
//  [Round 1] 🕛 0.00 seconds
//    [Fetch one] From `Repos` with id 47deg 🕛 0.00 seconds
//  [Round 2] 🕛 0.00 seconds
//    [Batch] From `Languages` with ids List(Repo(fetch), Repo(arrow)) 🕛 0.00 seconds

Furthermore, if we aren’t interested in languages or collaborators, the Fetch will execute in one round:

val io = Fetch.runLog[IO](runQuery("""
  query {
    organization(login:"47deg") {
      repositories(first: 100){
        name
      }
    }
  }
"""))

val (log, result) = io.unsafeRunSync

println(result)
// Organization(47deg,List(Project(Some(fetch),List(),List()), Project(Some(arrow),List(),List())))

println(describe(log))
// Fetch execution 🕛 0.00 seconds
//
//  [Round 1] 🕛 0.00 seconds
//    [Fetch one] From `Repos` with id 47deg 🕛 0.00 seconds

Summary

We’ve shown that Fetch’s execution model can be used for executing GraphQL queries while optimizing them without requiring much effort thanks to a couple of Scala libraries. We’ve intentionally omitted specifying a GraphQL schema and have simplified the query syntax for joined collections for simplicity’s sake.

Since the schema of a GraphQL API is known, we could have generated the code for parsing queries and running them into a Fetch fairly easily, and we intend to explore that in a future post or library. Until then you can find a GraphQL implementation in Scala called Sangria.

Have questions or comments? Join us in the Fetch Gitter channel. Fetch is made possible by an awesome group of contributors. As with all of the open source projects under the 47 Degrees umbrella, we’re always looking for new contributors and we’re happy to help guide you towards your first contribution.

Ensure the success of your project

47 Degrees can work with you to help manage the risks of technology evolution, develop a team of top-tier engaged developers, improve productivity, lower maintenance cost, increase hardware utilization, and improve product quality; all while using the best technologies.