

A Principled Approach to
Consolidating a Data Graph
GraphQL at Enterprise Scale
A Principled Approach to Consolidating a Data Graph
Je Hampton
Michael Watson
Mandi Wise
GraphQL at Enterprise Scale
Copyright © 2020 Apollo Graph, Inc.
Published by Apollo Graph, Inc.
https://www.apollographql.com/
All rights reserved. No part of this book may be reproduced in any form on by
an electronic or mechanical means, including information storage and retrieval
systems, without permission in writing from the publisher. You may copy and
use this document for your internal, reference purposes. You may modify this
document for your internal, reference purposes
This document is provided as-is”. Information and views expressed in this
document may change without notice. While the advice and information in this
document is believed to be true and accurate at the date of publication, the
publisher and the authors assume no legal responsibility for errors or omissions,
or for damages resulting from the use of the information contained herein.
Revision History for the First Edition
2020-09-11: First Release
2020-10-27: Second Release
2020-12-10: Third Release
2021-04-26: Fourth Release
Contents
The Team v
Preface vi
Who Should Read this Guide . . . . . . . . . . . . . . . . . . . . . . vi
What You’ll Learn from this Guide . . . . . . . . . . . . . . . . . . . vii
How to Contact Us . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Moving Toward GraphQL Consolidation 1
Why Consolidate Your Data Graph? . . . . . . . . . . . . . . . . . . . 1
What Does a Consolidated Data Graph Look Like? . . . . . . . . . . . 8
When to Consolidate Your Data Graph . . . . . . . . . . . . . . . . . 9
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Graph Champions in the Enterprise 15
The Graph Champion and Graph Administration . . . . . . . . . . . . 15
Delivering Organizational Excellence as a Graph Champion . . . . . . 19
Education To Support Organizational Change . . . . . . . . . . . . . 21
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Consolidated Architectures with Federation 22
A Better Way to Scale Distributed GraphQL Architectures . . . . . . . . 22
Subgraphs and the Gateway . . . . . . . . . . . . . . . . . . . . . . 25
Connecting the Data Graph with Entities . . . . . . . . . . . . . . . . 27
Defining Shared Types and Custom Directives . . . . . . . . . . . . . 31
Managed Federation . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Federated Schema Design Best Practices 35
Best Practice #1: Design Schemas in a Demand-Oriented, Abstract Way 36
Best Practice #2: Prioritize Schema Expressiveness . . . . . . . . . . . 39
Best Practice #3: Make Intentional Choices About Nullability . . . . . . 42
Best Practice #4: Use Abstract Type Judiciously . . . . . . . . . . . . 44
iii
iv Contents
Best Practice #5: Leverage SDL and Tooling to Manage Deprecations . . 47
Best Practice #6: Handle Errors in a Client-Friendly Way . . . . . . . . 48
Best Practice #7: Manage Cross-Cutting Concerns Carefully . . . . . . 53
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Graph Administration in the Enterprise 56
Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Tooling for Data Graph Contributors . . . . . . . . . . . . . . . . . . 59
Tooling for Data Graph Consumers . . . . . . . . . . . . . . . . . . . 62
Observability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Appendix A: Federation Case Studies 71
Appendix B: GraphQL and Apollo Learning Resources 73
The Team
This guide is the culmination of thousands of hours Apollos employees have
spent working with and learning from our customers over the years.
These are the members of the Apollo team who have made contributions to the
content in this guide:
Je Hampton Writing
Michael Watson Writing
Mandi Wise Writing/Editing
v
Preface
The
data graph
has quickly established itself as an essential layer of the mod-
ern application development stack. In tandem, GraphQL has become the de
facto technology for managing this new layer with its enticing promise to bring
together all of an organization’s app data and services coherently in one place.
And thanks to the wellspring of experimentation and innovation with GraphQL
over the years, it has proven itself a mature and capable technology that’s ready
for scalability.
GraphQL makes its way into an enterprise’s tech stack through a variety of
avenues, for instance, a single team eager to leverage its client-driven approach
to data fetching. However, as its adoption spreads realizing GraphQLs promise
at scale requires coordination and consolidation of these eorts across teams.
At Apollo, we’ve had the opportunity to work with countless developers in a
wide range of enterprises over the years. Through that work, we’ve learned
that a unified, federated data graph is at the heart of any successful GraphQL
consolidation project. We first shared some of these insights in Principled
GraphQL where we outlined best practices that organizations can follow to
create, maintain, and operate a data graph as eectively as possible. In this
guide, we’ll provide a detailed road map for putting these principles into action
at the enterprise level.
Who Should Read this Guide
This guide is for engineering leaders.
If your enterprise is currently using
GraphQL, then you have undoubtedly experienced challenges related to main-
taining a monolithic data graph or wrangling multiple smaller graphs. Consol-
idating GraphQL in your organization can help reduce friction points between
teams, enhance developer experience, improve governance of your graph, and
even provide better observability of how your data is consumed.
vi
What You’ll Learn from this Guide vii
This guide is for business leaders.
Consolidating your data graph isn’t just
about the architecture of your tech stack. It’s about an organizational transfor-
mation that will harness the power of graphs to unlock platform value. A unified
data graph increasingly lives at the center of value delivery in an enterprise and
the strategies and tactics presented in this guide provide a pathway to realizing
the potential of your data-graph-as-a-product.
This guide is for developers and architects.
Whether you’re a developer on a
client team or actively maintaining a GraphQL server in production now, the
concepts outlined in this guide will give you a clearer understanding of how
your work can align to your organization’s broader GraphQL strategy and even
become a “Graph Champion” on your team.
What You’ll Learn from this Guide
This guide is the culmination of what we’ve learned aer spending thousands
of hours working with enterprises at Apollo. Based on those experiences, we’ll
cover both the high-level considerations and the practical skills required to
successfully consolidate a data graph across an enterprise.
We’ll first present a case for why GraphQL consolidation is important in an
enterprise and provide a framework for assessing when an enterprise should
undertake a consolidation project. Subsequently, we’ll move into the specifics of
successful enterprise-level graph management and discuss the essential role of
the graph champion in a consolidated GraphQL architecture.
Ultimately, this guide has been written for you by Apollo to help guide you on
your journey toward eectively scaling your data graph across your enterprise.
It’s intended to be a living document and the solutions team will add additional
content to it on an ongoing basis in future releases.
You can check for update and download the latest version of the guide here:
http://apollographql.com/guide
How to Contact Us
We’d like to hear from you if you have questions about this guide or have a
unique perspective you’d like share about using GraphQL your organization.
Email us at solutions@apollographql.com to reach out at any time with
your comments or if you require any assistance implementing GraphQL in an
enterprise environment.
Moving Toward GraphQL
Consolidation
By Je Hampton and Michael Watson
This chapter will introduce you to the notion of creating a unified, federated
data graph in an eort to leverage the benefits of a consolidated GraphQL
architecture. Based on our experience working with a variety of enterprises at
Apollo, we’ll provide a rationale for consolidation as well as a framework for
determining whether your organization is ready to consolidate its data graph.
Why Consolidate Your Data Graph?
The GraphQL community and ecosystem of related soware have grown at
breathtaking speed. During the intervening years since its public release in 2015,
this technology has quickly matured to a point where it can be used in nearly any
infrastructure.
Companies such as Airbnb, GitHub, and the New York Times have famously
already adopted GraphQL in their tech stacks. With its strong type system
and declarative approach to data-fetching, it’s easy to see why teams across
enterprises have been eager to embrace the many benefits of GraphQL. At
Apollo, we see firsthand the level of enthusiasm organizations have for GraphQL
with over 1.5 million downloads of the Apollo Client packages every week, along
with hundreds of thousands more weekly downloads of the Apollo Server and
Apollo Federation packages.
Scanning your organization you may quickly realize that multiple teams are
already using GraphQL in production today. Having some top-level insight into
how GraphQL is used across your enterprise is the first step toward understand-
ing whether those eorts can and should be consolidated.
1
2 Moving Toward GraphQL Consolidation
How GraphQL Gains Traction in an Enterprise
When developers begin to experiment with GraphQL, they almost invariably first
encounter a foundational architecture where a client application queries a single
GraphQL server. In turn, the server distributes those requests to backing data
sources and returns the data in the client’s desired shape:
As dierent teams within an enterprise move toward oicially adopting GraphQL,
the complexion of their isolated implementations will usually be adapted from
this basic architecture, but may vary considerably from from team to team. At
Apollo, we’ve typically seen that those initial, unconsolidated eorts resemble
one of the following four patterns.
Pattern 1: Client-Only GraphQL
Client teams that are enthusiastic to reap the benefits of GraphQLs client-centric
data-fetching capabilities may charge ahead and implement a GraphQL API
within the context of their application. With such implementations, these teams
are oen motivated to adopt GraphQL for the convenience of wrapping existing
APIs with a single GraphQL API endpoint.
To illustrate this approach, a client-only GraphQL architecture may look like this:
Pattern 2: Backend for Frontend (BFF)
GraphQL may also be used as a solution for teams implementing the Backend
for Frontend (BFF) pattern. BFF seeks to solve the problem of requiring dierent
Why Consolidate Your Data Graph? 3
clients (for example, web and iOS) to interact with a monolithic, general-purpose
API. Alternatively, BFFs can save client applications from making requests to
multiple backend services to obtain all of the data required to render a particular
user interface view.
As a solution, BFFs add a new layer where each client has a dedicated BFF
service that directly receives the client’s requests and is tightly coupled to that
user experience. For teams creating BFF services, GraphQL can be a natural fit
for building out this intermediary, client-focused layer and adopting this pattern
can be an important first step toward consolidating a data graph.
In practice, the BFF pattern with GraphQL may look like this:
Pattern 3: The Monolith
The monolith pattern can take on two forms in an enterprise. In its first form,
teams may share one codebase for a GraphQL server that is used by one or more
clients. In some cases, client code may even live in the same repository as the
GraphQL server. However the code is organized, the ownership of this graph is
shared by the various developers who ultimately consume the graph’s data.
In its alternative form, a single team may be designated to own a graph that
is accessed by multiple client teams. This team would typically define a set of
standards for the graph and champion its adoption throughout the organization.
As with GraphQL-based BFFs, maintaining a single, monolithic GraphQL API
can help set the stage for eective consolidation of an organization’s GraphQL-
focused eorts.
For either monolithic scenario, its high-level architecture looks like this:
4 Moving Toward GraphQL Consolidation
Pattens 4: Multiple Overlapping Graphs
Enterprise teams may also independently develop their own service-specific
GraphQL APIs in tandem. With this approach, teams may delineate each service
API based on types or use cases, but there will oen be overlap between the
graphs due to the interconnected nature of data.
Such an architecture may look like this:
Where Do These Patterns Break Down?
Aer taking stock of who uses GraphQL and how in your enterprise, the patterns
the various teams have implemented can provide insight into what kinds of
problems they initially endeavored to solve. Similarly, these choices can help il-
luminate what pain points the teams currently face with respect to how GraphQL
is used in their tech stacks.
Why Consolidate Your Data Graph? 5
Client-Only GraphQL
Teams that opt for client-only GraphQL approaches are motivated to improve
their client development experience by layering GraphQL on top of the REST
endpoints or other legacy APIs they have to work with. And while improved de-
veloper experience is a win, beneath this abstraction the client application will
still incur performance costs as it maintains responsibility for making multiple
requests to various services to gather all of the data required to render a view.
BFFs
Like client-only approaches, teams that use GraphQL with BFFs enjoy the advan-
tage of improved developer experience by way of a consumer-friendly GraphQL
API, but they also manage to overcome the performance issues incurred by
client-only approaches. BFFs accomplish this by providing a unified interface for
a client application to send its requests while also handling the heavy liing of
querying multiple backend services on behalf of the client.
However, there is an inherent tradeo in building and maintaining BFFs. When
every client team is empowered to create a BFF to suit their needs, there will be
inevitable duplication of eort across those teams. However, where BFFs are
shared between seemingly similar clients in an eort to reduce duplication, then
the GraphQL schema contained within can balloon in size and become confusing
due to the lack of clear ownership.
Monoliths
The pains that emerge from shared BFFs are only sharpened with monolithic
GraphQL server implementations that have shared ownership. Portions of a
graph may be well-designed to suit the needs of certain client teams only, while
other clients must find workarounds or create overlapping types for their own
use. Correspondingly, standardization becomes an issue because the shape of
the graph evolves myopically on a client-by-client or a feature-by-feature basis.
Even in scenarios where a dedicated server team maintains ownership of the
graph challenges quickly arise when more than one graph definition is required
for a single product in order to support the needs of multiple clients. A server
team may also find itself burdened with the task of building and maintaining the
necessary tooling to evolve the schema over time to meet new product needs
without breaking compatibility for any clients that are actively consuming data
from the graph.
6 Moving Toward GraphQL Consolidation
Multiple Overlapping Graphs
Finally, when multiple graphs exist within an enterprise it oen indicates that
the organization was an early adopter of GraphQL, moved to production quickly,
and invested more in GraphQL as time went on. As one potential outcome of
this investment, an attempt to expand a monolithic GraphQL API across teams
may have ultimately resulted in the graph being split into multiple pieces to
accommodate the conflicting needs of each team. The inevitable result of this
approach is a duplication of eort to manage these two overlapping graphs
and a subpar experience for client applications that no longer have a unified
interface from which to request data.
Another possible reason an enterprise may have multiple overlapping graphs
stems from a deliberate choice for teams to manage their GraphQL APIs inde-
pendently but assemble them into a single gateway API using schema stitching.
While schema stitching can simplify API usage from a client’s perspective, the
gateway API requires a considerable amount of imperative code to implement.
What’s more, it may not always be clear-cut where to split types across services
and it also necessitates the designation of an API gatekeeper who will manage
the gateway and how the underlying schemas are composed into it.
Inconsistency: The Common Shortcoming
All of the previous patterns—whether client-only GraphQL, BFFs, monoliths,
or multiple overlapping graphs—also have a shared shortcoming in that their
implementations result in a
lack of consistency.
A more productive way for-
ward for teams searching for better eiciency and understandability from their
GraphQL-based architectures will have two requirements:
1. Consumers should be able to expect consistency in how they fetch
data.
A single endpoint should be exposed to client applications and,
regardless of what underlying services supply the data, clients should be
able to use consistent workflows to consume the data.
2. Providers should consistently represent common entities in a
consumption-friendly way.
Teams may be empowered to use any
underlying technology at the data layer, but access to this data should
be consolidated through the GraphQL API and exposed in a way that
compliments client use cases. Additionally, teams should be able to
delineate service boundaries based on separation of concerns (as opposed
to separation by types) without interfering with each other.
Why Consolidate Your Data Graph? 7
How Consolidation Addresses These Challenges
Consolidating your data graph is the key to moving beyond these architectural
pitfalls, achieving consistency, and realizing the full potential of GraphQL in an
enterprise.
At a fundamental level, moving toward graph consolidation requires that your
organization has
one unified graph
instead of multiple graphs created and
managed by each team. However, the implementation of that unified graph
should be
federated across multiple teams
. These are the first two “integrity
principles” outlined in Principled GraphQL.
Specifically, moving toward this kind of consolidated data graph allows teams
across the enterprise to:
Scale GraphQL APIs eectively.
Implementing uniform practices al-
low the benefits of GraphQL to be realized at scale in an organization.
For example, teams will have a better understanding of the workflows
and policies that they must follow to make contributions to the graph.
Similarly, they will also benefit from improved standardization when
consuming data from the organization’s graph.
Obtain a unified view of your data.
Your graph is a representation of the
data of your product. Having a consolidated view of this data will provide
you with fresh perspective into how that data is currently used, while also
inspiring new creative uses for it in the future. Additionally, it will help you
to enforce a measure of consistency on how client applications consume
that data.
Leverage existing infrastructure.
GraphQL consolidation allows teams
to reuse existing infrastructure in an organization and help eliminate
duplicated eorts where teams interact with data. Consolidation also
allows you to take a holistic view of the practices and tooling developed by
each team that touches your data graph and leverages the best of those
individual eorts across the enterprise as a whole.
Ship code faster.
Organizations adopt GraphQL to build and iterate on
their products faster. As GraphQL gains traction throughout an enterprise,
these benefits may be partially oset by time spent developing tooling to
help support that growth. Consolidation helps reclaim that lost momen-
tum by providing a clearly defined set of practices for teams follow when
contributing to or consuming data from the graph.
8 Moving Toward GraphQL Consolidation
What Does a Consolidated Data Graph Look Like?
In practice, a consolidated, federation-driven GraphQL architecture consists of:
A collection of
subgraph
services that each define a distinct GraphQL
schema
A
gateway
that composes the distinct schemas into a
federated data
graph and executes queries across the services in the graph
Apollo Server provides open source libraries that allow it to act both as a
subgraph and as a gateway, but these components can be implemented
in any language and framework. Specifically, Apollo Server supports
federation via two open-source extension libraries:
@apollo/federation
provides primitives that your subgraphs use
to make their individual GraphQL schemas composable
@apollo/gateway
enables you to set up an instance of Apollo
Server as a gateway that distributes incoming GraphQL operations
across one or more subgraphs
We will cover consolidated GraphQL architectures using Apollo Federation
and Apollo Gateway in-depth in Chapter 3.
Unlike other distributed GraphQL architectures such as schema stitching, fed-
eration uses a declarative programming model that enables each subgraph to
implement only the part of your data graph that for which it’s responsible. With
this approach, your organization can represent an enterprise-scale data graph as
a collection of separately maintained GraphQL services. What’s more, schema
composition in federation is based on GraphQL primitives, unlike the imperative,
implementation-specific approach required by schema-stitching.
When to Consolidate Your Data Graph 9
Core Principles of Federation
A GraphQL architecture that has been consolidated with federation will adhere
to these two core principles:
Incremental Adoption
If you currently use a monolithic GraphQL server, then you can break its func-
tionality out one service at a time. If you currently use a dierent architecture
like schema stitching, then you can add federation support to your existing
subgraphs one at a time. In both of these cases, all of your clients will continue
to work throughout your incremental migration. In fact, clients have no way to
distinguish between these dierent data graph implementations.
Separation of Concerns
Federation encourages a design principle called separation of concerns. This
enables dierent teams to work on dierent products and features within a
single data graph, without interfering with each other.
By contrast, traditional approaches to developing distributed GraphQL architec-
tures oen lead to
type-based separation
when splitting that schema across
multiple services. While it may initially seem straightforward to divide a schema
by type, issues quickly arise because features (or concerns) managed by one
service oen span across multiple types that are located in other services.
By instead
referencing
and
extending
types across services,
concern-based
separation
oers the best of both worlds: an implementation that keeps all
the code for a given feature in a single service and separated from unrelated
concerns, and a product-centric schema with rich types that reflects the natural
way an application developer would want to consume the graph.
When to Consolidate Your Data Graph
At this point, you may have a sense that your enterprise could benefit from
consolidating its data graph, so the next important question to answer is when
should it move toward consolidation?
GraphQL, from a pure engineering standpoint, is one means to achieve a com-
mon set of business goals: horizontal scalability, rapid product iteration, and
increased service delivery capacity, and reduced time-to-market. When placed
in the hands of architects and engineering leaders, common questions emerge
about how GraphQL can and will change the organization.
10 Moving Toward GraphQL Consolidation
At a fundamental level, a conversation about consolidation can begin
as soon
as it seems logical for multiple teams to manage dierent parts of the data
graph.
While each organization and line of business may have unique consider-
ations in answering the question of when and how to consolidate, Apollo has
recognized patterns of success and failure when making this organizational
shi. Additionally, any good architect should spend suicient time laying the
groundwork for future change. While it might be tempting to federate “early
and oen, consolidating through federation requires meeting a threshold and
burden of evidence that the enterprise will benefit from this approach.
In the spirit of Principled GraphQL, we present a framework for making this
decision, illuminating the potential gaps in an organization’s success plan, and
ensuring constant success throughout the organization’s GraphQL evolution to a
federated implementation.
With a process in place to answer this question and evaluate the capability of
success, we’ll explore some common scenarios taken from real-world projects
here at Apollo. The real value of GraphQL lies in the hands of those tasked with
its implementation, and organizations of all sizes and shapes face the same
human-centric challenges with more or less success, and with more or less
friction during the process.
To frame this decision-making process, we should first examine the inherent
strengths of implementing or extending a federated data graph.
The Strengths of a Federated Graph Implementation
Just as network performance tuning is bound by the speed-of-light, the organiza-
tional optimizations oered by a federated GraphQL implementation is bound
by some real-world constraints:
Consensus:
A collective understanding of data graph entities, tools, and
quality
Responsibility:
Clear delineation of data graph ownership, education,
and support available to teams
Delivery: Speed of infrastructure change, velocity of product delivery
Performance:
Impact to consumer-facing operation resolution for dis-
tributed operations
At its heart, a federated GraphQL implementation is an optimization toward
separation of concerns (be they performance, team structure, delivery cadence,
line-of-business alignment, or some combination of these) in exchange for a
distributed system. The shi toward microservices also involved this tradeo,
but without a demand-driven, product-delivery orientation.
When to Consolidate Your Data Graph 11
When deciding to break a monolithic graph into a federated one or when expand-
ing a federated graph by adding new services and teams, an architect should
have a plan for addressing the above four areas of concern. The decision ma-
trix below is annotated with each of these concerns and provides guidance in
resolving any gaps in measuring, understanding, and addressing these concerns.
Decision Framework Matrix
Whether you’re adding a new service, splitting an existing service, or choosing to
implement a federated graph for the first time,
an architect’s most important
responsibility is understanding the motivation for the change
. In Apollo’s
experience, a lack of clear and reliable measurements makes it harder to decide
where and when to separate the concerns among graph services.
At a strategic level, GraphQL adoption and evolution to a federated implemen-
tation can be measured reliably using a simple matrix. By answering these
questions periodically, technology leaders will have a continuous evaluation of
when, and how, their GraphQL implementation should proceed.
Our recommendation is to keep this exercise simple and stable. Practition-
ers should use the
Apollo Consolidation Decision Matrix
below as a regular
artifact to aid in a formal decision-making process.
If the answers to
all
of these questions are “yes, then you should proceed to
laying out a clear path to a successful implementation.
If the answers to
any
these questions are unclear or “no, then leaders should
take caution in evolving their GraphQL implementation to federation:
Use each “no” to identify and monitor metrics and indicators that change
is necessary
Approach each “no” with a relentless desire to connect with the team(s)
doing the work and understand how this becomes a “yes”
Apollo Consolidation Decision Matrix
Concern Criterion Yes No Remediation/Guidance
Consensus Are multiple teams
contributing to your graph?
If this is an initial federated
implementation, identify
your ”Graph Champions”
(see the next chapter) and
establish education, review,
and governance processes.
12 Moving Toward GraphQL Consolidation
Concern Criterion Yes No Remediation/Guidance
Responsibility Are contributions to your
graph by multiple teams
regularly causing conflicts
with one another?
If teams are collaborating
well together, consider the
potential switching cost of
diving teams or adding new
teams.
Delivery Is there a measurable
slowdown or downward
trend in GraphQL service
change delivery?
If there isn’t a measurable,
negative impact to product
or service delivery, consider
the additional complexity
and support for this change.
Delivery Is there a concrete security,
performance, or product
development need to deliver
portions of your existing
schema by dierent teams or
dierent services?
If consumers or internal
stakeholders are not
currently aected, consider
revisiting the driving factors
for this change.
Consensus Is there a single source of
governance for your GraphQL
schema within the
organization?
An initial Federated
implementation, or an early
expansion of Federation, are
good opportunities to
create support systems for
education,
consensus-building,
governance, and quality
control.
Consensus Does your GraphQL
governance process have a
reasonably robust education
component to onboard new
teams?
Apollo has found that a
robust education plan is a
leading indicator of
constant improvement and
success.
Delivery Is your existing GraphQL
schema demand-oriented
and driven by concrete
product needs?
Changes driven by
data-modelling or internal
architectural requirements
may not have an ROI when
weighed against the costs of
infrastructure and
organizational change.
When to Consolidate Your Data Graph 13
Concern Criterion Yes No Remediation/Guidance
Responsibility Do you have a strong
GraphQL change
management, observability,
and discoverability story, and
do providers and consumers
know where to go for these
tools?
Graph administration and
tooling such as Apollo
Studio are key elements in a
successful,
organization-wide GraphQL
initiative.
Consensus Is your existing GraphQL
schema internally consistent,
and are your GraphQL
schema design patterns
well-understood by providers
and consumers?
Dividing responsibility or
adding new schema to your
Graph without strong
governance may exacerbate
existing friction or
product/service delivery
challenges.
Performance Can you be reasonably sure
that the cost of additional
latency, complexity, and
infrastructure management
will have a positive ROI when
bound by business timelines
and objectives?
Ensure that the
requirements for separating
concerns have a
performance and
optimization budget.
Ensuring Constant Improvement and Success
The outcomes of a GraphQL consolidation project should be measured against
the original, documented drivers for the transition to a federated data graph.
Aside from these measurements, certain actions and approaches must be under-
taken to ensure that ongoing changes to the consolidated GraphQL architecture
will be a success from a human and technology perspective.
For instance, teams may need to adopt new processes and practices to evolve
shared types collaboratively and in such a way that provides consistency for
current consumers of the data graph. Additionally, while an incremental cost,
the infrastructure impact should be explored and verified against reference
architectures during the project.
Because GraphQL can be an organizationally transformative technology, care
should be taken to involve all stakeholders during the planning and implementa-
tion process of a federated data graph. As a result, education plays a key role in
the success of federated implementations, which we will begin to explore in the
next chapter.
14 Moving Toward GraphQL Consolidation
Summary
Consolidating GraphQL APIs across the enterprise can help bring a much-needed
measure of consistency to how this technology is implemented for both data
graph contributors and consumers alike. Moving toward a unified, federated
approach allows an organization to scale its GraphQL APIs, obtain new perspec-
tives on its data graph, reuse infrastructure, and enable teams to ship code
faster. When the time is right to move toward a consolidated data graph, enforc-
ing proper separation of concerns in the underlying services will allow teams
to continue to rapidly iterate while adhering to the constraints imposed by the
federated implementation.
In the next chapter, we’ll explore the topic of graph ownership within an organi-
zation as well as how to plan for the successful roll-out of a consolidated graph
architecture with federation.
Graph Champions in the
Enterprise
By Je Hampton
As we explored in the previous chapter, GraphQL adoption patterns can vary
considerably within large organizations. In some instances, GraphQL is identified
by architects and applied as an incremental pattern of API consolidation or
mediation. Alternatively, GraphQL spreads organically among product teams
looking to accelerate their delivery with the safety and support aorded by the
GraphQL specification and community. Regardless of its inception, GraphQL
adoption naturally grows beyond a single team’s ability to reason about what is
being developed in an enterprise.
Apollos experience has revealed a consistent need for a specific skill set around
GraphQL in an enterprise. To put it plainly—regardless of the investment
model—GraphQL adoption will eventually generate the need for consolida-
tion once two or more teams invest in a data graph. The enterprises Graph
Champions will be instrumental to this consolidation eort.
In this chapter, we’ll further explore the concept of
the data-graph-as-a-
product
, identify its customers, and explore the skills and products necessary to
consolidate GraphQL within an enterprise. We’ll then scope the responsibilities
of Graph Champions and their role in organizational excellence and we’ll explore
each component of graph championship and data graph administration with key
deliverables and approaches to address consolidation challenges.
The Graph Champion and Graph Administration
The size and shape of the Graph Champion role may be embodied in a few teams
members, an architectural review board, or simply a cross-functional guild.
Regardless of its shape, the Graph Champion works to ensure that contributors
and consumers of an organization’s graph get what they need from it.
15
16 Graph Champions in the Enterprise
In short, the Graph Champion views an organization’s
data graph as a product
with multiple customers.
From that perspective, Graph Champions under-
stand that:
Time-to-market is crucial to customer success
Product quality is necessary for customer trust
Educating customers is a key factor in making a product useful
The data graph must have an ecosystem of tooling that serves all cus-
tomers well
The ergonomics exposed to graph consumers and conntributors must be
aligned with industry standards
Four Key Responsibilities of the Graph Champion
At Apollo, we have commonly seen that the core responsibilities of Graph Cham-
pions in an enterprise are divided into four overarching areas:
Governance
Broad initiatives are best served by a team whose focus and value is well-
understood across business units and organizational boundaries:
Graph Champions are recognized as a
source of truth
for GraphQL within
the organization
With an increased altitude, Graph Champions can be
entrusted with the
security of the graph and its access
Teams can rely on Graph Champions to bring
clarity to cross-cutting
concerns
(for example “how do I reference an end-user?” or “how do
we handle media, currency, and internationalization consistently in our
products?”)
Establish and maintain deprecation and long-term-support (LTS) sched-
ules based on end user and consumer demand for graph features
Health
Graph Champions support healthy, consolidated, and federated data graphs that
have these key characteristics:
Healthy adoption of a single, federated graph requires
rigor
in maintain-
ing a cohesive, easy-to-consume graph surface
Service discovery
and
product development
depend on consistent
document documentation, style, and availability
Consumers can serve end users quickly because the federated graph has
consistent naming and logical organization
The Graph Champion and Graph Administration 17
Do not contain
highly-duplicative or deceptively-similar portions of the
graph
Avoid confusion and friction for consumers
Advocate
Graph Champions serve the interests of multiple customers and stakeholders
through support and service by:
Defending the role of the data graph to business leadership
Providing
education to new customers
in the languages and parlance of
the teams to which they belong
Onboarding and facilitating discussions, RFCs, and architectural reviews
Equip
Successful digital transformation“ strategies oen under-prioritize engineering
ergonomics and tooling. A successful Graph Champion equips each customer of
the data graph according to their needs by:
Providing and manage tooling for other teams to use and evolve the graph
Establish common, polyglot patterns and sound practices for eective
GraphQL use
Supporting delivery systems, including integration, testing, artifact reg-
istries, and IDE tooling
Supporting Customers of the Data Graph
A new product-centric view of the data graph demands a clear understanding of
the graph’s customers. Before moving forward, it’s important to recognize that
and a customer-centric view of API service delivery is distinctly dierent from a
stakeholder-centric view of an ongoing project. While stakeholders may bring
concerns to a project’s lifecycle, customers bring feedback about how well the
product supports them in achieving their goals.
To those ends, we have identified four unique customer personas that data
graphs must support, each with dierent usage requirements and feedback
perspectives to consider:
1. End User
Uses products built by the organization’s consumers
May use public APIs, cross-platform application experiences, or
integration platforms
2. Consumer
Explores an organization’s graph
18 Graph Champions in the Enterprise
Builds products for End Users using existing and new graph features
Are concerned with performance, new product development
3. Contributor
Resolve graph data to underlying systems
Fulfill product-driven requests from Consumers
Collaborate with Consumers through tooling, education
4. Sponsor
Enable CI/CD and provide delivery platform
Maintain operational excellence
“Last Mile” to the End User
With these personas in mind, we can further contextualize the key responsibili-
ties of Graph Champions from the previous section to gain a holistic view of their
role in supporting a consolidated data graph in an organization:
Managing Consolidation Challenges
As organizations work toward consolidating their GraphQL service delivery
through federation, a common set of challenges oen arises as teams align
to new practices for managing and contributing to a unified data graph. As
an extension of supporting graph customers, Graph Champions can help an
enterprise strategically address the following challenges:
Delivering Organizational Excellence as a Graph Champion 19
Challenge #1: Schema Evolution
GraphQL increases developer velocity and product delivery. Graph Champions
support this ongoing product evolution through education and governance so
that the graph can continue to safely and eectively serve its customers.
Challenge #2: Composition
Dierent teams and business priorities frequently create blurred boundaries of
domain, data, and service ownership. Graph Champions can facilitate domain-
based conflict resolution of overlapping types, fields, and cross-cutting concerns
in support of the broader health of the composed data graph.
Challenge #3: Service Delivery
Organizations delivering a data graph as a product must reason about services
and schemas with dierent rates of change and dierent delivery timelines for
end-user products. Graph Champions can help provide the necessary insight
to configure service boundaries that allow one team to maintain and evolve its
portion of the graph without compromising or otherwise conflicting with the
work of other teams.
Challenge #4: Tooling
GraphQL devops has matured. Service delivery demands observability, perfor-
mance tuning, and client/operation identification. Graph Champions act as
advocates for proper developer and operational ergonomics to support teams in
eective service delivery.
Delivering Organizational Excellence as a Graph
Champion
There are some higher-level questions that can guide the mission and day-to-
day and week-to-week work of the GraphQL Champions in an enterprise. These
questions fulfill customer needs and align to key responsibilities of the role:
Responsibility Question Approaches
Governance As teams contribute to the graph,
what is their obligation to their
downstream consumers?
Schema versioning, deprecation
schedules
20 Graph Champions in the Enterprise
Responsibility Question Approaches
Governance Who sets which policies with
respect to SLA, SLO, LTS?
RFCs, DevOps discussions,
platform policies
Governance Is deprecation required
per-service?
LTS commitments, business
product alignment
Governance Can breaking changes be forced to
consumers? Under what
circumstances, and on whose
accountability?
LTS commitments, business
product alignment
Governance
Is support segmented per-platform,
in-aggregate, or driven by
longest-client-support?
LTS commitments, business
product alignment
Advocacy How do consumers stay informed
of changes?
Center of excellence portal,
internal communications
Advocacy How do you ensure clear Graph
Policies and usage?
Defined standards, RFCs,
templates, and educational
programs
Advocacy How is a new team onboarded
successfully?
Center of excellence portal,
education
Advocacy How do we maintain consistency
for cross-cutting concerns?
Prioritize RFC and Champion
participation, governed consensus
Equip Which languages, services, and
platforms will be supported?
Equip How do we create scalable,
high-performing teams?
IDE integrations, dev-time tooling,
test automation
Equip How do we automate and enable
change in our product(s)?
Schema evolution and registry
Health Can we automate quality in our
delivery?
Tracing-based automated testing,
SDLC alignment with GraphQL
delivery
Health Can we observe the health of the
graph as a product, not as a series
of disjointed services?
Integrated observability, data
graph-specific tracing
Education To Support Organizational Change 21
Education To Support Organizational Change
A comprehensive, continuous education plan has proven crucial to Apollo’s
customers success in the enterprise. Once one understands the changes to the
organization’s graph, a key early step is to educate the teams and management
who will drive and support the changes. Graph Champions within the organiza-
tion have a responsibility to provide education support. Thankfully, both Apollo
and the wider GraphQL community have a foundational set of resources.
An example educational outline for GraphQL adoption and change should likely
include the following:
GraphQL introduction:
Facebook
Reference Implementation
Purpose
Principled GraphQL
Summary
Graph Champions provide essential capabilities to an enterprises GraphQL
consolidation work. When viewed as a product, the value of a data graph serves
many technical customers and, ultimately, the business’ strategic goals. A
successful consolidation strategy needs leaders that can properly equip data
graph contributors with the tools they need while also advocating for, governing,
and maintaining the overall health of the data graph. Graph Champions are
also well-positioned to help an organization navigate some of the challenges of
consolidation while providing educational support to graph contributors and
consumers alike.
Consolidated Architectures with
Federation
By Mandi Wise
Chapter 1 touched on the high-level architecture of GraphQL APIs that are con-
solidated via federation. By embracing this federated approach, teams can
address the lack of consistency that oen emerges from other non-federated
GraphQL architectures while also exposing data within the graph in a demand-
oriented way. In this chapter, we’ll explore federation’s various implementation
details and architectural considerations in greater depth to gain a better under-
standing of how to fully realize its benefits.
A Better Way to Scale Distributed GraphQL Architectures
The first principle outlined in Principled GraphQL is “One Graph, which states
that an organization should have a single unified graph, instead of multiple
graphs created by each team. While there are other pathways to a distributed
GraphQL architecture, federation is the only option that exposes a single entry
point to a data graph while simultaneously allowing teams to maintain logi-
cal service boundaries between the portions of the graph that they own and
maintain. What’s more, federation oers a declarative interface for seamlessly
composing the independently managed schemas into a single API, unlike other
more brittle, imperative approaches like schema stitching.
We previously discussed that a federated GraphQL architecture consists of two
main components: first, a collection of
subgraphs
, and second, a
subgraphs
that sits in front of those services and composes their distinct schemas into a
federated data graph. To facilitate schema composition, the gateway and sub-
graphs use spec-compliant features of GraphQL, so any language can implement
federation.
22
A Better Way to Scale Distributed GraphQL Architectures 23
Visit the Apollo documentation to view the full federation specification.
Historically at Apollo, we have seen that federation usually isn’t a starting point
for most enterprises in the early stages of adopting GraphQL. While it can be in
some cases, implementing federation before running GraphQL in production
with a pre-consolidation pattern will likely necessitate large education and
integration eorts for the teams who will be responsible for managing portions
of the data graph. It may also skew the focus of this process heavily toward data
modelling across services instead of product delivery.
More oen, as GraphQLs surface area expands across teams tech stacks,
pain points emerge as these teams attempt to scale within the various pre-
consolidation patterns (discussed in Chapter 1) and perhaps even begin to
experiment with other non-federated approaches to consolidation. Graph Cham-
pions within the organization emerge and drive the teams toward a federated
architecture to unify the disparate portions of the data graph, increase developer
velocity, and scale GraphQL APIs more eectively.
In our experience, these paths are well-worn and converge on a shi toward a
federated data graph. This transition was designed to minimize disruption to
teams that are currently contributing to and consuming existing GraphQL APIs.
When this transition is properly executed, champions can improve the semantics
and expressiveness of the data graph while facilitating improved collaboration
between teams. Federated architectures achieve these ends by adhering to two
core principles: incremental adoption and separation of concerns.
Core Principle #1: Incremental Adoption
Just as any GraphQL schema should be built up incrementally and evolved
smoothly over time (as outlined in detail as one of the Agility” principles in
Principled GraphQL), a federated GraphQL architecture should be similarly
rolled-out through a phased process.
For most teams, a “big bang” rewrite of all existing GraphQL APIs or all portions
of a monolithic GraphQL schema may not be fruitful or even advisable. When
adopting federation, we recommend that an enterprise identify a small but
meaningful piece of their existing GraphQL implementation to isolate as the first
subgraph (or a small number of subgraphs, if required). Taking an incremental
approach to federating the graph will allow you to gradually define services
boundaries, identify appropriate connection points between subgraphs, and
learn as you go.
24 Consolidated Architectures with Federation
Additionally, whatever portion of the data graph you scope into an initial sub-
graphs should have at least one client that actively continues to consume this
data. From the client’s perspective, the transition to federation can and should
be as seamless as possible, and continued consumption of this data can help
you validate assumptions, test out new federation tooling, and help you under-
stand how to best delineate future subgraphs boundaries.
Core Principle #2: Separation of Concerns
The second core principle of federation is also one of its main architectural
advantages when consolidating GraphQL in an enterprise. Federation allows
teams to partition the schema of the unified data graph using concern-based
separation rather than type-based separation. This distinction sets federation
apart from other consolidation approaches like schema stitching and allows
teams to collaborate on and contribute to the data graph in a more organic and
productive way.
While dividing a GraphQL schema across teams based on types may initially
make sense, in practice, types will oen contain fields that cannot be neatly
encapsulated within a single service’s boundaries. For example, where one team
maintains a products service and another maintains a reviews services, how do
you define the relationship that a list of reviews has to a given product or that a
product has to a specific review in these portions of the schema?
In these instances, foreign key-like fields may find their way into the types, which
reduces the expressiveness of relationships between nodes in the graph and
exposes underlying implementation details instead of serving product use cases.
Alternatively, a non-trivial amount of imperative code would be required to link
the types together in a stitched schema.
Concern-based separation allows each service to define the types and fields
that it is capable of (and should be responsible for) populating from its back-end
data store. The boundaries that encompass these concerns that are related to
team structure, geographic hosting, performance, governance and compliance,
or some combination thereof. Other services may then directly
reference
and
extend
those types in their schemas with new fields backed by their data stores.
Teams maintain their respective portions of the graph with little-to-no friction.
The resulting API is a holistic, client-friendly representation of the enterprises
unified data graph.
Subgraphs and the Gateway 25
Apollo Studio provides the necessary tooling to help you in understand
references, extensions, and dependencies between graphs. Learn more
about Apollo Studios features.
Subgraphs and the Gateway
To set up a federated data graph, we will need at least one federation-ready
subgraph service and a gateway GraphQL API to sit in front of it. Note that in
practice, a federated data graph will typically have multiple subgraphs behind
the gateway as follows:
To create a subgraph with Apollo Server, we would also install the
@apollo/federation
package alongside it and use its
buildFederatedSchema
function to decorate the services schema with the additional federation-specific
types and directives. For example:
const { ApolloServer } = require("apollo-server");
const { buildFederatedSchema } = require("@apollo/federation");
// ...
const server = new ApolloServer({
schema: buildFederatedSchema([{ typeDefs, resolvers }])
});
server.listen(4001).then(({ url }) => {
console.log(`Server ready at ${url}`);
});
The
buildFederatedSchema
function ensures that the subgraph’s schema
conforms to the Apollo Federation specification and also exposes that schema’s
26 Consolidated Architectures with Federation
capabilities to the gateway. In addition to Apollo Server, many third-party li-
braries provide support for Apollo Federation in a variety of languages including
Java, Kotlin, Ruby, and Python.
With a subgraph in place, we can configure a gateway to sit in front of
that service. By creating a new Apollo Server in conjunction with the
@apollo/gateway
package, we can
declaratively
compose the subgraph’s
schema into a federated data graph:
const { ApolloGateway } = require("@apollo/gateway");
const { ApolloServer } = require("apollo-server");
const gateway = new ApolloGateway({
serviceList: [
{ name: "accounts", url: "http://localhost:4001" }
]
});
const server = new ApolloServer({
gateway,
subscriptions: false,
});
server.listen(4000).then(({ url }) => {
console.log(`Server ready at ${url}`);
});
When the gateway starts up, it uses the URLs provided in the
serviceList
to
fetch the schema from each subgraph to compose the federated data graph. In
production, we recommend running the gateway in a
managed mode
with
Apollo Studio (using static configuration files instead of querying service
schemas at start-up), which we’ll explore further later in this chapter.
At this time, subscription operations are not supported with Apollo
Federation, so the subscriptions option must be set to false.
The Apollo team has explored other patterns for serving real-time queries
with a federated GraphQL API, which you can view in this repository.
When a request reaches the gateway-enabled Apollo Server, it will execute the
incoming operation across the subgraphs and then form the overall response.
How that request is optimized and fulfilled across the federated data graph is
determined by a key feature of the gateway known as query planning.
Connecting the Data Graph with Entities 27
At a high level, query planning works by optimizing for the most time spent in
a single service to reduce the number of network hops. More specifically, the
gateway used a
service-based depth-first approach
to operation execution
across services, unlike the breadth-first approach typically used by monolithic
GraphQL servers.
Customizing Service-Level Execution
Apollo Gateway also exposes a configuration option called
buildService
that
allows both customization of requests before directing them to a subgraph and
also modification of the responses received from a subgraph service before
delivering those results to a client. This option can be particularly useful when
forwarding auth-related headers from the gateway to the subgraphs or when
customizing headers sent in a query response.
Connecting the Data Graph with Entities
The core building blocks of a federated data graph are known as
entities
. An
entity is a type that we canonically define in one subgraph’s schema and then
reference and extend by other services. As per the Apollo Federation specifica-
tion, we define entities in a subgraph’s schema using the @key directive.
The
@key
directive defines a
primary key
for the entity and its
fields
argu-
ment will contain one or more of the type’s fields. For example:
type User @key(fields: "id") {
id: ID!
name: String
username: String
}
The @key directive may be used to define multiple primary keys for an entity:
type Product @key(fields: "upc") @key(fields: "sku") {
upc: String!
sku: String!
name: String
price: Int
brand: Brand
weight: Int
}
The @key directive also supports compound primary keys for nested fields:
28 Consolidated Architectures with Federation
type User @key(fields: "id organization { id }") {
id: ID!
name: String
username: String
organization: Organization!
}
type Organization {
id: ID!
}
Referencing Entities
Aer defining an entity in a schema, other subgraphs can reference that entity
in their schemas. In order for the referencing service’s schema to be valid, it
must define a stub of the entity in its schema. For example, we can reference
a
Product
type defined in one service as the return type corresponding to a
product field on a Review type defined in another service:
type Review @key(fields: "id") {
id: ID!
body: String
product: Product
}
extend type Product @key(fields: "upc") {
upc: String! @external
}
Note that the GraphQL spec-compliant
extend
keyword is used before the
referenced
Product
type, indicating that this type was defined in another
subgraph. The
@key
directive indicates that the reviews service will be able to
identify a product by its UPC value and therefore be able to connect to a product
based on its
upc
primary key field, but the reviews service does not need to
be aware of any other details about a given product. The
@external
directive
is required on the
upc
field in the
Product
definition in the review service to
indicate that the field originates in another service.
Because the reviews service only knows about a product’s UPC, it will be unable
to resolve all of a
Product
type’s fields. As a result, the reviews services resolver
for the
product
field will only a return a representation of the product with the
primary key field value as follows:
Connecting the Data Graph with Entities 29
{
Review: {
product(review) {
return { __typename: "Product", upc: review.upc };
}
}
}
Resolving References
To resolve any additional fields requested on
Product
, the gateway will pass
that representation to the products services to be fully resolved. To fetch the
product object that corresponds to the reference, the products service must
implement a reference resolver for the Product type:
{
Product: {
__resolveReference(reference) {
return fetchProductByUPC(reference.upc);
}
}
}
With these resolvers in the place, the gateway can now successfully coordi-
nate execution of operations across service boundaries and clients can make
GraphQL query requests to a single endpoint and in a shape that expresses the
natural relationship between products and reviews.
Extending Entities
Referencing entities is a key feature of federation, but it’s only half of the story.
While an entity will be owned by a single subgraph, other services may wish to
add additional fields to the entity’s type to provide a more holistic representa-
tion of the entity in the data graph. Doing so is a simple as adding the additional
field to the extended type in a non-originating service. For example, a reviews
service’s schema may add a
reviews
field to the extended
User
type that was
originally defined in an accounts service:
extend type User @key(fields: "id") {
username: String @external
reviews: [Review]
}
30 Consolidated Architectures with Federation
The reviews service must then implement a resolver for the user’s reviews:
{
User: {
reviews(user) {
return fetchReviewsByUsername(user.username);
}
}
}
When extending entities, it’s important to keep in mind that
the entity’s orig-
inating service will not be aware of the added fields
. Additionally, each field
in an entity must only be defined once or the gateway will encounter schema
composition errors.
Advanced Extensions, Calculated Fields and Optimizations
Extension points within a data graph can also be leveraged for advanced use
cases. In one advanced scenario, an entity may be extended with computed
fields by requiring fields from the entity’s originating service.
For example, a reviews service could add a custom
reviewName
field for a
product by using the
@requires
directive to specify the fields that it depends
on from the originating service. Using the
@requires
directives makes these
fields available to the reviews service when resolving the
reviewName
field even
if they weren’t requested by the client in the query operation:
extend type Product @key(fields: "sku") {
sku: String! @external
name: String @external
brand: Brand @external
reviewName(delimeter: String = " - "): String
@requires(fields: "name brand")
}
Multiple subgraphs may also resolve a field when data has been denormalized
across those services. In this scenario, applying the
@provides
directive on a
field definition that returns an extended type will tell the gateway that certain
fields for that entity can be resolved by the extending service too:
extend type User @key(fields: "id") {
username: String @external
reviews: [Review]
}
Defining Shared Types and Custom Directives 31
type Review @key(fields: "id") {
id: ID!
body: String
author: User @provides(fields: "username")
product: Product
}
The
@provides
directive helps to optimize how data is fetched by potentially
eliminating unnecessary calls to additional subgraphs. In the above example,
the reviews service is capable of resolving an author’s username, so a request to
the accounts service may be avoided if no additional data is required about the
user.
This directive can be a useful (but optional) optimization that helps support the
gateway’s query planner in determining how to execute a query across as few
services as possible, but its usage comes with a few important caveats:
The subgraph that extends the entity must define a resolver for any field to
which it applies the @provides directive
There is no guarantee as to which service will ultimately resolve the field in
the query plan
The fields argument of @provides does not support compound fields
Extending Query and Mutation Types
As a final note on type extensions, when defining queries and mutations in a
subgraph’s schema we also add the
extend
keyword in from of the
Query
and
Mutation
types. Because these types will originate at the gateway level of the
API, all subgraphs should extend these types with any additional operations.
For example,
type Query
would be prefixed by the
extend
keyword in the
accounts service as follows:
extend type Query {
me: User
}
Defining Shared Types and Custom Directives
Value Types
In some instances, subgraphs may need to share ownership of a type rather than
turning it into an entity and assigning it to a particular service. As a result, Apollo
32 Consolidated Architectures with Federation
Federation provides support for shared
value types
including Scalars, Objects,
Interfaces, Enums, Unions, and Inputs. When subgraphs share value types, then
those types must be identical in name in contents, otherwise, composition
errors will occur.
Please see the Apollo Federation documentation for detailed instructions
on sharing types across subgraphs.
Custom Directives
Apollo Gateway provides support for both type system directives and executable
directives. Type system directives are applied directly to a subgraph’s schema
while executable directives are applied in operations sent from a client.
To provide support for type system directives, Apollo Gateway eectively ignores
them by removing all of their definitions and uses from the final composed
schema. The definitions and uses of these custom directives remain intact in the
subgraph’s schema and are processed at that level only.
Executable directives, on the other hand, are treated much like shared value
types. These directives must be defined in the schemas of all subgraphs with the
same locations, arguments, and argument types, or else composition errors will
occur. Correspondingly, subgraphs should also use the same logic to handling
executable directives as well to avoid ambiguity for the clients that apply those
directives to operations.
See the Apollo Federation documentation to read more about handling
directives with subgraphs.
Managing Cross-Cutting Concerns
Whether sharing value types or executable directives across subgraphs, it’s
always important to consider the long-term implications of introducing cross-
cutting concerns that may impede teams’ abilities to manage and iterate their
portions of the data graph. At Apollo, we’ve seen enterprises introduce measures
into CI/CD pipelines to help manage composition errors as they occur when one
team introduces a changes to a shared value type, but be sure to evaluate the
complexity that each cross-cutting schema concern adds to your deployment
process before doing so.
Managed Federation 33
Managed Federation
In the previous examples, we have seen how to run a federated data graph
using a list of service URLs. As a best practice, Apollo Gateway can also run in
a managed federation mode and use Apollo Studio as the source of truth for
each subgraph’s schema. With managed federation, the gateway is no longer
responsible for fetching and composing schemas from the subgraph services.
Instead, each service pushes its schema to a registry, and upon composition,
Apollo Studio updates a dedicated configuration file for the graph in Google
Cloud Services. The gateway then regularly polls Apollo Studio for updates to
the data graph’s configuration, as visualized below:
Managed federation supports team collaboration across a distributed GraphQL
architecture by allowing each team to safely validate and deploy their por-
tions of the data graph. A managed approach to federation also provides an
enterprise with critical observability features to monitor changes in data graph
performance via field-level tracing. We will explore managed federation in-depth
in relation to graph administration best practices in a later chapter.
Summary
In this chapter, we explored the features and benefits of a federated schema
and how they may be realized using Apollo libraries. Federation is underpinned
by the principles of
incremental adoption
and
separation of concerns
. By
adhering to these principles, teams within an enterprise can work toward a
consolidated GraphQL architecture along a minimally-disruptive migration path.
Federation enables teams to independently, yet collaboratively, manage por-
34 Consolidated Architectures with Federation
tions of the single, unified data graph. Entities are the key feature of a federated
data graph that provides the extension points among subgraphs and power that
collaborative work.
With an understanding of the basic mechanics of federation in place, in the next
chapter, we’ll explore schema design best practices with special consideration
for federated data graphs.
Federated Schema Design Best
Practices
By Mandi Wise
GraphQL is a relatively new technology, but from its rapid and widespread adop-
tion has emerged a host of common schema design best practices—both from
the enterprises that use it at scale every day, as well as the broader developer
community. The majority of best practices that apply to non-federated GraphQL
schema design also apply when designing service schemas within a federated
data graph. However, federated schema design rewards some additional best
practices when extracting portions of a data graph into subgraphs and determin-
ing what extension points to expose between service boundaries.
As we saw in the previous chapter, entities are the core building blocks of a
federated data graph, so the adoption of any schema design best practice
must be approached with the unique role of entities in mind. A successful
federated schema design process should begin by thinking about what the initial
entity types will be and how they will be referenced, extended, and leveraged
throughout the graph to help preserve the separation of concerns between
services—both today and as the graph evolves in the future.
When migrating from a client-only or monolithic GraphQL pattern, that work be-
gins by identifying what entities will be exposed in the first subgraph extracted
from the larger schema. When migrating from an architecture consisting of BFF-
based GraphQL APIs or any other architecture of multiple overlapping graphs,
the work of identifying entities (and determining new service boundaries, in
general) may be a bit more complex and involve some degree of negotiation with
respect to type ownership, as well as a migration process to help account for any
breaking changes that may result for clients.
Whatever your architectural starting point, Apollo Federation was designed to
allow the work of identifying entities and defining subgraph boundaries to be
done in an incremental, non-disruptive fashion. Beginning to identify these
35
36 Federated Schema Design Best Practices
entities is also the essential prerequisite for adopting the other schema design
best practices that will follow.
In this chapter, we’ll explore some proven best practices for GraphQL schema de-
sign with a specific lens on how these practices relate to federated data graphs,
as well as any special considerations and trade-os to keep in mind when design-
ing and evolving schemas across a distributed GraphQL architecture.
Best Practice #1: Design Schemas in a Demand-Oriented,
Abstract Way
The shi to a unified data graph is almost invariably motivated in part by a
desire to simplify how clients access the data they need from a GraphQL API
backed by a distributed service architecture. And while GraphQL oers the
promise of taking a client-driven approach to API design and development, it
provides no inherent guarantee that any given schema will lend itself to real
client use cases.
To best support the client applications that consume data from our federated
graph, we must intentionally design schemas in an abstract, demand-oriented
way. This concept is formalized as one of the Agility” principles in Principled
GraphQL, stating that a schema should not be tightly coupled to any particular
client, nor should it expose implementation details of any particular service.
Prioritize Client Needs, But Not Just One Client’s Needs
Creating a schema that is simultaneously demand-oriented while avoiding
the over-prioritization of a single client’s needs requires some upfront work
specifically, client teams should be consulted early on in the API design process.
From a data-graph-as-a-product perspective, this is an essential form of foun-
dational research to ensure the product satisfies user needs. This research
should also continue to happen on an ongoing basis as the data graph and client
requirements evolve.
Client teams should drive these discussions wherever possible. That means
in practice, instead of providing a dra schema to a client team and asking
for feedback, it’s better to work through exercises where you ask client team
members to explain exactly what data is needed to render particular views and
have them suggest what the ideal shape of that data would be. It is then the task
of the schema designers to aggregate this feedback and reconcile it against the
broader product experiences that you want to drive via your data graph.
Best Practice #1: Design Schemas in a Demand-Oriented, Abstract Way 37
When thinking about driving product experiences via the data graph, keep
in mind that the overall schema of the data graph is a representation
of your product and each federated schema is the representation of a
domain boundary within the product. This is why Apollo Federation excels
at supporting omni-channel product strategies—the data graph can be
designed in a demand-oriented way that’s based on product functions
and the clients that query the graph can, in turn, evolve along with those
functions.
Keep Service Implementation Details Out of the Schema
Client team consultation can also help you avoid another schema design pitfall,
which is allowing the schema to be unduly influenced by backing services or
data sources.
Other approaches to GraphQL consolidation can make it challenging to side-step
this concern, but federation allows you to design your schema in a way that
expresses the natural relationships between the types in the graph. For example,
in a distributed GraphQL architecture without federation, foreign key-like fields
may be necessary for a subgraph’s schema to join the nodes of your data graph
together:
type Review {
id: ID!
productID: ID
}
With federation, however, a reviews service’s schema can represent a true subset
of the complete data graph:
extend type Product @key(fields: "id") {
id: ID! @external
}
type Review {
id: ID!
product: Product
}
As another common example of exposed implementation details, here we can
see how an underlying REST API data source could influence the names of
mutations in a service’s schema:
38 Federated Schema Design Best Practices
extend type Mutation {
postProduct(name: String!, description: String): Product
patchProduct(
id: ID!,
name: String,
description: String
): Product
}
A better approach would look like this:
extend type Mutation {
createProduct(name: String!, description: String): Product
updateProductName(id: ID!, name: String!): Product
updateProductDescription(
id: ID!,
description: String!
): Product
}
The revised
Mutation
fields better describe what is happening from a client’s
perspective and oer a finer-grained approach to handling updates to a prod-
uct’s name and description values where those updates need to be handled
independently in a client application. Using two separate update mutations
also helps disambiguate what would happen if a client sent the
patchProduct
mutation with no
name
or
description
arguments (because the mutation
could handle updating one value or the other, but does not require both for any
given operation) and saves the subgraph from having to handle these errors at
runtime. We’ll speak more on the use cases for finer-grained mutations in the
next section.
As a final, related point on hiding implementation details in the schema, we
should also avoid exposing fields in a schema that clients don’t have any reason
to use. If a schema is intentionally and iteratively developed based on the
aggregation of product functions and client use cases, then this issue can easily
be avoided.
However, when tools are used to auto-generate a GraphQL schema based on
backing data sources, then you will almost invariably end up with fields in your
schema that clients don’t need but may develop unintended use cases for in the
future, which will make your schema harder to evolve over the longer term. This
is why, at Apollo, we generally discourage the use of schema auto-generation
tools—they lead you in precisely the opposite direction of taking a client-first
approach to schema design.
Best Practice #2: Prioritize Schema Expressiveness 39
Best Practice #2: Prioritize Schema Expressiveness
A good GraphQL schema will convey meaning about the underlying nodes in an
enterprise’s data graph, as well as the relationships between those nodes. There
are multiple dimensions to schema expressiveness—many of which overlap
with other schema design best practices—but here we’ll focus specifically on
standardizing naming and formatting conventions across services, designing
purposeful fields in a schema, and augmenting an inherently expressive schema
with thorough documentation directly in its SDL to maximize usability.
Standardize Naming and Formatting Conventions
There are only two hard things in Computer Science: cache invalidation and
naming things.
Phil Karlton
Arguably, the “naming things” aspect of this observation grows even more chal-
lenging when trying to name things consistently across a distributed GraphQL
architecture supported by many teams! (Same goes for caching, but we’ll cover
that topic separately in a later chapter.)
Being consistent about how you name things may go without saying, but it’s
even more important when composing schemas from multiple subgraphs into a
single federated GraphQL API. The “One Graph” principle that drives federation
is meant to help improve consistency for clients, and that consistency should
include naming conventions. For example, having a
users
query defined in one
service and a
getProducts
query defined in another doesn’t provide a very
consistent or predictable experience for data graph consumers. Similar to fields,
type naming and name-spacing conventions should also be standardized across
the graph.
Additionally, when an enterprise already has multiple GraphQL APIs in use that
will be rolled into the federated data graph, the names of the types within those
existing schemas may collide. In these instances, a decision must be made about
whether those colliding types should become an entity within the graph or a
value type, or if some kind of name-spaced approach is warranted.
The outset of a migration project to a federated data graph is the right time to
take stock of what naming conventions are currently used in existing GraphQL
schemas within the enterprise, determine what conventions will become stan-
dardized, onboard teams to those conventions, and plan for deprecations and
rollovers as needed. Additionally, there should also be a thorough review pro-
40 Federated Schema Design Best Practices
cess in place as the graph evolves to ensure that new fields, types, and services
adhere to these conventions.
A Brief Note on Pagination Conventions
Another important area of standardization when consolidating GraphQL
APIs across an enterprise is providing clients a consistent experience
for paginating field results across services. On this topic, we oer these
high-level guidelines:
Add pagination when it’s necessary. Don’t add pagination arguments
to a field when a basic list will suice.
When pagination is warranted, leverage your consolidation eorts
as an opportunity to standardize type system elements that support
pagination (for example, arguments and pagination-related object
types and enums).
Standardizing pagination across your data graph doesn’t mean
preferring one style of pagination over another (for example, oset-
based or cursor-based pagination). Choose the right tool for the job,
but ensure that each style of pagination is implemented consistently
across services.
Your internal data graph governance group should actively enforce
pagination standards across your subgraphs to maintain consistency
for clients.
Design Fields Around Specific Use Cases
As mentioned previously, a GraphQL schema should be designed around client
use cases, and ideally, the fields that are added to a schema to support those
use cases will be single-purpose. In practice, this means having more specific,
finer-grained mutations and queries.
While it’s still important to ensure that we don’t expose unneeded fields in a
schema, that doesn’t mean we should avoid adding additional queries and
mutations to a schema if they are driven by client needs. For example, having
two
userById
and
userByUsername
queries may be a better choice than a
single
user
query that accepts either a name or ID as a nullable argument.
Because the more generalized
user
query could fetch a user by name or ID it
necessitates nullable arguments, which creates ambiguity for the client about
what will happen if the query is submitted with neither of those arguments
included.
Convoluted input types can also complicate the observability story for your data
graph. If an input is used to contain query arguments, then each additional field
Best Practice #2: Prioritize Schema Expressiveness 41
added to the input can make it increasingly opaque as to what field may be the
root cause of a particularly slow query when viewing an operation’s traces in
your observability tools.
Taking a finer-grained approach also applies to update-related mutations.
For example, rather than having a single
updateAccount
mutations to rule
them all, use more purpose-driven mutations when these values are updated
independently by clients. For example, consider this series of mutations used to
update a user’s account information:
type Mutation {
addSecondaryEmail(email: String!): Void
changeBillingAddress(address: AddressInput!): Account
updateFullName(name: String!): Void
}
If any of these values needed to be updated simultaneously or not at all, then it
would make sense to bundle the updates into a coarser-grained mutation. But
with this caveat aside, opting for finer-grained mutations helps avoid the same
pitfalls as finer-grained queries do and saves you from doing extra validation
work at runtime to determine that the submitted arguments will lead to a logical
outcome for a mutation.
As a final note on field use cases, fields within a schema can be leveraged as an
entry point to what authenticated users can do within that schema. A common
pattern is to add a
viewer
or
me
query to an API, and the GitHub GraphQL API
provides a notable example of this pattern:
type Query {
# ...
"The currently authenticated user."
viewer: User!
}
Document Types, Fields, and Arguments
A well-documented schema isn’t just a nicety in GraphQL. The imperative to doc-
ument the various aspects of a schema is codified in the GraphQL specification.
The specification states that documentation is a “first-class feature of GraphQL
type systems” and goes further to say that all types, fields, arguments, and other
definitions that can be described should include a description unless they are
self-descriptive.
42 Federated Schema Design Best Practices
So while in many regards a well-designed, expressive schema will be self-
documenting, using the SDL-supported description syntax to fully describe
how the types, fields, and arguments in an API behave will provide an extra
measure of transparency for data graph consumers. For example:
extend type Query {
"""
Fetch a paginated list of products based on a filter.
"""
products(
"How many products to retrieve per page."
first: Int = 5
"Begin paginating results after a product ID."
after: Int = 0
"""
Filter products based on a type.
Products with any type are returned by default.
"""
type: ProductType
): ProductConnection
}
In the example above, we see how a thoroughly described
products
query may
look when the query and each of its arguments are documented. And just as
with naming conventions, it’s important to establish standards for documenta-
tion across a federated data graph from its inception to ensure consistency for
API consumers. Similarly, there should also be governance measures in place to
ensure that documentation standards are adhered to as the schema continues
to evolve.
Note that when documenting subgraphs’ schema files, we can’t add
descriptions strings above extended types (including extended
Query
and
Mutation
types) because the GraphQL specification states that only type
definitions can have descriptions, not type extensions.
Best Practice #3: Make Intentional Choices About
Nullability
All fields in GraphQL are nullable by default and it’s oen best to err on the side
of embracing that default behavior as new fields are initially added to a schema.
Best Practice #3: Make Intentional Choices About Nullability 43
However, where warranted, non-null fields and arguments (denoted with a
trailing
!
) are an important mechanism that can help improve the expressive-
ness and predictability of a schema. Non-null fields can also be a win for clients
because they will know exactly where to expect values to be returned when
handling query responses. Non-null fields and arguments do, of course, come
with trade-os, and it’s important to weigh the implications of each choice you
make about nullability for every type, field, and argument in a schema.
Plan for Backward Compatibility
Including non-null fields and arguments in a schema makes that schema harder
to evolve where a client expects a previously non-null field’s value to be pro-
vided in a response. For example, if a non-null
email
field on a
User
type is
converted to a nullable field, will the clients that use that field be prepared to
handle this potentially null value aer the schema is updated? Similarly, if the
schema changes in such a way that a client is suddenly expected to send a previ-
ously nullable argument with a request, then this may also result in a breaking
change.
While it’s important to make informed decisions about nullability when ini-
tially designing a service’s schema, you will inevitably be faced with making a
breaking change of this nature as a schema naturally evolves. When this hap-
pens, GraphQL observability tools that give you insight into how those fields are
used currently in dierent operations and across dierent clients. This visibility
will help you identify issues proactively and allow you to communicate these
changes to impacted clients in advance so they can avoid unexpected errors.
Minimize Nullable Arguments and Input Fields
As mentioned previously, converting a nullable argument or input field for
a mutation to non-null may lead to breaking changes for clients. As a result,
specifying non-null arguments and input fields on mutations can help you avoid
this breaking change scenario in the future. Doing so, however, will typically
require that you design finer-grained mutations and avoid using everything but
the kitchen sink” input types as arguments that are filled with nullable fields to
account for all possible use cases.
This approach also enhances the overall expressiveness of the schema and
provides more transparency in your observability tools about how arguments
impact overall performance (this is especially true for queries). What’s more, it
also shis the burden away from data graph consumers to guess exactly which
fields need to be included in mutation to achieve their desired result.
44 Federated Schema Design Best Practices
Tip: Use Default Values for Nullable Arguments and Input Fields
Providing a default value for a nullable argument or input field will also
improve the overall expressiveness of a schema by making default behav-
iors more transparent. In our previous
products
query example, we can
improve the
type
argument by adding an
ALL
value to its corresponding
ProductType
enum and setting the default value to
ALL
. As a result, we
no longer need to provide specific directions about this behavior in the
argument’s description string:
extend type Query {
"Fetch a paginated list of products based on a filter."
products(
# ...
"Filter products based on a type."
type: ProductType = ALL
): ProductConnection
}
Weigh the Implications of Non-Null Entity References
When adding fields to a schema that are resolved with data from third-party
data sources, the conventional advice is to make these fields nullable given the
potential for the request to fail or for the data source to make breaking changes
without warning. Federated data graphs add an interesting dimension to these
considerations given that many of the entities in the graph may be backed by
data sources that are not in a given service’s immediate control.
The matter of whether you should make referenced entities nullable in a sub-
graph’s schema will depend on your enterprise’s existing architecture and likely
need to be assessed on a case-by-case basis. Keep in mind the implication
that nullability has on error handling—specifically, when a value cannot be re-
solved for a non-null field, then the null result bubbles up to the nearest nullable
parent—and consider whether it’s better to have a partial result or no result at all
if a request for an entity fails.
Best Practice #4: Use Abstract Type Judiciously
The GraphQL specification currently oers two abstract types in the type
system—interfaces and unions. Both interfaces and unions are powerful tools
to express relationships between types in a schema. However, when adding
Best Practice #4: Use Abstract Type Judiciously 45
interfaces and unions to a schema—and in particular, a federated schema—it’s
important to do so with a clear-eyed understanding of the longer-term impli-
cations of managing these types. To do so, we must first ensure that we’re
using interfaces and unions in semantically purposeful ways. Second, we must
help prepare client developers to handle changes to these types as the schema
evolves.
Create Semantically Meaningful Interfaces
A common misuse of interfaces is to use them simply to express a contract for
shared fields between types. While this is certainly an aspect of their intended
use, they should only be used when you need to return an object or a set of
objects from a field and those objects may represent a variety of dierent types
with some fields in common. For example:
interface Pet {
breed: String
}
type Cat implements Pet {
breed: String
extraversionScore: Int
}
type Dog implements Pet {
breed: String
activityLevelScore: Int
}
type Query {
familyPets: [Pet]
}
In this schema, the
familyPets
query returns a list of cats and dogs, with a
guarantee that the
breed
field will be implemented on both the
Cat
and
Dog
types. A client can then query for these types’ shared fields as usual, or use inline
fragments for the Cat and Dog types to fetch their type-specific fields:
query GetFamilyPets {
familyPets {
breed
... on Cat {
extraversionScore
}
... on Dog {
46 Federated Schema Design Best Practices
activityLevelScore
}
}
}
If there was no use case for querying both cats and dogs simultaneously to
return both types from a single operation, then the
Pet
interface wouldn’t serve
any notable purpose in this schema. Instead, it would add overhead to schema
maintenance by requiring that the
Cat
and
Dog
types continue to adhere to this
interface as they evolve, but with no functional reason as to why they should
continue conforming to Pet.
What’s more, the overhead for maintaining both interface and union types is
amplified when dealing with federated data graphs. Where interfaces and unions
are shared as value types across schemas, they become cross-cutting concerns
(which we’ll address further in a later section). Further, interfaces may also be
entities in a federated data graph, so challenging decisions may need to be
made about which service ultimately “owns interface entities and whether the
services that implement them in a schema can adequately resolve all the types
that belong to that interface.
While interfaces are abstract types, they should ultimately represent something
concrete about the relationship they codify in a schema and they should indi-
cate some shared behavior among the types that implement them. Satisfying
this baseline requirement can help guide your decisions about where to use
interfaces selectively in your federated schemas.
Help Clients Prepare for Breaking Changes
Interfaces and unions should be added to a schema and subsequently evolved
with careful consideration because subtle breaking changes can occur for the
API consumers that rely on them. For example, client applications may not be
prepared to handle new types as they are added to interfaces and unions, which
may lead to unexpected behavior in existing operations. From our previous
example, a new Goldfish type may implement the Pet interface as follows:
type Goldfish implements Pet {
breed: String
lifespan: Int
}
The previous
GetFamilyPet
query may now return results that include gold-
fish, but the client’s user interface may have been tailored to only handle cats
Best Practice #5: Leverage SDL and Tooling to Manage Deprecations 47
and dogs in the results. And without a new inline fragment in the operation
document to handle the
Goldfish
type, there will be no way to retrieve its
lifespan field value.
As such, it’s important to communicate these changes to client developers in
advance and it’s also incumbent on client developers to treat fields that return
abstract types with extra care to guard against potential breaking changes.
Best Practice #5: Leverage SDL and Tooling to Manage
Deprecations
Your internal data graph governance group should outline an enterprise-wide
field rollover strategy to gracefully handle type and field deprecations through-
out the unified graph. We’ll discuss graph administration and governance
concerns in-depth in the next chapter, so in this section, we’ll focus on more
tactical considerations when deprecating fields in a GraphQL schema.
GraphQL APIs can be versioned, but at Apollo, we have seen that it is far more
common for enterprises to leverage GraphQLs inherently evolutionary nature
and iterate their APIs on a rapid and incremental basis. Doing so, however,
requires clear communication with API consumers, and especially when field
deprecations are required.
Use the @deprecated Type System Directive
As a first step, the
@deprecated
directive, which is defined in the GraphQL
specification, should be applied when deprecating fields or enum values in
a schema. Its single
reason
argument can also provide the API consumer
some direction about what to do instead of using that field or enum value.
For instance, in our earlier
products
example we can indicate that a related
topProducts query has been deprecated as follows:
extend type Query {
"""
Fetch a simple list of products with an offset
"""
topProducts(
"How many products to retrieve per page."
first: Int = 5
): [Product] @deprecated(reason: "Use `products` instead.")
"""
Fetch a paginated list of products based on a filter type.
"""
48 Federated Schema Design Best Practices
products(
"How many products to retrieve per page."
first: Int = 5
"Begin paginating results after a product ID."
after: Int = 0
"Filter products based on a type."
type: ProductType = LATEST
): ProductConnection
}
Use Operation Traces to Assess When It’s Safe to Remove Fields
Aer a services schema has been updated with new
@deprecated
directives,
it’s important to communicate the deprecations beyond the SDL as well. Using
a dedicated Slack channel or team meetings may serve as appropriate com-
munication channels for such notices, and they should be delivered with any
additional migration instructions for client teams.
At this point, a crucial question still remains: “When will it be safe to remove
the deprecated field?” To answer this question with certainty that you won’t
cause any breaking changes to client applications, you must lean on your ob-
servability tooling. Specifically, tracing data can provide insight into what
clients may still be using the deprecated fields so appropriate follow-ups can
be actioned. GraphQL observability tools such as Apollo Studio will check any
changes pushed for registered schemas against a recent window of operation
tracing data to ensure that a deprecated field rollover can be completed without
causing any breaking changes to existing clients.
Best Practice #6: Handle Errors in a Client-Friendly Way
Given that GraphQL oers a demand-oriented approach to building APIs, it’s
important to take a client-centric approach to handle errors when something
goes wrong during operation execution as well. There are currently two main
approaches for handling and sending errors to clients that result from GraphQL
operations. The first is to take advantage of the error-related behaviors outlined
by the GraphQL specification. The second option is to take an error as data”
approach and codify a range of possible response states directly in the schema.
Choosing the correct approach for handling a particular error will depend largely
on the type of error that was encountered, and, as always, should be informed
by real-world client use cases.
Best Practice #6: Handle Errors in a Client-Friendly Way 49
Use the Built-in Errors List When Things Really Do Go Wrong
The GraphQL specification outlines certain error handling procedures in re-
sponses, so we’ll explore how this default behavior works first. GraphQL has
a unique feature in that it allows you to send back both data and errors in the
same response (on the
data
and
errors
keys, respectively). According to the
GraphQL specification, if errors occur during the execution of a GraphQL opera-
tion, then they will be added to the list of errors in the response along with any
partial data that may be safely returned.
At a minimum, a single error map in the
errors
list will contain a
message
key
with a description of the error, but it may also contain
location
and
path
keys
if the error can be attributed to a specific point in the operation document. For
example, for the following query operation:
query GetUserByLogin {
user(login: "incorrect_login") {
name
}
}
The
data
key will contain a
null
user and the
errors
key in the response can
be structured with a single error map as follows:
{
"data": {
"user": null
},
"errors": [
{
"type": "NOT_FOUND",
"path": [
"user"
],
"locations": [
{
"line": 7,
"column": 3
}
],
"message": "Could not resolve to a User with the login
of 'incorrect_login'."
}
]
}
50 Federated Schema Design Best Practices
Many GraphQL servers (including Apollo Server) will provide additional de-
tails about errors inside the
extensions
key for each error in the
errors
list.
For instance, Apollo Server provides a
stacktrace
key nested inside of the
exception key of the extensions map.
The information inside of
extensions
can be further augmented by Apollo
Server by using one of its predefined errors, including
AuthenticationError
,
ForbiddenError
,
UserInputError
, and a generic
ApolloError
.
Throwing one of these errors from a resolver function will add a human-
readable string to the
code
key in the
extensions
map. For example, an
AuthenticationError
sets the code to
UNAUTHENTICATED
, which can signal
to the client that a user needs to re-authenticate:
{
"data": {
"me": null
},
"errors": [
{
"extensions": {
"code": "UNAUTHENTICATED",
"stacktrace": [...]
}
}
]
}
As a best practice, stack traces should be removed from an error’s
extensions
key in production. This can be done by setting the
debug
op-
tion to
false
in the Apollo Server constructor, or by setting the
NODE_ENV
environment variable to production or test.
Please see the Apollo Server documentation for more information on
handling, masking, and logging errors in production environments.
The detailed error response that is required by the GraphQL specification and
further enhanced by Apollo Server is suicient to handle any error scenario that
arises during operation execution. However, these
top-level errors
that reside
in the response’s
errors
key are intended for exceptional circumstances and—
even with additional, human-readable details in an
extensions
key—may not
provide optimal ergonomics for client developers when rendering error-related
user interface elements.
Best Practice #6: Handle Errors in a Client-Friendly Way 51
For these reasons, the default approach to handling errors is best suited for
things that are truly errors. In other words, they should be used when something
happened that ordinarily wouldn’t happen during the execution of a GraphQL
operation. These kinds of errors could include an unavailable service, an ex-
ceeded query cost limit, or a syntax error that occurs during development. They
are exceptional occurrences outside of the API domain and are typically also
outside a client application’s end user’s control.
Represent Errors as Data to Communicate Other Possible States
Sometimes errors arise during the execution of a GraphQL operation from which
a user may recover or reasonably ignore. For example, a new user may trigger a
mutation to create a new account but send a username argument that already
exists. In other scenarios, certain errors may occur due to situational factors,
such as data being unavailable when users are located in some countries.
In these instances, an
errors as data
approach is oen preferable to returning
top-level errors in a response. Taking this approach means errors are coded
directly into the GraphQL schema and information about those errors will be
returned under the
data
key instead of pushed onto the
errors
list in the
response. As a result, what’s returned in the
data
for a GraphQL server response
may contain data related to the happy path of an operation or it may contain
data related to any number of unhappy path states.
There are dierent ways to describe these happy and unhappy paths in a
schema, but one of the most common is to use unions to represent collec-
tions of possible related states that may result from a given operation. Take the
following example that includes a
User
type defined in an accounts service and
extended to include a suggestedProducts field in a products service:
# Accounts Service
type User @key(fields: "id") {
id: ID!
firstName: String
lastName: String
description: String
}
extend type Query {
me: User
}
52 Federated Schema Design Best Practices
# Products Service
type Product @key(fields: "sku") {
sku: String!
name: String
price: Float
}
type ProductRemovedError {
reason: String
similarProducts: [Product]
}
union ProductResult = Product | ProductRemovedError
extend type User @key(fields: "id") {
id: ID! @external
suggestedProducts: [Product]
}
extend type Query {
products: [Product]
}
Above, the
ProductResult
type is a union of the two possible states of a prod-
uct: it is either available or it has been removed. In the case that a product has
been removed, related products can be presented to users in its place. A query
for suggested products for a currently logged in user would be structured as
follows:
query GetSuggestedProductsForUser {
me {
suggestedProducts {
__typename
... on Product {
name
sku
}
... on ProductRemovedError {
reason
similarProducts {
name
sku
}
}
}
}
}
Best Practice #7: Manage Cross-Cutting Concerns Carefully 53
Because we are queuing a union type, an inline fragment is used to handle the
fields relevant to each union member. The
__typename
field has been added to
the operation document to help the client conditionally render elements in the
user interface based on the returned type.
Through this example, we can begin to see how errors as data help support data
graph consumers in several compelling ways. First, creating a union of happy
and unhappy paths provides type safety for these potential states, which in turn
makes operation outcomes more predictable for clients and allows you to evolve
those states more transparently as a part of the schema.
Second, it also allows you to tailor error data to client use cases. Correspond-
ingly, the requirement to tailor a user experience around error handling is a
good indicator that those errors belong in the schema. And conversely, when
a data graph is intended to be used predominantly by third parties, it would
be impossible to customize error data to suit all possible user interfaces, so
top-level errors may be a better option in these instances.
Of course, there’s no such thing as an error-handling free lunch. Just as with
any union type, clients must be informed of and prepared to handle new result
types as they are added to the union (also reinforcing why this approach can be
problematic when unknown third parties may query your data graph).
Further, the key to implementing errors as data successfully in a schema is to do
so in a way that supports client developers in handling expected errors, rather
than overwhelm them with edge-case possibilities or confuse them due to a lack
of consistency in adoption across the data graph. An enterprises data graph
governance group must play a key role in setting and enforcing standards for
how both top-level and schema-based errors will be handled across teams.
For an in-depth exploration of the errors as data approach, please see the
200 OK! Error Handling in GraphQL talk by Sasha Solomon from GraphQL
Summit 2020.
Best Practice #7: Manage Cross-Cutting Concerns
Carefully
In the previous chapter, we discussed how sharing value types (scalars, ob-
jects, interfaces, enums, unions, and inputs) and executable directives across
subgraphs’ schemas leads to cross-cutting concerns. As a general rule, where
subgraphs share value types, then those types must be identical in name, con-
tents, and logic, or composition errors will occur. Similarly, executable directives
54 Federated Schema Design Best Practices
must be defined consistently in the schemas of all subgraphs using the same
locations, arguments, and argument types, or composition errors will also result.
In some instances, it will make sense for subgraphs to share ownership of certain
types instead of assigning that type to one service and exposing it as an entity.
For example, when a GraphQL API supports Relay-style pagination, it may be
necessary to share an identical
PageInfo
object type across multiple services
that require these pagination-related fields:
type PageInfo {
endCursor: String
hasNextPage: Boolean!
hasPreviousPage: Boolean!
startCursor: String
}
It wouldn’t make sense to expose
PageInfo
as an entity for several reasons,
not the least of which is that there is no obvious primary key that identifies
these objects. Further, the fields in this object type will be relatively stable
across subgraphs and over time, so the likelihood of complications arising from
evolving this type is minimal.
There’s no simple formula for evaluating the overhead added by a single value
type or executable directive in a federated GraphQL API. While they may impact
teams’ abilities to manage and iterate their portions of the data graph because
services may no longer be independently deployable, the long-term cost may
be minimal if the types or directives rarely change. As a best practice, your data
graph governance group should establish internal guidelines about when to
introduce and how to work with value types and executable directives in the
data graph, and drive adoption of new measures in your CI/CD pipeline to help
manage the composition errors may result from these cross-cutting concerns
during deployment.
Summary
In this chapter, we covered a variety of best practices for designing schemas
within a federated data graph. We explored what it means to design a schema in
a demand-oriented, abstract way with an eye for expressiveness. We also saw
how nullability and abstract types can help improve the expressiveness and the
usability of a schema when used strategically.
Next, we saw how the
@deprecated
directive and supporting tooling can help
teams within an enterprise safely evolve schemas and how using both top-level
errors and unions to express a range of possible result states can improve the
Summary 55
error handling experience for clients. Finally, we revisited the importance of
measuring the cost of adding cross-cutting concerns to a federated data graph.
In the next chapter, we’ll move on from focusing exclusively on schema-related
concerns to what best practices for overall data graph administration look like in
an enterprise.
Graph Administration in the
Enterprise
By Michael Watson and Mandi Wise
GraphQL was designed to allow your API to evolve continuously in response
to new product requirements and client developer feedback, and without the
overhead of versioning. In practice, such evolution requires insight into how
your GraphQL API is used so that types, fields, and arguments may be safely
modified without causing breaking changes for clients. With a federated data
graph, extra consideration is needed to ensure when one subgraph changes its
schema that those updates won’t cause unexpected breaking changes for the
other subgraphs that rely on its entities.
A well-managed data graph will help drive adoption and compound the network
eects of the data graph in an enterprise. To realize this potential, there needs
to be an iterative and repeatable process in place for managing and evolving
the federated data graph across teams once the graph has been deployed to
production. That means that each team that owns a subgraph service needs
tooling in place to support its ongoing contributions to the overall graph. These
teams need to be assured that the changes they make won’t break existing
operations sent by clients and that they can evolve their portion of the schema
without breaking the data graph’s composition.
Additionally, new scenarios for data consumption will arise as additional client
teams are onboarded to the graph. As this happens, the right balance must be
struck between flexibly accommodating these changes, maintaining the integrity
of the data graph, and managing access to it. You’ll also want to ensure that
the clients querying the graph identify themselves and that you have some
mechanism for enforcing these usage rules.
As we can see, adopting a federated “One Graph” approach can inspire new lev-
els of collaboration in an enterprise’s development eorts. Having the right tools
and processes in place to support enterprise-scale data graph evolution will
56
Workflows 57
increase the speed at which teams can ship updates to the broader data graph,
and in turn, the user interfaces that power product experiences. Throughout
this chapter, we’ll explore the workflows, developer tooling, observability tools,
and governance practices that have served Apollos enterprise customers in the
management and continuous evolution of their data graphs.
Workflows
Teams that work on components of a distributed GraphQL architecture need
workflows that support the iterative evolution of the data graph, both at an
initial conceptual stage and later when shipping changes that will impact the
composition of the overall graph. At Apollo, we have observed and worked with
customers to develop the following best practices for prototyping and deploying
schema changes.
Prototyping Schemas
Whether embarking on an initial consolidation project or iterating subgraph
schemas in an existing federated data graph, teams need a way to prototype
their type definitions with an understanding of how the types and fields in their
portion of the schema will compose into the broader graph. They also need
a way to experiment with referencing and extending entities from other sub-
graphs. These prototyping exercises are the starting point for all schema design
workflows and they are instrumental in planning for the iterative evolution of a
federated data graph.
At Apollo, we have worked with teams that start this schema design process by
handing a blank piece of paper to client developers and asking them to sketch
out the ideal shape of a query response. Once the shape of the query is defined,
it’s used as starting point for rolling out the necessary service changes to support
that query. There may be subsequent edits made to the dra query based on
unavoidable constraints of the existing data graph structure or the underlying
data sources. Aer some amount of further iteration, the client developers can
test out the new query and the schema update can be deployed.
There are many dierent variations of this schema design process, but one
consistent theme is that there are unknowns that must be addressed before any
change can be safely implemented in a schema that will be served in production.
At a minimum, teams should be able to discover what is currently available in
the schema beyond the subgraph that they own. This discovery helps teams
leverage existing entities in a data graph, avoid duplication, and help contribute
to a more cohesive model of the enterprises data. Further, teams need to
58 Graph Administration in the Enterprise
understand if their proposed changes compose into the overall graph before
investing time in implementation.
To support and enhance our customer’s schema design workflows, the Apollo
solutions team designed the Apollo Workbench VS Code extension. This exten-
sion allows developers to design and test federated schema design by modeling
GraphQL operations and providing feedback about composition errors directly
in VS Code. It also integrates with Apollo Studios schema registry so that all
subgraph schemas may be downloaded for a data graph and modified in a non-
destructive environment. The Apollo Workbench extension may be downloaded
from the VS Code extension marketplace.
Deploying Changes into the Data Graph
Once schema changes are ready to deploy, teams need a workflow for rolling
out the changes to dierent environments and informing other teams of those
changes. This is where a schema registry becomes essential. A centralized
registry will be the source of truth for the enterprises data graph, provides an
overall view of the graph so team members can understand how their portion of
the schema fits into the larger picture, allows developers to safely push changes
to that graph, and can integrate with other developer tooling.
With a centralized registry in place, a gateway can reference the composed
schema directly from the registry and schema change events can be used to
drive updates to the configuration of an Apollo Gateway. In turn, referencing a
schema configuration from a registry allows a federated data graph’s schema
to be updated on the fly and without restarting the gateway service to force
recomposition. Service owners can then incorporate schema pushes into their
CI/CD pipelines once the artifact has been deployed and is ready to serve traic.
Apollo Studio can serve as this centralized registry, and subsequently, unlock
all of the capabilities of managed federation. Managed federation will create
a new gateway configuration every time a schema is pushed to the registry by
one of the subgraphs using the
rover subgraph publish
command from
the Rover CLI, but only does so when those changes can be composed into the
existing data graph without breaking changes for other subgraphs and clients.
Additionally, managed federation allows teams to create dierent variants of the
data graph that correspond to the dierent environments where the graph runs
(such as staging and production). Each variant has its own GraphQL schema,
which means schemas can dier between environments.
Tooling for Data Graph Contributors 59
Try Out Managed Federation
To enable managed federation mode with Apollo Gateway, you’ll need
to sign up for Apollo Studio to obtain an
APOLLO_KEY
and then push
your subgraph schemas up to the registry using the Rover CLI. Aer
adding the
APOLLO_KEY
as an environment variable, you can remove the
serviceList
from the existing
ApolloGateway
configuration, restart
the gateway service, and your data graph will automatically start in
managed mode, now serving the version of your schema that has been
composed and stored by the Apollo Studio registry.
Ultimately, an enterprise will need to have controls in place to manage data
graph contributions and those controls will live in the schema registry. Each
push of an updated subgraph schema to the registry is an opportunity to per-
form validation that flags breaking changes to the overall data graph. At a
minimum, a schema registry should be capable of determining if an update to a
subgraph’s schema can be safely composed back in, and then upon successful
composition, drive the updated schema to the gateway service.
Tooling for Data Graph Contributors
When multiple teams contribute to a data graph, it’s essential to have standards
in place to ensure everyone can contribute to the graph as eectively as possible.
The schema registry is the central point at which these individual contributions
are collected and validated before incorporation into the overall data graph.
To use a schema registry, some tooling is required to add schema checks and
publications into a team’s existing deployment process.
CI/CD Pipelines for Subgraph Services
When a subgraph schema is published to the registry, Apollo Studio runs com-
position validation to ensure that the proposed change will compose with other
registered subgraph schemas. Upon successful composition, a new gateway
configuration is created. However, if a composition error occurs, then the error
state of the new schema is staged in the registry until the subgraph publishes
an updated version of its schema that may be properly composed into the data
graph.
Ideally, a subgraph’s schema registration should be incorporated into a relevant
CI/CD pipeline to automate this process. Further, the automated schema publish
should happen when the subgraph service is ready to serve traic, which would
typically be aer the service completes deployment and passes a health check.
60 Graph Administration in the Enterprise
In a Kubernetes-based environment, it’s common to publish a schema aer the
readiness probe passes. If Apollo Server is used to power the subgraph services,
then you can use the
onHealthCheck
method to implement custom logic to
verify that a service is ready to serve traic (usually to ensure that downstream
data sources are available). Once the health check passes, a CI/CD pipeline can
register the schema.
Schema Validation
Confirming valid schema composition is only half of the battle when guarding
against breaking changes in production—you’ll also need to ensure that schema
updates won’t break existing operations currently used by clients. Nobody
wants to deal with (or be the cause of) production downtime, and it has very real
financial consequences for enterprises. For a large e-commerce application, a
few minutes of downtime can have six-figure implications. Within a distributed
GraphQL architecture that favors continual schema evolution, developers work-
ing on subgraph services should feel confident that they can release changes
without doing unintended damage to client applications. For example, even
something as simple as changing an existing field on an Object type from non-
null to nullable can lead to breaking changes for clients that aren’t prepared to
handle a potentially null value.
This is where a schema registry provides even more value when combined with
observability tools that collect tracing data on the operations performed against
the schema. These tools allow you to perform analysis on a composed schema
to verify if any proposed changes will aect existing traic to the API. Schema
validation is the process of performing static analysis of a schema against a
set of known GraphQL operations for a given window of time. The period that
you check against may need to be large if any mobile clients consume the data
graph (because you won’t have as much control over upgrade cycles as with
web-based clients).
Apollo Studio facilitates schema validation via the Rover CLI using
rover
subgraph check
command. Teams can use this command in their deploy-
ment pipelines to ensure proposed schema changes don’t adversely aect client
traic. If Apollo Studio detects a potentially dangerous change, then it will dis-
play information in its user interface about what the breaking change is, what
clients are aected (by client name and version number), what operations are
impacted, and the volume of traic running against those operations. As a result,
schema validation provides a scalable solution that supports safe, incremental
data graph evolution driven by multiple teams.
Tooling for Data Graph Contributors 61
Schema Design
As noted above, Apollo Workbench is an essential tool for developers making
ongoing contributions to the data graph. It was created to help developers
understand data graph composition and execution details during the schema
design phase in a mocked environment, rather than waiting until implementa-
tion time to discover that a schema addition or update doesn’t compose into the
graph as expected.
A typical Apollo Workbench-driven workflow for a developer updating a sub-
graph schema would begin by downloading the current representation of the
entire federated schema from Apollo Studio into VS Code. A particular sub-
graph’s schema can then be modified and new subgraph schemas may also be
added to test composition with the overall data graph. From within their local
environment, developers can easily see what entities may be referenced or ex-
tended from other subgraphs as they work. And when composition errors occur,
they are displayed in the Problems panel of VS Code. Both newly designed and
known operations (pulled from Apollo Studio) may be tested against iterations
of the composed schema with a view of the full query plan directly in the editor.
For more details on optimizing development workflows with Apollo Workbench,
see the GraphQL Summit 2021 keynote or the Apollo Workbench documentation.
62 Graph Administration in the Enterprise
Tooling for Data Graph Consumers
GraphQL oers a client-centric approach to developing APIs, and this promise
extends beyond designing queries that are purpose-built to meet client devel-
oper needs and drive product experiences. Both Apollo and members of the
GraphQL community have created extensive client-side tooling for web and
mobile to expedite development and help teams ship features faster. In turn,
client developers can maximize the utility of the schema registry by adopting
best practices when sending their requests to the data graph.
Consuming the Data Graph
When an enterprise shis toward a consolidated data graph, client developers
who may have previously juggled multiple GraphQL endpoints or other point-
to-point APIs no longer need to jump through client-side hoops to query all of
the data needed to render a view in an application. That said, with a centralized
schema registry in place, a common set of standards should be established to
structure how clients make requests to the data graph.
At a minimum, there should be basic controls on what operations are executed
against the data graph, and how those operations are structured. Tracing data
becomes far more useful and actionable when clients include their names
and versions as metadata. When using Apollo Client for web or mobile, you
can specify
name
and
version
options that will automatically be translated
into header values sent with every request (specifically, the headers are called
apollographql-client-name
and
apollographql-client-version
,
and they may be set manually for other GraphQL clients). Tools such as Apollo
Studio can then use this operation trace metadata to help service developers
conduct more eective schema validation.
In addition to identifying themselves by name and version, clients should use
named operations for each request to the data graph. For example, at Apollo,
we prepend UI_ to the names of all operations sent from the Apollo Studio web
application. Other teams go so far as to add linting rules to check the structure
of client operations as a part of that client’s deployment pipeline too.
While there isn’t a single “best” way to structure operation names, the important
takeaway here is to establish some operation-related standards, communicate
those standards to teams, and then enforce them. Enforcement of these rules
can take place within the Apollo Gateway, but adding runtime logic on a per-
request basis should be approached with caution when rules may be checked
statically in client code as part of the CI/CD process instead. However enforced,
both client awareness and operation names are essential in providing the
necessary visibility in observability tools to support a field deprecation and
Observability 63
rollover strategy that prevents breaking changes for client developers as the
data graph evolves.
Code Generation
Codegen-related tooling can help facilitate client development for both web and
mobile applications. To support modern, strongly-typed web development, the
GraphQL Code Generator library can be used to generate types for operation
results.
On the iOS side, the Apollo CLI may be used to download the data graph’s
schema and add it to a target’s directory, and then subsequently generate code
as a build step based on the operations saved in
.graphql
files. For Android,
an Apollo Gradle plugin is available to download the schema and generate
type-safe models and code from operations in .graphql files when built.
Additional Tools for Client Developers
Apollo provides additional tools to help support client development, including
the Apollo Client Devtools extensions for Chrome and Firefox, which includes an
embedded GraphiQL IDE along with query, mutation, and cache inspectors. The
Apollo VS Code extension also supports client development with GraphQL syntax
highlighting, operation autocompletion, performance information, and more.
For iOS, Apollo Xcode Add-ons provides syntax highlighting for GraphQL query
document files to Xcode.
Observability
As more services are added to a federated data graph and adoption spreads
across client applications, it may grow challenging to reason about how a single
request traverses the graph. As noted in a previous chapter, Apollo Gateway
undertakes a query planning process to optimize for the most time spent in
a single service to reduce the number of network hops for a single request.
Once calculated, the gateway will execute the query plan across the subgraphs
required to fulfill the request. With managed federation and Apollo Studio,
federated traces may be used to provide detailed insights into the GraphQL
layer’s performance and usage.
Federated Traces
With federated tracing enabled (which happens by default when an
APOLLO_KEY
variable is present in the gateway’s environment), the gateway
will include an HTTP header of
apollo-federation-include-trace: ftv1
64 Graph Administration in the Enterprise
with each request to a subgraph. Each subgraph will then construct its trace
and add this data to the
extensions
of its response. Apollo Gateway then
constructs the overall trace for the request based on the shape of the query plan.
This process may be visualized as follows:
The gateway will send this tracing data to Apollo Studio where it may be used for
schema checks, tuning query performance, and debugging operation errors:
When Apollo Server is used to power subgraphs, this tracing data is provided
out-of-the-box via the inline trace plugin. Many third-party federation libraries
also expose federated tracing data. Note that not all third-party federation
libraries will necessarily provide field-level tracing data in a response, but the
gateway’s aggregated trace will still show the total time spent in the service even
though the detailed field resolver data won’t be available for that portion of the
query plan.
Observability 65
Integration with Other Observability Tools
Many enterprises use additional observability tools to monitor application per-
formance and these tools may also be integrated with a federated data graph.
Apollo Studio connects directly with DataDog and many other integrations are
possible too, either through a custom Apollo Server plugin in the gateway, a cus-
tom
RemoteGraphQLDataSource
, or by carrying specific headers throughout a
request.
For example, when using AWS CloudWatch an
aws-request-id
header must is
included with the request, but by default, Apollo Server’s usage reporting plugin
excludes all headers. However, the usage reporting plugin can be configured to
include this header in its traces as follows:
ApolloServerPluginUsageReporting({
sendHeaders: { onlyNames: ['aws-request-id'] }
})
When tracing data is viewed in Apollo Studio, you will now have the
aws-
request-id
available to help diagnose service-level performance issues in
CloudWatch in relation to the federated traces.
As another example, your team may need to understand the relationship be-
tween an incoming request from a gateway to a subgraph service and the orig-
inal operation name of the request to the gateway. For this case, a custom
RemoteGraphQLDataSource
can be used to include the operation name and
a hashed representation of the original query can be used as the name of the
query to the subgraph:
66 Graph Administration in the Enterprise
class OperationNameForwarding extends RemoteGraphQLDataSource {
willSendRequest({ context: { operationName }, request }) {
if (request?.variables.representations) {
let key =
JSON.stringify(request.variables.representations);
let keyHash =
createHash('sha512').update(key).digest('hex');
let newQuery = request.query.replace(
'query($representations:',
`query ${operationName}_${keyHash}($representations:`
);
request.query = newQuery;
}
}
}
These are just a few examples of what’s possible for application performance
monitoring of a federated data graph with Apollo Studio and additional observ-
ability tools. A member of the Apollo solutions team can work with you to design
custom observability integrations for your enterprise.
Governance
Initial GraphQL adoption oen emerges either from within a single team or a
small number of teams, and at that scale, managing governance concerns may
be handled on an informal basis. However, the move toward a consolidated data
graph requires a more intentional approach. Given the evolutionary nature of a
federated data graph, strong governance practices are needed to help maintain
its integrity while simultaneously driving its adoption across an enterprise.
The establishment of a data governance group is an important factor in the
success of any consolidation project. This group may be thought of as the
“GraphQL Center of Excellence within an enterprise and it should represent a
cross-section of key data graph stakeholders. Ultimately, the governance of a
federated data graph is largely concerned with empowering the people who
will contribute to and consume it with processes that will help them operate
as good citizens of the graph. Once the data graph governance group has been
established, the work largely focuses on setting standards that help maintain
the quality of the data graph, facilitate its continuous evolution, support its
operation, and enforce standards for client usage.
Governance 67
Establishing the Data Graph Governance Group
To set enterprise-wide standards for the graph, a data graph governance group
should be established. This group acts as a cross-team, collaborative governing
body for the data graph. It also establishes best practices related to data graph
maintenance and administration and provides ongoing education for graph
contributors and consumers.
At a minimum, the governance group should consist of one representative from
each of these stakeholder categories (though ideally, each subgraph service and
client team will have representation):
Stakeholder Role Ownership
Executive
Sponsor
Provides approval to help ensure
the prioritization of the project
Owns resourcing for the overall
initiative
Graph Champion Is the driving force behind the
initial consolidation project and is
instrumental in obtaining
executive sponsorship
Owns internal training and
onboarding to the data graph
Subgraph Lead Represents a subgraph service
(usually the team lead and may
also be a Graph Champion)
Owns service boundary resources
Product Manager Helps shape schema design within
service boundaries
Owns the representation of the
service boundary in the data graph
DevOps
Representative
Ensures consistent CI/CD pipelines
for subgraphs
Owns CI/CD pipeline requirements
and underlying infrastructure and
tooling
Client Developer
Advocate
Advocates for client consumption
patterns in relation to schema
design and evolution
(representation from every client
team is not required, though would
be helpful)
Owns data graph consumer tooling
and SDKs (partners with a relevant
Product Manager)
Ideally, the data graph governance group should be formed at the outset of a
consolidation project. If a similar GraphQL Center of Excellence already exists
with an enterprise, its composition should be evaluated to ensure that the key
data graph stakeholders that will be involved in the consolidation project have
adequate representation within the group.
An appropriate meeting cadence for this governance group will vary by orga-
nization needs and the complexity of the consolidation work at hand, though
68 Graph Administration in the Enterprise
in most cases, the group members will likely need to meet on a more frequent
basis at the outset of a consolidation project. Once a federated data graph is
running in production, a regular meeting cadence should still be maintained to
help support graph evolution as well as expanding its adoption across teams in
the enterprise.
Setting Standards for Data Graph Management
Once established, the data graph governance group is responsible for setting
best practices related to the enterprise’s consolidated graph, communicating
and enforcing those practices, and evolving them as needed over time. Gener-
ally, these concerns may be categorized into three main areas: Graph Integrity,
Graph Operation, and Graph Usage. Suggested practices for each area are out-
lined below.
Graph Integrity
Reconciling naming conventions and how entities are conceptualized, refer-
enced, and extended across domains can be a challenging aspect of an initial
consolidation project. These concerns will require ongoing attention aerward
too as the data graph evolves and new subgraph services are incorporated into
it. Documenting naming conventions, guidelines for entity and value type up-
dates, as well as type and field migration workflows helps service owners make
informed decisions about how they can evolve their portion of the schema. The
governance group should also formalize a review process for proposed schema
changes and an architectural review process for adding new subgraph services
before they are incorporated into the broader data graph.
Once changes are made to the data graph, teams that contribute to and con-
sume the API must be informed. Regular rhythms and processes should be
established for synchronously and asynchronously communicating schema
updates to internal teams, especially when rolling over deprecated fields. In
addition, Apollo Studio can be configured to post schema change notifications
directly in a Slack channel, and it also exposes a schema change webhook for
general use with other services and tools.
Graph Operation
Having the right observability tools in place is a key factor in maintaining
smooth graph operations and minimizing mean time to recovery when some-
thing unexpected happens. As previously discussed, federated traces provide
insight into API usage at the field, operation, and client levels in Apollo Studio
and this data can be used to tune performance, debug errors, and support a safe
Governance 69
rollover strategy for deprecated fields (and performance reports and alerts can
also be pushed directly to a Slack channel from Apollo Studio).
The governance group should proactively establish performance best practices
that support data graph operation. For example, caching may happen at various
levels in the stack—from the normalized cache in Apollo Client, to Automatic Per-
sisted Queries that support edge caching with CDNs, to full response caching at
the gateway or subgraph level—and service owners and client teams alike must
be aware of what the standard practices are within the graph. Similar guidelines
may be provided for areas such as using data loaders to batch requests to under-
lying data sources, providing minimal unit test coverage for subgraph resolvers,
and running automated performance tests for known operations.
Graph Usage
As previously discussed, it’s a best practice for clients to identify themselves
by name and version before querying data from the graph, and clients should
also assign names to all of the operations they send to the API. Ideally, these
operation names are defined using a shared naming scheme, so it will be the
role of the data graph governance group to set, communicate, and enforce these
naming standards.
Additionally, the governance group may wish to set standards for using query
variables instead of literals as operation arguments. This measure will help
minimize operation cardinality and take advantage of some of Apollo Studio’s
reporting optimizations in the gateway. And to guard against potentially abusive
operations, the governance group may also put appropriate mechanisms in
place to limit query depth, breadth, and overall cost.
Onboarding and Supporting Teams
When it comes to driving adoption of the data graph in an enterprise, one of the
most important functions a governance group can serve is supporting teams
as they are onboarded to the data graph, both as contributors and consumers.
Similarly, the data graph governance group can also help support ongoing
education through the establishment of an enterprise-wide “Community of
Practice” for the unified data graph, and GraphQL in general.
For an extensive list of links to additional resources to help support internal
GraphQL training at your enterprise, please see Appendix B.
70 Graph Administration in the Enterprise
Summary
In this chapter, we covered several important topics related to federated data
graph administration. We first described workflows for prototyping schema
changes and how managed federation allows subgraph service owners to deploy
updates without the fear of introducing breaking changes to graph composition
or existing client operations. We then explored tooling that supports both data
graph contributors and consumers.
We also saw how federated traces and observability tools assist with monitoring
data graph performance and evolution. And lastly, we discussed the importance
of establishing a data graph governance group based on a representative cross-
section of team members, and how the work of this group helps maintain the
integrity of the graph, support its continuous evolution, and drive adoption by
helping onboard new teams to the graph and supporting GraphQL education
across the enterprise.
Appendix A: Federation Case
Studies
The following is curated list of case studies from enterprises that have adopted
a federated approach to GraphQL consolidation with the support of various
elements of the Apollo platform.
Adobe
This post details how Adobe Experience Platform engineering uses GraphQL with
over 40 internal contributors across 40 API endpoints at Adobe to improve their
agility and velocity:
GraphQL: Making Sense of Enterprise Microservices for the UI
Netflix
This series of blog posts outlines how Netflix uses a federated approach to
GraphQL to power its Studio API:
How Netflix Scales its API with GraphQL Federation (Part 1)
How Netflix Scales its API with GraphQL Federation (Part 2)
RS Components
This series of blog posts outlines how RS Components adopted a transitionary
architecture to facilitate its move to a federated GraphQL API, and with an eye
for how to to scale its eorts in the future:
Schema Services: Transitioning Towards a Federated Architecture
The Evolution of GraphQL at Scale
71
72 Appendix A: Federation Case Studies
StockX
From scaling to developer velocity to documentation, this blog post outlines
what key insights the StockX team had in their journey adopting federation:
9 Lessons From a Year of Apollo Federation
Walmart
This post outlines how the Walmart Customer Experience team migrated away
from REST-based orchestrators to use Apollo Federation for enhanced perfor-
mance and developer ergonomics:
Federated GraphQL @ Walmart
Appendix B: GraphQL and Apollo
Learning Resources
In addition to the content in this guide, the following resources will be helpful for
teams adopting GraphQL and Apollo Federation:
GraphQL
Learn GraphQL with Apollo
Learn - graphql.org
GraphQL Specification
Principled GraphQL
Apollo Client
Android
Apollo Client Android Docs
Tutorial - Apollo Android SDK
iOS
Apollo Client iOS Docs
Tutorial - Apollo iOS SDK
React/JS
Apollo Client React/JS Docs
Apollo Client React/JS Roadmap
Migrating Your React App to Apollo Client 3 (video)
Configuring the Cache
Demystifying Cache Normalization
73
74 Appendix B: GraphQL and Apollo Learning Resources
Local State Management with Reactive Variables
Apollo Server
Apollo Server Docs
Apollo Server Roadmap
Apollo Federation
Apollo Federation Docs
The Architecture of Federation (video)
Migrating from Schema Stitching
Third-party Libraries that Support Apollo Federation
Apollo Studio
Apollo Studio Docs
Managed Federation Overview
Sending Metrics to Apollo Studio
Schema Checks
Rover CLI Docs