Garrett McGrath, a Staff Engineer in FlightAware's Aviation Insight group, is responsible for the performance, reliability, & observability of multi-machine Hyperfeed & an increasing constellation of services that consume Hyperfeed's output.
The FlightAware Multiverse
Theoretical physics has gifted the popular imagination the intriguing notion of the multiverse: a collection of parallel, divergent universes that together constitute reality. At FlightAware, we have a similar situation with the flights we track. FlightAware’s picture of global flight activity emerges by fusing together information from dozens of disparate data sources. Each of these sources provides distinct tracking data and, crucially for this discussion, each data source can have its own level of confidentiality or visibility. Put simply: not every data source can be seen by everyone. Some data sources, like FlightAware’s network of terrestrial ADS-B receivers, are visible to everyone without exception. But others, like Aireon’s space-based ADS-B data, have a restricted, limited distribution. Having so many data sources is an enormous asset in terms of flight tracking quality, but it also poses a technical problem. How do we track a single flightplan with multiple access levels, restricting privileged data from unauthorized users while preserving the full picture of a flight for those with elevated permissions? FlightAware solves this problem with its version of the multiverse: by maintaining multiple versions of the same flightplan simultaneously. Rather than tracking a single flightplan, FlightAware tracks a family of flightplans in tandem. To achieve this, we employ two core abstractions: provenance and pedigree.
Provenance and Pedigree
Flight tracking at FlightAware occurs in a program called Hyperfeed, which manages the flightplan multiverse. When processing an input message, Hyperfeed performs several high-level steps. First, it normalizes and validates the data. As part of this data preparation, Hyperfeed assigns every input message a provenance. In Hyperfeed’s domain, a provenance is a textual tag indicating how widely the input data can be shared with FlightAware’s customers. The specific textual tags used are kept internal to FlightAware’s systems; no end user deals directly with provenance. Once the input has been prepared for ingestion and a provenance has been assigned, Hyperfeed consults its current state to find the flightplan for the input message.
If no flightplan can be found, a new one is created. Only considering for now the case where Hyperfeed creates a new flightplan, a key part of flightplan creation is the assignment of a pedigree. Every input message gets a provenance; every flightplan gets a pedigree. A flightplan’s pedigree contains all the provenances used to track it. Initially, a flightplan only has a single provenance in its pedigree, but things get more interesting when additional provenances come into play.
In the converse situation, where Hyperfeed finds an extant flightplan for an input message, it first checks if the provenance of the message exists in the pedigree of the flightplan. If it does, then it uses the input message to update the flightplan accordingly. If it does not, then before Hyperfeed updates the flightplan, it undergoes a process called forking where it splits the universe of the flightplan into parallel, divergent paths. Forking has some nuances outside the scope of this post, but in every case, Hyperfeed maintains a parent flightplan whose pedigree contains all the provenances ever used to track the flight and creates some child flightplans, each one containing a subset of the provenances in the parent flightplan’s pedigree. Every time forking occurs, the parent gets a new provenance in its pedigree and adds at least one new child flightplan. In this way, the tracking of a flightplan family evolves over time as more provenances get incorporated.
Flightplan Family Pedigree Evolution
To illustrate this process, let’s consider an example: the addition of two new provenances to a parent flightplan’s pedigree. Initially, our example flightplan has a single fork in its family: the parent fork with a lone provenance, called A, and a pedigree consisting of just that provenance. When an input message with a new provenance of B comes along and gets matched to this flightplan, forking occurs and results in two new members of the flightplan family. The parent flightplan’s pedigree changes at this point from A → A B. Of the two new family members, one child has a pedigree of A, and another child has a pedigree of B.
Assume now that an input message adds an additional provenance of C. This updates the parent’s pedigree and creates four additional child flightplans. Starting simply, after adding C to the parent flightplan, its pedigree becomes A B C. Previously the parent had a pedigree of A B; that pedigree persists, but now in a child flightplan rather than as the parent. The child with a pedigree of A similarly sticks around, and a new child is created with the pedigree A C. Similarly, the child with a pedigree of B persists, with a new child created for the pedigree B C. Lastly, a new child flightplan begins with a pedigree of C.
Speaking generally, the addition of a provenance to a flightplan family creates a flightplan with the newly added provenance as its pedigree, plus each existing member of the family is copied, including the parent; one instance of the copy gets the new provenance added to its pedigree while the other version of the copy retains its previous pedigree. Forking a flightplan family with N members yields a new family with 2N + 1 members. In the previous example, the family size grew with each provenance added to the parent’s pedigree from 1 to 3 to 7 members.
Pedigree Access Control
Forking and maintaining the FlightAware multiverse happens within Hyperfeed, but this is only part of the story. The other part is what happens with Hyperfeed’s output, i.e., FlightAware’s picture of global flight activity, which is called controlstream. Controlstream is meant for external consumption: once it leaves Hyperfeed it becomes, for example, the data seen on the website, the source of API offerings, or an input to ML models for predicting flight ETAs. However, controlstream can only be consumed and presented to FlightAware’s customers and users in a way that respects pedigree. Respecting pedigree means that every consumer of controlstream sees only their permitted portion of the flightplan multiverse. To accomplish this, every controlstream consumer has a pedigree access-control list (PACL). For controlstream consumers without a FlightAware account, e.g., a non-logged-in user of the website, a default PACL is used that only shows publicly distributable data. Otherwise, a PACL gets assigned at FlightAware account creation time. Any presentation of controlstream, then, necessitates evaluating a consumer’s PACL against the pedigree(s) of a flightplan family. Evaluation of a consumer’s PACL determines whether a flightplan family is visible, and, if so, which family member’s pedigree has the most provenances with respect to what the PACL permits.
Evaluating a PACL sounds relatively straightforward, but there is an additional wrinkle that adds complexity. A PACL does not simply contain the provenance that a controlstream consumer can see, it also supports additional constraints on any given provenance in a PACL. Namely, a specific provenance can be further limited to a restricted list of callsigns or tail numbers and airports. Furthermore, the permission to view a given provenance for a PACL can also have a date range attached to it. This allows FlightAware to express restrictions for controlstream consumers like the following: consumer U can see the Aireon spaced-based ADS-B provenance only for tail numbers verified to belong to U and only for flights starting on a specific date, or consumer X can see Eurocontrol data but only for flights originating or destined for Heathrow airport in the year 2020. Supporting these sorts of additional restrictions adds enormous power to the pedigree system of data access. This system enables FlightAware to granularly carve up the flightplan multiverse into finer-grained parallel universes tailored for very precise use cases.
In case anyone wants to write in with a correction, please, ahead of time, excuse the sloppiness of my characterization of the multiverse. The intention here is to capture a colloquial understanding of the concept, not that of professional physicists. ↩︎
Confidentiality is used here in the computer security sense which means “the privacy of information, including authorizations to view, share, and use it.” ↩︎
While flightplan is an oft-used term, in the context of this blog it means the intention of a plane to move at a certain time from one location on earth to another location on earth at a different time. Each flight page on FlightAware’s website or mobile app, for example here or here, represents a flightplan. ↩︎
Hyperfeed keeps its state, which includes all the flightplans it is currently tracking, along with any positions for those flights and a host of additional supporting information, in a PostgreSQL database. ↩︎
When updating a flightplan family, Hyperfeed only updates the members with a pedigree containing the provenance of the input message. ↩︎
Most of the nuance not covered in detail involves special case handling for the most permissive provenance, which happens to improve performance. Forking behaves somewhat differently when the most permissive provenance is seen for the first time for a flightplan. This special case handling, though, only adds more detail without invalidating the discussion herein. ↩︎
A FlightAware account’s PACL can change over time depending on the addition or removal of premium features; a user’s PACL is not immutable. ↩︎
For those wondering, the pedigree-based access control mechanism does not enforce the block list. That is also a key part of displaying the contents of controlstream to FlightAware’s customers, but it is handled separately. Hyperfeed does not have any notion of blocked idents, so the access control decisions for it happen entirely in any application or presentation layer used to display controlstream’s data. ↩︎