Firehose++

As the Engineering Manager of the Backend Crew, Jonathan Cone leads the team responsible for FlightAware’s primary customer-facing APIs & individual feed processors.

FlightAware offers two major APIs for its customers, AeroAPI and Firehose. AeroAPI is a query-based request/response API while Firehose is a socket streaming API. Firehose began under a different name for a specific customer in 2013 to relay position data in real-time to their aircraft management system. In July 2014, the product was rebranded as Firehose and made available to a broader audience of FlightAware customers with flight information (FLIFO) messages included at the end of that year. As the product grew, a need arose to increase the product's performance, and significant changes were made from the end of 2017 through late 2018. This post discusses the motivation for those changes, the technical challenges, and lessons learned.

The Opportunity

Firehose emits JSON messages consisting of surveillance data (positions) and flight information (FLIFO) with multiple (on the order of 500 to 1000) messages per second during live operation. In 2017, a requirement arose to have Firehose replay historical data at a minimum of 60x real-time. This meant that if a customer connected and requested an hour of historical global flight data, that data should be delivered in one minute or less. The original implementation could only achieve ~8x real-time, so substantive changes would be necessary to meet this requirement. New customers also desired higher SLAs for the product, but Firehose lacked a substantive test suite or any Continuous Integration (CI) or Continuous Deployment (CD) pipelines.

Firehose Overview

FlightAware consumes data from over 40 different sources, including its ADS-B networking of over 30,000 receivers, and writes normalized feed data into tab-separated value (TSV) files. These feeds are then combined and processed by Hyperfeed, FlightAware’s flight decision engine, to arrive at a canonical view of global flight data. Hyperfeed emits data to TSV files, called controlstream, with the results of its evaluations and this data is the input to the Firehose API. Each line in the TSV contains key/value pairs with relevant flight data for that message. FlightAware uses a variety of means for storing and moving around flight data, but ultimately the controlstream TSVs are the canonical source of truth and are preserved in perpetuity. When a user connects to Firehose, the following occurs to serve them live (or historical) flight data:

Authenticate the user
Load the user’s permissions
Process the user’s initiation command
Start a controlstream reader at the desired point in time
Process each controlstream message
a. Check that the message is appropriate for the user (pedigree evaluation)
b. Check that the user has permission and requested to see this message type (rules evaluation)
c. Apply any other rules (e.g., rate limiting, airline enforcement)
d. Serialize the message to JSON

First Steps

The original implementation of Firehose was written in the Tcl scripting language. Much of FlightAware’s codebase at the time was in Tcl, including the libraries and packages used to access data at FlightAware. The first question that arose was whether our Tcl packages could be used to read raw data at the 60x replay requirement. We decided to break the problem into manageable chunks, the first being addressing the speed at which we could read the raw data and transform it into a data structure for subsequent processing. It turns out Tcl is capable of reading a file fairly quickly. As an example, consider an hour of raw flight data locally that we want to read synchronously:

set fp [open "flight_data.tsv" r] ;# open the file
while {[gets $fp line] > -1} {} ;# no-op read
close $fp ;# close the file

The no-op read happens quickly, and on one of our dev servers I found the replay rate to be >150x. This isn’t that surprising since the file operations here are little more than wrappers for the system “open” and “read” calls. Firehose needs to do more than just read and dump the data, so we expand our simple experiment to include transforming the TSV data into a Tcl list.

set fp [open "flight_data.tsv" r]
while {[gets $fp line] > -1} {
    set raw_data [split $line \t] ; # Split each line into a Tcl list
}
close $fp

We now find the replay performance to be ~30X, falling well short of our desired target. That split results in multiple memory allocations for each element of the list, so it is not surprising that the performance is now suffering. Given that we don’t have a way to control the allocations in Tcl easily, it seemed unlikely we could improve the read performance sufficiently for our use case.

Stepping into C++

We decided to explore language alternatives at this point and were particularly interested in C++. One of our principal developers had created a C++ library to read FlightAware’s TSV data files, and this served as a launching point for the exploratory work. The early work focused on reading data locally and chopping each line into its keys and values. To achieve the necessary performance, allocations needed to be limited and ownership of the underlying block of data handled responsibly. We were using C++14 at this point in the process and developed a simple class to store views of the underlying data that represented lines or various fields without anything heavier than a pointer allocation:

class FAString {
public:
    FAString();

    FAString(const char * bytes, ssize_t start, ssize_t end);

private:
    const char * bytes = NULL;
    ssize_t start = 0;
    ssize_t end = 0;
};

For those familiar with C++17, you may be reminded of std::string_view (or boost::string_view). This class achieved the same result, storing a pointer to the underlying data and a start and endpoint for the value of interest. The code that splits the TSV line is pretty standard fare, and with that in place, we were able to read data and split lines into key/value pairs in a performant way.

Firehose needs to pull information from a discrete set of keys in each message to evaluate rules and serialize the output.We evaluated how to make those key/value pairs available in a performant way. During the key/value parsing, each key/value becomes an entry in a std::vector, which isn’t a convenient container for accessing a specific key. This view class didn’t have all the specializations included to allow it to be used as a key in a std::unordered_map, and we were wary of using FAString as a map key since the map would not have ownership of the underlying data. Instead, we opted to create a class with data members for the controlstream keys and use switch statements to quickly parse the key/value into its appropriate class member. Since the switch statement acts as a jump table (roughly equivalent to using goto), the insertion performance was as good or better than std::unordered_map.

Here is a simplified example of that class:

class ControlStreamMsg {
    long _hc;
    FAString ident;

    ControlStreamMsg(const std::vector<std::pair<FAString, FAString>> &keyValues) {
        for (auto &[key, value] : keyValues) {
            switch (key[0]) {
                case '_':
                    if (key == "_hc")
                        _hc = atol(value.data());
                    break;
                case 'i':
                    if (key == "ident")
                        ident = value;
			break;
            }
        }
    }
}

In this case, we switch on the first character of the key (validation that the character exists is omitted here) and then use an if/else block within each case statement to handle the value assignment. We prefer for the individual case statements to consist of fewer than 4 if/else statements for the final matching, so in instances where the first character does not resolve to 4 or fewer keys, a nested switch statement is applied on the second character.

class ControlStreamMsg { 
    ... 
    switch (key[0]) { 
        case 'a': 
            // Many keys start with 'a' 
            switch (key[1]) { 
                case 'c': 
                    if (key == "actualArrivalRunway") { 
                        actualArrivalRunway = value; 
                    } else if (key == "actualDepartureRunway") { 
                        actualDepartureRunway = value; 
                    } 
                    break; 
    ... 
};

These first experiments with reading flight data TSVs were reviewed, tested, and formed the foundation of a high-performance C++ TSV reader we named copystream. The actual reader logic and controlstream parser were refactored into a library, falcon, for reuse in other C++ applications. This first iteration of a C++ reader/parser achieved read rates of at least 250,000 lines per second, which is performant enough for the overall replay rate requirement (60x). Of course, this doesn’t entirely solve our problems because operations downstream of the reading / parsing still need to keep up with that read rate.

Moving Toward a Full Rewrite

At this point, we were not committed to rewriting all of Firehose in C++, but rather began integrating the high-performance C++ reader into the existing application. Using the C++ reader/parser, Firehose replay performance increased by about 20%, giving us replay rates in the 10x – 12x range. That was certainly an improvement but still far from the 60x replay rate requirement. Using tclgdb we continued to profile Firehose to identify bottlenecks. Previously, we had seen significant time being spent on reading and allocation of lists in that process. Using the C++ reader, that was no longer the greatest bottleneck, but the allocations of lists when moving back into Tcl were still an issue. The other major consumers of CPU time were pedigree evaluations and JSON serialization.

FlightAware consumes data from a variety of sources. Some of those sources are public and available to any user, while others are privileged and require special permission. Each message in controlstream has fields specifying the type of sources for that message. When a flight contains both public and privileged data, controlstream will contain a view of that flight for each access type. We refer to these source descriptions as pedigrees, and each user has an access control list describing the pedigrees they are permitted to view. For example, a flight that is based entirely off FAA radar and public FLIFO data would be assigned a public pedigree. Suppose during the operation of the flight we obtain position data from Aireon, our space-based ADS-B provider, or another satellite source. In that case, the flight will have two views with one containing both the public and privileged data and another just containing public data.

Firehose evaluates each flight message to determine if that user is permitted to see this message and if this message represents the best data available for that user, based on the pedigree (view) of the flight and the user’s permissions. These pedigree evaluations were done entirely in Tcl and were a performance bottleneck for reaching our 60x target. Therefore, another team at FlightAware embarked on the process of rewriting the pedigree evaluation library in C++. The new C++ pedigree evaluation library, pepper, solved this problem and yielded over 100,000 pedigree evaluations per second. At that time, the volume of controlstream was on the order of 1,000 messages per second, so pepper evaluated pedigrees at > 100x real-time. The library can be used either directly by a C++ application or a Tcl application using a cpptcl interface.

We returned to the Tcl implementation of Firehose and instrumented it with the pepper pedigree evaluation library for another round of profiling. Progress—pedigree evaluation was now so fast it was hard to find on the CPU time heat map, but the allocations, when moving data into Tcl and JSON serialization, kept the replay performance below 15x.

Now, the only substantive parts of Firehose running in Tcl were just the rules evaluation and JSON serialization. It seemed unlikely we would realize the gains we needed continuing to work in Tcl, so we decided to rewrite the remaining core functionality in C++.

Committing to C++

With the decision to move the remainder of Firehose’s core functionality into C++, we still opted to leave the user authentication and other startup code in Tcl. This could be accomplished using cpptcl to create a Tcl interpreter and interact with it during the startup phase of a Firehose connection. That code was battle-tested already, and moving it into C++ would just introduce the risk of creating new bugs. Additionally, the initialization and authorization code is run once during startup and completes within a millisecond so there isn’t significant performance improvement to be found there. The core functionality that we needed to rewrite in C++ consisted of rules evaluation and JSON serialization.

One problem was that Firehose lacked sufficient testing, in particular integration testing, for us to have confidence that a rewrite would not cause regressions or introduce new bugs. Therefore, in coordination with QA, one of our team members began writing an extensive integration suite against the existing Firehose implementation. The goal was to run those same tests against the new implementation to ensure that Firehose would still produce the same outputs for a given set of inputs. This endeavor was undertaken in parallel with the remaining rewrite work and gave us sufficient confidence to move forward.

The first attempts at writing Firehose in C++ involved rewriting the JSON serialization. We knew from previous profiling work that the serialization was a significant bottleneck, whereas rules evaluation was less of an issue. However rules evaluation and message serialization are tightly coupled operations, so both needed to happen in the C++ codepath. The first experiments expanded the copystream program into a mini Firehose application. We tried out a number of different JSON libraries from YAJL to RapidJSON to Nlohmann JSON. The interfaces for Nlohmann and RapidJSON reflected more modern C++ styles that we liked, but ultimately the performance of YAJL in our use case far outstripped the others, so we selected it for the JSON serialization. With a YAJL implementation in place to assemble the JSON messages produced by Firehose in the single-threaded copystream, we had now reached our goal. The replay performance was around 60X real time; however, we still lacked the rules evaluation logic and this did not leave much headroom for future growth. Profiling the application revealed that most of the CPU time was spent on serialization, which is work that can be done in parallel across multiple cores.

Moving from the expanded copystream back to the Firehose rewrite, we decided on a multi-threaded approach for the pedigree evaluation, rules evaluation, and JSON serialization. JSON serialization is the most computationally expensive operation in Firehose and while a single-threaded approach achieved 60x replays, it barely managed that. By splitting the serialization into separate threads we gave ourselves headroom for future feed volume growth. Each message Firehose evaluates from controlstream contains a unique Flight ID that is immutable for the duration of a flight. That field can then be used to hash work into workers, and Firehose uses a CRC32 checksum of the Flight ID to “bucketize” work. When the main thread starts up, it performs the SSL handshake, authenticates the user and parses their initiation command. It then initiates streaming by creating threads for the workers and writer and asynchronously reading data. As data is pulled off that asynchronous thread, it hashes the line and passes that data off to the specified worker. Each worker then performs the pedigree evaluation, evaluates the rules associated with that connection and user and finally serializes the message if it is accepted. Workers then pass the original line and any serialized messages back to the main thread for processing. The main thread, after dispatching a chunk of lines to work, collects the results back in the same order which they were dispatched, using an std::unordered_map and std::list, and hands-off any messages with serialized content to a writer thread which handles putting that serialized data onto the TCP socket.

The brevity of the previous paragraph belies the amount of work that went into producing the Firehose multithreaded application. In fact, it took weeks to work out all the details of that implementation. One concern was how we could ensure that we didn’t allocate data as we passed it between threads. Our first attempt involved a class that would hold (own) both the original line and any serialized data produced by the worker. The first iteration of this class stored the data as a std::shared_ptr<> and copied the pointer when handing it off to the std::deque used for passing data between threads. This turned out to be less than ideal, as the reference counting for shared_ptr can be slow when used in a multi-threaded environment, and it made it unclear who the actual owner of the data was at a given point in the processing. In later improvements, we converted to use std::unique_ptr<> for each of the lines being read and std::move them along the path of execution, more accurately reflecting where ownership of the data lies at a given point in time and increasing performance by removing the reference counting. The first implementation also had workers passing their completed work items back to the main thread as soon as an item was completed. This resulted in significant contention on the locks for the deque passing data back to the main thread. Since the work is batched, it was faster to let the worker finish all items in its batch then pass all of the completed items at once.

The ultimate result was that the C++ implementation of Firehose replayed data at 180x or greater real-time, far exceeding our goal. This has proved beneficial as the volume of data in FlightAware’s feed each day has continued to increase. At the time development of Firehose in C++ was underway, a typical day of controlstream was ~70GB, while today that size has grown to ~180GB. Even with the larger data volume, Firehose continues to see >120x real-time replays for customers. The integration tests that were developed as part of this process also found regressions in the new implementation and allowed us to resolve those before launch and resulted in a relatively painless transition to the new application.

Lessons Learned

For applications where performance is critical, having the ability to profile the running code is paramount. One item we struggled with in our early evaluations of the Tcl implementation of Firehose was an inability to profile the application and gain insights into what was eating up CPU cores and needed to be improved. I mentioned previously that we used tclgdb for some of that work; in fact, tclgdb was developed to facilitate this type of profiling as part of this work. For C++ applications there are a variety of profiling tools, but it can be daunting at first to understand how to use them and interpret the results. Much of our initial work on Firehose was done on our Macs in XCode. It turns out that XCode is a decent C++ IDE, and the Mac includes a profiler that’s easy to run and understand. We used that profiler to guide much of our work in both the early days of development and today. For example, during the initial development, I found that suddenly our performance decreased following some changes to replay rates of <20x. Using the profiler, I quickly identified that a typo in the CRC32 function used to hash work was resulting in each line being passed by value. That surge in allocations resulted in a significant drop in performance, but it was found quickly and resolved.

I mentioned earlier that we introduced a class with similar functionality to std::string_view. As C++17 became available in all of our environments, we switched to using that STL container. The FAString implementation worked for our initial use cases, but switching to the STL implementation opened up new possibilities, since their implementation was supported by other STL containers and had specializations we had not, nor were likely to, implement.

Sufficient test coverage for an application is critical for any rewrite and also for future enhancements. The test suite for Firehose, which is now run automatically on any pull requests, merges to master, or tagging of version, has continued to grow. For any bugs that are found, we add tests to confirm the fix and prevent future regressions. Had we not spent the time to write up tests against the old application, we would not have had much confidence in the new implementation and likely would have introduced both new and old bugs into the product.

The Takeaway

Deciding to rewrite an application, particularly one that is part of a company’s core infrastructure, is not a decision to take lightly. The odds of rewriting an application of any non-trivial complexity without introducing bugs or changing behavior in subtle ways are next to none. We ultimately made that decision in this case, but it was decided methodically and gradually. In many other instances, we have been able to simply refactor or replace components to achieve the desired goal. One key to ensuring such an endeavor is successful is providing adequate testing around your application’s interfaces. Whenever possible, use popular and stable open-source libraries or core language functionality instead of rolling your own solution. Those libraries are likely in use by many more users and have addressed issues you are unlikely to encounter right away in your own implementations.

All Firehose customers now use the C++ interface for the streaming API (although they are probably unaware of the change), and the product has grown substantially over the past three years. The ancillary work we did as part of the Firehose rewrite has helped us make robust and efficient changes in a multitude of other applications at FlightAware. The Firehose application now has Continuous Integration and Continuous Delivery pipelines for its build, test, and deployment. The application is containerized and runs in docker. Our work to improve Firehose continues today as we increase the SLAs available to customers and offer new features. The groundwork we laid shows how careful planning and testing allow us to make dramatic changes to our software while providing reliable service to our biggest customers.

Angle of Attack

Firehose++ - The Evolution of a High-Performance Streaming API

Jonathan Cone