Angle of Attack

Blast From the Past: Driving Reliability at FlightAware

Mon, 02 Mar 2026 13:09:08 -0600

Nearly six years ago, I wrote a post outlining how FlightAware was embarking on a journey to implement Site Reliability Engineering. The post focused on the transformation of our incident response processes to bring structure to the response as well as implement effective postmortems. Today, I can happily state that we continue leveraging these processes and they have served us well. While there are occasional hiccups and incidents, we have learned and improved as a result of these processes. We've been able to incorporate change into our design patterns, services, and infrastructure to bring higher resilience based on lessons learned. Incidents also have more structure, ensuring there is coordination and clear communication despite any pressure caused by service disruptions. In short, embracing this and other SRE practices have helped raise the bar and deliver better products to customers.

So now, let's look back on that post.

~ Sean Kelly

Recently at FlightAware, we began embarking down the path of converting our traditional IT Operations team and practices into Site Reliability Engineering. In this post, we'll take a quick look at what SRE is, why we chose to go in this direction, and how this journey has changed our incident response processes for the better.

Introduction

First, we should cover the basics and make sure we’re clear on what Site Reliability Engineering (SRE) even is. It is a discipline where software engineering practices are used to solve what would otherwise be traditional Operations problems. For example, an SRE might automate the testing, deployment, validation, and rollback of a software deployment. In the extreme opposite case, an Operations engineer may instead login to a server, run an installer by hand, and then call the developer if it breaks. This is an extreme example, but it is meant to illustrate a sort of mindset difference in problem solving.

SRE is still fairly new, so many organizations play fast and loose with what it means to them or which pieces they apply. At FlightAware, we’ve opted to model SRE using the Google SRE methodologies they write about in their book, but within reason. We have fewer servers and less cloud, so some aspects can be harder to apply while others may just need some scaling down. Some of our practices were already aligned with the SRE approach, but officially adopting it has brought more intent and direction to our trajectory.

Incident response is another important piece of SRE. It concerns itself with both the incident itself and the postmortem process afterwards. Google writes about incident response in their book, and other organizations like PagerDuty and Atlassian also discuss it in length. This increased structure around response to problems is the aspect of SRE we’ve embraced the most thus far.

But Why?

As FlightAware has grown, the way customers use our services has also grown. We now have customers that use us as a key function of their operations. They want increasingly higher reliability since our service disruptions can have a significant impact to them. Our uptime has always been quite good, but we’ve never had processes or practices in place to hold us to the high standard as teams got larger. What was easy to do through institutional knowledge with a team of ten becomes a lot harder with 12+ teams of varying sizes and focus areas. You need a way to spread best practices within a growing Operations team as well as continuing to push the bar higher.

When I joined FlightAware in 2012, I was hired on as the IT Operations person. While my past is very Ops-focused, I also have experience writing software and approaching problems from a programming perspective. This is true for many of the folks making up our IT Operations team. SRE is a developer-oriented approach to Operations problems, so we were well suited to pivot the existing team to this new approach. FlightAware is a software company, and this allowed us to embrace that existing talent in the team without having to re-hire or extensively re-train.

Given all of this and its increased prominence in the industry, SRE was an obvious choice for us.

Houston, We Have an Incident

Before adopting SRE, we had already begun to work on improving our incident response processes. However, the methodologies laid out in various SRE texts gave us much more guidance and direction for making incidents better. Historically, in an outage, the on-call would get a robocall and dig in. If they got stuck, they’d escalate to colleagues. After it was all over, we often did brief write-ups on what actually broke. That was about the extent of it from start to finish. Much of this was fine when the company was small but started to falter over time especially as both internal and external stakeholders wanted more of an understanding as to what broke and how we were working to mitigate it.

Before we go any further, I’d like to set some context for future examples in this post. You may know FlightAware as a website for tracking flights. While this is absolutely true, we also have an entire portfolio of other products and services. One of our products is called Firehose. Firehose streams JSON-encoded flight events to TCP connected clients, allowing real-time ingestion of events into customers’ systems. Put more simply, Firehose sends messages like departures, arrivals, and flight positions to customers so they can integrate FlightAware into their products.

Throughout this post, I’ll be abusing Firehose by running it through all kinds of failure scenarios to illustrate how we handle incidents. Don’t worry, no Firehoses were harmed in the publishing of this post.

Let’s break Firehose now as a hypothetical example and see how our pre-SRE Systems Engineers would respond.

Firehose Is on Fire

It is 3:07AM and Archer’s phone jolts him out of bed. He logs in to Slack and Zabbix to see what is going on. He sees that there are 60 Zabbix alarms being triggered which seem to essentially boil down to Firehose not passing data to customers.

After confirming the situation, Archer runs the program for notifying customers. Notifications are generated and sent, so Archer continues trying to figure out what is broken.

He finds through a bit of investigation that none of the Firehose services are receiving messages from their upstream feed service. He logs into the upstream service and investigates. Eventually he tries restarting the service to no avail.

Archer is stuck. It appears Firehose isn’t actually broken but rather an upstream service. He knows Lana is on that team, knows a lot about the service, and always answers the phone. He gives her a call at 3:25AM, waking her up for the third week in a row. He doesn’t know it, but she is actually on vacation. Despite that, she still answers and offers to help.

Lana jumps on, putting her vacation on pause, and they both continue to investigate. A few times, they discover that they are investigating the same aspect of the problem. Once, they both restart the same services seconds within each other. They try to coordinate on Slack, but their primary focus is resolving the problem, so coordination isn’t firing on all cylinders.

Finally, between the two of them, they get things going again. Archer sighs in relief as he sends a notice to customers that service is restored. In the morning, he’ll write up exactly what broke so there is a record of what happened and something to work off of in the event customers ask questions. He’ll also need to reach out to Lana to capture what she did and work out which changes actually cleared the issue.

How We Have Changed

These days, we treat every failure as an incident. Full service outages and minor internal-only issues are all incidents. To prevent treating everything like a raging inferno, we classify based on severity and impact:

SEV-1: Critical issue that warrants public notification and liaison with executive teams. An example would be that all our Firehose instances are down across all data centers.
SEV-2: Critical system issue actively impacting many customers’ ability to use the product. An example here could be that no new customer connections can be established to Firehose, but existing ones are still working. This example also has a high likelihood of escalating to a SEV-1 as the impact grows due to customer connection churn.
SEV-3: Stability or minor customer-impacting issues that require immediate attention from service owners. This could be the total failure of Firehose at a single datacenter. Customers can still connect to Firehose at other locations and, in many cases, will already have an active redundant connection at another site.
SEV-4: Minor issues requiring action, but not affecting customer ability to use the product. This could be a single failed Firehose instance. The customers can still connect to other ones.
SEV-5: Issues or bugs not affecting customer ability to use the product. This could be the loss of a single disk in a Firehose server. It doesn’t impact the service, but we are at an increased risk of failure.

Any SEV-1s and SEV-2s automatically require a trip through our postmortem process, but more on that later.

When the on-call gets summoned, they take on the responsibility for resolving the issue or escalating the incident response process to get more eyes on the problem. While working an incident, we also have several defined roles to keep responsibilities clear. By default, the on-call owns all roles until they delegate one or more out. The roles we have are as follows:

Incident Commander: Coordinates the incident response. Ensures things move forward, arranges for any needed resources to be obtained, and otherwise keeps things unblocked.
Communications: Provides updates to customers and internal stakeholders.
Scribe: Documents the incident, ensuring we have a timeline and records any changes.
Worker: Researches and responds to the incident. These are the ones digging through logs, running tests, and trying to figure out why Firehose has raised an incident.

In our scenario above, Archer was the incident commander, communications, scribe, and worker. When Lana got on the scene, she also joined in a worker role.

A problem you may have noticed was that Archer just reached out to somebody he knew and trusted on another team for help. Given 100 issues of this nature, it is likely that Archer would reach out to Lana 100 times, making her a permanent and ad hoc on-call.

To solve this problem, we traded in our homemade on-call software implementation in favor of PagerDuty. PagerDuty made it a breeze to implement on-call rotations for key teams throughout Engineering that may need to be called on for their expertise. So now, Lana knows when she could expect a call and she’s not alone in the rotation.

So, with just these changes, let’s take a look at our incident again.

Firehose Is on Fire: Again From the Top

It is 3:07AM and Archer’s phone jolts him out of bed. He logs in to Slack and PagerDuty to see what is going on. He sees that there is a Firehose incident rolling up from 60 Zabbix alarms. They seem to essentially boil down to Firehose not passing data to customers. But the PagerDuty incident actually indicates the issue isn’t Firehose, but the upstream feed generator. Archer is Incident Commander, along with all other roles. He marks the incident as a SEV-1 since all Firehose customers are impacted.

Because customers are impacted, Archer uses the same procedure as before to notify customers of the incident. In the new parlance, Archer is acting as the Communications role while doing this. Because this is a SEV-1, he also escalates a notification to leadership using the Response Play feature in PagerDuty.

Like before, Archer spends a few minutes looking into the root cause of the problem, using the PagerDuty data as a guide. Once he realizes he’s got a big problem, he uses another Response Play to summon another SRE. Woodhouse shows up on the scene and takes on the Worker role.

In his Incident Coordinator (IC) role, Archer keeps tabs on what Woodhouse is doing. He’s also still wearing the Communications hat, so he keeps customers and internal stakeholders updated on the status. He may also seek further backup so somebody can take the Scribe role, documenting what is happening in real time. When Woodhouse tells Archer that the problem is upstream and they need help, Archer escalates to the right team to get a developer involved. Before, he would have just dialed Lana since she is very responsive. However, now there is an on-call rotation that is leveraged instead. That is where Pam comes in, pulled in programmatically as the active member in that team’s rotation.

Pam and Woodhouse collaborate on the problem, keeping Archer in the loop on what is being done. Archer is responsible for ensuring they are making progress and not doing conflicting or duplicated work. Unlike before, Archer ensures they aren’t both restarting the same services.

After Pam and Woodhouse resolve the problem, Archer again uses his Communications powers to notify customers and internal stakeholders that they are out of the danger zone. Finally, as the IC, he creates a Jira to launch the postmortem process in the morning and includes the timeline from the Scribe.

Incidents Are Better

This process provides a lot more structure and clear delineation on who is responsible for what during an incident. It also has the potential to yield far better documentation of the incident which will help with the postmortem and improve communication to customers. By expanding the on-call pool, we’ve also made it possible to programmatically escalate to the right people, making it easier to work an incident while also signaling for help. No time is spent war dialing from a company directory; just press the button and let the cloud robots do the work in the background. Since there is an IC role tracking progress, this also helps eliminate the problem where an engineer digs into a problem to the detriment of communications or identifying other paths to resolution.

Postmortems

The second most important part of incident response is the postmortem process. (If you didn't guess, the first is fixing the problem.) The postmortem process is the process where we not only diagnose what happened, but also discuss:

How can we reasonably prevent this type of problem from happening again?
- Any immediate or urgent actions?
- Any action items that should get rolled into an upcoming release?
- Any longer-term action items that require planning, research, or a prioritized project?
What went well in the incident response process?
What didn't go well in the response process?
What should we do differently next time?
Did we learn anything?

The postmortem isn’t just about identifying what broke and what was done to fix it. It is an opportunity to reflect, improve, and iterate. Sometimes there aren’t any clear action items. Sometimes bug fixes may need to be done in software releases. Occasionally, we find that something we’ve been doing for a long time is not desirable anymore and we need to make changes. Nothing is sacred, and everything is on the table.

FlightAware and I have two requirements for postmortems:

They are blameless. Nobody is at fault for the incident. Humans make mistakes, so you need to accept that it will happen and design systems that are reasonably tolerant to our ways. Nobody is put on the spot or deemed to be the cause of an incident. The time I accidentally shutdown the wrong PostgreSQL server and caused an outage wasn’t an opportunity to admonish me but rather figure out how to make that harder to do. And yes, I did this.
Postmortems are not punitive. This dovetails with the point above, but is also a harder one to land. When somebody is on the hook to write up a document outlining what happened, it has the potential to feel like a homework assignment. The right atmosphere and communications have to be put in place to instill the value of the postmortem. The opportunity to write one is actually a powerful voice on making recommendations for improvements. It is not, and never should be, a punishment.

Fortunately, the blameless approach has always been a core value of FlightAware engineering. It is a value I communicate as much as possible.

Our postmortem process is still evolving, but is currently comprised of the following phases:

Initial writeup: The incident commander gathers all of the facts and data about the incident. This includes the timeline, cause of the incident, what changes were made to resolve it, start and end time, who was involved, etc. We aim to have this completed and peer reviewed within two business days. If there are action items identified that need addressed immediately, they will also be generated and prioritized.
Deep dive: Now that the facts are understood and agreed upon, the SRE team does more of an introspective pass. A collaborative discussion is carried out to identify any process weaknesses, patterns, and greater or underlying issues that may need to be addressed. Some medium- and long-term action items may also be captured.
Review: Every two weeks, we have a scheduled incident response meeting attended by the Vice President of Engineering, all Engineering group leads, and our SRE lead. Everyone who participated in an incident response since the last meeting is invited for a discussion and review of the incident. This goes beyond Site Reliability Engineering and has participation from developers as well. By this time, much of the incident is ironed out, understood, and has associated action items. This is a last pass for questions, concerns, feedback, and a general way to keep tabs on how we’re doing with incident response.

Through this process, we endeavor to make our software and services more reliable and resilient. By understanding what leads to a problem and going beyond a basic root cause analysis, we ensure that we are always moving in the right direction and guarding against future problems. This process can also help feed external customer communication, as part of our write up includes describing the customer impact.

Wrapping Up

Implementing new incident response and postmortem processes at FlightAware has helped provide much more rigor and structure to handling outages. We’ve gone from an ad hoc process to having roles and responsibilities. We’ve implemented postmortem processes to capture what broke but more importantly to identify the actions needed to guard against future failures.

Outages are unavoidable, but there is always room for improving.

By implementing everything outlined here, we continually iterate on minimizing impact, maximizing lessons learned, and improving software. So far, we’ve seen response times shorten (especially for escalations), visibility for stakeholders increase, and much more cross-team collaboration and ownership during and after postmortems. The impact is clear and positive, and I’m looking forward to how we can continually push the bar higher over time.

Blast From the Past: Driving Reliability at FlightAware was originally published in Angle of Attack.

Flight Blocking

Mon, 02 Feb 2026 11:58:45 -0600

FlightAware shares a lot of flight data – you can search on our website and see information about flights that are taking place almost anywhere in the world, stretching back for years. Sometimes, though, we block flights from public view, often at the request of their owners. Ensuring that this data remains private is one of the most important things to test at FlightAware. This post will give some background about flight blocking, an explanation of how our users view their own blocked flights and share them with others, and some insight into how these rules translate into test scenarios for FlightAware QA.

What is flight blocking?

When a flight is blocked, that means that it is hidden from most users. It won’t show on maps or in lists of flights (such as the arriving or departing aircraft at an airport, or a list of flights of a particular aircraft type), and if someone that doesn’t have access to the flight tries to open its flight page directly, they’ll get an error. However, if you are the owner and have registered the aircraft with FlightAware, then you’ll be able to see its flights just like any other, both on the flight page and everywhere else on the website.

By enrolling their aircraft with the FAA’s Limiting Aircraft Data Displayed (LADD) program, US aircraft owners can request that their aircraft’s flight data be filtered from public view but still available to vendors (like FlightAware) that use their data feed. In that case, FlightAware will have the data, but it will be universally blocked until the owner claims it by signing up for our Global subscription and confirming their ownership. At that point, they will be able to view their flights, as well as grant limited access to other users. Similarly, pilots in places like Europe where non-commercial aircraft are blocked from public tracking by default can use our Global subscription to view their flight data on our website.

In addition to being used to claim and get access to aircraft that are already blocked, our Global subscription also allows any owned aircraft to be blocked at the owner’s discretion, separately from anything like LADD.

Testing Flight Blocking

To test flight blocking functionality, you need to consider various scenarios on pages throughout our applications. Some of the most common ones have already been listed above – blocked flights shouldn’t show (for most users) on maps or in lists of flights, and if someone that doesn’t have access to the flight tries to open its flight page directly, they should get an error. Whenever a change goes through that impacts any list of flights or the flight page, it’s important that these scenarios be tested, both for the flights being properly blocked for most users and properly visible to their owners. However, there are several ways global subscribers can allow other users to view their flights, which in turn leads to further test scenarios.

The most direct way a global user can share a flight with others is by adding them to their global account, either as an admin or as a regular user. Admins can interact with the global account with many of the same options as the aircraft owner, while regular users can see the otherwise-blocked flights on the account but cannot edit the account itself. Additionally, users may be grouped with aircraft to control which flights are unblocked for them or be given universal access to view all flights on the global account. Each of these scenarios must be considered for testing if relevant changes are going through.

Another way that global users can share a flight with a subset of users (while still blocking them to the general public, as seen above) is by allowing Fixed Base Operator (FBO) staff to view the flights. This functionality is a little more nuanced, as it allows FBO staff at the airports of their blocked flights to view those specific flights. FBO staff can view these flights through the airport page, where these flights are marked in bold to indicate that they are normally blocked. From there they can link to the flight page, but when viewing the flight, any past flights the aircraft has taken to that don’t involve the airport the FBO user is associated with will not be visible in the flight history section of the page. When testing changes to the airport page, flight page, or to FBO user functionality, it’s important that this selective unblocking remains intact, revealing (only) specific flights to the FBO users at that airport.

One other way that a global user can share a blocked flight with another user is to use the “Share Flight” option on the flight page. When you share a flight using email, it will include a link to the flight page that allows anyone to view the flight, no matter its block status.

There are a few specifics of this view of the flight page to be considered when testing. First, the flight history section of the flight page isn’t visible in this view, so that only this one flight’s information is shared. The amount of information visible via the link will also depend on the user viewing it. If the person you’re sharing it with doesn’t have a FlightAware account, they will only see the basic information about the flight that they would see on any publicly visible flight page when not logged in and will have restricted access to information like flight layers and runway information. Whenever changes to the flight page are made, this view needs to be tested to ensure that the basic functionality remains intact, and that the various rules restricting it remain in place.

Conditional and Partial Blocking

There are also a few situations where flights from an aircraft that is blocked in some situations may be visible or partially visible in others, which result in corner cases that need to be tested as well. These situations primarily have to do with the time-sensitive nature of the LADD, or the different idents a single flight might be associated with.

Time-Sensitive Blocking

One difference between Global-based blocking and LADD-based blocking is that LADD blocking is date-sensitive, while Global blocking is not. This means that when an aircraft is blocked via LADD, only flights that line up with the dates that it was registered with LADD will be blocked; if it had flights before it was registered, or it later leaves LADD, those flights will not be blocked, but the ones while it was on LADD will always remain blocked. If a flight is blocked via Global, all present and future flights for that aircraft will be blocked; if that Global blocked status is removed, then barring other reason it might be blocked, all past and future flights will again become unblocked.

Testing this functionality involves adding and removing flights from our test LADD database or global accounts (or both) and ensuring that the flights end up blocked or unblocked as appropriate for general users. While this logic isn’t frequently updated, any changes to it would require regression testing to ensure the functionality remains intact.

Flight Idents vs Aircraft Registration

In addition, there may be situations where an aircraft registration is blocked but a flight ident is not, as the same flight ident may utilize different aircraft on different days. In this case, a user may view a flight itself via the flight ident page, but it will be in a partially blocked state – anything about the aircraft registration will not be visible on the flight page.

The test cases here involve ensuring that the different combinations of variables all work together properly – blocked vs unblocked registrations and idents, users that should or should not be able to see the blocked information, and whether the page being viewed corresponds to the registration or ident. The aircraft owner should have full visibility on both pages, while other users should have restricted visibility on the unblocked page and get an error if they attempt to open the blocked page directly.

Conclusion

At FlightAware, we ensure that blocked flights are properly restricted from public view, and we allow the owners of these aircraft to selectively unblock their flights as needed, be it to their associates, FBO users, or other individuals. Identifying and testing the intersections of different flights, different pages, different users, and different ways of blocking or unblocking flights to ensure that they all work as expected is one of the most important tasks of FlightAware QA.

Flight Blocking was originally published in Angle of Attack.

Protecting Applications with OpenID Connect

Mon, 05 Jan 2026 14:06:21 -0600

In October 2024, we released AuthNxt—our next-generation authentication platform designed to integrate seamlessly with both our legacy TCL monolith and our modern microservices. This move was the product of months of planning, designing, building, testing, and integrating to ensure that all our services can identify FlightAware users and ensure account security across the stack. To date, our Go-based backend serves millions of requests per day, powering the flightaware.com website, our mobile apps, and countless microservices deployed in our datacenters.

If you’re curious about the engineering behind AuthNxt, check out our blog post about its development lifecycle and design trade-offs that went in to releasing this core service.

AuthNxt’s Infrastructure

A diagram demonstrating how FlightAware services interact with AuthNxt.

At the core of our authentication service is a backend microservice written in Go. This backend manages all the logic for user authentication, including session management, JSON Web Token (JWT) generation, anonymous access for non-logged in users, and third-party sign-in support. In addition to the backend, we wrote a library for our modern Next.js applications and a compatibility layer in our TCL monolith for our legacy applications. Together, this provides seamless authentication across the entire flightaware.com domain via secure session and JWT browser cookies.

The Problem – GlobalBeacon Authentication

While our new authentication infrastructure has been considered a resounding success, it did not fully cover all our product needs. Namely, there was one major drawback to our existing architecture: it only provides sessions and JWTs to flightaware.com domains (things like www.flightaware.com, login.flightaware.com, etc.). This covers the majority of our services, but there is one critical service at FlightAware that was not covered by AuthNxt: GlobalBeacon.

GlobalBeacon was jointly developed by Aireon and FlightAware and is designed to be a first of its kind solution for ICAO’s Global Aeronautical Distress and Safety System (GADSS). This is a critical service for many FlightAware customers, providing real-time alerting and fleet monitoring on a global scale.

A diagram showing the challenge of passing sessions to GlobalBeacon.

Much like our web monolith, GlobalBeacon is written in TCL and has its own bespoke authentication system. Furthermore, GlobalBeacon exists on a separate web domain: globalbeacon.aero. Thus, it cannot share its session cookies with our flightaware.com domains and cannot support AuthNxt’s session cookies.

This presented a unique problem: we wanted GlobalBeacon to utilize the same authentication best-practices as AuthNxt, without weakening our session cookies to work across web domains. We originally considered building out a separate AuthNxt deployment specially designed for GlobalBeacon, but this came with several drawbacks—maintenance overhead, deployment headaches, compatibility, etc. We would much rather prefer a solution that integrates with the already-deployed AuthNxt service.

Fortunately—as with many engineering problems—there’s a standard solving exactly that problem.

Implementing OpenID Connect

OpenID Connect (OIDC) is an “interoperable authentication protocol based on the OAuth 2.0 framework.” It is designed to allow services (known as the Relying Party) to use a third-party provider (known as an OpenID Provider or Identity Provider) to verify a user’s identity using its own authentication and authorization framework. This offloads some of the heavy lifting of managing user credentials and identities onto a provider with a trusted authorization server.

A view of the AuthNxt sign-in page, with the Google and Apple options highlighted.

We already use OIDC within AuthNxt. Have you ever signed in to an app using your Google or Apple account? If so, then you’ve already used OIDC! We support Google and Apple sign-in in AuthNxt. In this case, FlightAware acts as the relying party (RP), while Apple and Google act as OpenID Providers (OPs). In this case, we retrieve limited user information from Apple or Google’s servers and can associate the user’s email address with their FlightAware account. This offloads the authentication logic to these OP servers, which we trust to be secure from potential attackers. This is especially convenient on mobile devices, where Google and Apple sign-ins are well-integrated into Android and iOS, respectively.

A diagram showing how the OIDC flow works. From auth0.

From here, the solution seems simple: add functionality to AuthNxt to allow it to act as an OpenID Provider (OP) with GlobalBeacon acting as the relying party (RP) to verify user’s FlightAware account information. This would enable GlobalBeacon to utilize AuthNxt’s security and login flows while still issuing and managing its own sessions within the globalbeacon.aero domain.

Challenges

The single largest challenge with implementing OIDC is complexity. The OIDC specification is dense and has many requirements; it’s also based on the OAuth 2.0 Authorization Framework (RFC 6749), which itself has complex implementation considerations. To make integration as seamless as possible, reduce error rates, and provide a potential avenue to use AuthNxt for other non-flightaware.com applications, we wanted to adhere to these specs as closely as possible.

Fortunately, the fact that OIDC is a written specification means that the requirements are extremely clear; by adhering to the spec, we can utilize pre-written OIDC/OAuth 2.0 libraries to integrate with AuthNxt. There’s no need to write custom OIDC client code for every RP. To ensure that our implementation was correct, we created a small web app for testing OIDC implementations.

A screenshot of our simple OIDC testing tool.

This app is simple: once registered as an RP, it can make authentication requests to AuthNxt. Upon successful sign-in, we are redirected back to the tester app, which shows the user information that has been returned from AuthNxt. This information is signed by the application and, thus, can be verified for integrity using AuthNxt’s public key. This proves that the user accessing the app is a real, valid FlightAware user.

A screenshot of the results page of our OIDC testing tool.

In our OIDC tester, we simply display the response from AuthNxt to verify its contents. In a real-world application like GlobalBeacon, you would then need to issue a session for that user for use within your application. As part of this project, GlobalBeacon updated its authentication code for this exact purpose; it now provides its already-established session cookies but now backed by AuthNxt instead of its legacy authentication system.

Impact and Next Steps

GlobalBeacon now has a modern, secure authentication solution that both provides security for our users and maintains parity with the legacy authentication solution. This allows users to sign in to GlobalBeacon with their FlightAware accounts seamlessly using the same login UI that AuthNxt implements natively. In fact, users already signed in to their FlightAware account don’t even need to sign in again! Simply authorize the integration and you will be signed in to GlobalBeacon automatically.

A screenshot of the authorization page as part of the OIDC flow.

In the long-term, OIDC is an investment in our authentication infrastructure across the stack. In addition to securing the flightaware.com domain, we can also secure applications on other web domains using the same security best-practices. If we want to add another application as an OIDC client, we simply need to register it with AuthNxt and add an OIDC client library, and it will work out of the box.

Closing Thoughts

Authentication is a core service at FlightAware. That core was strengthened through the implementation of OIDC in AuthNxt, expanding its capabilities while simultaneously providing a solid authentication solution for GlobalBeacon.

Personally, this was a fantastic learning opportunity and a great way to deepen my knowledge on authentication and security best-practices. I am most excited to see how else AuthNxt will grow its capabilities in the future as we continuously adapt to changing security standards in the modern web world.

Protecting Applications with OpenID Connect was originally published in Angle of Attack.

Monorepo Phase 2: Tooling, Workflow, and Lessons from the Web Wing at FlightAware

Tue, 02 Dec 2025 08:57:09 -0600

When we first shared our move to a monorepo, our goal was to simplify dependency management, align tooling, and ship faster. A year later, that foundation has grown into the standard way the Web Wing builds. We’ve added more apps, introduced custom Nx tooling, and learned how to keep a single shared codebase healthy as it scales. This post looks at how we’ve evolved that setup, what tooling decisions have scaled with us, and what we’ve learned along the way.

The Web Wing Ecosystem

The Web Wing spans multiple Crews and owns nearly every public-facing surface of FlightAware - Flight Tracker / Flight Status. All of our new web applications are built with Next.js. These apps are built, tested, and deployed from a single repository. Today, the monorepo has 8 applications and 9 shared libraries. The monorepo gives every Crew access to the same libraries and build tools, which makes it easy to share UI patterns and behavior without duplicating work.

If you are wondering what Wings and Crews are at FlightAware, checkout this blog post.

Tooling and Structure

We use Nx to manage the workspace. The repo follows the standard layout of apps/, libs/, and deploy/. Nx’s dependency graph and “affected” detection allow us to deploy only what’s changed.

All apps share a single package.json. That choice has trade-offs, but it ensures every app stays on the same framework and library versions. When we upgrade Next.js or React, every app upgrades together. It takes more coordination but removes the long-term drift that tends to appear in multi-repo setups.

Over time we’ve added custom Nx generators that scaffold deployable apps, including CI configuration. These are used both for production work and internal events like the FlightAware Hackathon, where teams were able to spin up deployable apps in minutes.

Workflow and Deployment

Engineers usually run a single app locally with nx serve, though Tilt can be used to run multiple apps together when needed. Local setup takes about five minutes—the time to install dependencies and start the dev server.

All CI and deployments run through GitHub Actions. We use Nx affected detection to build and test only changed projects. Builds complete in under ten minutes, so we haven’t needed remote caching or parallelization yet.

Once a change has been committed, a PR build is generated and our test suite runs. A CODEOWNERS file is used to ensure that the appropriate crew/person is tagged for review of the PR.

End-to-end tests written with Playwright are integrated into the Nx graph and run as part of CI. (See our Playwright Blog Post for details.) ESLint handles linting across all apps.

Deployments use a single pipeline. All affected apps are deployed together. Production releases happen by updating a branch that Flux watches that is coupled to the environment we are deploying to. Flux picks up the new versions and ships the containers to Kubernetes. Rolling back is as simple as updating the app version in Git. Thanks to Nx’s affected detection, we only deploy applications that have actually changed since our last deployment.

Challenges and Lessons

The biggest challenge of a shared package.json is coordination. Updating a library that’s used by several apps requires more communication. We’ve mitigated that by updating frequently so each change is small and easier to reason about.

Complex upgrades, especially large Next.js releases, touch every app at once. Our automation and PR deploys help validate those quickly. Running Playwright tests across all apps gives us confidence that the upgrade didn’t break anything unexpected.

Nx’s affected detection has kept build times predictable even as the repo grew. We’re still using sequential builds, but as the number of apps continues to grow we’ll revisit parallelization.

The most important lesson wasn’t technical. Frequent communication and shared ownership matter as much as tooling. A single repo works only if every engineer feels responsible for keeping it healthy.

Developer Experience

New engineers start by contributing to existing apps. The process is simple: clone the repo, install dependencies, run nx serve, and open the app in a browser.

Consistency comes from shared ESLint and test configurations. The monorepo README doubles as a changelog for major updates so everyone stays in sync.

Setup is lightweight and feels like any other modern Node-based project.

Coordination Across Crews

All engineers in the Web Wing participate in #monorepo-collab, a Slack channel dedicated to coordination and announcements. Codeowners help manage boundaries between apps and shared libraries. Broader topics or structural changes are discussed in our monthly Web Alliance meeting.

Cross-Crew collaboration happens naturally now. Components built for Checkout might end up in the Admin app. Analytics helpers developed for the Homepage are reused in Conference. The monorepo made this normal instead of exceptional.

Impact

Builds now complete in under ten minutes, and new applications can be scaffolded and deployed in just minutes using our Nx generators. Regular upgrades to Next.js and other dependencies keep our technology stack current and technical debt low. Engineers across different Crews work within a consistent environment, which makes cross-team collaboration smoother and more efficient. Shared tooling and libraries have also led to a more unified user experience across FlightAware’s public-facing applications, strengthening both our internal developer workflow and the overall product experience.

What’s Next

As FlightAware moves on from Tcl and to more modern applications, we expect to add more targeted Next.Js applications for upcoming products. Our shared libraries will continue to grow and evolve to serve the needs of consuming applications. Our CI/CD tooling will mature to give finer-grained control over which app ships.

Closing Thoughts

Our monorepo has become the foundation for how the Web Wing builds. It heavily promotes reuse and code sustainability. Centralized tooling, frequent communication, and a bias for small, continuous updates have kept it maintainable as it grows.

Monorepo Phase 2: Tooling, Workflow, and Lessons from the Web Wing at FlightAware was originally published in Angle of Attack.

An Enlightened Engineer's Perspective on Design

Mon, 03 Nov 2025 11:20:04 -0600

Discovering Design Beyond Aesthetics

When I first started my career in engineering, I thought design strictly meant aesthetics. It was about colors, alignment, and polish. I used to think of design as something to implement, not something to participate in. As engineers, we tend to focus on feasibility and how to make things work, how to make it fast, and how to make it testable.

I've always had an interest in design. That curiosity is what made me want to take IDEO U’s "Insights for Innovation" and "From Ideas to Action" courses. I wanted to understand how designers think, how they approach problems, and how that mindset might make me a stronger engineer.

Through those courses, I started seeing design differently. Design is not just about how something looks, but how something works for real people. It is a mindset built around empathy, curiosity, and experimentation. What surprised me most was how compatible that mindset is with engineering when you approach it with intention.

Seeing Through Empathy

One of the most memorable concepts was the idea of listening with your eyes. Yes, that seems like a phrase you might want to skim past, but it ended up being one thing that really set the tone for the course and changed my mindset. It means watching how people behave, not just hearing what they say. When you observe without rushing to fix, you can start to notice the moments that reveal what is really happening. The pauses before a tap, the small moments of confusion when something is not intuitive, and the way people adapt when the flow is not smooth are all small signals that hold meaning. While easy to overlook and rush past, they are important to observe so that we can understand where to improve and how we can truly create designs that cater to our audience.

That practice of observation naturally leads into another concept, building empathy through immersion. Observation shows us what people do, but immersion helps us understand why. Instead of guessing what someone might feel, you intentionally experience a task or limitation from their perspective. When you walk through a flow as a user, not as its creator, you notice friction you might have ignored. When you intentionally step outside your comfort zone or test something as a beginner, you uncover the small frustrations and decisions that shape how people experience technology. That kind of empathy is not theoretical; it is a design skill, and it strengthens engineering judgment in a very tangible way.

Building to Think

Understanding how people experience what we build ties directly into another key idea, the value of building to think. Engineers often build to deliver. We want everything to be correct, efficient, and ready for production before anyone sees it. But design thinking asks us to build earlier, when things are still rough, and to share those early versions for feedback. That shift changes everything. When you treat a prototype as a learning tool rather than a finished product, you start to see the value of feedback not as judgment, but as information. It is how you learn what works, what resonates, and what does not. It is also how you balance the three lenses of design thinking: desirability, feasibility, and viability. Building early and getting feedback helps you test not just whether something can be built, but whether it should be, and whether it makes sense to sustain it in the long run.

Learning these concepts reshaped how I think about problem solving. Design thinking gives structure to things that often feel instinctive, such as curiosity, empathy, and iteration, and turns them into tools for building better products. It reminds me that progress does not come from getting everything right the first time, but from learning quickly and staying open to what others see and feel. That is what makes design so powerful and, ultimately, what makes engineering more human.

An Enlightened Engineer's Perspective on Design was originally published in Angle of Attack.

Assessing (Cyber) Risk in a Sustainable and Agile Way

Mon, 06 Oct 2025 14:49:24 -0500

The practice of cybersecurity is fundamentally about managing risk. In a world where there are limited resources for implementing and maintaining controls, and where systems must be accessible/usable, we need some mechanism to determine where to spend our time and effort. Sure, we could use post-quantum ciphers to encrypt all the data, lock our servers in a cage, unplug the power and network, throw away the keys, then bury the whole thing under 100 tons of radioactive concrete. But people need to actually use those servers and *absolute* security is more a state of mind than something concrete that exists in reality. Fortunately for us, smart people have already thought about, and come up with, a way to do just that: the Risk/Threat Assessment.

Now, this is not the first post anyone has ever written extolling the benefits of performing risk assessments. There are plenty of posts and articles that tout the need for risk assessments and high-level steps for performing risk assessments. OWASP has some excellent resources on risk rating and threat modeling. NIST has SP 800-30. If you aren’t already familiar with this topic, click the links (open in a new tab so you can climb back out of the rabbit hole!) and dive in. We can wait.

Okay. If you’ve made it this far, we will assume you’re familiar with at least the general outline of what a risk assessment is and roughly what the different components are. Great. But how do you implement this in practice? More importantly, how do you implement this in practice so you can get some useful benefit out of it without spending all your time doing risk assessments? Lots of frameworks (and lots of corporate risk assessment policies) seem to assume you’re doing some kind of waterfall development and thus that you just do your risk assessment once over everything “in scope” before (or after!!) you deploy to prod and you’re done. Or they’re focused on an enterprise IT view and want to perform an annual risk assessment across entire fleets of systems outside the view of individual product teams. But there are a few problems with those approaches:

If you are assessing risk for all the assets in your organization once a year and you’ve got even a moderately sized organization, that is a lot of systems to be thinking about. FlightAware has hundreds of servers with many interrelated dependencies. Even with a dedicated cyber risk team, assessments at that level are going to be time-consuming and will likely result in broad generalizations without the detail, nuance, and context of individual components.
Even if you’re focused on a single system / product, there is likely enough complexity to make this a daunting task.
Technology is constantly changing, the threat landscape is constantly changing, your product is constantly changing. How do you keep up with all that change if you are only thinking about these things once a year or less?
We want the folks building the product and maintaining the systems to be involved and aware of the process. They will have the best understanding and insights that an external security team won’t have. We need a process that can maximize their contribution while maintaining as high a velocity as possible.

Fitting In

So, how do we fit this risk assessment process into FlightAware? Previous blog posts have lightly touched on aspects of how we develop software at FlightAware, which is that we operate loosely on Agile principles. For any significant change, whether that be a new product, system, or feature, we follow the same general path any software engineering group would:

Identify the requirements
Plan the work
Do the work
Check the work (tests!)
Deploy the work
Repeat!

Stage 2 is the critical step here. The risk assessment process is a planning activity and becomes a sort of mini cycle within the cycle. Once we have the set of requirements (customer, regulatory, internal policy driven, etc.), we can start to sketch out a rough system design, identify the boundaries of the system/component, and figure out data flows.

Data flows? What are those? A data-flow diagram (DFD) shows the data elements that flow into, through, and out of a system. This is ultimately what we need to protect, and these flows are going to largely be the sources of any risks we identify with our system. Having this well-documented allows us to focus our attention as we move further along in the process. As a side-benefit, DFDs can become useful documentation/reference for the developers in the future when they need to make changes to the system and want to understand impact.

Though there isn’t a single right or wrong way to draw a data-flow diagram, the key elements include:

Individual data elements listed. For conciseness, groups of data elements can be combined into a single label that can be referenced from another table.
- Eg `name`, `flightID`, `date`, etc
Functions/components/subsystems that process the data
- Eg `get_user_info()` or `POST /user` or `User database`
Arrows showing the direction the data is flowing

Example DFD

Understanding the data that the system is processing is important. First, it’s the data that we are ultimately trying to protect. Understanding the classification of the data (is it PII, PCI or subject to other regulatory restrictions?) helps inform our assessment of the impact later in the process. Understanding the type of data (integer, string, JSON, HTML, image, etc.) helps us as we consider what kinds of failures could be caused when parsing and processing the data.

A system design diagram identifies the boundaries of the system we are looking to assess, its major components, and connections it has to external systems and users. This lets us understand the architecture of the system as well as ingress/egress points for data. Those are the spots that we will want to focus on as that is where the data is entering or leaving our control and where threats to confidentiality, availability, or integrity will be realized. We use arrows in this diagram as well, to show the direction that data or connections flow.

Example System Design Diagramad

“Avengers, Assemble!”

Okay, great! Now that we have some pretty diagrams and we know how things should roughly fit together, we can start the fun: brainstorming about all the ways this system could fail. This part of the process is helpful to do in a collaborative environment. Bring in the developers, the architects, the technical leads who will be working on building this product and have subject-matter expertise on how it will or does work. Ideally, we’d also bring in a Site Reliability Engineer (SRE) or systems engineer/administrator as well. These are the folks who spend the most time dealing with complex system interactions and they’ve probably seen all kinds of interesting failures and will understand the nuances of how different components interact in a real-world situation. Finally, of course, you’ll want to include some cybersecurity expertise to round off our team.

Now that we’ve got the group assembled, simply pick a starting point in the system design diagram – some specific boundary or external interface where users or external systems are sending/receiving data – and start brainstorming failures. With all these technical engineering types involved, this brainstorming process has the potential to get very technical and very in-depth into the nitty-gritty of how specific application components, operating systems, or even hardware behaves. This process will be repeated throughout the development lifecycle, so it's important to start off by keeping things higher-level and move deeper as everyone grows more comfortable and the process matures. The first pass when we initiate the project doesn’t need to be comprehensive and starts with just a broad system-level analysis. The goal is to help developers become comfortable with the exercise so they can do it formally or informally “on their own” each sprint or every few sprints and focus just on the components that are in-scope for that sprint. We want to get to a point where thinking about and assessing risk is as natural and reflexive as writing tests (everyone is doing that, right?) or documentation (😂). Even implementing risk assessments is an Agile process! How meta.

At this stage there are no right or wrong, good or bad answers. Let the folks in the room come up with ideas and write them down. We have a risk assessment template in Confluence to store all the data for our threat assessment. Later, we will consider likelihood and impact to determine which, if any, of the identified threats are worth implementing mitigating controls.

Example table of identified threats

Hitting our stride…

This is where it can be helpful to bring in a formal threat classification framework to help us think systematically about the kinds of threats or failures we could face. We use STRIDE as it provides a nice, easy acronym to loop through various threat categories and is relatively straightforward:

S – Spoofing
T – Tampering
R – Repudiation
I – Information Disclosure
D – Denial of service
E – Elevation of Privilege

So, if we go back to our data flow diagram and start at the first step, data coming in from an external feed source, we can consider each STRIDE element in turn as we think about that connection and the associated data flow. First, think about how the data or connection can be spoofed. Do we verify the identity of the remote system with TLS certificates or some kind of identification/authentication process? Is the data itself signed? Next think about tampering. Could the data be modified in transit via some kind of man in the middle (MITM) attack? Could the data be modified by another user or process while it is stored in memory, in a cache, or on disk? Continue through each of the remaining STRIDE elements and then move on to the next process or component and repeat until all the areas of the in-scope system have been examined. An important element to point out is that while this is fundamentally a cybersecurity-driven process, the focus of this brainstorming session should be on “failures” not “attacks”. It is easy to fall into the trap of thinking about risk solely in terms of what an external (or even a malicious insider) attacker will do, but we need to keep in mind that the goal is not merely to keep attackers out, but to build resilient systems. *Anything* that can fail, is fair game for consideration. What if a hard drive dies or gets a bit or two flipped? What if the network connection between two components goes down, or even just experiences unusually high latency? What if a tornado rips through the datacenter? What if us-east-1 goes down? Thinking about these things isn’t just a cybersecurity box checking exercise (availability is 1/3 of the CIA triad!), it’s about systematically examining the system, looking for weaknesses, and ultimately building a better system that is robust and resilient in the face of any kind of threat.

Now, we don’t have to start from scratch every time we go through this exercise. We can pre-seed our list of threats from previous iterations or from external sources like Mitre’s ATT&CK or the OWASP Top Ten. These resources are great because they come from data about organizations facing these threats in the wild. Each iteration of this process lets us review, update, and refine our list, but not having to start from scratch means we can move faster and get to building things sooner.

List(threats)

Severity = Impact x Likelihood

Once all the components and data flows have been examined, we usually end up with a decent list of threats to address. We want to understand the severity of each of these threats so we can prioritize the work to implement mitigations and determine where it needs to fit in the development timeline. Severity can be thought of as the product of likelihood and impact. Despite that definition sounding downright mathematical, this part is as much art as science. Humans are notoriously bad at predicting things, so anywhere you can get hard data from can help, but ultimately, we don’t need to have an actuarial level of precision with these predictions. A simple “high, medium, low” is sufficient for our purposes and we usually delegate the initial analysis to an individual with the whole group providing final review and sign-off. This lets us speed up the process somewhat, while reducing the impact of an individual’s biases.

SDLC << 1

For each threat identified and prioritized, we either list existing controls that address the risk, or we add a task to the project’s epic to identify and implement an appropriate response. This provides two key benefits: we can easily tie implemented controls to identified threats with assurance that those controls will exist in the system once completed, and we can integrate security right into the project plan and timeline. This aligns with the mantra of “Shift Left” in the Dev(Sec)Ops ethos.

Future work

Now we’ve systematically generated a list of threats, prioritized them, tied them to a list of compensating controls, and then tied those to discrete development tasks that we can track and measure. Our system is as robust as we can make it, right? Are we … done?

Spoiler alert: of course not!

Going forward, we can improve on this process with more automation. We can take our list of threats and build specific tests to check for vulnerabilities to these threats (see also vulnerability scanning: SAST/DAST) that we integrate into our CI/CD pipelines. Rather than documenting implementation of the control, the assurance of mitigation comes from passing tests. For boilerplate threats that recur persistently across products, projects, and systems we can start to build templates, standards, and libraries that get included by default without needing to re-invent the wheel every time. Iterative improvements are the name of the game and let us continually adjust and re-adjust to changing threat landscapes, new technology, and new requirements over time.

Assessing (Cyber) Risk in a Sustainable and Agile Way was originally published in Angle of Attack.

Stories from the Cache Crimes Division

Tue, 02 Sep 2025 11:16:29 -0500

Introduction

One of my favorite things about working at FlightAware is having the latitude to tackle problems that interest me, even when they're not directly related to my current project. Recently, a few different issues related to HTTP caching all came up around the same time, and I got the opportunity to learn a lot more about our web caching infrastructure (and web caching in general) than I ever expected. In this post I'll discuss two of the issues that arose, covering their impact, root cause, and remediation.

Background

First, a brief introduction to HTTP caching to help you follow along (just the relevant bits). Broadly, there are two places where HTTP responses can be cached: your computer, and some other computer. Alright, that's an oversimplification. To be more precise, when thinking about HTTP caching as a developer you're either worried about the user's browser cache or you're worried about a caching proxy (sometimes multiple) that you control between you and the user. Both will be relevant in this post. In FlightAware's case, we have both Varnish and Cloudflare sitting in front of us, acting as two layers of caching. A diagram:

It's simple enough, we have Cloudflare fronting everything, with two regions of Varnish servers behind it, and then two regions of Apache servers behind those. If we were doing it all over again there's a good chance there would be no Varnish here, but our use of it greatly predates our use of Cloudflare, and we make extensive use of its flexible configuration via VCL.

To control how the caches behave both in the proxy servers and user's browser, we can use various HTTP request/response headers to signal what content is cacheable, for how long, etc. The most important header to call out here is the Cache-Control response header, which can contain many settings. The one you'll see referenced the most in this post is max-age=, which indicates how many seconds a given response should be cached before it is considered stale.

Case 1: The fluctuating distance

The first case was the most complex to debug, as it had the most dependent conditions required for things to go wrong. It also required some deep research into Varnish's powerful configuration language to understand what was going on. It started as a bug report from one of our users highlighting how the flight they were viewing would periodically seem to lose some progress as its "distance flown" would sometimes temporarily creep down instead of up. It sounded like we were serving them some stale data (not a good look for a live flight tracking website)! But where was it coming from?

Fortunately, the problem was easy to reproduce. I just sat on a flight page with Chrome's dev tools pulled up and waited, as we periodically requested new flight data via an ajax call. Once I had the reproduction, it was time to accelerate the debug process by peeling back the outermost layer of caching (Cloudflare) and doing some manual exercising of the guilty endpoint. This quickly revealed some striking behavior:

Age indicates how long the response has been cached, it should generally not be more than max-age

We could receive continuously stale data, upwards of 8 minutes out-of-date, even with a max-age of 60. All the other headers were in good order, with Expires and max-age set consistently with each other.
It's generally pretty tough to get HTTP caches to serve you obviously stale content, with just a couple exceptions. One such exception is the stale-while-revalidate setting of Cache-Control. This setting allows stale data to be served from the cache as long as it's less than N seconds out-of-date, and although the stale data gets served, an asynchronous request is made simultaneously to revalidate the data, either marking the existing cached data as fresh again or replacing it with actually fresh data. But we're not receiving a stale-while-revalidate value here, so what gives? It doesn't take much searching around for the keywords "varnish" and "stale-while-revalidate" to stumble upon Varnish's "Grace mode", a setting within Varnish itself that exactly mirrors the behavior of stale-while-revalidate. After expanding my vocabulary, it didn't take long to find the smoking gun in the change history of our main VCL file.

This change was part of a larger wholesale migration of our Varnish configs during an upgrade from Varnish 3 to Varnish 7. I think it's likely that the developer thought the upper block was simply being overridden by the lower one and was thus unnecessary.
To further complicate things, in this code block there are 2 different grace settings being modified, req.grace and beresep.grace. All that's important to know is that req.grace overrides beresp.grace, so by removing the upper block, we went from a grace period of 15 seconds to 10 minutes!
So a grace period of 10 minutes means that the first request within 10 minutes of a resource going stale would serve the stale data, but then every request after should be fresh, right? How were there 5 responses in a row of stale data during my experimentation? Remember the infrastructure diagram above? Behind Cloudflare sits a sizable collection of Varnish servers. Any request from a user can go to any of those servers (sorry, no sticky sessions), meaning you could get stuck hitting the grace period for one stale cache after another, ouch.
Resolving the issue was fortunately simple: remove the setting altogether. Varnish's default grace setting is 10 seconds which is close enough to our original 15. By leaving it unset, we also get the benefit of Varnish respecting any stale-while-revalidate header we choose to specify ourselves at the origin.

Case 2: What the Pragma?

The next case was a bit more impactful than the first, to the tune of thousands of dollars a month on our Cloudflare bill. Our Operations team had recently informed us of some unexpected Cloudflare bandwidth overages that seemed to be getting worse. Although the underlying cause wasn't well understood, I went off to find some lowhanging fruit to perhaps stem the bleeding. It didn't take long to stumble across this graph:

Why were 8 out of our top 10 requests simply fetching static javascript/css resources? The pattern here may be familiar: we've included hashes in the paths to each file, enabling us to set extremely long cache times on the resources while retaining the ability to push out new versions when needed by pointing to resources with new hashes. In short: requests for these paths should almost always be served by the user's own browser cache, not Cloudflare. Looking at the response headers for one of the requests shows what I expected: Cache-Control with a wildly high max-age:

However, it also revealed something else interesting. What was that Pragma: no-cache doing there? What the heck does that header even do? Well, it turns out that it depends. The Pragma header is an artifact of the HTTP/1.0 era, when we didn't have the Cache-Control header and the question of caching had a simple, binary yes/no answer. Now, though, the header is quite deprecated (as MDN clearly communicates with a big red box) and its use is discouraged. "No big deal", I thought to myself, "since it's deprecated, surely it will be overridden by any settings in the Cache-Control header". And indeed that's exactly what MDN seemed to indicate:

Note: The Pragma header is not specified for HTTP responses and is therefore not a reliable replacement for the HTTP/1.1 Cache-Control header, although its behavior is the same as Cache-Control: no-cache if the Cache-Control header field is omitted...

I had to return to that page several times and reread it to finally notice that there's more to that quote:

...in a request.

Oof! So we can have Pragma: no-cache in requests, which is overridden by the request's Cache-Control header, but we can also have Pragma: no-cache in responses, where its behavior is unspecified. Could this mean that Varnish and Cloudflare ignore the header, but Chrome doesn't? Why yes, that's exactly what it means, and here's the source to prove it:

Varnish, on the other hand, ignores it completely.

There's some nuance to why we were setting the pragma header in the first place, but the gist of it is that in some cases early on in processing a request, we'll set things up to not be cached (Expires: 0, Cache-Control:no-cache,no-store,must-revalidate,max-age=0, Pragma: no-cache; the whole 9 yards!), and then later on in the request we decide they should be cached after all, so we update the to-be-emitted Expires and Cache-Control headers appropriately, but we forgot to clear out the Pragma header. So, again, the fix was just to delete some code (specifically, the initial setting of Pragma).

Conclusion

When a website (or any piece of software) hangs around for 20 years, it manages to accrue its fair share of mysterious cruft. Lines of config, blocks of code, even comments which people are afraid to remove, lest they break some thing seemingly wholly unrelated. Let this go on for too long, though, and you end up with something that’s impossible to maintain, with cruft layered on top of cruft until you finally just have to start all over. I hope these stories help demonstrate that it doesn’t have to be that way. Computers can be understood. You can root cause bugs, fix them, and make the whole thing simpler in the process!

Stories from the Cache Crimes Division was originally published in Angle of Attack.

My FlightAware Internship

Mon, 04 Aug 2025 11:25:46 -0500

The summer internship program at FlightAware plays a crucial role in shaping the next generation of technology leaders while contributing fresh energy and perspective to our engineering teams. The internship program not only gives students the opportunity to apply their classroom knowledge in a real-world setting, but it also allows our world class engineers to mentor emerging talent and make an impact on the future of the software industry. Jason Chung, our intern for this summer, had the privilege of working with our Systems crew on an amazing project that leveraged machine learning as a tool to build comprehensive software manifests for a given server in our network. Not only did Jason gain experience building a solution in a real world setting, he gained presentation experience when he presented his project to all of FlightAware Engineering, and career guidance via tech and career development talks presented by FlightAware engineers. I invite you to keep reading to hear from Jason about his experience at FlightAware and the great work that he did! ~ Shawn K.

Jason Chung

Hello! My name is Jason Chung, and I’d like to share my experience as a software engineering intern at FlightAware. I’m currently pursuing a master’s degree in Software Engineering at Carnegie Mellon University and expect to graduate in December 2025. During my internship this summer, I was lucky enough to join the Systems Crew, giving me an opportunity to apply my knowledge, expand my skillset, and gain hands-on experience with new technologies. FlightAware is home to some of the most knowledgeable yet humble professionals I’ve met, and it’s been an honor to both learn from them and contribute to the team.

My Project

For my internship project, I explored how machine learning (ML) could help system administrators understand what kinds of software are running on a given server—and do so in a centralized, intelligent, and automated way.

The next section provides some context by describing a common pain point for system administrators at Flightaware, and explains how my project addresses this issue.

The Problem

Site reliability engineers at FlightAware often need to answer a seemingly simple question: What services are running on a host?

In practice, answering this question is far from trivial. FlightAware operates hundreds of servers—also referred to as hosts, a term commonly used in systems administration to denote any machine connected to a network. Each host serves a specific role, ranging from hosting development environments to storing vast amounts of flight data. These hosts run a variety of software components, commonly referred to as services, which are accessed and relied upon by different teams across the organization.

Understanding which services are running on a host is crucial for timely problem diagnosis and resolution. For example, servers that have active services but are labeled otherwise may cause confusion about whether they should be decommissioned, risking suspension of important but untracked services (and vice versa: not getting rid of inactive servers with inaccurate labels).

These services can include everything from Docker containers and background processes to scheduled jobs and system services. While the purpose of a server (e.g., web server, database server) might hint at its services, the reality is that many servers have a combination of running services that make classification harder. Not only that, but these services can change over time, making it harder to identify them.

Why Existing Solutions Fell Short

At FlightAware, NetBox serves as the source of truth for infrastructure metadata—it tracks things like IP addresses, device roles, rack locations, and custom tags. However, in practice, we primarily use NetBox tags to store salt roles, which define a server’s configuration state but don’t offer a complete picture of the actual services running on that host.

This limited use of tags makes it difficult to quickly understand what a server is doing in production. Manually maintaining accurate service information is time-consuming, error-prone, and often out of sync with the real system state. Worse, many servers don’t have complete or reliable tags at all, which makes tasks like root cause analysis and triaging harder than they should be.

Previously, admins relied on a mix of tools to understand a server’s service set:

SSHing into each host manually
For each server:
- List of containerized services
- List of running processes
- Disk usage, file mounts, cron jobs, etc.
NetBox tags—if they exist and are up-to-date

While these tools were helpful, they produced long, inconsistent, and noisy outputs. Not to mention, manually querying, parsing, and interpreting this data on a regular basis wasn’t an ideal solution by any means.

This challenge is also compounded by scale and velocity: FlightAware manages over 700 servers, and it can take several minutes to manually confirm the status of services depending on their nature. Over the last six months alone, approximately 6,700 changes were made to servers in NetBox—about 225 changes per week (this excludes other changes to hardware and network devices as well). Importantly, each change may require service status confirmation, making manual auditing impractical given the volume and frequency of updates.

Proposed Solution

Imagine if a machine learning model could sift through these lengthy, noisy text outputs, recognize patterns, and accurately predict which services are running on a server. These predictions could then be validated against known configurations and reviewed by humans for accuracy.

This is precisely what my project — the Intelligent Tagging System (ITS) — aims to achieve. By leveraging natural language processing (NLP) and machine learning, ITS automatically classifies services based on data from servers. The project can be broken down into several key objectives:

Extracting features from text output using natural language processing
Training machine learning models with the extracted features and existing NetBox data (i.e. existing tags)
Automatically tagging servers with service labels using predictions from ML models with specific levels of confidence
Updating and validating tags in a feedback loop, making the system self-improving and up to date over time

Figure 1: System Architecture of Intelligent Tagging System

Data Collection and Preprocessing

To start, I needed a list of active servers at FlightAware. I obtained this by querying our organization’s NetBox instance through its own NetBox API. Next I built a lightweight tool to automate SSH access to those servers. After gaining access, I collected both structured and semi-structured data from each active server using remote shell commands. These commands focused on three key areas: system metadata, running processes, and application-related files.

Examples of the collected data include:

Containerized services: docker ps provided information on running and stopped Docker containers, including image names, commands, and statuses.
Process listings: ps aux and systemctl list-units --type=service gave visibility into system-wide processes and active services. ps -eo pid,cmd and lsof helped trace running executables and their command-line arguments.
Scheduled jobs: System-wide cron jobs from /etc/cron.* directories, and per-user cron jobs collected via crontab -u.
Network activity: netstat -tlnp and ss -tuln showed which processes were listening on which ports, helping to infer exposed services.
System metadata: Commands like uname -a, free -h, df -h, and cat /proc/cpuinfo were used to capture OS version, memory usage, disk space, and CPU model information.
Configuration and application files: Service-related configuration files were discovered using find /etc -name '*.conf’, and application directories were examined via ls /opt/, /usr/local/, and similar paths.

Each host was also associated with salt roles stored as tags in NetBox. While these roles—used by SaltStack (an excellent and open source automation tool we use for infrastructure administration) to define a server’s intended configuration—were generally accurate, they were often incomplete when it came to identifying all running services. To supplement this, server names and purposes were heuristically mapped to key service labels as well. As a result, the labeling approach was hybrid: combining structured config data with pattern-based inference to generate useful labels for supervised learning.

The outputs were cleaned, normalized, and stored in a lightweight SQLite database. Where applicable, labels were augmented or corrected by hand to ensure quality in the training set.

Feature Extraction (with NLP)

Rather than relying on handcrafted features, I treated system outputs as text documents and used NLP techniques to extract features:

TF-IDF or (Term Frequency–Inverse Document Frequency): Captures important but uncommon tokens like service names or flags.
CountVectorizer: Captures frequency of key terms across command outputs.
word2vec: To capture semantic relationships between similar services or tasks.

This allowed the model to learn patterns from raw outputs, without needing domain-specific knowledge.

Model Training

In our situation where one item can belong to more than one category at the same time—like a server that acts both as a database and as an internal service—using a single-label classification model isn’t enough. Single-label models can only assign one category to each item, so they might incorrectly label a server as just a database or just a service, but never both.

Thus, I decided to use a multi-label classification model, which means the model can assign multiple labels to the same item. This allows it to identify multiple services on a host—such as databases and internally built services—simultaneously.

I tested the following models:

Multi-output Random Forest: Offers strong performance with high interpretability and robustness to noisy data.
Gradient Boosting (XGBoost): Handles complex feature interactions well and often yields high accuracy in structured data.
Logistic Regression with One-vs-Rest (OvR): A simple, lightweight baseline approach where a separate classifier is trained for each label.

To evaluate performance, models were trained on 80% of the labeled servers, while the remaining 20% were held out as a validation set. This form of train/test split is a basic type of cross-validation, helping to estimate how well the model will generalize to unseen data. Cross-validation is important because it helps prevent overfitting—where a model performs well on training data but poorly on new data.

Model Evaluation and Tuning

The models were evaluated using a variety of metrics to capture performance:

F1-score (macro and micro): To balance precision and recall across both common and rare service labels.
Accuracy and AUC-ROC: Used where appropriate, though accuracy alone proved misleading (more on that below).
Confusion matrices: Generated for each service to better understand where predictions were going wrong.

To improve the model's performance, I experimented with different sets of service labels that the models predict to be running on the host systems. The goal was to find a balance between covering a wide range of services while keeping the services simple enough for the model to learn effectively.

I also used a technique called RandomizedSearchCV, which is a fast way to test different combinations of hyperparameters — the settings that control how the model learns. One important hyperparameter I focused on was theconfidence score threshold — the cutoff the model uses to decide whether it's confident enough to make a prediction for a specific service.

As shown below, changing this threshold had a noticeable effect on the TF-IDF model’s accuracy, which is a model that turns text into numbers based on how important each word is in the dataset.

Figure 2: A graph of Confidence Score Threshold vs. Average TF-IDF Model Accuracy

Challenges

One major challenge I encountered was overfitting, especially when working with a smaller set of labeled data. The model initially performed well on training data but failed to generalize, indicating it was memorizing rather than learning meaningful patterns. To address this, I expanded the dataset with more labeled examples, which helped improve generalization—but then I ran into another issue: the model was reporting suspiciously high accuracy scores, often around 99%.

It turned out that this was due to how I was calculating accuracy—by counting total label predictions, including negatives. Since most labels are negative (i.e., a service is not present on a given server), the model could achieve high accuracy simply by predicting zeros across the board. Once I corrected the metric to focus on correctly predicted positive labels, I saw a much more realistic accuracy of around 85%, which better reflected the model’s true performance. At that point, the model no longer showed signs of overfitting and began to generalize well to new data.

Another challenge was dealing with sparse data. Some services were so rare that the model couldn’t learn any reliable patterns to identify them. These infrequent labels added noise and lowered overall performance, so I removed them from the training set to help the model focus on patterns it could actually learn.

Lastly, I found that default model parameters—especially the confidence threshold for positive predictions—weren’t always appropriate. Initially, the classification threshold was set to 0.5, meaning the model would only predict a service as present if it was more than 50% confident. For certain services, this was too conservative and led to many missed positives. By lowering the threshold, I was able to capture more true positives without introducing too many false positives. This highlighted the importance of evaluating and adjusting default parameters rather than relying on them blindly.

Deployment and Feedback Loop

Once the training and prediction pipeline was in place, I built a lightweight Flask frontend to tie everything together. Flask is a lightweight and flexible web framework for Python that makes it easy to build web applications and APIs. Flask is often chosen for projects that need quick development without the overhead of a full-stack framework. It was a perfect fit for what was needed here.. My interface allows users to:

Monitor the status of the NetBox API connection
View all active servers and see when each was last queried
Collect and store system data and service labels from NetBox
Train machine learning models on the collected dataset
Push predicted and validated (and human-reviewed) services labels back into NetBox
Manually review and verify model predictions through a human-in-the-loop validation screen, ensuring service labels are accurate before updating the source of truth

Figure 3: Flask User Interface Preview

The system forms a self-sustaining feedback loop: it collects data from servers, makes predictions, supports validation (automated or manual), pushes updated tags to NetBox, and continues learning as new patterns emerge.

This architecture enables the following capabilities:

Kubernetes deployment for scalable and containerized orchestration
Continuous updates to service labels and periodic retraining of ML models
Auto-scaling of the application based on usage and demand
High availability and fault tolerance via managed replicas, with automatic restarts or container replacement to minimize downtime
Historical tracking of past data snapshots and label configurations for auditing and retraining purposes

Reflection

What’s Next?

While my project sufficiently demonstrated how machine learning could assist in automating service identification, there are still several potential improvements that could be made:

Explore More Advanced Models: While traditional ML models like Random Forest and Logistic Regression worked well, experimenting with transformer-based models (e.g., BERT) or recurrent neural networks could better capture contextual relationships in command outputs and improve classification, especially for more complicated types of software that are harder to identify.
Explore Additional Use Cases: Currently, the model focuses solely on identifying services running on each host. However, this approach could be extended to automatically populate other custom fields in NetBox—such as the server’s purpose, environment (e.g., production, staging), or flags for enabling specific monitoring exports like cadvisor_exporter_enabled. Automating the tagging of these additional attributes could further reduce manual effort and improve data consistency.
Enhance Documentation: Most of the project is well documented, which helps others understand and use the codebase. However, documentation can always be improved—for example, by adding diagrams to visually explain system architecture and workflows. Using “diagrams as code” tools (like Mermaid or PlantUML) to create and maintain diagrams directly alongside code is not a bad idea for the future.
Improve Software Architecture for Maintainability: The software itself could be cleaner and easier to maintain for future developers. One improvement I could have made is designing the system to be more modular and extensible from the start. Even though I have made clear interfaces with separate concerns—such as data collection, feature extraction, model training, and prediction—they can be broken down even further. This approach would enhance the system’s maintainability and ability to be easily modified over time.

Takeaways and Acknowledgments

This project gave me the opportunity to apply machine learning and software engineering principles to a real-world challenge—one with immediate, practical value to the team. I learned how to design and build scalable, production-friendly systems while balancing experimentation with reliability. Just as importantly, I learned how messy, noisy, and incomplete data often is in the real world, and how critical it is approach problems with an unbiased view and open mind. From designing the initial architecture to deploying a working solution, this internship gave me a glimpse into what it takes to build tools that are not only smart, but actually usable by the engineers they’re meant to support.

I'm truly grateful for the support I received from my mentor Jay, my manager Justin, the Systems Crew, and everyone at FlightAware who made this experience so meaningful. Their guidance, encouragement, and patience helped me grow—not just as an engineer, but as a person. Thanks to them, I can move forward with confidence and excitement for what’s ahead. Thanks for reading!

My FlightAware Internship was originally published in Angle of Attack.

FlightAware's ADS-B Flight Tracking Network

Mon, 07 Jul 2025 11:54:32 -0500

FlightAware uses multiple technologies to provide industry leading flight tracking.
One of the key ways this is achieved is through our network of aviation enthusiasts. By hosting a FlightAware FlightFeeder, these volunteers grow our ADS-B network which helps expand our tracking coverage, improves air traffic safety, and improves airline efficiency. In return, hosts receive benefits including: a complimentary FlightAware Enterprise Account, the ability to see all flights that their FlightFeeder sees, detailed stats about their site performance, and more.

What is a FlightFeeder?

FlightAware - FlightFeeder Ver. H11

FlightAware FlightFeeders are devices that receive radio signals from ADS-B transponders that are then decoded into aircraft positions. In other words: it’s a device that tracks the planes around you. While FlightAware does sell variations of our FlightFeeders, we also provide them at no cost to qualified hosts.

Once a host has their FlightFeeder online, they will have the ability to start seeing real time flight positions on their own personal stats page. ADS-B messages are also transmitted to FlightAware, which are then used to provide the public with real time flight tracking information. The data is also used to help airlines and operators solve tough efficiency problems, resulting in more accurate flight time predictions and improved aviation safety overall.

Are you interested in becoming a host? How does it all work?

Phase 1

The process of joining FlightAware’s ADS-B network begins by you submitting a request to see if you qualify for a free FlightFeeder.

To get the application started, a host will be asked to fill out some basic information such as their name and the proposed location where they will be installing their FlightFeeder. We also ask them to give a brief explanation on why they are interested in joining

The proposed hosting location is very important, as it will help us determine potential coverage improvements based on existing network coverage, and the locations proximity to an airport. If the location is directly on an airport, or within 5 miles of one, a FlightFeeder will help provide a substantial increase in airport surface positions, which helps support improved flight time prediction accuracy. Another important consideration is how much existing flight coverage there is in your area. If you live in a remote area, where there is little or no existing flight coverage, installing a FlightFeeder can make a significant impact on the coverage. FlightFeeders installed in remote locations help to fill in gaps where there is no flight data in the network.

If you meet one or both of these criteria, there is a stronger likelihood that your request will be approved.

Phase 2

FlightAware reviews hundreds of requests a week. Applications that are determined to be a good fit for the network will receive a follow up email asking for some additional information.

In this second round of the review process, we focus on the antenna for their FlightFeeder. You will be provided a link where you can add location photos of where you plan to install the antenna. We recommend you choose a location with an unobstructed, clear view of the sky.

Congratulations!

Once a FlightFeeder request has been approved, you will receive a welcome email, followed by a tracking number for the FlightFeeder. The package you receive will contain the FlightFeeder itself, a power supply, an antenna, a cable for connecting the FlightFeeder to the antenna, and a filter to help prevent interference. A FlightAware T-shirt that is exclusive for our FlightFeeder hosts will also be included as a thank you for hosting!

After you have received and installed your FlightFeeder, a free Enterprise subscription that includes personal flight tracking, historical data and map/weather customizations will be automatically activated on your account. The Enterprise account will remain free for you to use as long as you continue to host your FlightFeeder.

If you would like to host but do not live in a qualifying area, you can learn more about how you can contribute and become a part of the FlightAware host network here. We also have an online store where you can purchase a FlightFeeder Pro (a similar model to the FlightFeeder).

How your FlightFeeder makes a difference

With your FlightFeeder, you will have the ability to see live real time data and statistics on your personalized stats page. Additionally, your data will be highlighted on FlightAware’s track logs.

In areas where coverage is needed the most, the difference one feeder can make is remarkable. The examples below show how dramatic improvements were made in surface tracking at airports where there was no existing coverage. You can see how having just one FlightFeeder installed in close proximity to an airport resulted in a complete real time surface coverage solution.

KMEM Before

KMEM coverage with a FlightFeeder installed

Each time an ADS-B position is received from an aircraft on the ground, a blue marker is placed in that location. You will notice in the “before” photos that there are almost no blue markers at all. But in the “after” photos, the surface coverage is so dense that the blue markers have turned into lines that clearly visualize every movement an aircraft makes on the ground at that airport.

KEGE Before

KEGE Coverage with a FlightFeeder installed

Please consider signing up for our newsletter to be notified about ADS-B news!

FlightAware's ADS-B Flight Tracking Network was originally published in Angle of Attack.

Falsehoods Programmers Believe About Aviation

Mon, 02 Jun 2025 11:59:07 -0500

At FlightAware, our software needs to gracefully handle all sorts of weird and wonderful situations. While we as engineers might hope for aviation data to be clean and well-standardized, the real world is messy.

There are a lot of assumptions one could make when designing data types and schemas for aviation data that turn out to be inaccurate. In the spirit of Patrick McKenzie’s classic piece on names, here are some false assumptions one might make about aviation. While many of these are simply common misconceptions, some of these assumptions have bitten our customers at various points, and others have caused issues in our own systems over the years.

Together they are illustrative of the situations that Hyperfeed, our flight tracking engine, is responsible for correctly interpreting in order to provide a clean and consistent data feed for our website, apps, and APIs.

Flights

Flights depart from a gate
Flights that depart from a gate only leave their gate once
Flights depart within a few hours of the time they were scheduled to
Flights depart within a day of the time they were scheduled to
Flights have schedules
Flights take off and land at airports
Airplanes (excluding helicopters) take off and land at airports
Flights are at most a dozen or so hours long
Okay, they’re at most a few days long
Flights are identified by a flight number consisting of an airline’s code plus some numbers, like UAL1234
Flights are identified by either an airline flight number like UAL1234, or the aircraft’s registration like N12345, B6459, or FHUVL
A flight identifier like B6459 is unambiguously either a registration (B–6459), an airline flight number (B6 459), or something else
Flights don’t have multiple flight numbers
Flights with multiple flight numbers unambiguously have one “main” flight number
A particular trip’s flight number(s) never change
The flight number shown on your ticket is what the pilots and air traffic control are using
Flights don’t use the code of some entirely unrelated airline in their flight identifier
No flights use the same flight number within a day
Surely at least no flights use the same flight number at the same time?
Okay fine, separate flights from the same major passenger airline that depart within a few minutes of each other would not both have the same flight number… right?

Airports

Airports never move
Terminal and gate numbers have a consistent naming scheme
Each runway is only used by one airport
Airports always have two unique identifiers: a 4-letter Civil Aviation Organization (ICAO) code and a 3-letter International Air Transport Association (IATA) code
Airports always have three unique identifiers: an ICAO, an IATA, and a regionally-administered location code
The U.S. Department of Transportation assigns one canonical code to each airport it oversees
No airports have multiple IATA codes
The ICAO code for airports in the U.S. always starts with the letter K
For U.S. airports whose ICAO code starts with K, the last three letters are its IATA code
You can tell which geographic region an airport is in from its ICAO code
Everything that has an IATA code is an airport
Everything that has an ICAO code is on Earth
Airports have at least one well-known identifier of some sort

Airlines

No two airlines share the same IATA code
No airlines use multiple IATA or ICAO codes
You can tell what airline is operating a flight by looking at the physical aircraft
Airlines assign flight numbers to specific routes
Airlines only assign flight numbers to flights they operate
Airlines only assign flight numbers to flights

Waypoint names are unique
There is one agreed-upon definition of altitude
Flight information from Air Navigation Service Providers is accurate
Okay, pretty accurate; they wouldn’t indicate that a flight had departed unless it really had
If they indicate that a flight plan has been cancelled, then that flight definitely isn’t going to operate — it wouldn’t simply be due to someone editing the flight plan
At least their radar data accurately identifies each aircraft
Radars with overlapping coverage areas agree on the location of a target they can both see
If they send us a flight plan with the ICAO identifier of a known airport as the destination, then there must have been some intention of arriving there
If an aircraft diverts to another destination, it won’t divert again

Transponders and ADS-B

ADS-B messages only come from aircraft
ADS-B messages only come from aircraft and airport service vehicles
ADS-B messages only come from vehicles of some kind
The GPS position in ADS-B messages is accurate
The GPS position in ADS-B messages is accurate within some known uncertainty radius
ADS-B messages always include the correct flight identification
Transponders are correctly programmed to indicate the aircraft type (helicopter, airplane, balloon, etc)
You can always determine a aircraft’s registration number from its ADS-B messages
Transponders are programmed with the correct Mode S address
All of the transponders on a single aircraft are programmed with the same Mode S address
Nobody will ever set their flight identification to weird things like NULL
People will remember to update the transponder when the aircraft’s registration changes
ADS-B messages are always received exactly as they were transmitted
No one ever transmits false ADS-B messages
Transponders never break and rodents never chew through cables

Thanks to my colleagues who contributed to or reviewed this collection of falsehoods: Mark Duell, Paul Durandt, Karina Elizondo, Matt Higgins, Thomas Kyanko, Nathan Reed, and Amy Szczepanski.

Falsehoods Programmers Believe About Aviation was originally published in Angle of Attack.

Part II: Using Mockery to generate Mocks for Testing in Golang

Mon, 05 May 2025 12:08:10 -0500

In Part I, we talked about mocking some of our functionality using pgxmock. In part 2, we will replace the manual approach with mockery, a tool that automates the generation of mocks. It maintains the same structure and intent as the original article while focusing on the benefits and usage of mockery.

Why Mock the Database?

When writing unit tests, we want to test the logic of our code in isolation. This means we should avoid dependencies on external systems like databases. By mocking the database, we can simulate different scenarios (e.g., successful queries, errors, etc.) without needing a real database connection.

Introducing Mockery

Mockery is a popular Go tool that generates mocks for interfaces. It automates the process of creating mock implementations, saving us time and ensuring consistency in our tests. With mockery, we can easily generate mocks for our database interfaces and use them in our unit tests.

Example: Mocking a Database Interface

Let’s walk through an example of how we use mockery to mock a database interface in our unit tests.

Step 1: Define the Database Interface

We can reuse the same interface from part I:

Step 2: Generate Mocks with Mockery

To generate a mock for the PgxConnIface interface, we use the mockery command-line tool. First, install mockery if you haven’t already:

go install github.com/vektra/mockery/v2@latest

Next, run mockery to generate the mock:bash

mockery --name=PgxConnIface --output=./mocks --outpkg=mocks --filename=pgx_conn_mock.go

This command generates a mock implementation of the PgxConnIface interface in the mocks directory. The generated file will be named pgx_conn_mock.go and will look something like this:

Step 3: Write Unit Tests Using the Mock

Now that we have a mock implementation of PgxConnIface, we can use it in our unit tests. Here’s an example of how we might test functionality that depends on PgxConnIface:

In this test:

We create a new mock PgxConnIface using the generated mocks.PgxConnIface struct.
We set up expectations for the GetFlightByID method using mockRepo.On.
We call the method under test (GetFlightByID) and assert the results.
Finally, we verify that the mocks were called as expected using mockConn.AssertExpectations and mockRow.AssertExpectations(t).

Step 4: Test Edge Cases

One of the benefits of mocking is the ability to easily test edge cases. For example, we can simulate a database error to ensure our code handles it gracefully:

Conclusion

Using mockery to generate mocks for our database interfaces streamlines the unit testing process. It allows us to easily simulate different database scenarios and ensures our tests are isolated from external dependencies. By following the steps outlined in this article, you can start using mockery in your own Go projects to write more reliable and maintainable unit tests.

Happy testing!

Part II: Using Mockery to generate Mocks for Testing in Golang was originally published in Angle of Attack.

FlightAware’s Documentation and leveraging it for Customer Success

Mon, 07 Apr 2025 12:03:30 -0500

This month, Sales is contributing to Angle of Attack in partnership with Engineering!

At FlightAware, a key uniting priority across both our engineering and commercial teams is ensuring our customers have every resource they need to succeed. In this blog post, I’ll detail how our teams provide accessible, reliable documentation to our users, whether you might be browsing our products, or actively working on integrating one of our APIs.

FlightAware’s APIs

For the purposes of this post, I’ll be focusing specifically on documentation for FlightAware’s suite of robust APIs, each designed to serve aviation data needs, including:

AeroAPI – our modern query-based solution, providing customized access to users seeking both real-time and historical data for any application using a RESTful interface on a scale that is flexible and easy to integrate.
Firehose – our streaming-based, enterprise grade solution, that delivers high-volume, real-time flight tracking data directly to customers via TCP socket in a JSON format, ideal for consuming large amount of data as quickly as possible.

Why Our Documentation is Pivotal

Availability of Documentation

Both of our APIs offer publicly facing documentation available to anyone online at FlightAware.com using the links below, no need to be a customer, and no log in is required:

We’re confident that providing well-maintained documentation that’s easily accessible helps users identify solutions to their needs independently – so we make it directly available! This also enables confirmation that FlightAware can provide the aviation data for a given use case at a customer’s leisure. Transparency and ease of access of our product information is a necessity from our perspective.

Enabling Easy & Timely Integration

For each of our respective APIs, FlightAware provides detailed guides and examples, helping developers quickly get up to speed with our products. This includes basic product information such as the available data types, to exact messages and available queries, as well as collections of sample applications to help users move faster to integrate.

AeroAPI and Firehose, while different in terms of data delivery, are designed to enable quick connectivity and data access. For example, AeroAPI allows for self-service access tiers that can be active immediately after sign-up. By providing detailed information in the documentation, you can prepare to integrate and being developing as quickly as you’d like.

Comprehensive Specifications

One of my favorite things to tell users about our documentation is that everything is available online, and that includes in depth technical specifications. Whether it’s all of the available queries in AeroAPI including the payload/response structure examples, or the commands available in Firehose once you establish a connection.

In both products the schema is provided using an open standard and that same schema powers our documentation. Firehose uses JSON Schema for all its possible messages, and this populates our message documentation. AeroAPI uses an OpenAPI (Swagger) spec file to power its documentation portal. In both cases this means product updates are documentation updates, and the schema files are freely available. Internally we use the very same schemas to validate any prospective changes to the interfaces so we can be confident what’s documented is the same as what’s delivered.

Applicable Examples

We’ve designed our pages to address most questions our broad range of users might have, but sometimes, an example is the most helpful thing to illustrate what our products are capable of. This is best reflected in our interactive documentation for AeroAPI, where using an AeroAPI key, you can make actual request via our UI and receive the same results you would get once you actually establish a connection to the API, not only enabling you to verify what query will work best for your use case, but also test how comprehensive our data sets are.

The Firehose Firestarter suite goes as far as to model a complete collection of services that acts as a template for quickly starting with Firehose data. Many concepts for how to maintain an ideal connection and process messages are presented as readable source code. It also functions as a template for architecting a formal solution in with preferred languages and tools. The goal is to provide as much of a turn-key experience as is reasonably possible.

Consistent Updates & Accuracy

Our APIs are constantly evolving, just look at AeroAPI’s Revision History (insert link). With that evolution, it’s critical that we update our documentation timely and accurately with every update, and prior to product releases, our team ensures that documentation will be updated accordingly. Online webpages and Documentation are always the first resource at FlightAware to contain the most recent information about each of our products, and we recommend it be your first line of defense for information pertaining to our APIs.

Tips for Success

Check out AeroApps, our small collection of backend sample applications to help you get started with AeroAPI and Firestarter, our development accelerator that makes it easier than ever to integrate data from Firehose.
Always review Revision History to ensure you’re leveraging all of our capabilities! By staying up to date with API enhancements, customers can continuously improve their applications and leverage new capabilities to stay competitive.
Explore our Discussion Boards, and view Q&A regarding our APIs – you can even ask a question yourself! API Customers can even collaborate here or share advice from your experience with us.
Review the FAQ for each API

Closing

In the future, we’ll continue to provide industry-leading resources for our customers. We believe that great API documentation is not a static asset, but rather a dynamic tool that grows alongside our consistently improving products and our customers’ needs. By continuously refining and expanding our documentation, we’re empowering users to innovate faster, operate more efficiently, and create compelling aviation solutions.

If you haven’t yet visited our Documentation Centers for each of our APIs yet, use the links below to discover how you may be able to leverage FlightAware data to power your next project or product!

FlightAware’s Documentation and leveraging it for Customer Success was originally published in Angle of Attack.

Blast from the past: A new iOS map component for FlightAware

Mon, 03 Mar 2025 16:43:00 -0600

This blog post was originally published in November 2024. We’re publishing it again now because we thought it would be helpful to highlight how and why our iOS map component leverages Apple’s MapKit framework and Apple’s base map tiles as part of the overall solution.

In early 2024, we released a rewritten map component for our iOS app, along with a version packaged as an SDK for external customers. This release marked a major step in our journey to modularize our iOS app, unlocking benefits of code reuse, separation of concerns, and reduced overhead for maintenance and new feature development. This blog post will cover some of the design choices we made for the new map and lessons we learned along the way.

The map is at the center of the FlightAware app, serving as both a data display and an important piece of the navigation experience. It’s interactive and adaptable, changing to show the most relevant information for each screen in the app. Our map is faced with the difficult task of presenting a large amount of information in a way that’s both accessible at a glance and provides detail for users who want to dig in. Its interactivity and tight integration with many screens in our app give it an outsized impact on the way people use the app and how satisfied they are with the experience. In many ways, the map component is central to the use cases many users have for our app. It was challenging and rewarding to work on something that impacts so many users so directly.

We embarked on a project to rewrite the map component for several related reasons. Most importantly, we wanted to offer a flight mapping SDK as a product so that external customers could integrate FlightAware data and maps into their apps with just a few lines of code. We’re a small iOS team and don’t have the capacity to maintain multiple mapping solutions, so this external SDK would have to be produced from the same codebase we use for the map in our app.

We had a map component in our app already, but it had grown organically over 5-7 years and was deeply intertwined with a lot of app-specific code. It would take a lot of effort to extract it from the app, and we’d be left with something that still had significant technical debt and was tightly coupled to our internal REST API. Additionally, the old map had some architecture decisions that would be very difficult to change, such as completely removing and replacing all the aircraft icons on screen when they needed to have their positions or tracks updated. Considering all the effort that would be required to end up with a product that was still less than ideal, we determined that it made sense to start an implementation from scratch while integrating the significant knowledge we’d built designing, maintaining, and extending the previous solution.

Design Choice: MapKit

From the beginning, we chose to use MapKit, the built-in framework on iOS that provides basic map views, base map tiles, and primitives for interaction and drawing annotations and overlays on the map. For those familiar with MapKit, this may seem like an odd choice—MapKit has significant limitations and our application is certainly pushing the boundaries of what’s possible. For example, MapKit doesn’t support user-provided vector map tiles; while the Apple Maps base tiles are vector-based, there’s no way for developers to provide their own vector tile set. This means we can’t directly use the beautiful vector map tiles we developed for the web (check them out on our beta site at beta.flightaware.com/live/airport/KIAH).

We had a few important reasons for sticking with MapKit:

We already had expertise building complex, interactive applications with MapKit—our old map was also built on it. We were able to start simply, using our existing knowledge, and build workarounds or reimplementations as needed when things got more complex, instead of having to start from scratch with a different mapping library.
MapKit provides base map tiles included with our Apple Developer membership. While there are other map libraries available, developers generally have to provide their own raster and/or vector map tiles. Unfortunately, paid tile hosting services, a typical solution for this, are prohibitively expensive for an app with FlightAware’s scale and monetization model. MapKit provides functionally unlimited street map and satellite map tiles at no additional charge beyond our annual Apple Developer membership.
A third-party mapping library would increase the binary size of the SDK integration we’re providing to external customers. Using frameworks that already ship with iOS is an obvious way to reduce the binary size.

Perhaps the most important benefit of MapKit is the map tiles provided by Apple. Creating, maintaining, and updating a detailed, interactive map of the world is an extraordinarily difficult task, requiring the collation and curation of hundreds or thousands of individual datasets, each in a different format and with varying quality, as well as cartography and styling work to make the map visually appealing. Apple has already done all this work for their Apple Maps service and provides third-party developers like us access to the map tiles to use as a “base layer” in our maps. This saves us a significant amount of time and money. Without Apple Maps base tiles, we would have to either dramatically improve the detail of our own vector map tiles and potentially rasterize them (a large effort for each step), or pay for a third-party service like Mapbox or MapTiler, who maintain basemap tiles themselves and charge by the request. That would be prohibitively expensive for an app at our scale, and most iOS apps choose to use Apple’s no-cost map tiles for similar reasons.

It’s important to note that this applies to our iOS app. In our Android app, we use map tiles provided by Google, which match the tiles available on Google Maps and reflect Google’s data collation and curation. This is a common practice in most Android apps, just as using Apple’s tiles is in iOS apps. On our website, we use tiles sourced from a variety of providers.

From left to right, Apple’s built-in Maps app, the FlightAware app, and the Zillow app. You can see that it’s very common for iOS apps to use Apple’s base map tiles. When third-party apps use Apple’s tiles, they must display the “Apple Maps” attribution in the lower-left corner, which you can see in the FlightAware and Zillow apps to indicate the source of the tiles.

While MapKit is what we’re familiar with and it provides map tiles at no cost, it also has some significant limitations. For example, MapKit has some interesting performance behavior when dealing with many annotations or overlays comprising many points, which required us to implement specific workarounds to improve performance. For example, we found a severe memory leak related to MKTileOverlayRenderer when many flight track polylines were drawn on the map and had to implement a custom renderer to work around it. The map project as a whole provided countless opportunities to hone our skills in CPU and memory profiling using Instruments and implement evidence-backed performance improvements to mitigate specific issues.

MapKit also has limitations in terms of interactivity, animation, and design customization with the built-in primitives, so in some cases we had to reimplement significant parts of built-in functionality to achieve a desired effect. For example, our callout views and the logic to show and hide them are entirely custom because MKCalloutView doesn’t provide room for customization. The developer-facing interface of MapKit hasn’t changed significantly since it was introduced in iOS 4 (built around raster tiles provided by Google Maps) other than some new feature additions. MapKit is almost 15 years old and definitely showing its age in some places.

Although we built the new map around MapKit, we did put a lot of consideration into its limitations. Keeping them in mind, we designed the MapKit integration so that MapKit could be swapped out for another map library such as MapLibre Native. For example, positions and flight tracks aren’t “lowered” to MKAnnotation or MKPolyline until we are about to display them on the map—the rest of the map code uses data structures that are completely agnostic to the mapping library. This is a significant improvement from the old map’s codebase, which used MapKit types extensively in its business logic. In the new map component, the MapKit-related code is clustered together and separated from the message pipeline containing the core business logic, so it would be relatively straightforward to write code specific to another mapping library and connect it to the end of the pipeline. This leaves us the ability to switch to something different down the road while allowing us to ship something that works today.

Design Choice: Messages

Early in the development of the new map, after some significant prototyping, we decided that the flight information would be represented as a “pipeline” of different components sending “messages” to each other in sequence. A message represents the complete state of a flight at a point in time, including its position, track, and other identifying information. Each component would have a single responsibility, like applying a smoothing algorithm to the flight track or choosing the color and other style information for the aircraft icon. After going through all the processing components, the messages would arrive at the end of the pipeline and be used to update the annotations and overlays (aircraft icons, flight tracks, etc) displayed on the map.

Simplified architecture diagram of the pipeline-based design

In Swift, we expressed this with some protocols, structs, and enums as follows:

We chose to implement our own set of protocols instead of using something like Combine or AsyncSequences because we didn’t feel AsyncSequences were mature enough at the time and Combine seemed like a lot of syntactical overhead to accomplish what we needed. This implementation gives us flexibility to write simple, readable, easily testable code without adding incidental complexity.

The pipeline-based design has several important benefits:

Its orientation around messages, each focused on a single flight, means that it can easily adapt to both “push” data sources, where updated flight positions are streamed to clients in real-time, and “pull” data sources, where clients periodically poll an API to get updated information. The old map was mostly built around “pull” datasources, so this new design added the flexibility for us to use both kinds of datasources without needing to account for the difference in the rest of the map code.
It minimizes the number of places where state has to be maintained. Most of the components in the pipeline are stateless, using only the information present in the messages (which may have been added by previous components) to perform their task. Fewer places to store state and no need to access state across components means fewer opportunities for state to be out of sync and clearer logic in each component. This in turn dramatically improves the development and debugging experience for the whole project.
The components are individually testable and have well-defined interfaces between them. Unit tests can be written for each component, sending messages to it and verifying the messages it generates in response. Since each component is responsible for a single task (many are less than 100 lines of code), their behavior is easy to reason about and easy to write tests for.
The components are composable. They can be rearranged or removed entirely to change or remove behavior. For example, there’s a component whose job it is to manage what time is currently being displayed on the map, which is involved with animation of plane icons and the flight replay feature. This means there’s one place to maintain the time information, one place to change it, and all the time logic can be completely removed from the pipeline (for example, to debug something, or if animation or replay functionality is not desired) by simply removing the time component. The idea of abstracting the time handling logic into a single component was a major “a-ha moment” during the prototyping stage and one of the most compelling reasons we chose the message-based design. It significantly simplified and centralized the logic compared to the previous solution and unlocked possibilities we didn’t have before.

The pipeline-based design is an implementation detail of the map component, and is not reflected in the public API surface, so consumers don’t have to worry about configuring the specific pipeline components to produce the display they want. Instead, they specify what they want in a declarative way and our code configures a pipeline to display it.

Design Choice: Modular Data Sources

We also knew early on that we wanted our new map to be able to get flight data from multiple FlightAware data sources. For much of the development process, we weren’t sure what the solution for external customer data access would be. We also knew that we’d need to use our existing internal REST API when the map was integrated in the iOS app, but that we’d eventually be migrating away from this API to use something built on modern technologies in alignment with FlightAware’s broader modernization project. With all this in mind, we decided that the data source layer for the map would have to be modular, with well-designed abstractions of flight data that were agnostic to the specific API from which the data originated. We put a significant amount of effort into designing and evolving these abstractions. This decision paid off immensely in several ways:

It allowed us to swap in code to read data from files instead of an API, accelerating development and enabling reproducible tests without dependencies on external systems. For example, one of the problems we ran into frequently with the old map was debugging the way a flight appeared on the map at a certain point during its flight. Issues were often entirely impossible to reproduce later, as our flight tracking system had processed different data in the interim. With our new design, it was trivial to build a data source that reads API responses out of files, which allows us to capture the API output at the exact moment the problem is happening and debug against exactly that data until the problem is resolved. We also leveraged this facility to create synthetic data representing very rare scenarios, like a flight that crosses the antimeridian in both directions. This was a significant improvement for us and saved a lot of time that would otherwise have been spent searching for real “edge-case” flights.
It enabled the same components of the pipeline described above to be reused without regard to the specific data source. Additionally, we evolved the abstraction in line with building the components, so it’s designed for the best developer experience in Swift rather than being tied to the structure provided by any of our backend APIs. For example, we make heavy use of optionals, structs, and enums, instead of relying on primitive types like strings that might be empty or numbers that represent a particular unit of measure and may require conversion to be used correctly.
It let us put off deciding on the solution for external customer data access until very late in the development process. This in turn allowed us to align that decision with other teams at FlightAware who were working on similar solutions for our web-based products, saving overall time and effort and avoiding redundant solutions. When we finally had a solution identified, we were able to write only the code needed to interface with that solution and map its responses into our abstraction. The rest of the processing and display logic “just worked” without additional changes required. This was a rewarding validation that our approach to abstraction had saved significant time and effort.

Even though the long-term plan is for all consumers, internal and external, of our map component to be using one standard next-generation data source, this design still has developer-experience and testing benefits. While that data source isn't in production yet, this design lets us ship our new map and get it in the hands of our users now.

What we learned

While building the new map component, we learned several important lessons and built lots of new knowledge on the team. I’ll discuss some of the insights we gained around two major areas: testing techniques and building SDKs for external consumption.

The new map component represented a major leap in our team’s testing practices and capabilities. We weren’t in the greatest starting position: the old map had no automated tests of any kind, making it hard to identify regressions. The problem is compounded by the fact that putting flights on an interactive map has many edge cases which don’t come up often in manual testing, but that we want to get right. Building the new map component was an opportunity for a fresh start with better testing practices, and we took full advantage of that opportunity.

We set a goal that as much of the new map’s feature set as possible should be covered by some sort of automated test. As discussed above, our architecture choices made unit testing easy. As a result, we have several hundred unit tests exercising the individual components. Every time we find and fix an issue that the tests missed, we add a test to make sure we don’t break it again in the future. Our unit tests have been mostly reliable and have helped catch many regressions early in the development process. In addition, we found that the act of writing the tests helps us think about the code from a different perspective, which frequently leads to bugs or edge cases being discovered much earlier than if the tests had been written at a later time or not at all.

At the same time, our team is pragmatic about code coverage. We view it as just one tool in our toolbox, not a metric for which an arbitrary threshold must be enforced. We write unit tests where they make sense, but we’re not extensively changing our code just so we can test it or writing pointless tests for the sake of increasing the coverage a few percent. This pragmatic approach is common throughout FlightAware Engineering.

Not all the code we wrote for the new map was easily unit-testable. For example, all the code dealing directly with MapKit to display flight data on the map isn’t something we could easily unit-test. To test code like this, we rely on integration-style tests. Because the ultimate goal is to draw correct results on the map, it makes sense to test what ends up being drawn on the map. This doesn’t work with traditional UI testing approaches, which use the accessibility hierarchy to find views on the screen and assert on various properties (text value, visibility, etc) of the view. Instead, we use “snapshot testing”, which involves taking a screenshot of the map after it’s rendered a particular situation and doing a pixel-for-pixel comparison with a known-good image. We built a small test app with a UI that’s easily automated by the built-in XCUITest functionality and used the SUITCase library for screenshot comparison because many of the other libraries don’t correctly snapshot MKMapView.

The test app we built to facilitate snapshot testing. By comparing screenshots, we can test the map contents as well as the state of the replay mode UI, all of which would be difficult to unit test without architectural contortions.

While our snapshot tests have been useful, they have also been somewhat unreliable due to bugs in MapKit and UI automation interacting with the app. We struggle with false failures that require further investigation. On the whole, the snapshot tests are a useful tool when combined with nuanced manual review of the results.

In addition to new testing techniques, we also learned a lot about building an SDK for external consumption. This is something that no one on the team had done before, so we learned as we went.

We carefully evaluated the public API surface to ensure that functionality accessible to 3rd-party developers is coherent and easy to use. With a Swift-only framework, there are no header files, but the public interface is available in a .swiftinterface file generated by Xcode in the compiled framework bundles. We wrote comprehensive documentation on the entire public API, making sure to keep the needs of an external developer in mind.

Perhaps the most important mindset shift we adopted while building the SDK was a focus on binary size. Swift is known for large compiled binary sizes, so we had to analyze and improve ours by aggressively stripping unnecessary information from the binary, refactoring our internal dependencies to pull in only what was needed, and compressing data assets required for the SDK to function. For example, we had a Swift package with a single target containing all the code to interact with our internal REST API. The map component only needs to talk to a few endpoints, so we split the package into several targets and only imported the necessary ones into the map component. This saved several megabytes of compiled code size.

To cap it all off, we built a script that runs in our CI and monitors the change in compiled binary size with each PR to ensure we’re always considering this attribute that’s very important to our customers.

Conclusions

Overall, the project to build a new map component for our app and to deliver an external map SDK was a success. Since we shipped the new map component in our app, we’ve been able to add several map features and iterate based on customer feedback with very little additional effort thanks to the extensibility of our design. We’ve been able to dramatically clean up our codebase around the map integration, ensuring greater separation of concerns and easier maintainability for the future. We learned a lot, especially around testing techniques and building an SDK for external consumption, and we’ve already started applying some of these learnings to our other iOS development at FlightAware. And at the end of it all, we have a new native iOS map component that’s flexible and extensible, setting the stage for new map features and improvements we’re hoping to introduce in the future, as well as a drop-in SDK for customers to easily integrate our flight maps with just a few lines of code.

Blast from the past: A new iOS map component for FlightAware was originally published in Angle of Attack.

Overhauling Authentication at FlightAware

Mon, 03 Feb 2025 12:15:13 -0600

As FlightAware moves away from its monolithic Tcl tech stack to a distributed micro service architecture, many core services need to be split out from the monolith to keep the system running. Perhaps most important is authentication: a software product needs to have a means of knowing who you are so that it can serve you appropriate, helpful, and actionable information, and make it possible to ensure that your information does not end up in the hands of others (a topic about which we will have more to share later). To that end, this year we launched a new authentication solution thoughtfully designed around a modern approach to building web products.

Our previous solution for authentication was a Tcl library within our monolith. However, we are now moving towards serving multiple independent apps in our monorepo. We no longer use Tcl for building new services, so we needed a new approach to authentication. At a high level, this new approach has the following requirements:

It needed to support passwordless logins so that FlightAware is no longer in the business of managing sensitive password credentials.
It needed to have first-class support for Next.js, as our new apps are being built with that framework.
It needed to support multiple apps well; if you are on one product and move to another, you don’t have to sign back in.
It needed to be usable in our Tcl monolith. You could perhaps call this item (3a), but this one is unique because it is completely different from Next.js, and sees by far the largest portion of our web traffic as of January 2025.

Early on in the project, we decided that the NextAuth.js library was the best tool for the job. It supported all of the login types we wanted without us having to write them from scratch, was designed with Next.js in mind, and offered JWT authentication, which seemed ideal for providing authentication services to multiple Next.js apps through a common API interface.

However, as the project progressed, we increasingly found some of NextAuth’s design decisions at odds with our requirements. It was incredibly difficult to control what NextAuth put in its refresh tokens, and their size was getting unwieldy. Performance concerns were popping up. Finally, what broke the camel’s back was the realization of how much work it would take to implement the ability for users to manage other sessions, particularly deleting a different session from the one they were currently using, purely using NextAuth. The decision was made–we were ripping NextAuth out of the project and writing our own backend in Go.

We have already written extensively about Go at FlightAware. It is considered a first-class language in the organization, with plenty of custom libraries already written, such as support for OpenTelemetry, or integration testing with Redis and Postgres. While we were losing first-class support for Next.js (now we had to write a client library for downstream Next.js apps ourselves), we were getting the fine-grained control of our tokens that we wanted, as well as resolving our performance concerns overnight with the move to a pre-compiled, optimized binary. Next.js would act only as a frontend, leaving all authentication and database tasks to the Go backend.

With the Go backend, we were free to determine our own optimal authentication strategy. We already knew that we wanted to continue with the JWT strategy for the ability for apps to authenticate you without constantly making database calls. However, we also wanted session lifetimes to be independent of JWT expiration, or even refresh cookie expiration. So, we implemented a JWT expiration of five minutes, after which point your JWT is refreshed using a refresh token also stored in browser cookies. That call does hit a session database, but since it only takes place every 5 minutes at most, database load is much less of a concern. This strategy gives us the fine-grained control of session cookies with the optimized performance of JWTs. It’s nearly identical to the approach Clerk outlined in a blog post of their own, and it increased our confidence in our decisions to see others arriving at similar designs independently.

One of the biggest challenges in the project was architecting support for the Tcl monolith. Not only did we need to update the massive codebase to support a completely different authentication scheme but we also needed to move every FlightAware user over to the new system seamlessly. As a result, we would need to support 2 completely different authentication systems for as long as it took to migrate everybody over. We set about this by silently integrating the monolith with the new authentication system over the summer of 2024. At that time, anybody who had wanted to could have logged in using the new system, but since it was hidden, only our engineers were using this system. Once we were ready to bring more people on and test the migration portion, we forcefully migrated all employee accounts. This was not without some hand-wringing, and we had to do the employee migration 3 separate times to address issues that came up. This logged employees out, and was inconvenient, but everybody was a good sport about it, thankfully.

Another unexpected challenge was moving every login link in the Tcl codebase over to the new system. The Tcl monolith is a massive combination of Apache Rivet pages, React apps (over 30 of them!), and handlebars templates, each technology implementing its own method of link rendering. Some pages opened a login modal, others linked to a login page on the site. At the time all of those were written, the link was one of two possible constants, but as this was no longer the case, a lot of links had to be rewritten in a way that was surprisingly difficult to make reusable. Going into the launch, we were not 100% confident that we had addressed every single instance (it was also surprisingly difficult to grep), so we added some code to redirect the old login path to the new one as a way to add assurance.

The migration used a custom endpoint that took the user’s old session cookie as well as some information about the user to verify the authenticity of the request, and then wiped the old session, replacing it with the new one, and redirected back to the page the user was on before. We also implemented a gradual rollout of the migration based on user ID, so if unanticipated issues arose during the launch, we had the option to pause it and keep the site running for the users yet to be migrated.

Launch day went about as well as one can hope. Migration of users began in the morning, and even after only migrating 1% of users, we found issues–either from undocumented bespoke account configurations or significantly increased system load–that could not have possibly been found in employee testing and were able to fix them on the spot. We were able to complete the migration that afternoon.

So, what did we learn from this project?

Perhaps the most important takeaway was to keep Tcl things in Tcl. As a part of our modernization effort, we adopted a rule that we would not write new code in Tcl unless absolutely necessary. However, the monolith still needs to run and deliver for customers while we work on the pieces of its replacement. In the design process for these core services, it can be tempting to follow this rule to the letter and change how we would build our authentication app to support the monolith better as-is. However, this would be at the expense of the developer and user experiences with the new apps. Not being afraid to build new things in Tcl when it is appropriate paid dividends and will continue to do so.

We also saw the value of owning your core services. We built this authentication solution in about 1 year with about 3 FTEs. Although everything we build incurs a maintenance debt, we can be confident that the service will not require the same amount of engineering resources going forward. Comparing this to the alternative of handing authentication off to a third-party vendor, the organization is seeing six-figure cost savings annually from this choice alone.

Since authentication is a core dependency of delivering services to customers, the service working correctly must take precedence over rushing its delivery. I don’t think we could have done anything differently here, but I do think we felt the pain of not getting this into the hands of users as quickly as possible, as evidenced by the bugs we could only find on launch day.

Although I had some experience modifying small parts of other Go projects at FlightAware, this was my first foray into a full Go project written from scratch. Coming from Next.js as my bread and butter, I had to leave the comfort zone of a metaframework (since Next.js is a framework on React, which is itself a JS framework) and learn how to work outside of these constraints. Ultimately, I found this to be freeing, and highly recommend the language.

I’m excited to see how what we’ve learned from this one informs how we launch our new flight tracking experience, built with the same modern principles and technologies, in 2025.

Overhauling Authentication at FlightAware was originally published in Angle of Attack.

Flight Page Variations and How They Impact Testing

Wed, 08 Jan 2025 12:47:44 -0600

The flight page is the single most-viewed page on the FlightAware website, and as FlightAware QA, ensuring that it works as expected is one of our most important duties. The flight page changes how it displays data and what data it displays significantly for different flights, different users, and different access methods for the page, and we (QA) must ensure that we document and test the relevant variations whenever the flight page is changed. In this blog post, I’m going to be describing several of the ways that the flight page displays data, as well as the testing perspective for a given view of the flight page. In many cases, we have well-established Playwright automated UI tests to help prevent regression, but there are often specific details that must be considered for manual testing, or unique options or pages that have been created to help test certain niche functionality.

COMMERCIAL VS GENERAL AVIATION FLIGHTS

The flight page’s layout when viewing a commercial flight is geared towards passengers and those who are following their progress. Because of this, the arrival and departure information are much more prominently displayed than on a General Aviation (GA) flight page, and gate information is present. Commercial flight pages will also detail information about the flight’s airline and include a link to the airline’s website.

The layout for GA flights, on the other hand, is geared towards owners and operators, formatting its data in a way more convenient way to them and providing them with appropriate options. If you are the flight’s owner or operator, you will have options on the flight page to make certain choices for your flight or change flight times for future flights. The information you’ll see on the page will cover various details including takeoff and landing times, origin and destination airports, and weather information in more detail than the general public’s view. The flight log will show more extensive information as well and will include a schedule visualizer table if you have multiple flights in the near future.

General Public View

Flight Owner or Operator's View

From a tester’s perspective, the Commercial and GA flight page layouts are the two primary layouts for the flight page. These views are well-covered by automated tests and are exercised by almost any manual testing done on the flight page. Each is always considered at every step when making changes or implementing new features that impact the flight page, including design, development, and testing. Additionally, the GA flight page is tested to ensure that it shows the expanded options and data to its owner/operator (and not to anyone else), while the commercial flight page should display the airline’s logo and information properly. Automated and manual tests are set up to ensure these work as expected.

MOBILE

The main difference between the mobile and desktop flight page is that the mobile view contains all the information on the page in one condensed column, rather than two. We also hide the map replay functionality to make the page easier to navigate, and different ads are displayed to accommodate the smaller window size and single column.

New features and updates are tested on both mobile and desktop platforms, utilizing Playwright for any automated tests as that is the standard UI testing framework at FlightAware. Mobile does present some testing difficulties (such as testing various device screen sizes), but modern browsers and Playwright have ways to handle them and make it easy to simulate a mobile environment while testing. Most of FlightAware’s web traffic comes through mobile, so that platform is always an important consideration for us when testing changes to the flight page.

USER PERMISSIONS

While flight type (commercial vs GA) and device type (desktop vs mobile) are some of the biggest factors for determining flight page data and layout, there are a multitude of further different options and views that can cause additional changes. Various subscriptions give users access to different tiers of data, and it’s important that we keep these straight. The map within the flight page has options that vary per subscription level as well. Higher tier users will have access to more map layers options and overlays, as well as privileged data when displaying the flight path on the map. This allows higher-tier users to have more extreme levels of precision for their flight paths, especially in areas where ground data is harder to collect, such as in the middle of an ocean.

Changes to our permissions functionality are rare, mostly occurring when a new subscription is introduced. As with any new feature, these updates are extensively tested, both manually and with automated tests. When it comes to permissions, our automated testing is extensive; ensuring each subscription type works as expected is one QA’s highest priorities, and so there are a numerous Playwright tests covering that functionality. These types of changes also involve extensive manual testing; each existing subscription with intersecting data undergoes thorough regression testing to validate that it continues to behave as expected.

PERSONALIZATION

Beyond the subscriptions they’ve signed up for, users can also personalize the flight page by making selections determining how data is displayed. They can configure their preferred timezone and time format (12 or 24 hour, the flight’s local time(s), their local time or Zulu time), how airport codes are displayed (3 or 4 letter codes, or both, and if both which order), and various measurement unit settings (speed, distance, altitude, and fuel burn can all be configured individually). Additionally, they can toggle “aviator mode”, which makes commercial flight pages display their data using the general aviation layout. They can also, of course, change the website’s language, which may also rearrange the UI elements if a right-to-left language is selected.

These changes can be somewhat finicky to test, due to their modular nature. While some of these options have automated tests in place, it’s largely up to the tester to determine what manual testing needs to be done for each change and how best to do it. This varies depending on what parts of the flight page any changes will affect. For instance, when adding new airport runway data to the flight page, changes to airport name and time display options had to be considered due to nearby UI elements. We needed to ensure that even very lengthy names and times did not push any of the new fields out of alignment, even on smaller mobile devices. While most of these options only make minor changes on the UI, it’s still important that the flight page update properly and look good with each possible set of options. In many of these cases, the best solution for testing them is relying on QA experience and skill in manual/exploratory testing to ensure that everything important is verified.

OTHER CHANGES

In some instances, we’ve built tools to exercise functionality for a given set of options. For instance, we’ve created a rough catalog of different types of maps with an admin page where you can select one of the map types, have it find, or generate, and display said map so you can easily test any niche functionality. For instance, older flights show restricted weather data and cannot be shared via email, and position-only flights won’t have departure or arrival airport information. Certain maps are also used by airlines to track or display flights on their websites or at airports, and we can easily access testing versions of those through this page as well.

As another example, we have an alternate view of the entire website that’s accessed through a different portal which must be exercised when testing UI changes. This alternate view changes the format of the times and airports on the flight page as well as removing any images present. To test that functionality, we have an option on our test environments themselves to simulate that alternate portal which we can toggle on as needed. This option, and others specifically designed to help us test niche but important functionality, are very useful when releasing new features or updates and go a long way towards preventing regressions. They’ve been developed and maintained over time to help solve the problems we’ve run into while testing over the years and have become invaluable.

CONCLUSION

The flight page has experienced significant growth since it was released many years ago. A lot of new functionality has been added, a lot of existing functionality has been modified, and a lot of new subscriptions and user types have been added to our website. As the FlightAware website and its flight page continue to evolve, it’s important to document the functionality that’s present and ensure that it keeps working as expected. And so, as QA, we endeavor to do just that: keeping track of various bits of obscure functionality and different ways the flight page displays; doing whatever we can to best test it. We’ve been largely successful on this front thanks to a mix of automated tests, manual testing, special tools for accessing alternate views, or any combination thereof.

Flight Page Variations and How They Impact Testing was originally published in Angle of Attack.

The Playwright Advantage: Strategies for Effective Test Automation

Mon, 02 Dec 2024 10:06:17 -0600

At FlightAware, one of our key priorities is to consistently deliver high-quality software. We are committed to maintaining and continuously improving product quality, benefiting both our customers and engineering teams. Test automation is crucial in achieving this goal. FlightAware crews and the QA wing constantly strive to maximize our test automation coverage, ensuring faster software delivery without bugs and regressions.

For our testing strategy, we try to evaluate code from all possible angles by implementing comprehensive test coverage for unit, integration, UI tests, and performance tests. We also remain cognizant of the test turnaround time (yes, we try to prevent test builds from taking several hours to finish)

Last year, when I joined FlightAware, I became part of an exciting crew responsible for a technical transformation effort called WebNXT, with a goal to reimagine FlightAware’s web stack utilizing modern languages, frameworks, and tools. As part of this effort, our crew began migrating legacy FlightAware maps and pages and adopted a monorepository structure for newly developed applications and libraries.

When selecting a test automation framework, there were several factors we considered for decision-making, like how tests can provide faster feedback to developers and how can we achieve extensive coverage, includes running our tests on cross browser, cross-platform, and mobile web. We evaluated options like Cypress, Selenium, and Playwright. After a few proof of concepts, and carefully weighing the pros, cons and efficiency, we decided to adopt Playwright for building and maintaining our end-to-end testing suite.

In this blog post, we'll delve into why we chose Playwright as our test automation framework, discussing the decision factors and benefits we’ve gained from it. I'll also showcase how integrating key features of Playwright has positively influenced our development and testing lifecycle.

Cross-browser

Cross-language

Mobile view emulation

Cross domain testing

Network interception and mocking

Integrating Playwright in a Monorepo Environment

Configuring Playwright in Monorepo application was quite simple. It took me less than 30 minutes to setup our app and run our first mock test. Quick, right? You can refer to Nx’s readme on Playwright on how you can set it up for your app. Once our app was migrated to the monorepo, here’s how I created an end-to-end project for it.

This command created an e2e tests folder adjacent to existing app-1. Since I wanted to run my tests against the local development server, I passed the webServerCommand and webServerAddress optional arguments.

playwright.config.ts

Did you notice something on line 2 in the code snippet above? Well, that’s the root-level baseConfig I created for all my apps. This type of configuration abstraction helps us maintain common settings leveraged by e2e tests and avoid writing repetitive boilerplate configurations across different apps. Pretty Neat!

Example, base-config playwright.config.ts

For reference, here’s What your project would look like…

Write your first test script in the e2e folder and simply run the command nx e2e app-1-e2e , and you will have your first successful test run.

Network Interception and Data Mocking for Enhanced Testing

One of the challenges of testing applications like FlightAware is validating the canvas map and related objects like data tables (like our newly designed flight cards), which are dynamic and updated based upon real-time flight data. We addressed this challenge by integrating Playwright’s network interception capabilities and incorporating mocking in our test workflow.

For instance, the following code intercepts a graphql request for flightBox operation and injects mock data in real-time to generate an expected flight within the flight card. Depending upon the operation, we may also use it to render an expected flight path on the map.

This workflow addresses two challenges:

We can capture a screenshot of such a flight path and compare it with the baseline image through visual testing.
We can validate the mock data in the tables to make sure they are correctly rendered.

You can utilize network interception in test automation to essentially block or mock any request to meet your goals. For instance, one can use it to block ads on the webpage if testing ads isn’t a primary test goal.

This is one of the great features that Playwright offers compared to its counterparts. During our proof of concept with Cypress, we faced challenges intercepting and mocking requests in Firefox and WebKit (Safari's rendering engine). While Cypress allows you to disable chromeWebSecurity in Chrome, there’s no simple way to do this in Firefox or Safari, which makes intercepting cross-origin requests more challenging. Although Cypress primarily supports Chrome, it can be used with other browsers, but there may be limitations.

Insights Unleashed with enhanced Reporting

Test reporting enhances transparency and visibility into overall quality and coverage for stakeholders. We at FlightAware try to keep our testing cycles concise and effective by running tests at different layers, and timely reporting of the test results is an important objective. This enables our crew and QA wing engineers to track any test failures, facilitate rapid resolution in case of a bug, and mitigate false positives quickly.

We’ve integrated Slack reporting into our Playwright tests to achieve this goal. Integration is straight-forward - you can install playwright-slack-report using a simple npm command - npm install playwright-slack-report -D.

Hook this package into your playwright.config.ts

Report Notification

There are several customizable features available for reporting. For example, we tag the relevant engineer or team alias whenever a test build fails, ensuring a quick, direct response which results in minimum delays in identifying and resolving issues.

Optimize Efficiency with Performance Testing

Have you ever had an opportunity to visit our newly built surface maps ? Take a look at Atlanta Airport and you’ll notice a high volume of flights displayed on the map, and they are all updating in real-time. As part of our overall testing strategy, we aim to ensure that our releases do not negatively impact performance.

We’re taking advantage of Playwright’s performance monitoring capabilities, which leverage CDP (Chrome DevTool Protocol) in capturing the performance metrics during the test runs. Our primary focus is to monitor and record JSHeapUsedSize and CPUUsage metrics of a webpage over an extended period of time while simulating various user actions. This kind of test helps us identify potential memory leaks and page crashes due to code bugs.

We collect these metrics during our test runs and then use the data collected to plot a chart to demonstrate the JSHeapUsedSize over the duration of the test.

The direct integration with Chrome DevTools Protocol (CDP) gives Playwright a significant advantage over other testing frameworks like Selenium and Cypress. While Cypress offers some basic performance testing features, such as measuring page load times and intercepting network requests, Playwright excels with its CDP integration. This allows for more detailed performance monitoring and the ability to collect comprehensive performance traces.

Advanced Playwright Testing: Managing Browser Contexts

Browser context in Playwright is an isolated environment within a single browser instance, allowing multiple tests or sessions to run concurrently without interference. Each context behaves like a separate browser, with its own: Cookies, Cache, Session storage, etc.

This feature enables us to simulate multiple users or sessions, allowing us to test various scenarios, such as concurrent logins or different application states for the same user. For example, session management is a key functionality that can be automated using this capability. You’ve probably come across functionality (such as the one on FlightAware’s website) that lets users manage all their active sessions in one place—viewing, deleting individual sessions, or clearing all sessions at once. With Playwright’s ability to create multiple contexts, we can simulate and manage multiple user sessions for the same user within a single test.

Here’s a very simple example demonstrating how to use browser contexts in Playwright. This example creates multiple browser contexts within a single test and verifies that the sessions and cookies are distinct for each page.

The ability to use browser contexts in Playwright offers distinct advantages over other frameworks. Cypress lacks built-in support for multiple contexts, making it challenging to manage separate user sessions. While Selenium does allow for multiple browser instances, it does not provide the same level of context isolation. This means we often need to manage cookies and local storage manually across sessions, adding complexity to our testing.

Some helpful Tips & Tricks

Following are the tricks that have been super useful for me; sharing them here in case they help you

If you want to run tests on a specific platform, such as mobile web, then just include the keyword “mobile” in the test suite description and use the following example config.

You can use environment variables to trigger condition based behavior. For instance, if you intend to send slack reports only for scheduled job runs, refer to the example code snippet in the Reporting section above.
If your test is verifying numerous hyperlinks by navigating back and forth, and experiencing stability issues, consider using native JS commands instead of Playwright options.

Configuring global setup and teardown files can be useful for performing certain operations at the beginning of your test suite execution. For instance, you might want to run a proxy server before any tests begin and stop it after all tests have completed. While hooks like before, beforeEach, after, and afterEach are available in different testing frameworks, they may not be efficient for parallel test execution. Additionally, many frameworks require extra boilerplate code for global setup and teardown. In contrast, Playwright streamlines this process by allowing you to configure a single shared environment for all tests, making resource setup more efficient by only requiring it once instead of multiple times.

Wrap-Up Thoughts

We are continually exploring and integrating various features of Playwright into our testing strategy. For example, we are currently utilizing multiple browser context capabilities to write test that simulates multiple user or sessions. We are also enhancing our testing framework by incorporating parallel testing features. Overall, we’ve found Playwright to be highly effective in strengthening our testing strategy.

Whether you're just starting with test automation for a new application or aiming to optimize your current setup, the insights shared in this post may help you leverage some exciting features that Playwright offers. We encourage you to experiment with the capabilities discussed and see how they can transform your development and testing lifecycle.

Happy automating! 😄

The Playwright Advantage: Strategies for Effective Test Automation was originally published in Angle of Attack.

A new iOS map component for FlightAware

Sun, 03 Nov 2024 22:24:09 -0600

Earlier this year, we released a rewritten map component for our iOS app, along with a version packaged as an SDK for external customers. This release marked a major step in our journey to modularize our iOS app, unlocking benefits of code reuse, separation of concerns, and reduced overhead for maintenance and new feature development. This blog post will cover some of the design choices we made for the new map and lessons we learned along the way.

Design Choice: Messages

Simplified architecture diagram of the pipeline-based design

In Swift, we expressed this with some protocols, structs, and enums as follows:

The pipeline-based design has several important benefits:

Its orientation around messages, each focused on a single flight, means that it can easily adapt to both “push” data sources, where updated flight positions are streamed to clients in real-time, and “pull” data sources, where clients periodically poll an API to get updated information. The old map was mostly built around “pull” datasources, so this new design added the flexibility for us to use both kinds of datasources without needing to account for the difference in the rest of the map code.
It minimizes the number of places where state has to be maintained. Most of the components in the pipeline are stateless, using only the information present in the messages (which may have been added by previous components) to perform their task. Fewer places to store state and no need to access state across components means fewer opportunities for state to be out of sync and clearer logic in each component. This in turn dramatically improves the development and debugging experience for the whole project.
The components are individually testable and have well-defined interfaces between them. Unit tests can be written for each component, sending messages to it and verifying the messages it generates in response. Since each component is responsible for a single task (many are less than 100 lines of code), their behavior is easy to reason about and easy to write tests for.
The components are composable. They can be rearranged or removed entirely to change or remove behavior. For example, there’s a component whose job it is to manage what time is currently being displayed on the map, which is involved with animation of plane icons and the flight replay feature. This means there’s one place to maintain the time information, one place to change it, and all the time logic can be completely removed from the pipeline (for example, to debug something, or if animation or replay functionality is not desired) by simply removing the time component. The idea of abstracting the time handling logic into a single component was a major “a-ha moment” during the prototyping stage and one of the most compelling reasons we chose the message-based design. It significantly simplified and centralized the logic compared to the previous solution and unlocked possibilities we didn’t have before.

Design Choice: Modular Data Sources

It allowed us to swap in code to read data from files instead of an API, accelerating development and enabling reproducible tests without dependencies on external systems. For example, one of the problems we ran into frequently with the old map was debugging the way a flight appeared on the map at a certain point during its flight. Issues were often entirely impossible to reproduce later, as our flight tracking system had processed different data in the interim. With our new design, it was trivial to build a data source that reads API responses out of files, which allows us to capture the API output at the exact moment the problem is happening and debug against exactly that data until the problem is resolved. We also leveraged this facility to create synthetic data representing very rare scenarios, like a flight that crosses the antimeridian in both directions. This was a significant improvement for us and saved a lot of time that would otherwise have been spent searching for real “edge-case” flights.
It enabled the same components of the pipeline described above to be reused without regard to the specific data source. Additionally, we evolved the abstraction in line with building the components, so it’s designed for the best developer experience in Swift rather than being tied to the structure provided by any of our backend APIs. For example, we make heavy use of optionals, structs, and enums, instead of relying on primitive types like strings that might be empty or numbers that represent a particular unit of measure and may require conversion to be used correctly.
It let us put off deciding on the solution for external customer data access until very late in the development process. This in turn allowed us to align that decision with other teams at FlightAware who were working on similar solutions for our web-based products, saving overall time and effort and avoiding redundant solutions. When we finally had a solution identified, we were able to write only the code needed to interface with that solution and map its responses into our abstraction. The rest of the processing and display logic “just worked” without additional changes required. This was a rewarding validation that our approach to abstraction had saved significant time and effort.

Design Choice: MapKit*

MapKit also has some interesting performance behavior when dealing with many annotations or overlays comprising many points, which required us to implement specific workarounds to improve performance. For example, we found a severe memory leak related to MKTileOverlayRenderer when many flight track polylines were drawn on the map and had to implement a custom renderer to work around it. The map project as a whole provided countless opportunities to hone our skills in CPU and memory profiling using Instruments and implement evidence-backed performance improvements to mitigate specific issues.

Despite all these limitations, we had a few important reasons for sticking with MapKit:

We already had expertise building complex, interactive applications with MapKit—our old map was also built on it. We were able to start simply, using our existing knowledge, and build workarounds or reimplementations as needed when things got more complex, instead of having to start from scratch with a different mapping library.
MapKit provides base map tiles included with our Apple Developer membership. While there are other map libraries available, developers generally have to provide their own raster and/or vector map tiles. Unfortunately, paid tile hosting services, a typical solution for this, are prohibitively expensive for an app with FlightAware’s scale and monetization model. MapKit provides functionally unlimited street map and satellite map tiles at no additional charge beyond our annual Apple Developer membership.
A third-party mapping library would increase the binary size of the SDK integration we’re providing to external customers. Using frameworks that already ship with iOS is an obvious way to reduce the binary size.

Although we built the new map around MapKit, we did put a lot of consideration into the limitations discussed above. Keeping them in mind, we designed the MapKit integration so that MapKit could be swapped out for another map library such as MapLibre Native. For example, positions and flight tracks aren’t “lowered” to MKAnnotation or MKPolyline until we are about to display them on the map—the rest of the map code uses data structures that are completely agnostic to the mapping library. This is a significant improvement from the old map’s codebase, which used MapKit types extensively in its business logic. In the new map component, the MapKit-related code is clustered together and separated from the message pipeline containing the core business logic, so it would be relatively straightforward to write code specific to another mapping library and connect it to the end of the pipeline. This leaves us the ability to switch to something different down the road while allowing us to ship something that works today.

What we learned

In addition to new testing techniques, we also learned a lot about building an SDK for external consumption. This is something that no one on the team had done before, so we learned as we went.

Conclusions

A new iOS map component for FlightAware was originally published in Angle of Attack.

Building a Bridge from Tcl to Rust

Mon, 07 Oct 2024 12:40:05 -0500

Much of FlightAware is implemented in a scripting language called Tcl, which has served us well since our inception nearly two decades ago. But looking around at today’s software ecosystem, it’s difficult to claim that Tcl will continue to be the best choice for our needs in the future.

For example, we routinely need to maintain our own Tcl bindings and implementations for commonly used software such as Kafka, Prometheus, and PostgreSQL. In most other language ecosystems that are prevalent today, there are widely used packages that provide this functionality which don’t require us to build them in-house.

A few years ago, FlightAware made the decision to move away from Tcl and embrace several other language ecosystems that are a stronger fit for our work by defining our “First-Class Languages”: Go, Rust, and Python (along with application-specific languages such as Typescript and Swift as appropriate).

When I joined FlightAware on the Flight Tracking team in March 2023, several projects using these new language selections were already underway. However, one system that was still firmly rooted in Tcl was Hyperfeed.

Hyperfeed is FlightAware’s core flight tracking engine. It’s responsible for fusing incoming flight information from disparate sources and producing a single, consistent data stream of live aircraft activity worldwide. The resulting feed is consumed by the many downstream services that show flights on our website and mobile apps, use ML models to predict departure and arrival times, detect aircraft flying in holding patterns, provide data to customers through our REST API and our streaming API, Firehose, send alert emails and push notifications, and much more. If Hyperfeed isn’t running, time is frozen; no aircraft move on our maps, no alerts get sent, and the data shown on our website and in our APIs gets stale.

As you might imagine, we approach changes to this critical system with care. Hyperfeed encodes subtleties learned from years of experience tracking flights, and there are plenty of examples throughout software engineering's history that demonstrate the risks of attempting to rewrite such a large and complex system from scratch.

Instead of rewriting Hyperfeed from scratch in a new language, we have chosen a path of incremental improvements — leveraging our first-class languages — that will morph the system over time into one that no longer depends on Tcl.

In particular, we’ve chosen to use Rust because it’s one of the languages several members of the team are already familiar with and it has good Foreign Function Interface (FFI) support, which is crucial for integration with Tcl.

The goal is that by gradually factoring out sensible abstractions into Rust, we can both strengthen the structure of Hyperfeed and immediately reap the performance benefits of a compiled language. The type safety that Rust brings to the table will also be a welcome addition to the codebase.

The heavy lifting is being done by the tcl crate (crates are the packaging mechanism for the Rust ecosystem). To help manage the interactions between all of the native code involved, we are using Nix (which is used heavily across FlightAware).

There are a few layers to this, and you can follow along with the complete source code here. First, we have the pure Rust library implementation in src/greeter.rs:

use std::fmt::Debug;

#[derive(Debug)]
pub enum Language {
    English,
    French,
    Spanish,
}

/// Greet returns a greeting message in the preferred language for the provided recipient.
pub fn greet(who: &str, lang: Language) -> String {
    match lang {
        Language::English => format!("Hello, {who}!"),
        Language::French => format!("Bonjour, {who}!"),
        Language::Spanish => format!("Hola, {who}!"),
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_english_greeting() {
        assert_eq!(
            greet("Alice", Language::English),
            "Hello, Alice!".to_string()
        );
        assert_eq!(
            greet("Alice", Language::French),
            "Bonjour, Alice!".to_string()
        );
        assert_eq!(
            greet("Alice", Language::Spanish),
            "Hola, Alice!".to_string()
        );
    }
}

src/greeter.rs

Notice that this is where we’re putting our tests for the library logic, and there are no mentions of Tcl whatsoever. With this in place, we can build a Tcl wrapper using the tcl crate in src/lib.rs:

use tcl::reexport_clib::{Tcl_Interp, TCL_OK};
use tcl::*;
use tcl_derive::proc;
use version::version;

mod greeter;

#[derive(thiserror::Error, Debug)]
enum Error {
    #[error("unsupported language")]
    UnsupportedLanguage,
}

/// Initialize the Tcl module.
///
/// # Safety
///
/// This function uses unsafe calls to Tcl's C library.
#[no_mangle]
pub unsafe extern "C" fn Greeter_Init(interp: *mut Tcl_Interp) -> u32 {
    let interp = Interp::from_raw(interp).expect("No interpreter");
    interp.def_proc("::greeter::greet", greet as ObjCmdProc);
    interp.package_provide("greeter", version!());
    TCL_OK
}

#[proc]
fn greet(who: String, lang: String) -> Result> {
    let lang = match lang.as_str() {
        "en" => greeter::Language::English,
        "fr" => greeter::Language::French,
        "es" => greeter::Language::Spanish,
        _ => return Err(Box::new(Error::UnsupportedLanguage)),
    };
    Ok(greeter::greet(&who, lang))
}

#[cfg(test)]
mod tests {
    use super::*;
    use tcl::Interpreter;

    fn setup_interpreter() -> Interpreter {
        let interp = Interpreter::new().expect("Could not create interpreter");
        unsafe {
            Greeter_Init(interp.as_ptr());
        }
        interp
    }

    #[test]
    fn test_greeting_english() {
        let interp = setup_interpreter();
        let code = "
            ::greeter::greet Alice en
        ";
        assert_eq!("Hello, Alice!", interp.eval(code).unwrap().get_string());
    }

    #[test]
    fn test_greeting_unknown() {
        let interp = setup_interpreter();
        let code = "
            ::greeter::greet Alice xx
        ";
        assert!(interp.eval(code).is_err());
    }
}

src/lib.rs

The primary objective of this layer is to handle the translation between Tcl and Rust. For example, the greet function annotated with the #[proc] macro converts language names passed as strings from the Tcl interpreter into Language enum values. Another interesting thing to note here is our ability to test the interface between Tcl and Rust by evaluating some Tcl code.

Next, we can write a small Tcl program to test out our new library:

package require greeter

foreach lang {en fr es} {
    puts [::greeter::greet "world" $lang]
}

test.tcl

And finally, we’ll use Nix to enter a shell with our compiled Rust library and the exact version of Tcl that it was built against, and run our sample program:

We’ve already used this approach to implement a Rust version of one of our internal libraries. Since Hyperfeed’s deployment process has historically been focused around Tcl, our release process didn’t really have a “build” step where Rust code could be compiled. With some updates to our release tooling, we are now able to build a single release artifact that contains Tcl scripts along with compiled Rust libraries and push it to each of the Hyperfeed servers. As a result, we can now begin to introduce Rust implementations into Hyperfeed in production.

Building a Bridge from Tcl to Rust was originally published in Angle of Attack.

2024 Intern Summer Projects

Tue, 03 Sep 2024 12:41:12 -0500

As a Software Engineer on FlightAware’s Mobile team, Samantha Turnage works to refine and maintain the FlightAware iOS app. In addition, she was the intern coordinator for 2024.

This summer, FlightAware welcomed five interns from various parts of the country. These interns worked closely with our engineering team on their projects, achieving remarkable results that they later showcased to the entire company. Throughout their time with us, they also connected through social events and engaged in tech and career development discussions led by our engineers. We’re excited to share the outcomes of their efforts and spotlight their achievements. - Samantha T.

Kirthan Reddy

Hi! My name is Kirthan Reddy and I’m a software engineering intern at FlightAware. I’m a master’s student at NYU Tandon studying computer science and plan to graduate by December 2024.

My Project

My project this summer was to create a full-stack web application that would test the strength of the ADS-B signals captured by PiAware devices. PiAware is a Raspberry Pi powered radio receiver that captures ADS-B signals from aircraft in the sky and sends this data to FlightAware servers for processing. There are more than 34,000 PiAware devices that currently make up FlightAware’s flight intelligence network. We want to make sure that these devices can maximize signal coverage to improve flight intelligence at FlightAware.

To measure signal strength, I used a metric known as RSSI (received signal strength indicator), which measures the power of a radio signal. RSSI values closer to 0 indicate a stronger signal reception while those farther away from 0 indicate a weaker reception. I maintained a hash table in my application where the key represented the icao code (unique callsign of an aircraft) and the value represented a queue of up to 50 RSSI values. I computed the median of each queue and then calculated the median of the medians.

I built my app using a FastAPI backend and React frontend component. I used TypeScript to streamline error checking for the React app. FastAPI continuously reads ADS-B signal data from dump1090 (decoded data from software defined radio), performed the median calculations on the data, and streamed that data to React using a WebSocket connection. The WebSocket connection was implemented with the Socket.io library and FastAPI was served by Uvicorn, an asynchronous web server implementation.

The React webpage uses D3.js (a data visualization library) to render the average RSSI values within the last five minutes in a line chart. This chart was continuously updated, showcasing the average values within a shifting five-minute time window. This way the user could track the changes in RSSI values as they changed locations.

This Received Signal Strength (RSSI) graph displays the signal strength quality of all the aircraft messages your ADS-B receiver is currently receiving. RSSI values closer to 0 indicate a stronger signal reception, while those further away indicate a weaker one. This is intended to help you determine the placement of your antenna for optimal ADS-B signal reception.

To test the app, I moved the PiAware device to different rooms in my house and my backyard to see if the line chart showed marked differences in RSSI values. I encountered some issues with noticing any differences in the RSSI values. I tried changing my algorithm and insulating my antenna with tin foil/saran wrap. I noticed that it did stop sending data when the ProStick was disconnected from the device. The algorithm still needs more work in my opinion to ensure it can produce measurable differences in RSSI values.

I packaged and deployed the app as a Debian package. A Debian package starts up at boot time and runs in the background. This way the user wouldn’t have to worry about running the application themselves. All in all, I learned a lot from this project. There were a few challenges I encountered along the way like stabilizing the WebSocket connection and setting up the Debian package, but I wouldn’t trade my experiences for anything.

Chad Fusco

Hello world, my name is Chad Fusco, and it’s a pleasure to share some insight into my experience so far this summer as a software engineering intern at FlightAware! I’m pursuing my master’s in computer science at Rice University. Previously, I earned an MS in Civil Engineering from Case Western Reserve University, and I have since pivoted into the exciting, fast-paced world of software. I also completed the software engineering immersive program at Hack Reactor a couple years ago, where I got my initial taste of modern full-stack web development. During my internship this summer, I was lucky to be fully integrated into the Web Core team, enabling me to put my knowledge into practice and further grow my skillset. The people at FlightAware are outstanding, and it’s been a honor to work alongside them and learn from them.

My Project

My project is to build a revamped version of the Account Management pages using Next.js, TypeScript, GraphQL, and other modern technologies. This project is a part of FlightAware’s WebNxt initiative, which you can read more about here. The goals of this project are two-fold:

Migrate the Account Management frontend out of the current TCL monolith and into a Next.js app using modern tooling within an Nx monorepo (more on that later).
Create a more dynamic, clean, and enhanced user experience. See the screenshots below for the current version and the proposed future version.

Current Account Management experience

Future Account Management experience

The new Account app is one of the first Next.js apps at FlightAware to be built in the web monorepo, which will eventually be used to house the entire FlightAware frontend. The monorepo architecture solves many of the issues with a monolith architecture (scalability, tightly coupled components), while also solving many of the issues with a strict microservice architecture (siloed code, complex deployment). We use Nx for our monorepo, which offers excellent CI tooling and can intelligently detect which projects are affected by a change, limiting builds only to those projects. You can read more about FlightAware’s decision to use a monorepo here.

New Library UI Components

The first thing I needed to do was build the foundational building blocks for this application. I used this as an opportunity to add new components to the monorepo UI library for use by the Account app and any future applications. This was a fun exercise in abstract thinking, building generalized React components with Tailwind’s utility classes. Some of the new components are shown below:

New React UI Components

Over the summer, I also participated in the WebNxt Reading Group, a technical study group organized by Jared Harvey for the web-focused software engineers at FlightAware. This phase of my project coincided with the group’s discussions on accessibility, semantic HTML, and responsive design, which was perfect timing! It was great to share ideas not only with my own team (Core Services), but with the Apps team, QA, and other engineers in the broader Collins Aerospace.

Through creating these modularized components, I strengthened my knowledge in web accessibility and responsive design, and got practice writing unit tests in Vitest, React Testing Library, and Jest.

App Architecture

Next, I needed to architect the application. The Account App is a Next.js application. Next.js offers a wealth of great features, including Static Site Generation (generating pages at build time), Server-side Rendering (generating pages on the server on each request), Server-side and Client-side Routing, and near-zero-configuration API routes called “Server Actions.” It also has some other nifty features like Layouts to share UI between routes, automatic route prefetching with its Link component, and image lazy loading and auto-sizing with its Image component. I’ve also found Next.js to offer a very nice developer experience.

I decided to implement the app as a multi-page application rather than a single-page application (SPA). This allowed for future expansion of the app. It’s also expected to lead to better performance. Since each page is fetched separately from the Next.js server, more HTML can be rendered on the server, either at build or request time, and each request is for a portion of the app rather than the entire app. This results in smaller bundle sizes for each page, meaning less code for the client to download. A high-level diagram of the application is provided below. As an approximate mental model, I think it’s helpful to imagine each page as a separate React app, with Next.js’s Node runtime orchestrating the pages. Each page has Client Components, which are hydrated with Javascript sent from the server (each Server Component page) as a bundle.

High-Level Account App Multi-Page Mental Model

The architecture of each page went through a few iterations, with the first two shown below in simplified form. The orange boxes are Server Components, while the yellow boxes are Client Components. My initial thought was to fetch the user data from the Profile Page and pass it as a prop to the ProfileForm. The user would make some changes to their account and press Submit to send the updates to a Server Action, which would forward the update to the GraphQL Router. Once the GraphQL Router confirmed the update, the Profile Page would refresh.

However, refreshing the page turned out to not work as expected. Next.js provides a .refresh() method on its useRouter hook, which should, in theory, make a new request to the server, refetch the data from the GraphQL Router, and rerender the Profile Page all while maintaining the ProfileForm’s state and the user’s scroll position on the page. Curiously, this wasn’t functioning as expected, even with using revalidatePath to purge cached data for the route.

Besides this technical hiccup, the idea of the page “owning” the user data instead of the form isn’t intuitive in terms of data hierarchy. The Profile Page component barely does anything with the data besides passing it down to the ProfileForm component. Shouldn’t ProfileForm own and manage the user data? This led to the revised architecture shown below.

Original vs Revised Architecture for the Account Next.js App

Every decision comes with a tradeoff. The downside to this revision is that more code would be downloaded from the server. Therefore, I ended up implementing a hybrid approach, where the “initial” user data is fetched by the Profile Page on the initial page load and passed to the ProfileForm, which manages the data from there on. This logic is demonstrated in the abbreviated code sample below from the Profile Page Server Component. Note how this takes advantage of Next.js’s seamless way of running JavaScript (JS) code on build, request, or client-side:

The form field options (e.g. in a Select dropdown menu) are not user-specific and therefore can and should be generated at build time.
The initial user data clearly can’t be fetched at build time, but can be fetched server-side at request time, so this code goes inside the Profile() function.
Finally, because it is a Client Component, the JS inside ProfileForm is run client-side and consists of the logic for updating and refreshing the user data.

Connecting to the Backend

Using GraphQL

FlightAware’s WebNxt initiative is more than just frontend - it includes backend, too! In parallel with my work, two engineers working on the backend - Gabrielle Toutin and Gerald Burkholder - developed UsersNxt, a new GraphQL service for all user data. I used the urql npm package to interface with UsersNxt via FlightAware’s new GraphQL Router, which implements a federation of subgraph microservices (UsersNxt being one of them). A GraphQL API has a number of advantages over the more traditional REST API:

Prevents overfetching and underfetching data. GraphQL is a “query language for APIs.” The client gets exactly the data it needs. Facebook originally developed this because it wanted to minimize the data transferred over the network to mobile devices.
Faster development speed. We only need to define one endpoint. There is no need to figure out what endpoints you need and what data structures they should return ahead of time. We just define a data schema and we’re done.
More flexible than REST. This is similar to the point above. There’s no need to go through multiple development iterations of the REST APIs to add/change endpoints as client needs evolve.
Allows analysis of what data the client is requesting. Each client specifies exactly what information it’s interested in. This can help evolve an API by deprecating fields that are no longer requested.

These advantages greatly benefitted the Account App. Nearly all the data required for each page was unique to that page. Using the GraphQL API allowed me to request precisely what I needed for each page, while also allowing the flexibility to move form fields from one page to another, without needing to create a new Jira ticket to update the backend.

Automating GraphQL Typing

One of the common challenges when using different languages for the frontend and backend is that it disrupts the consistency of type definitions across the application. We would like to know what types the GraphQL API returns, and we’d like to enforce those types on the frontend in an automated fashion to prevent runtime errors. How can we do this?

To accomplish this goal, the monorepo uses the graphql-codegen npm package (Codegen). It is a plugin-based tool that can be used to generate GraphQL schema and operation TypeScript (TS) types on the frontend, and typing for the backend as well. In short, it gives you a fully typed TS response object when you place an API call to a GraphQL API. It accomplishes this by using GraphQL Introspection to fetch the types defined at a target URL.

For example, for the GraphQL query below …

Codegen would generate the GetUserDocument object in the code example below:

As a side-task, I put forth a small proposal to change how we use Codegen in the WebNxt monorepo. Currently, Codegen is configured and run separately in each monorepo project. This is a leftover from projects that were originally developed outside the monorepo and then migrated in. The main downside to this is that the schema types, which are not project-specific, are generated separately for each project, and their generation is coupled with the generation of the operation types, which are project-specific.

I proposed setting up Codegen in one centralized location in the monorepo to have the schema types created in one file. I also proposed some configuration improvements, including migrating away from the typescript-urql Codegen plugin to reduce bundle size, doing a small study quantifying the savings.

After I shared the proposal with the other web developers, Gabrielle suggested an enhancement to further streamline the whole process: setup a GitHub Action to automatically create a PR to update the schema type when they change on the server. What’s great about this is that it truly isolates the schema type updates so that when developers run Codegen as part of a feature ticket, they are only updating the project-specific operation types. This prevents polluting PRs with unrelated changes and reduces unnecessary builds.

Authentication

Security was a critical consideration in the development of the Account App. The Account App uses AuthNxt, the new WebNxt authentication service developed by Jared Harvey, Will Koury, and Joe Polastre, to ensure users visiting the app are who they say they are prior to allowing them to see and modify their account information.

AuthNxt uses token-based authentication, which offers advantages over session-based authentication. Using digitally signed tokens makes the process stateless and decentralized. The authentication server doesn’t have to remember which sessions are still valid, and multiple servers can serve the public key needed for the resource servers to validate requests.

Token-based authentication was new to me, and although it was largely taken care of and abstracted away by AuthNxt, I felt it was important to understand the fundamentals of how it works. The general flow is:

Final Reflection

I feel fortunate to have landed this internship at FlightAware. If somebody were to tell me three years ago that I’d be contributing to the very app I’ve been using for over a decade, I wouldn’t have believed them. The work here is varied and fascinating, and it’s great to be a part of it. Thank you to Jasmine, Samantha, Gabrielle, and everyone else who put together this internship program.

Lastly, I would be remiss if I didn’t send out two special thank you’s:

First, to my mentor, Jared Harvey. I don’t know if you volunteered to do this or were “volun-told,” but I don’t think I could’ve asked for a better mentor! Thank you for being so generous with your time, even through your big move to the East Coast. I learned a ton from you and look forward to more collaboration in the future.

Second, to my manager, Andrew Lewis. Thank you for your time and guidance over the summer. I felt trusted to deliver from the get-go. I appreciate that trust, which made me feel free to take initiative, explore new ideas, and grow professionally. Thanks for allowing me to join your ranks!

Bryan Garcia

Hello! My name is Bryan Garcia. This fall will be my last semester at California State University, Fullerton for my bachelor’s in computer science. After receiving my diploma in December, I plan to pursue my career in software engineering, particularly in full-stack web development. Outside of work, I love to camp, play video games (especially my favorite game Old School RuneScape), and hit the gym--this summer I achieved personal record of 225lbs for 9 reps on bench press!

This internship was a truly amazing opportunity to work with some of the most amazing and talented people. I met a lot of cool people, including my mentor, my manager, fellow interns, intern coordinators, and everyone else who are a part of the apps team crew. Some of my favorite moments included attending the web learning sessions, where we discussed and presented on topics related to our tech stack, as well as participating in the fun games with the other interns, like Among Us (I got kicked off the ship when I was innocent).

Content Page Modernization Project

With the help of my manager, mentor, and apps team peers, I created a web service that generates static HTML pages from JSON. This project has been part of an effort to modernize the FlightAware website by updating the stack to Next.Js and consolidating the codebases of all web services into a monorepo. This has been an exciting project to work on and a huge learning experience for me.

My first assignment was migrating the current industry pages over to the new app. The problem with the old industry pages was that they were hosted by a 3^rd party vendor, which made it difficult to align with the rest of the FlightAware site (e.g. adding a new header or any type of custom component would not be possible) and required manually coding each page. The new industry pages would be completely managed within the monorepo--in other words, they (and any future content pages) would now be hosted within FlightAware, along with all the other apps within our monorepo.

With the industry pages being managed and served from the monorepo, they will have access to all the other monorepo shared libraries. Every web service would use the global component library and have the latest design and functionality at the same time.

Content Rendering

The content page modernization project is special in how the content is handled. Once a content management system has been built and setup, it will be a seamless process for any staff member to come in, add, edit, or delete a web page or any of its components. For now, the pages are managed within a JSON-formatted Typescript file that defines the content of any given page.

In this example snippet of our industry airlines page, we have an array of sections that are rendered in their respective order, each with their own identifier: Hero, Intro, BenefitCards. These identifiers are used to indicate which component will be used for that set of data. Let’s take our Intro set for example, when it’s time for the Intro component to be rendered, the content generation app will read the id for intro, match that with the corresponding intro component, pass the data into the component, and then render the component with the given text and styling.

We are able to access the industry airlines page because we named our section as ‘airlines’ in the data structure. To access this page, we will visit the airlines industry page at ‘/industry/airlines’. The app knows that in order to access this data, we have to take a look at the parameters for our URL and access the content with the matching parameters.

The way that the project is structured allows for staff members to come into the file, easily add, edit, or delete components from any page without the hassle or worry of breaking anything else in the project. Everything is built to render any change made to the data structure. In the future, a proper content management system will be added that will generate and update this data structure through a more user-friendly interface.

Components

Let’s talk about the components. Currently for the industry pages there are a total of 10 different components, each with their own data types. Let’s dive into one of my favorite components, the benefit cards.

Each benefit card has its own heading, body, links, and image. However, we can modify content position, background style, and text format all through the data structure. Here are some examples below:

Image position on right with light theme

Image position on right with dark theme

The FAQ section was another component that had its own interesting challenges. This component was built purely with CSS and no JavaScript. The animations to reveal the answer to a question while closing any other open questions was all done through CSS. It was very tempting to reach for a JavaScript library, but I knew HTML/CSS was capable of handling this interactivity. Ultimately, creating it from scratch without JavaScript was not just a rewarding challenge to tackle, but also a way of enhancing the overall web experience:

· The component can be server-side rendered and immediately interacted with by a user without having to wait for a JS file to be downloaded and run.

· Using semantic HTML to develop the accordion follows Web Accessibility best practices

The body text for the answer section also allows for semantic markup. Meaning we can easily add bold, italicize, and underline text, as well as add links to the text as the content pleases.

The Result

Here’s what the old airlines pages currently looks like (left) and what the modern pages will look like (right, work in progress):

Challenges

One of the more challenging parts of this project and internship was overcoming a lot of my own personal social anxieties and imposter syndrome (definitely not an among us reference). After meeting and getting comfortable with my peers, it was incredible how easy it was to work along-side them. Feeling that sense of belonging within work culture makes a huge difference in how productive and impactful one feels in their job.

On the technical side, my challenge was understanding and applying SSR (server-side rendering) and CSR (client-side rendering) knowledge into the project. I had very little understanding of what that meant when first starting off, but now I have a stronger grasp that I can hopefully one day apply at my next position. SSR and CSR have their pro’s and con’s when it comes to rendering components. SSR allows for components to be rendered on the server, allowing for the server to do all the heavy lifting of compiling and creating static html, which improves SEO and allows for pages to be cached and loaded almost immediately on subsequent visits. The disadvantage is that you cannot create components that are reactive, they are required to be static once they are served to the client. CSR dynamically renders components on the client-side through the browser’s DOM, allowing for dynamic changes and reactivity. The disadvantage is that pages are required to use JavaScript in order to render these components/use their functionality (hydration). Pages cannot be cached for faster serving, and SEO could be impacted depending on how the page is built.

Conclusion

Overall, I thought this project was very interesting because it was my first time dealing with TypeScript and learning about web development best practices. I learned a lot about tailwind CSS, dynamic importing, and redirecting users within the NextJS project configuration. I learned about the deployment process within FlightAware, including all the Kubernetes and cloudflare configurations. I learned a lot of industry practices including the concept of agile methodologies, team coordination, and peer reviewing. I am grateful for this opportunity and cannot wait to apply what I have learned here in the future and I’m looking forward to learning more and creating bigger and more vast projects.

Jin Woo Oh

Hello! My name is Jin Woo Oh, and I’ve worked as a Software Engineering Intern on the Predictive Technologies Crew at FlightAware this summer. I studied mechanical engineering at the University of Illinois at Urbana-Champaign (B.S.) and Seoul National University (M.S.) then shifted my career to machine learning by pursuing an online master’s degree at Georgia Institute of Technology in data analytics. Using machine learning models to help make data-driven decisions and solve real-world problems has been something that I was deeply intrigued by.

Working at FlightAware provided me with an opportunity to challenge myself by tackling unfamiliar problems and learning new tools. I had the fortune of working with talented and humble engineers that helped set a standard for me to strive towards in my nascent career in software engineering. Ultimately, this internship helped me understand that passion and willingness to learn can take me far even if I might be coming from a different academic background.

One of FlightAware’s well-known flagship products is Foresight Labs. It is a publicly accessible web service that provides industry-leading machine learning predictions for arrival runway probabilities over time, and predicted gate arrival and taxi times for a particular flight at each possible arrival gate. The system works by taking feature vectors for various flights from the Redis server, which receives metadata, feature vectors and prediction quantiles from two sources: firehose and eta-streamer-rs. Firehose is a public service that provides real-time flight metadata, such as aircraft positions and flight plans, in the form of a raw feed. eta-streamer-rs is how the Predict team makes ETA (EON and EIN) predictions available to downstream consumers – it draws data from controlstream (which is the main data feed within FlightAware). The feature vectors from the Redis server are parsed and sent to the Triton inference server over GRPC to make predictions. All of this process is wrapped in an open-source, interactive Python framework called Streamlit.

Figure 1. System architecture of Foresight Labs

During my time at FlightAware, I developed an API for the “What If My Gate Changes” feature on Foresight Labs, which I named “FLAPI”. It makes connections directly to Redis and Triton servers to fetch metadata, feature vectors and prediction quantiles to return predicted taxi-in times for a hypothetical gate change that occurs for a specific flight.

There were a few reasons behind the need for this project. The first and foremost was the long-term goal of expanding the functionality of FlightAware’s AeroAPI, which is a simple, query-based API product that gives developers access to a variety of FlightAware's flight data. The purpose of my project was to create a prototype API that would serve as a steppingstone in that long-term goal, all while developing a tool that can easily access the predictive capabilities of Foresight Labs for internal use within the company. There were also clients who were interested in seeing an API for Foresight Labs, as well.

On a personal level, what really motivated me to pursue this project was the learning opportunity it provided. It aligned very closely with my goal for the internship, since I got a chance to try something new to me such as building an API, getting a taste for all of the steps in product development from creating my own application and improving it to version control and deployment, using vital orchestration tools like Docker and Kubernetes in an industry setting, and much more. The icing on the cake was that I got to work with a live product and contribute to a developmental process that could lead to another product, which is not something that an intern generally gets a chance to do. In retrospect, it was also a great problem to tackle because it required me to seek help from many people, both from within my team and outside, who all had their own areas of expertise to impart knowledge on the various steps that I was stuck on along the way.

The very first step of my project involved setting up my dev environment and cloning the repository to begin working in that local environment. The initial short-term goal was to get that cloned local copy of Foresight Labs running successfully. This helped catch two birds with one stone, since the process helped me do a deep dive on all parts of the code base and figure out which of the functions in the code base were viable candidates for my API.

One of my most memorable lessons in this internship came early. Because it was my first exposure to working with industry-level code base, I found it to be a daunting task. But my mentor Nathan helped get me started by going through the early parts of the code and explaining the algorithm step-by-step. This was one of the key learning moments during my internship and for my career in software because it helped me realize that it’s okay to be intimidated by long blocks of code. When I worked on my personal projects, I was used to running code line by line on Jupyter notebook to analyze other people’s programs. Nathan assured me that it’s okay do that and encouraged me to use a method that works for me. Going through a program line by line allowed me think for myself on the direction that the program was headed and the intention behind the algorithm, which helped me look at the code in blocks instead of lines, and eventually look at hundreds of lines of code in sections that I could comfortably visualize. Eventually, when I finally got Foresight Labs running locally on my machine, I was thrilled. Although it was an application that was already developed by someone else, being able to replicate the functionality gave me the first sense of real accomplishment and hope that I can tackle much harder challenges to come.

Figure 2. “Find best gate!” search result for What If My Gate Changes

Also, because this was my first real position as a developer, there were not just conventions and best practices that I had to pick up, but also mistakes to learn from – a large part of which was backing up work and documenting things I learned. For example, the dev environment that I set up for my Jupyter Notebook that I was running in my KubeFlow cluster reset for no reason one day, and I had to spend a great amount of time retracing the questions that I had asked to multiple people. My advice for future interns would be to always take notes and document the steps to a process, even if you don’t think you’ll need it again.

Once I understood the algorithm behind Foresight Labs, I learned how to make a simple application using FastAPI, which is an open-source web framework for building APIs with Python. During the process of creating a working prototype, I learned that I could use a GET endpoint for fetching simple data in hashable form, and a POST endpoint for returning complex data outputs that required complex JSON payload inputs. Using these two primary endpoints, I added a variety of useful functions to the API. For example, get_metadata() returned the flight number, origin and destination airports, destination gate, predicted landing and gate arrival times, etc in a dictionary format when passing flight ID as its input. I also created more complex functions like rank_gates(), which returned a sorted dictionary of gate-terminal tuples for the given destination airport of a specific flight with their respective scores, sorted from lowest score to highest. The lower the score for a particular gate-terminal pair, the shorter the taxi-in time between the plane landing and arriving at the gate. The purpose of the function was to give the consumers options to choose a gate-terminal pair among the top choices such that when there are multiple candidates of similar taxi-in times, they could utilize the domain knowledge that the model cannot account for (such as traffic situation, constructions, on-site incidents, etc) to make the best decision. Lastly, I added a super function that conveniently returns all of the metadata, predictions, predicted landing/arrival/taxi times, and ranked gates in dictionary format. These endpoints were tested and documented on Postman, which is a platform for building, testing and managing API’s.

Figure 3. Testing and verifying the successful API response to the get_metadata() request for an en route flight ID ‘UAL2211-1723369087-fa-1325p’

After completing the first functioning iteration of my application, I sought to improve its performance by tackling some of the difficulties in handling data. A significant challenge that I faced while working with the dev server for Redis was that the flight ID’s associated with the eta-streamer’s feature vectors and predictions were different from the flight ID’s associated with firehose’s metadata. So for one particular flight, the Redis keys tagged with eta-streamers’ test flight ID and the Redis keys tagged with Firehose’s production flight ID were different. Fortunately, these flight ID’s had the same flight numbers, and their respective timestamps were off by no more than 10 seconds. After conducting some tests with these observations, I was able to develop an internal function in the API to automatically match the test and production flight IDs for a given en route flight. This made it easier to get Triton predictions and the metadata for the same flight and helped simplify my code.

The application performance was also enhanced by removing unnecessary components and packages. Because Foresight Labs was built on Streamlit, a lot of the functions in What If My Gate Changes and its modules, including connection requests to Redis and Triton servers, relied on Streamlit to function. However, my application did not require an interactive web framework to function. Removing the Streamlit components from the API code base and the Streamlit-dependent module imports made the application more efficient and lighter.

In the final stage of application improvement, I requested a code review session from the Predict Crew and received helpful feedback to clean up my code before deployment. This was a beneficial session that helped with not just my application but for myself to learn some of the best practices in coding. I implemented most of the suggestions in the final version.

The next phase in my project was application deployment. Although I had some self-taught experience with creating and using a docker container as a testing environment for my personal projects, I had no knowledge of selecting a proper image for my FastAPI application, and version control of images via pushing and pulling to a private docker registry. I started by going through tutorials and documents and created practice images and containers before I sought advice from my team on how to build an image for my specific application needs.

Figure 4. CrashLoopBackOff error message for locally deployed application

After verifying that my containerized application ran without issues, I composed a deployment.yaml file and attempted to deploy it within my local Kubernetes cluster. Another major hurdle in my project came when I noticed that my Kubernetes pod was facing a CrashLoopBackffOff error. It was caused by a compatibility issue between the docker image and the environment it was being deployed on. The MacBook Pro M1 I was using throughout the internship built docker images in arm64 (64-bit architecture used in ARM processors) by default, which could not natively run software compiled for x86. What I needed was to build a docker image with an amd64 (also known as x86-64, the 64-bit extension of the x86 architecture) format, which is backward-compatible with the 32-bit x86 architecture, which is what most cloud environments and Kubernetes clusters have. In facing this problem, I learned that a multiarchitectural build using the “docker buildx” command could create a docker image that is compatible for both systems. Eventually, however, I adopted my teammate’s simpler solution to build the docker image in one of FlightAware’s internal Linux-based dev hosts to circumvent the issue, as image was built in amd64 format by default in that environment.

Figure 5. Successful on-prem deployment of FLAPI upon applying deployment.yaml and service-and-ingress.yaml files

Once I verified that the application was deployed successfully without errors in my local cluster, the final step was to launch my application via an on-prem deployment. I accessed the Rancher platform for FlightAware to acquire the yaml file that was to be used to revise my kubeconfig file. This would add and access the Houston dev cluster to create my namespace in. Since the purpose of deploying on the Houston dev cluster was to allow internal access for members of FlightAware to use the API, I needed service and ingress files to be applied to my Kubernetes pod:

• service.yaml - exposes the service on a virtual IP address (VIP) that is only reachable within the cluster. Other services and pods within the cluster can access it, but it is not accessible from outside the cluster

• ingress.yaml - manages external access to services in a Kubernetes cluster, typically HTTP. Allows exposing multiple services under a single IP address and manage them using URL paths or hostnames.

Combining the service and ingress yaml files into one, I used the “kubectl apply” to expose my application to port 8125, which is now accessible to all internal members of FlightAware on flapi.k.flightaware.com.

There are several things that can be done in the next steps of this project. One would be to replace the placeholder host that I used for the Triton inference server. In the development stage, I used a temporary host instead of the production server to establish Triton connections. This will need to be changed in order for FLAPI to provide accurate predictions. Another task on the agenda would be to replace the Streamlit-based Redis and Triton connections on the original What If My Gate Changes code base for Foresight Labs to GET and POST requests to my API endpoints, which will simplify the program.

I have faced and overcome many challenges throughout the internship, but there were some that I wish I could have done better. Keeping up with creating and updating Jira tickets in a timely manner and backing up the changes to my application and docker image, no matter how small, to both my GitHub repositories and the docker registry proved to be difficult. I believe this is something that a lot of software engineers still find difficult to make a habit of, but given more time to actively develop these practices I believe I can improve on these areas.

Time management, especially with regards to balancing work and relaxation and consistently setting specific goals was another big challenge. Although I have discovered in recent years that my tendency to treat programming problems as puzzles and to not give up on them easily proved to be compatible with this profession, I also found it to be hard on the body and mind. Because it is difficult to gain enough momentum to get in “the zone” on a regular basis, I found myself making the most out of that high-focus state and deeply invest myself on work for days or couple weeks at a time. However, when those sprints came to an end, either due to a huge mental block or a sense of accomplishment, it became very difficult to focus and bring up my productivity, even to an average level. Fortunately, I learned that setting very specific short-term and long-term goals, oftentimes by asking my manager and my mentor what I should do next, helped motivate me to keep moving forward, even if at a slower pace.

Last, but not least, asking good questions throughout various stages in the project proved to be very difficult during my internship. This was something that I was especially concerned about since the start of the internship. Because I was coming from a different field and academic background from my intern cohorts, the sense of imposter syndrome, coupled with my tendency to try every single thing I can think of to solve a problem before calling it quits made asking questions even more difficult. A rule of thumb that my mentor Nathan suggested that proved to be immensely helpful for me is to apply the “half-day rule” where I would limit myself to working on a problem for just half of the working day before asking teammates for help. It was a very effective method that incorporated a healthy mix of grit through self-teaching and humility through advice-seeking. It is a technique that I will be sure to take beyond just my career in software engineering.

There were many other interesting projects that I wish I had more time to explore, such as the XGBoost-based arrival-runways prediction model. Although currently very effective, it would have been interesting to explore how different feature engineering techniques, such as addressing data drift, implementing sufficient dimensionality reduction techniques like sliced inverse regression and Principal Hessian Directions, and accounting for seasonality across months, seasons or years would affect the accuracy and computational efficiency of the resulting model. Along with feature engineering, exploring algorithms other than LightGBM, CatBoost and XGBoost for the arrival-runways model to measure the overall performance of the model would be something I would love to try if I could return to FlightAware.

I am deeply grateful for all the support that I received from my mentor Nathan, manager Adam, the Predict Crew, and all members of FlightAware that helped me grow as an engineer and as a person throughout this internship.

2024 Intern Summer Projects was originally published in Angle of Attack.

Blast from the past: 2023 Intern Projects

Mon, 05 Aug 2024 12:22:08 -0500

Each year FlightAware hosts a cohort of students for its summer internship program. The program gives each intern a chance to learn what it's like working full-time as a software engineer, how to deliver a project to completion, and an opportunity to expand their skillset. This year is no exception, and our interns are tackling some impressive projects. Some are laying the framework for new parts of the website, another is building an application for measuring ADS-B signal reception, there's an effort to migrate some core flight tracking functionality into a modern language and finally a project related to "what if" scenarios for flights. It has been really exciting to see their progress over the summer, and I believe you will all enjoy the post on the topic next month. In the meantime, lets take a look back at the amazing 2023 interns!

~ J. Cone

This summer, we had 5 students from across the country join FlightAware as interns. They collaborated with other FlightAware engineers to build out their project (and in one case, two projects!) and accomplished impressive work that they demoed to the entire company. In addition to that, they got to know each other better through get-togethers and participated in talks about tech and career development presented by FlightAware engineers. We invite you to see the results of their hard work, as we highlight the interns and their projects over the next few weeks.

Adithya Chandrashekar

Hello everyone! I am currently a rising Junior at The Ohio State University pursuing a bachelor’s degree in Computer Science and Engineering with a minor in Business. I am currently a Software Engineering Intern on the Flight Tracking Team. In the summer of 2022, I have also graduated from a Software Engineering Immersive, Codesmith, whose purpose is to teach full-stack JavaScript and computer science to prepare individuals, looking to switch careers, for mid- to senior-level software engineering roles. I enrolled in this Software Engineering Immersive to supplement my knowledge of CS and jumpstart my career as a Software Engineer.

FlightAware has been the application that I have used for years to track flights, and there is a personal reason why I decided to pursue an internship with FlightAware. Since childhood, both of my parents have had to travel extensively for work. Before discovering FlightAware, I used to have a lot of fear for my parent’s safety during the flight. However, FlightAware eased my fear. FlightAware gave me the ability to track flights in real-time which allowed me to check and ensure the flight had no issues, and that my parents were okay. FlightAware provided me comfort and security, and it was another factor which influenced my decision to pursue this internship.

I have had an amazing experience throughout this internship. From the time I joined FlightAware until now, I have received tremendous support from my mentor, manager, Flight Tracking team, and FlightAware as a whole. FlightAware reminds me of a large family where everyone respects each other and wants the best for the entire family. When one person needs assistance, their entire team is ready to provide support. I have never heard of any other company which respects, trusts, and wants the best for its employees as FlightAware does. I have had the opportunity to collaborate with several engineers on my team, and cross-functional teams, which has strengthened my communication skills.

The internship program has been well-crafted to ensure a balance of learning and fun. Every Wednesday, there is some sort of intern collaboration activity, whether it be learning, building, or having fun. We have had several learning sessions with engineers throughout FlightAware, where they have provided us with both institutional knowledge and advised us on how to make the best of this internship. We have also played numerous games such as Among Us, and skribbl.io which have been extremely fun. Which other company pays their interns to play games?? We have also had the opportunity to meet other engineers through a website called gather.town where you have a physical character that you can move and going near another person allows you to virtually, through both audio and video, communicate with others.

FlightAware distinguishes itself from other companies in yet another noteworthy aspect. The projects that interns get assigned in FlightAware are meaningful projects which truly contribute to FlightAware and its growth. Normally, interns at other companies complain about getting assigned projects which are not meaningful, boring, or insignificant. However, this is not the case with FlightAware. FlightAware gives interns a lot of trust by providing meaningful projects and giving us the resources we need to complete the project successfully. I truly admire this about FlightAware and it was another key factor in my decision to pursue an internship with FlightAware, during this Summer.

One piece of advice to future interns is to ask questions whenever necessary and to not feel embarrassed or shy. I have asked my mentor many questions which he has answered thoroughly and quickly. Furthermore, he gave me a challenge to trust myself more. This challenge significantly increased my self-confidence, and I am extremely thankful for this. No one in FlightAware gets annoyed if someone asks questions, so please don’t hesitate to reach out if you need any help. Your mentor and manager are both there to support you.

My Project

During this summer, I was fortunate to have completed two projects.

Project #1

My first project was rewriting Surface Monitor, an existing internal tool used to monitor the performance and health of Surface Fuser and Surface Combiner, in Python3.

In the image above, you can see an overview of the current architecture for Surface Movement, which contains the programs used to track the surface movement of a flight. Surface Combiner combines and deduplicates ASDE-X and ADS-B daystream feeds, which are outputted by hcombiner, to produce a single input feed for surface_fuser and its format is tab-separated values. Surface Fuser follows combiner output and interprets information such as identifying positions reported for target, and correctly scheduling events in the future. Surface Monitor ingests any number of feeds, but currently, it’s ingesting two feeds which are Surface Fuser, and Surface Combiner. Surface Monitor does simple filtering of the data, and emits every second.

The current monitor is in TCL, a legacy language created 35 years ago, and is inefficient and lacks many features that are provided by modern languages such as Python3. My main task was to migrate Surface Monitor from TCL to Python3, which took me about a month to complete.

The first task was to accept Command Line Interface(CLI) arguments. Surface Monitor uses combfeeder to gather the data for each feed, and several arguments have to be provided for it to execute. Furthermore, Surface Monitor accepts several arguments as configurations for the program.

After creating an argument parser, the next main task was to develop a function that would follow the feeds provided by the user, and ingest its data. This included invoking combfeeder with the appropriate arguments and feed to watch, and ingesting their data to validate and analyze. However, it isn’t as simple as it sounds. The first challenge was to identify how to alternate between feeds. For example, we wanted to read one line from Surface Fuser, then one line from Surface Combiner, etc. This isn’t straightforward since languages are by default synchronous (execute one step at a time), and the only way to alternate between feeds was to utilize Asynchronous Programming. Asynchronous Programming, in simple terms, means that multiple related operations can run concurrently without waiting for other tasks to complete. This paradigm allows us to switch between feeds without having to wait until the feed that is currently being read has ended.

The next step was to prevent parts of the program from blocking other parts. Essentially, we don't want tasks such as reading the data or reporting the analysis to block the program. For example, while we analyze the data or report the analysis, we still want to continue reading data from other feeds. Essentially, we want two different parts of the program to run concurrently/in parallel, a.k.a multitask, and as I previously mentioned, Asynchronous Programming allows us to do this. So, I utilized asyncio which is a Python library that is used to write concurrent code using the async/await syntax. Using asyncio, I created several tasks for reporting the analysis, and flushing state (current saved values) so it doesn’t interfere with future analysis.

The rest of the project was more straightforward. I had to create different monitors for the metrics we wanted to report for a given interval, such as Throughput (number of messages read), Catchup Rate(how fast is the monitor reading the data compared to real-time), and Latency (how delayed is the monitor compared to real-time). After creating these monitors, I also had to create a validator that would validate each line from a given feed against the same criteria as the existing Surface Monitor. After this, I had to set up a Slack integration where the program reported important messages under a specific Slack channel. Once this was completed, I had to set up alarms using SCADA (a program used for real-time monitoring) and Zabbix used to monitor metrics. Furthermore, I created Unit Tests for all of the monitors and the validator, to ensure they worked correctly without any unexpected behavior.

After completing everything above, I had to create integration tests(test all components of the program together) and performance tests (measure the performance of surface_monitor_py and create benchmarks). After creating the tests, the final steps were setting up a docker container(isolated environment), which allows the program/code to run the same regardless of the operating system. Once a docker container was created and the program was running, the final step was to create Github Action workflows (a configurable automated process that will run one or more jobs) to create the docker image (instructions to build the container) and deploy to a host (run docker container on a specific host). Below is an example of the output of surface_monitor_py.

Project #2

The second project I had the opportunity of working on was creating a Go-Around Detector. A go-around occurs when an aircraft is on its final approach and attempts to land but the pilot determines unsafe landing conditions and decides to “go around” the airport and come back for another attempt. Go-Arounds often get interchanged with Missed Approaches which occur when an aircraft is on an IFR(Instrumental)/Published approach, but the pilot decides that the IFR approach cannot be completed, and defaults to either a new approach provided or an approach of their choice. Currently, the Go-Around detector classifies a Missed Approach as a Go-Around since it’s not possible for us to detect a Missed Approach due to not being provided with many pieces of data which are necessary to identify a Missed Approach. Below is a picture of a Go-Around.

As I previously stated, my task was to create a Go-Around detector that would detect go-arounds for a flight and emit a go-around event, which can be used to provide a more accurate estimate for arrival times, since Go-Arounds always result in a delay. Currently, we have an existing program called the Aircraft Delay Detector (ADD) which analyzes thousands of position messages which contain information including an aircraft’s speed, altitude, heading, location, and timestamp, per second. I had to integrate the Go-Around detector with ADD and ensure seamless integration.

The first step in designing the logic for the Go-Around detector was filtering. I wanted to identify the filters for attaching the Go-Around detector to flights since we don’t want to create and attach the Go-Around Detector to all flights at all given moments. First, we currently only want to look at non-ad-hoc flights. Non-ad-hoc flights are flights that are scheduled, while ad-hoc flights are non-scheduled. If a flight is not scheduled, we don’t have a destination for that flight and the destination is required for us to detect a go-around since they only occur when an aircraft is close to the destination. The last part of the previous sentence leads us to the second filtering which is proximity to the destination airport. We didn’t want to attach the Go-Around detector to flights that are not close to the destination airport (within 15 miles of the airport). Even though 15 miles is still far from the airport, this allows us to get more data which can be used to increase the accuracy of the Go-Around detector. The third filter we decided on was altitude. Go-Arounds typically occur under 2,500 feet so I set the altitude threshold at 4,000 feet above the elevation of the destination airport. If the flight is 4,000 feet above the elevation of the destination airport, then we would exclude those flights and not attach the Go-Around Detector. To summarize, Non-Ad-Hoc Flights, proximity to the airport, and altitude above the elevation of the destination airport were the 3 main filters I added to ensure we didn’t attach Go-Around Detectors for flights that were not attempting to land.

The next step was creating the actual logic for detecting a Go-Around. From the image above of a go-around, you can see a logic that could potentially be used. You can see that the aircraft consistently descends towards the airport but then starts to consistently ascend as they get close to the airport. This is always the case with a go-around, an aircraft always switches from descending to ascending when it gets close to the airport. And this was the logic I used to accurately determine a go-around. Currently, ADD creates a Position object for each of the 10 recent last seen positions, and stores key properties (latitude, longitude, timestamp, altitude, vertical rate, ground speed, and aircraft identifier). Altitude is the key metric that pertains to the Go-Around detector, at the moment, since it is the metric which results in the simplest yet extremely accurate Go-Around detection logic.

First, I created a function that would calculate the average altitude for the 10 last seen positions, and I stored that in a list/array which would store the 15 average altitudes. The average altitudes would be calculated each time we ingest a new position since the 10 last seen positions change each time to include the most recent position. I’m storing these average altitudes since sometimes a plane’s altitude could vary due to turbulence, hence, using the average altitude would be a more accurate representation.

Next, I have another function that analyzes the average altitudes and determines whether it’s a possible go-around. I have a switch count which keeps track of the number of times an aircraft switches between ascending and descending, which is done by analyzing each average altitude left-to-right and comparing each average altitude to its previous average altitude. As I stated, looking at the average altitude eliminates any inconsistency which could occur due to turbulence/bad data, so if there is any switch from descending -> ascending or vice versa, it is an accurate representation of whether an aircraft descends or ascends. I am also counting the number of ascending(or same) altitudes we see and the number of descending altitudes(or same). I am keeping a count of ascending or descending since we want the aircraft to consistently ascend or descend. One average alt could potentially be incorrect if we received several bad positions, hence, we want to ensure we see a consistent pattern of either ascending or descending. I’m also looking at same altitudes since sometimes, the average altitudes might be the same if an aircraft is descending slowly, etc. Hence we’ll accept 2 average altitudes which are the same, but if we see more than 2 it means the aircraft is maintaining its altitude, hence we’ll exclude any additional positions that are the same. So for example, if we see 5 average altitudes which are the same, we only include 2 out of the 5 since 2 average altitudes can be the same and still be descending/ascending but it’s not if there are 3 or 5. So if the aircraft only switched once, and the current status is ascending after the switch meaning the aircraft went from descending to ascending, and if the aircraft has consistently descended and then consistently ascended, then it’s a possible go-around.

After detecting a possible go-around, we have to check to see if an aircraft is close to the airport. Aircrafts always get extremely close to the airport during a go-around, if not go above the airport, hence their distance from the airport should be very small. The threshold I used was 1 mile. So if we have detected a possible go-around AND (the aircraft is within 1 mile from the destination airport OR the aircraft gets within 5 miles of the airport and starts going away from the airport) it’s a go-around. Within 5 miles from the airport might seem contradictory to what I said regarding an aircraft getting extremely close to the airport, if not going above the airport, however sometimes an aircraft can get within 5 miles before the pilot decides to abort landing, hence this accounts for such cases. Within 5 miles from the airport better aligns with Missed Approaches, but as I stated, we are currently classifying both as Go-Arounds.

After completing the Go-Around detector logic, I created 10 tests for 10 flights and integrated the tests with current testing logic within ADD. With minor changes, the Go-Around detector tests are run with the other tests currently created within ADD.

This specific logic yielded a 97.5% accuracy after being run on 100+ live flights, and 10 test flights which included special edge cases. This is in terms of false positives since it’s currently not possible to identify false negatives. Additionally, the current detector detects a Go-Around within 1 minute of the Go-Around occurring, hence, it’s able to detect a go-around fairly quickly.

Attached below are 2 images of Flight TAM3343 with each image representing one go-around of the flight, and below those images are the emitted log messages from each go-around.

William Burns

Hello! My Name is William Burns, and I am a Site Reliability Engineering Intern on the Systems team at Flight Aware. I am a Senior at Arizona State University studying Information Technology and plan on graduating with my bachelor’s degree in December 2023. Having had an interest in aviation and aerospace, FlightAware has provided me with the perfect opportunity to apply my technical skills toward a field I have a lot of passion for.

During my time here, I have been given the opportunity to work alongside talented engineers and has brought me experience that will surely benefit me for the rest of my career. Post-graduation, I would like to work as a Site Reliability Engineer and hope to return to FlightAware.

My Project

The primary objective of my project this summer was to centralize and automate FlightAware's management of firewall policies across its on-premise network infrastructure. A current focus for the Systems Team and FlightAware is having the ability to automate our network configurations. This project lays the groundwork for this goal by providing an easily extensible solution to firewall management.

I devised a solution in Python3 that extracts IPv4 and IPv6 network prefixes or IP addresses from NetBox, our data center infrastructure management software. Subsequently, these are formatted to be compatible with an Access Control List (ACL) generation tool, Capirca. This allows for the automatic generation of ACLs suitable for our Juniper SRX devices. Once produced, these ACLs are deployed directly onto the SRX devices. This automation alleviates the previously manual process of writing and applying ACLs, especially when introducing new network prefixes to the SRX devices drastically reducing the manual overhead.

For anyone not familiar with Access Control Lists, they are critical security components for network devices such as routers and switches. ACLs, contain sets of rules used to control and manage access to network resources based on specific criteria. These criteria can include factors like source and destination IP addresses, port numbers, and the protocol in use. Typically implemented on routers and switches, ACLs are an integral part of network security, providing a mechanism to explicitly allow or deny traffic based on predefined conditions. Whether it's to prevent unauthorized access, segment internal networks, or filter incoming and outgoing traffic, ACLs offer granular control at various layers of the Open Systems Interconnection (OSI) model, most notably the Network (Layer 3) and Transport (Layer 4) layers.

At FlightAware the nature of the services we offer and the vast amount of data we handle make network security a crucial part of our operations. ACLs allow us to finely control data traffic, ensuring that only authorized requests access the right data. This granularity is crucial given the volume and diversity of data we handle. This ensures not just security, but also optimal data traffic flow, helping maintain the responsiveness and accuracy of FlightAware's services. In essence, for the seamless operation of our technical infrastructure, ACLs serve as critical tools in FlightAware's network management arsenal.

The project is organized into several distinct components, each with its specific role: NetBox, Capirca, Docker, and the deployment of the ACLs to a chosen Juniper SRX device.

NetBox

The NetBox components are tasked with pulling network prefixes from FlightAware's NetBox instance using the pynetbox library, searching for matches based on the "Tenant" or "Description" values. Once identified, these prefixes are recorded in a file named NETWORK.net, setting them up for being ingested Capirca.

Capirca

Capirca's role is crucial in generating the ACLs. It utilizes the network prefixes from NetBox, along with a services object called SERVICES.svc that enumerates lists of ports and protocols, and policy objects (.pol) which tell Capirca how to generate the final security policy or ACL configuration.

Juniper PyEZ

Following the preparation and generation of the ACLs, the final but crucial phase is deploying them to the Juniper SRX device. For this, we utilize Juniper's PyEZ library that facilitates automation of network devices through the NETCONF protocol. Once connected to the targeted device, the prepared ACLs are then pushed to the device.

Docker

The Docker component provides the necessary environment for the entire process to run seamlessly. Using Docker Compose, we encapsulate all the dependencies and configurations into a consistent environment. Docker acts the cohesive bridge, integrating NetBox, Capirca, and the Juniper SRX device operations, guaranteeing that the system dependencies remain consistent irrespective of where the script is run. This not only simplifies deployment but also ensures reproducibility across various platforms and systems.

Figure 1: NETWORK.net object generated using network prefixes from NetBox. Contains the IP addresses that will become part of the Address Book in the Access Control List.

Figure 2: SERVICES.svc object which contains port and protocol naming service definitions.

Figure 3: Example security policy (.pol) written using Capirca specific format. Contains Header and Term sections which define targeted platform and details such as source address and destination address.

Figure 4: Example generated Access Control List

Figure 5: Log output for NetBox component

Figure 6: Log output for Capirca ACL generation component

Figure 7: Log ouput for Juniper PyEZ component. Example of pushing ACL to target device.

Conclusion

Overall, this project while fairly small in scope provided lots of exposure to new technologies and provided its fair share of challenges. One significant aspect was being introduced to Agile methodologies and using JIRA to track progress and issues. Both tools proved invaluable in streamlining our processes and staying organized. While an integral part of this internship was ultimately to complete our project, I think the takeaway extends far beyond that. A critical lesson learned was recognizing when I was going down an unproductive path and the need to pivot quickly. Being able to assess and redirect efforts efficiently was a standout skill I developed during this project. Being part of the Systems team provided firsthand experience of a professional work environment. Beyond just completing tasks, I gained insights into teamwork, process management, and the real-world application of engineering principles.

Thank you FlightAware!

Jared Harvey

Hello! My name is Jared Harvey. This fall, I’ll be entering my final semester in Carnegie Mellon’s Master of Software Engineering program. After my graduation in December, I’m hoping to pursue a career in full-stack web development or as a backend engineer. Outside of work, I love to practice photography, spend time with friends, and play (a frankly unhealthy amount of) Dungeons & Dragons.

This summer, I’ve had the opportunity to work with FlightAware’s Web team, where I’ve been responsible for leading an effort to create a new admin dashboard for FlightAware employees, which will be expanded upon and eventually replace the existing legacy version. This project is part of the Web team’s WebNxt initiative, which aims to completely replace the existing website architecture with modern languages, tools, and technologies packaged as independent microservices.

It's been an honor to work on this project alongside the rest of FlightAware’s engineers, who have all been incredibly welcoming and open to collaboration when I ask for help. From my manager and mentor to company leadership, I’ve had the privilege of working with engineers across the company to build my project from the ground up.

My Project

Fa_web is the legacy codebase for FlightAware’s website; it’s a type of architecture known as a monolith which, as the name implies, contains virtually everything to do with the current website. To move away from this legacy monolith, the Web team introduced the WebNxt initiative. WebNxt is a methodology for designing microservices—independent applications which serve a small set of features—using modern languages and design principles. Some key benefits of this approach are that applications are easier to build, projects can get to production more quickly, and newly hired engineers (like me!) are more familiar with the tooling.

The existing admin dashboard exists on the fa_web monolith and consists of over 100 different links to the various tools that FlightAware employees use to manage the company’s data. Like other WebNxt projects, we want to separate this dashboard into its own microservice. Fortunately, I haven’t been tasked with porting over 100 different pages of tools; rather, my job this summer is to create the skeleton for a new admin dashboard which FlightAware can expand upon over time.

Since my project is a brand-new application, a large part of my project was deciding how to construct it and deploying it so that it’s accessible to FlightAware employees. Fortunately, the architecture is simple. I am responsible for the frontend and backend; the frontend is a Next.js application, which is a popular framework choice for JavaScript-based applications and the choice framework for the WebNxt initiative at FlightAware. For the backend, we decided to use an open-source application called Hasura. These two services would interact with other FlightAware infrastructure, such as our new authentication server and FlightAware’s database.

The system architecture for my project.

Hasura was a new technology for me when I started this project, as it was picked by the web team specifically for this project. It’s a third-party application meant to replace a typical backend service. It works by “tracking” tables in a database: Hasura reads the schema in our database then automatically generates a fully featured GraphQL API. This API allows developers to easily query for data or make database updates in a familiar, JSON-like language. Hasura also allows you to expose custom SQL queries in its API, allowing for more complex features such as table joins, GROUP BY clauses, and advanced filtering.

A sample GraphQL API query provided by Hasura.

When I began my project, my manager had a specific first feature in mind: Mode S code assignments. For context, each aircraft has an identifier known as a “tail number”; this is a name for the aircraft that is typically used as its ID. However, data received in our ADSB transponder network does not always include the tail number; however, we do receive a 24-bit number known as the Mode S code. The Mode S code is a unique numeric identifier for each aircraft and it rarely, if ever, changes (it should never change during a flight). The problem lies in associating Mode S codes to tail numbers: we want to be able to associate the two so that we can close a gap in our tracking abilities at FlightAware.

There are a few problems with Mode S management. First, there is no universal data source mapping tail numbers to Mode S codes. As a result, FlightAware must rely on multiple sources of data, some of which conflict and many of which have known errors. Compounding this, FlightAware also has no existing system for managing Mode S assignments. Rather, when an employee wants to add a Mode S record, an email is sent to FlightAware’s Chief Solution Officer, who then must manually write the SQL queries to update the database.

The existing system is slow and inefficient. To solve this problem, I began working on a new Mode S management tool that will support these needs in the future. I started with the UI for a view page, add form, and edit form; these pages query data from the database and write to a table called aircraft_registry_flightaware, where the data is considered “pending”. A script then imports pending data into the database, moving that data to the aircraft_registry_modes table—which is read-only for the purposes of this project.

Once the UI and basic CRUD operations on the database were complete, I needed to shift my focus to validation. Some of this validation is simple, such as ensuring that we don’t have duplicate records and that the data input in the HTML form is in the correct format. More complex validations, such as ensuring that Mode S codes have a valid country prefix, were more time consuming. Unfortunately, I couldn’t complete every single possible validation on this data; however, I was able to build out the basis for generating errors so that additional validation can be added easily.

The Mode S Management tool.

The edit form that I created for my project.

A large part of my project was getting everything deployed. Since this is an internal FlightAware dashboard, we don’t want it to be accessible from outside of FlightAware’s network. Thus, we decided that this project would be deployed to our on-site Kubernetes cluster. This was a significant challenge for me—I needed to do everything from creating a Docker image for the project to creating the Kubernetes configuration to get it running smoothly on the cluster. I also needed to set up GitHub Actions to automatically re-deploy the application after an update it made, putting Continuous Integration & Development (CI/CD) skills into practice. While I had been exposed to Kubernetes in the past through my master’s program, this was my first time integrating a full project into a live environment. Fortunately, FlightAware’s Operations (Ops) team was able to guide me through the challenging parts of my project, and my dashboard is now available to FlightAware employees.

Overall, my project this summer was a challenging, but incredibly fulfilling, learning opportunity. In addition to learning frontend technologies using Next.js and React, I also got to learn about deployment, requirements specification, agile methodologies, and automated software testing. This project was a lot of responsibility for me, but FlightAware’s engineers were willing to help guide me when I got stuck. I’ve learned a lot about being an engineer and am looking forward to what skills I can develop next!

Blast from the past: 2023 Intern Projects was originally published in Angle of Attack.

Modernizing PiAware Setup with Bluetooth LE

Mon, 01 Jul 2024 13:41:19 -0500

In my previous blog post, I shared an overview of FlightAware’s ADS-B receiver network that streams real-time flight data to FlightAware. Our receivers require an Internet connection to operate, so a key focus over recent years has been making it easier for our hosts to get them connected to the Internet and simplify the overall setup process. For modern day consumer electronics, configuring the Wi-Fi network is often the first step in the setup process and requires a mobile device to connect to them and complete the setup.

FlightAware wanted to adopt a similar approach to setting up our ADS-B receivers. This blog post explains how users traditionally set up Wi-Fi on our ADS-B receivers and how we leveraged Bluetooth Low Energy (BLE) to simplify and modernize the setup process.

PiAware Explained

If you’re not familiar with PiAware, it is an open-source solution we’ve developed for people that want to assemble an ADS-B receiver and contribute data to FlightAware. Our ADS-B receivers are powered by Raspberry Pi computers. PiAware at its core is a Raspberry Pi OS image with our pre-installed suite of flight tracking software. We used the open-source tool that the Raspberry Pi Foundation uses to build the standard Raspberry Pi operating system and added custom build stages to install our software and configure them to start up at boot time. The goal with this was to reduce the barrier to entry for contributing to FlightAware and make the process as simple as possible: Acquire a Raspberry Pi and ADS-B receiver hardware, flash PiAware on an SD card and insert it into the Pi, connect it the Internet, and off you go.

The simplest method to connect PiAware to the Internet is to attach it to a wired Ethernet connection. However, the process is less simple for users that prefer to connect it over Wi-Fi. The process is as follows:

1. Insert the PiAware SD card into a computer

2. Open a text file and edit it with your Wi-Fi SSID and password

3. Insert PiAware SD card image into the Raspberry Pi and boot it up

Not entirely too difficult, but we were starting to venture into pre-built, headless ADS-B receivers and needed some way to connect to the device and configure Wi-Fi without having to open it and edit text files.

Leveraging Bluetooth

During the exploratory phase, we explored two well-established approaches to this problem: Bluetooth and Wi-Fi access point mode. The latter involves turning the device into a wireless access point and requires the user to manually enter in a passkey to connect to the device before transferring data. We thought this process would be complex and prone to issues and ultimately decided on using Bluetooth, specifically BLE, for its simplicity and to minimize the risk of disrupting an active data feed to FlightAware.

Bluetooth Low Energy was designed for low power consumption and ideal for devices needing to transfer small amounts of data periodically, as opposed to a continuous stream of data. This made it a suitable choice for our needs. Within the Bluetooth LE specification, there are two protocols for handling device connectivity and data transfer within a Bluetooth LE network:

1. Generic Access Profile (GAP) – This protocol allows Bluetooth LE devices to discover and connect to each other

2. Generic Attribute Profile (GATT) – Once a connection is established, GATT defines how data is transferred between each other

We built a custom Bluetooth service, “piaware-ble-connect”, that leveraged these two protocols to provide an interface to PiAware over Bluetooth. It uses the official Linux Bluetooth protocol stack and the D-Bus messaging system to provide a high-level API for interacting with the Bluetooth hardware.

Connecting to PiAware

To enable PiAware to be discoverable over Bluetooth, piaware-ble-connect creates a D-Bus interface to interact with the Bluetooth LE Advertising manager on the system bus. It then configures and broadcasts advertising packets containing information that identifies PiAware which allows nearby Bluetooth LE devices to discover PiAware and establish a connection with it.

We added a user interface in both the FlightAware iOS app and the FlightAware website to allow users to connect to PiAware. The iOS app utilizes Apple's Core Bluetooth framework for connectivity and the FlightAware website leverages the Web Bluetooth API, which allows connections to BLE devices through a web browser.

Figure 1- Connecting to PiAware over Bluetooth LE

Configuring Wi-Fi on PiAware

The Generic Attribute Profile mentioned above defines the format and organization of data exchanged between Bluetooth LE devices. The data is generally organized into “services”, which are groups of related functionalities, or “characteristics”, provided by a Bluetooth device. For example, a BLE device may expose a Battery service with characteristics for battery health and level that can be queried.

We took a slightly custom yet common approach to using GATT for transferring data, which was implementing a UART service with two characteristics: one for receiving data (RX) and one for transmitting data (TX). This essentially emulates a serial communication channel over Bluetooth LE. We decided to go with this approach because it gave the flexibility to send data in JSON format, which can be relayed to a web server as an HTTP request. The BLE client application (e.g. the FlightAware iOS app or web browser) would send the Wi-Fi settings to piaware-ble-connect, which acts as a bridge to transmit those settings to an API that handles those requests.

We developed piaware-configurator, an API written in Python and Flask, that handles HTTP requests and performs the necessary logic to configure and apply the device Wi-Fi settings. The responses from the API are returned via BLE and presented to the user, indicating if PiAware has successfully connected to FlightAware.

The figures below demonstrate the data flow when configuring Wi-Fi and the user interfaces the Web and Mobile teams developed for it.

Figure 2 - Data Flow for configuring Wi-Fi

Figure 3 - Connecting to PiAware using the FlightAware website and Web Bluetooth API

Figure 4 – Connecting to PiAware using the FlightAware iOS app and Core Bluetooth

Conclusion

As a result of adopting Bluetooth technology to help with the setup process, users are no longer required to modify text files on PiAware to configure Wi-Fi. This was a big step towards modernizing and improving the overall user experience of our products. We’ve since extended this functionality to our latest, recently released FlightFeeder, allowing us to remove the LCD display previously used to configure Wi-Fi. We plan to extend this feature to the FlightAware Android app to expand usage and continue to build on the rest of our products.

Modernizing PiAware Setup with Bluetooth LE was originally published in Angle of Attack.

Monorepo

Mon, 03 Jun 2024 10:35:33 -0500

Phil Copley is a Senior Software Engineer on the Applications Crew at FlightAware. He is currently the technical lead for the beta surface visualization and coordinator of the Web Competency Alliance.

We’ve been thinking a lot about velocity at FlightAware lately. How can we ship our products faster? How can we get customer feedback faster? Being able to ship quickly without compromising the stability of our products or platform is a competitive advantage.

One key strategy we’ve implemented as part of our broader technological transformation is a monorepository architecture for new applications and libraries. In this blog post, we’ll dive into how this decision came about, the technical challenges and benefits we’ve seen so far, and what’s coming up next.

How FlightAware is Built

"Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations."

Melvin Conway

FlightAware is not exempt from Conway’s Law, and by extension our organizational structure lends itself to microservices and monorepos quite nicely. If you’ve read about our organizational structure before, not a lot has changed--we still have crews, wings, and alliances. But we’ve moved away from strictly “Web” or “Back-end” crews and toward more cross-functional crews. Right now our monorepo is mostly relevant to our Applications and Core Crews, both of which are made up of folks from our front-end web, back-end, operations, product, and design wings. Iterating on public user-facing applications as well as the platforms and APIs supporting them respectively, it makes sense we’d need to share a lot of code between the two crews but also between both front-end and back-end.

We started down the road of completely independent microservices and pretty soon began running into issues, especially around sharing code. A package update would sit in review while we coordinated releases of consuming applications as to not introduce inconsistencies. The same team would need to make changes in multiple packages and coordinate those deployments. We could see a future where different teams are writing the same logic in their language, or even the same language, just to own it and avoid this coordination. We needed a solution that would enable seamless code sharing and collaboration, regardless of the programming language, or the team, and most of all needed to avoid this blocking deployment coordination.

Enter: The Monorepo

Monorepositories are not a new concept. In the right circumstances they can give you many of the benefits of a monolithic application and microservices, while mitigating many of the downsides of both.

It’s important to understand the difference between a monolithic application and a monorepository containing many applications. A monolithic application is typically going to contain all of your data access code, all of your front-end code, all of your business logic, and all of your tests. Additionally, you’re almost always going to be deploying that in one block. Fix a bug with the padding on your logo? You’d better be comfortable redeploying all your database access and user management code at the same time.

A monorepository, while still holding all your code easily visible and accessible in a single place, will be able to deploy these pieces independently. Change the marketing page for one of your products? Just that application gets deployed. Change something that affects half your products? Only those products get deployed, but they get deployed independently. If you manage to deploy something that’s broken, you don’t take out your entire company’s product line but just the thing that’s actually broken. This is a huge benefit and greatly reduces the blast radius of production outages, letting us ship faster and more confidently at the same time.

Another benefit of a monorepo not typically found in a monolith is the support for multiple languages. FlightAware has four first-class languages: Rust, Go, Python, and TypeScript. While realistically almost all new front-end code will be in TypeScript, our back-end services are a healthy mix of all of the other languages, and we have a handful of other languages we support for very specific use cases as well. Because of this, being able to natively build, run, and test these different languages side-by-side is a definite plus.

Monorepo Tooling at FlightAware

FlightAware is using Nx for our monorepo tooling. Beyond simply hosting code in one git repository, Nx has tooling to quickly spin up new applications and libraries, built-in caching to prevent wasting time waiting for unchanged dependencies to build over and over again, and even offers cloud-hosted build services if you’re interested in that.

Nx Generators

Nx generators are CLI commands you can use to easily spin up new applications, new libraries, and change configuration for existing apps. For example, you can add TailwindCSS or change which testing library you’re using.

Nx also enables you to write custom generators, fulfilling one of our longer-term goals to have a suite of custom FlightAware generators for our common tasks, and for generating our basic application or library structure in new projects.

Targets & Shared Tooling

Nx has a concept of “targets”, which are scripts you can run for a given project, or multiple projects. These can be nx generators, custom generators, or just arbitrary command line scripts. For example, we launched our monorepo before Nx-release was stable, so we have a few custom targets to manage independently versioning our projects in GitHub.

nx-affected

As a software developer one of the most useful aspects of Nx to me is Nx-affected. This allows us to run specific targets only on projects that have changed, and projects consuming changed projects. So if you imagine the following contrived project graph:

Example Nx project with two libraries, three applications, and E2E test projects.

If you just make changes to product-1 you don’t want that to also trigger builds for the other products, your libraries, or waste time running e2e testing suites for unchanged applications. Likewise, if you make a change to your components library, you need to rebuild and retest everything that depends on it. Nx affected lets us do this quickly and has a lot of built-in features such as filtering. You can get all the affected projects but exclude e2e test projects, or you can get all the affected libraries but only if they contain a publish target. This gives you very fine-grained control over your build pipeline and lets you make sure you’re not wasting compute building or testing things you don’t need to.

Benefits and Results

Generated & Shared Code

There were many growing pains in this process. Migrating the first app from a standalone product to a monorepo app took about twice as long as we expected, and it took us a while to get the build pipeline to a stable point. However, after those initial hurdles, I think we’re at a point now where our velocity has increased beyond what it would have been using the earlier microservice approach. We’re shipping our beta surface visualization to production several times a week and we’re soon starting a spike to investigate moving more of our core libraries into the monorepo.

We also share all dependencies between all apps. Nx refers to this as an integrated monorepo. This means that we don’t need to worry at all about packages going out of date on seldom-used applications because they are using the same packages all the other projects are. If a package is updated, that project has been changed, so tests will run and the application will be redeployed.

Shared Deployment Tooling

Outside of the monorepo, coordinating initial setup of deployment tooling between our front-end, back-end, and operations teams could take as much as a two weeks depending on availability and other workload. With Nx, all that is taken care of and the infrastructure is already in place to automatically deploy an arbitrary number of new applications. Not only does in this include the basic of just getting an application into production, but right out of the gate we get PR deployments for every application starting with PR #1.

OSS: Giving Back

When I interviewed at FlightAware I was excited about the prospect of working for a company that contributes back to the Open Source community. We were able to open a pull request addressing an issue we ran into, and in the course of doing so realized other organizations had the same problem months prior.

This is not out of the ordinary, either. Even on my small team there are several of us who have opened public patches to open source software that we use every day. If it fixed an issue we’re facing there is no problem contributing that code back to the community.

Next Steps

Custom Generators

I touched on this earlier but Nx has the ability to implement custom generators for your applications. One of the first platform-level things we’ll be addressing with Nx is creating a suite of custom FlightAware generators. Eventually I expect this to be an experience much like create-next-app where you simply run the generator, answer a few questions, and you have a fully-integrated functional library or application ready to go.

More Apps

As more projects are moved into the monorepo, the argument for moving other projects there also increases. The benefits of being tightly integrated on the dependency and infrastructure side while remaining loosely coupled programmatically quickly outweigh the costs of any migration which so far have been minimal.

Your Help

If this sounds interesting to you we’d love to have you come join our team! FlightAware is hiring.

Monorepo was originally published in Angle of Attack.

Mobile Version Checking at FlightAware

Mon, 06 May 2024 11:56:49 -0500

Managing multiple versions of an app is crucial to ensure that users have a smooth experience and that the app functions on different platforms. However, running into issues is not uncommon, especially when changes to backend services or other essential factors make previous app versions obsolete. Furthermore, it is important to keep users informed about available updates, including any requirements such as device upgrades or operating system updates.

Recognizing the need for a service that could handle app lifecycle management, our team developed a solution with two primary objectives. The first objective was to ensure that users always use the latest version of our app, with bug fixes and updated features. The second objective was to have the ability to retire backend services and old app versions without abruptly cutting off users, which would result in a poor user experience. This blog post will focus on how we created this solution and how we integrated it into our app.

Updating and Maintaining Version Data

Our solution is a combination of a service called Mobile Version Check and logic to process that in the client-side app UI. The service oversees updating and maintaining our app version data as well as determining if a user needs to upgrade to a new version based on input that is provided. It is accessed through a REST endpoint and is deployed on two Kubernetes clusters using Flux. Each deployment includes two replicas that check independently to ensure that they are up to date with Apple’s App Store Connect API. This is a tool that allows developers to retrieve specific details about apps, such as the app’s version number, release date, and other details by using an app’s unique ID or URL. The service periodically checks this API to maintain a history of our app releases and compatible devices for each app version. A successful response may look as such:

This response tells us what the current running version of our app is, when it was released, its minimum OS version, and the devices that can run it. These responses are then saved for later reference into our app version file. This file is set to be updated every 15 minutes, when a new app version is released, or a new Apple device is released. After an update, each pod replica will restart and purge the cache.
In addition to this data, there is a JSON file embedded in the service’s Go binary. It is manually curated by developers and is updated directly in the project after Apple product releases or as needed. This file is responsible for keeping track of the minimum allowed version of our app that users can have installed on their devices. This is how the service determines if we need to require an update from our user.

All this data makes up the instructions for how the service will respond to a version check when queried.

Querying the endpoint

Here’s a breakdown of how the service gets queried: 

Parameters:

app_identifier: Unique identifier for the app
app_version: Current version of the app
os_version: Operating system version
device_model: Model of the device

When a query is made to the endpoint, the parameters that are sent with it will be checked against the rules that have been set up using Open Policy Agent configuration files. This library can evaluate and interpret the configuration files, as well as the device and app version information provided. Based on this, the service will then determine whether the current app version is still supported and if an upgrade is available for the device.

A JSON-encoded response then gets returned. It contains flags indicating if an upgrade is required and any additional fields for upgrade information such as the newest version they can upgrade to, and the minimum OS version needed for the upgrade.

Multiple variations of responses get returned based on the data used to query the service.

Here are some examples of what we may get back.

Example 1: A response for a user running the latest app version with the newest OS:  

This tells us that the user does not have any upgrades to make and will not have to update their app. They are on the most up-to-date version of our app.

  Example 2: A response for a user that is running an older version of the app that is not required to update.

In this case, we would notify the user that they can choose to upgrade their app version without needing to update their OS to do so.

Example 3: A response for a user that is currently using a discontinued version of our app and needs to upgrade to the newest version before they have access to our app again.

This lets us know to force the user to upgrade to the latest version of the app, and that no OS update is required to do so.

All of these are possible responses that will be sent directly back to the client-side app where they will be used to determine if or what screen appears.

Prompting upgrades in the app

The client-side app checks for new updates from Mobile Version Check on each app launch. It provides the current device model, operating system version, and the app version that the user is currently using to get the update. If there is no new version available, the app can continue to function normally.

Two types of screens may be presented when a new update is available. First, the user might be running an older version of the app that is not required to update, but we have additional update fields populated for them. In this case, the app will display a popup with options (using a client-side algorithm for timing and frequency of presentation), which can be dismissed to continue using the app.

Second, if the user is running an app version that has been discontinued, the app will prevent them from using its features. This is signaled by "”upgrade_required”: true". In such a case, a non-dismissible full-screen message will be presented, which includes a link back to a page on flightaware.com that explains the policy and why this has happened.

Conclusion

The implementation we have adopted has not only helped us to efficiently manage the lifecycle of our product but also made it easier for us to maintain both front-end and back-end code bases. Furthermore, it has provided our customer support team with a clearer support matrix by reducing unnecessary report requests and simplifying knowledge base management.

Mobile Version Checking at FlightAware was originally published in Angle of Attack.

How Hard is it to Delete a Docker Tag?

Mon, 01 Apr 2024 14:24:23 -0500

I recently ran into a surprisingly tricky issue around deleting stale Docker image tags to keep our private Docker registry tidy. I ended up doing more research than expected, and I wanted to share some of my findings.

Introduction

We're big fans of Docker at FlightAware. The ability to isolate your services and all their dependencies on a host from one another is incredibly powerful. In fact, that combined with the broad adoption of Docker containers as a unit of software deployment were significant factors driving FlightAware's transition from FreeBSD to Ubuntu as our primary production OS.

Of course, once you start building a bunch of Docker images, you've got to put them somewhere, preferably somewhere private. That "somewhere" is known as a Container Registry, and there's a variety of options to choose from. We opted to self-host our registry and selected the "official" registry service maintained (until recently) by Docker, Inc. (they donated the project to the open source community in late 2023); it's called Distribution.

For several years, we ran the registry with few issues. At times, we'd get a bit tight on storage, and someone would go manually clear out some of our larger images. Recently, though, we decided that we could do with some more consistent cleanup of our registry, especially as we've begun to more consistently build images in Continuous Integration environments. If we weren't careful, we'd have unbounded growth on our hands.

"How hard could it be?", I thought, as I outlined a little cleanup script. "Find tags that match a pattern and are older than X days, delete the tags, and then garbage-collect." I didn't even need to do the last part myself; we already ran registry garbage collection on a weekly basis! Turns out it was the second step that proved to be a bit of a rabbit hole. You see, deleting a Docker image tag is apparently not trivial business. This post covers my exploration of various Container Registry clients and their different approaches to accomplish what you might have thought was one of the most trivial registry operations you can perform.

Background

First, I'll share some simplified Container Registry concepts that should help with understanding some of the details below.

Every image in a registry can be referenced by a unique, immutable digest which is generated based on the image's contents. Images can also be referenced by tags, which are generally user-selected. A single image can be referenced by multiple tags (consider the official Python image where you have the 3.12.2-bullseye, 3.12-bullseye, and 3-bullseye tags all referencing the same image, and the user may select one based on how tightly they wish to pin their application to a given Python version). Docker tags are mutable and can be changed to point to some other image at will. If you're familiar with git, I've found a reasonable analogy to be that Docker digests are like git commits while Docker tags are like git branches (not git tags, which are meant to be immutable).

The Mission

So my goal was to write a hopefully simple shell script that could leverage an existing Container Registry client CLI to list out tags, determine their ages, delete old tags, and then let garbage collection reclaim that sweet sweet disk space. After some searching around, I found the following clients that all looked promising:

With the exception of Deckschrubber, all the tools listed are quite widely used, actively maintained, and implement all the essential commands you'd expect for interacting with a remote Container Registry. I found Deckschrubber appealing as well, though, as it actually functions at a higher layer, acting as an image cleanup tool out of the box rather than just providing primitives. It's also implemented on top of the registry client library present in the Distribution Docker Registry that we were using, leading me to hope that it would be more reliably compatible with any of the service's quirks.

The Problem

So how hard could deleting a little ol' Docker tag be? This is just the tag, mind you, not the underlying image itself. I would expect our periodic garbage collection to handle image deletion once there are no tags left referencing it. Well it turns out that the Docker Registry API specification entirely lacked a tag deletion endpoint until 2021. And even 3 years later, our Container Registry of choice, Distribution, hasn't yet landed the new endpoint in a stable release. With that being the case, the clients I surveyed each had their own interpretation for what should be done if a user tried to delete a tag reference. Let's explore each of them.

For each tool tested, we'll assume a repo has been set up in a private registry as follows:

docker pull busybox:glibc
docker pull busybox:musl
docker tag busybox:glibc my-docker-registry/busybox:glibc1
docker tag busybox:glibc my-docker-registry/busybox:glibc2
docker tag busybox:musl my-docker-registry/busybox:musl
docker push -a my-docker-registry/busybox

Crane

Crane has a delete subcommand with the following description:

> crane delete --help
Delete an image reference from its registry
-- snip --

I would generally consider a string like my-docker-registry/busybox:glibc1 to be a valid image reference, yet after running the command we see the following error:

> crane delete my-docker-registry/busybox:glibc1
Error: DELETE https://my-docker-registry/v2/busybox/manifests/glibc1: DIGEST_INVALID: provided digest did not match uploaded content

I'm not the first to encounter such an error, and there doesn't appear to be a satisfactory approach to deleting just a tag.
One possible alternative here is to fetch the digest of the tag in question and then delete that. If you do that, though, you'll find that any other tags that were pointing to the specific image you deleted are now also gone!

> crane ls my-docker-registry/busybox
glibc1
glibc2
musl

> crane delete my-docker-registry/busybox@$(crane digest my-docker-registry/busybox:glibc1)

> crane ls my-docker-registry/busybox
musl

This is the core of the trouble behind tag "deletion" as implemented before 2021. You couldn't delete just a tag, you also deleted the underlying image. Fortunately, crane forces you to be explicit with what you're deleting, requiring you to specify a digest rather than just a tag. With this approach, the user is less likely to be surprised by the deletion of the entire image and all its tags.

Skopeo

Skopeo is more flexible with what you can tell it to delete. Its delete subcommand has the following description:

> skopeo delete --help
Delete an "IMAGE_NAME" from a transport
-- snip --

I do prefer it being clear that you're deleting an "image" rather than an "image reference". So does IMAGE_NAME here include the : format? Let's find out:

> skopeo list-tags docker://my-docker-registry/busybox
{
    "Repository": "docker://my-docker-registry/busybox",
    "Tags": [
        "glibc1",
        "glibc2",
        "musl"
    ]
}

> skopeo delete docker://my-docker-registry/busybox:glibc1

> skopeo list-tags docker://my-docker-registry/busybox
{
    "Repository": "docker://my-docker-registry/busybox",
    "Tags": [
        "musl"
    ]
}

And just like that, your tag is gone, but so are the others! Ultimately, skopeo is performing the same operation as crane under the covers, resolving the tag to a digest and then deleting said digest. This is both more and less surprising than Crane's behavior. Skopeo does indicate you're deleting an image by name rather that just an image reference, but to have multiple tags disappear when only one was specified is still an unpleasant discovery.

Regctl

Regctl is quite an interesting case. It's clear the author wasn't content to be limited by a restricted spec, as evidenced by the following feature listed in the project's readme:

Delete APIs have been provided for tags, manifests, and blobs (the tag deletion will only delete a single tag even if multiple tags point to the same digest).

How did he manage to pull this off? Fortunately, the author is quite clear on his approach in the command documentation:

> regctl tag rm --help
Delete a tag in a repository.
This avoids deleting the manifest when multiple tags reference the same image.
For registries that do not support the OCI tag delete API, this is implemented
by pushing a unique dummy manifest and deleting that by digest.
If the registry does not support the delete API, the dummy manifest will remain.
-- snip --

Well isn't that clever! Indeed, if we give it a try, we'll see that only the specified tag is deleted:

> regctl tag ls my-docker-registry/busybox
glibc1
glibc2
musl

> regctl tag rm my-docker-registry/busybox:glibc1

> regctl tag ls my-docker-registry/busybox
glibc2
musl

One notable side effect of this more involved approach to deletion is that the command is noticeably slower. Ultimately, this best fits expectations for what should happen when deleting a Docker image tag. It certainly would have been my choice for our cleanup script if I hadn't stumbled across the final choice in this list.

Deckschrubber

And finally we have the (somewhat) black sheep, Deckschrubber. Instead of offering commands to list tags and delete images, Deckschrubber is just a single command which accepts various parameters that inform it of which tags you'd like to clean up or keep. For instance, the following command will delete all the tags on the image we've been working with:

> deckschrubber -latest 0 -tag ".*" -registry "https://my-docker-registry" -repo "^busybox$"
INFO[0000] Successfully fetched repositories.            count=1 entries="[busybox]"
INFO[0000] Marking tag as outdated                       fields.time="2023-05-18 22:34:17 +0000 UTC" repo=busybox tag=glibc1
INFO[0000] Marking tag as outdated                       fields.time="2023-05-18 22:34:17 +0000 UTC" repo=busybox tag=glibc2
INFO[0000] Marking tag as outdated                       fields.time="2023-05-18 22:34:17 +0000 UTC" repo=busybox tag=musl
INFO[0000] All tags for this image digest marked for deletion  repo=busybox tag=glibc1
INFO[0000] Deleting image (-dry=false)                   digest="sha256:db16cd196b8a37ba5f08414e6f6e71003d76665a5eac160cb75ad3759d8b3e29" fields.time="2023-05-18 22:34:17 +0000 UTC" repo=busybox tag=glibc1
INFO[0000] All tags for this image digest marked for deletion  repo=busybox tag=musl
INFO[0000] Deleting image (-dry=false)                   digest="sha256:45561defaa53c6364b822f1782dae76b2a38c375a28b6a89b814c152eb6e2f6e" fields.time="2023-05-18 22:34:17 +0000 UTC" repo=busybox tag=musl

> regctl tag ls my-docker-registry/busybox

More relevant, though, is how Deckschrubber behaves if we tell it to only delete the glibc1 tag:

> deckschrubber -latest 0 -tag "^glibc1$" -registry "https://my-docker-registry" -repo "^busybox$"
INFO[0000] Successfully fetched repositories.            count=1 entries="[busybox]"
INFO[0000] Marking tag as outdated                       fields.time="2023-05-18 22:34:17 +0000 UTC" repo=busybox tag=glibc1
INFO[0000] Ignore non matching tag (-tag=^glibc1$)         repo=busybox tag=glibc2
INFO[0000] Ignore non matching tag (-tag=^glibc1$)         repo=busybox tag=musl
INFO[0000] The underlying image is also used by non-deletable tags - skipping deletion  alsoUsedByTags=glibc2 repo=busybox tag=glibc1

It doesn't delete any tags! Because Deckschrubber implicitly pulls down all image tags to compare against the specified regex, it can also inspect those tags to see what digests they reference. If there are any shared digests, then none of the tags in question are deleted. I found this approach to be conservative and straightforward; it doesn't delete images unexpectedly, nor does it require any tricky tag manipulation. That combined with the existing cleanup-oriented functionality of Deckschrubber made it a natural pick for whipping up a quick cleanup script for our Container Registry.

Conclusion

It turns out things aren't always as simple as they seem (and if you've actually seen the underlying Container Registry API you would probably have known from the beginning that this wouldn't be so simple). And heck, I didn't even discuss manifests, manifest lists, blobs, layers, or any of the other artifacts you can find buried within a Container Registry. I hope you learned something from the post, nonetheless. Maybe you found a new tool to try out, or maybe you've resolved to leave the Container Registry maintenance to the experts.

How Hard is it to Delete a Docker Tag? was originally published in Angle of Attack.

Blast from the past: FlightAware’s Terrestrial ADS-B Network

Mon, 04 Mar 2024 13:30:05 -0600

Eric Tran is an Engineering Manager at FlightAware. He leads a team within FlightAware’s Operations and Reliability group that is responsible for building and maintaining our global network of ADS-B receivers.

This blog was posted back in 2021. Since then, we've added many new receivers and have seen increased interest in FlightAware's ADS-B Network. With this in mind we updated some aspects and wanted to reshare the content. In addition, we have new posts in the works that go deeper into the technical aspects of ADS-B engineering at FlightAware, so stay tuned! - Eric T.

How It Started

In the early 2000s, there was an emergence of a new aircraft surveillance technology known as Automatic Dependent Surveillance–Broadcast, or ADS–B. This new technology eventually replaced radar as the main surveillance method for tracking aircraft. Aircraft with ADS-B transponders obtain their location from GPS and broadcast it on the 1090 MHz and 978 MHz frequency, which allow air traffic control (ATC) to receive this data and use the information to manage nearby airspace.

The original founders of FlightAware took advantage of this new opportunity and created an ADS-B receiver that could receive and process this raw flight data and forward it to FlightAware over the Internet. As technology developed over time, we’ve engineered more cost effective, performant receiver solutions that allowed us to scale our network to over 35,000 nodes across the world, all of which are hosted by a community of dedicated volunteers.

What’s in our ADS-B Receivers?

The Hardware

The core components consist of a Raspberry Pi Computer, a USB RTL2832U Software Defined Radio (SDR), and a 1090 MHz antenna. We developed our very own line of SDRs, which we call the Pro-Stick and Pro-Stick Plus. While both models have built-in RF amplifiers to maximize ADS-B performance, the Pro-Stick Plus has a built-in 1090 MHz filter to reduce noise in high RF environments.

FlightAware provides two ADS-B hardware solutions. The first is an open-source solution, called PiAware, that anyone can build themselves with full autonomy by gathering the hardware and loading FlightAware ADS-B decoding software on it. The other solution is called FlightFeeder, which is manufactured by FlightAware. The FlightFeeder is self-configuring and remotely managed by FlightAware, allowing us to provide software updates and troubleshooting support.

Left - PiAware Right - FlightFeeder

The Software

At a high level, there are four software components that work together to provide the main functionality of our ADS-B receivers. These components are responsible for decoding and processing aircraft signals, transmitting the tracking data to FlightAware, and providing a map interface to display the aircraft the receiver is picking up in real-time. All the software is open-sourced and can be found on FlightAware's Github.

dump1090-fa is the program that demodulates and decodes aircraft transponder messages received from a connected RTL2832-based Software Defined Radio. To consume this data, clients can connect to specific network ports to stream the decoded messages in a variety of formats. Examples of the data formats and their respective ports:

TCP port 30002 for raw/unparsed messages in AVR format
TCP port 30003 for parsed messages in BaseStation format
TCP port 30005 for raw/unparsed messages in Beast binary format

The information that can be derived directly from these aircraft messages includes position, altitude, squawk code, aircraft identification, airborne vs ground messages, speed, heading, roll, and more. With additional server-side processing, we can use this data to derive other information like weather.

piaware is the program responsible for formatting and relaying aircraft data to FlightAware servers. It starts up at boot-time and connects to localhost:30005 to consume the Beast formatted data from dump1090-fa. It then establishes an encrypted TLS connection with FlightAware servers to transfer that flight data and sends regular heartbeats, providing system and health information such as CPU load, temperature, uptime, etc. This information allows us to monitor our network health and notify hosts about any issues with their receiver.

SkyAware is a web-based JavaScript application that is bundled with dump1090-fa. This application reads aircraft JSON data produced by dump1090-fa, plots the data on a map interface, and provides the user a view of the aircraft that his or her receiver is picking up. The interface provides detailed information about the aircraft and can be customized and filtered based on what the user is interested in tracking.

SkyAware Web Interface

For easy installation, we’ve bundled all the necessary software into a pre-built Raspberry Pi OS Lite image that users can load onto a micro-SD card for their Raspberry Pi. FlightAware maintains our own apt repository, making it easy for our users to download and install our software using Debian’s Advanced Package Tool (APT).

Multilateration (MLAT)

Not all aircraft are ADS-B equipped and, as a result, cannot broadcast their location. However, through the use of multilateration, FlightAware can derive an aircraft’s location using the 1090 MHz Mode-S transponder signals being emitted from the aircraft. By using the known location of 4 or more ADS-B receivers on the ground, we can calculate the distance an aircraft is away from each of the receivers. We achieve this by using the time it takes for the Mode-S signal to propagate from the aircraft to the receiver and the propagation speed of the signal. With those distances, we can derive the location and track these aircraft that are not trackable via ADS-B.

Visual diagram of MLAT

fa-mlat-client is the program within piaware that selectively forwards Mode-S messages to dedicated MLAT servers at FlightAware to perform MLAT calculations. As an incentive for hosting a receiver, we return the MLAT results back to the receiver, which can be displayed on SkyAware and streamed on the following TCP ports:

TCP port 30105 for multilateration results in Beast binary format
TCP port 30106 for multilateration results in extended BaseStation format

We hope this gave you a better understanding of how FlightAware’s real-time flight data is sourced and generated from our terrestrial ADS-B network. The ADS-B team is working on modernizing our technology stack and building out new hardware and software to improve our network and provide our users new and interesting ways to interact with flight data. Stay tuned for future blog posts from the team!

Blast from the past: FlightAware’s Terrestrial ADS-B Network was originally published in Angle of Attack.

Part I: Mocking database logic for unit tests in Go at FlightAware

Mon, 19 Feb 2024 12:58:11 -0600

Lee Obuli is a software engineer on the AeroAPI team who specializes in driving testing efforts with Go at FlightAware.

Go is one of FlightAware’s four core programming languages. It stands out because of its relatively easy learning curve and great performance. Since FlightAware needs a performant solution for delivering massive amounts of data, Go is a great choice. Speed is not the only requirement for our code; it should also behave correctly and reliably. To deliver high-quality code at FlightAware, one essential technique we use is unit testing. If you are interested in learning more about the principles of unit testing, this Codefresh article is a great resource. This blog will focus on a mocking technique for effective unit testing in Go.

FlightAware relies on PostgreSQL databases to deliver the data that we use on the website and in our APIs. Testing the interactions between Go code a database introduces an external dependency. This makes it challenging to test the code’s behavior without a backing database. Mocking database interactions allows developers to create simulated data objects that mimic the behavior of real databases. This provides control of data inputs and outputs to test the code, without having to instantiate an actual database.

Example program

Here is an example of Go code that executes a simple query to get flight data by passing in a flight ID. The function should be tested to ensure its correctness.

The code above starts with defining a Flight type, which represents the object returned by the database. Next, it defines a PostgreSQL client type (PostgresSQLClient) which contains a connection field represented by an interface. This interface must implement the methods that will be used later. An interface is like a blueprint that defines a set of method signatures. It outlines what methods a type must have, without providing the actual implementations. Interfaces enable the use of dependency injection since there may be multiple implementations for the same interface. After the example Go code creates a PostgreSQL client, it calls the method GetFlightByID. For those wondering, c *PostgresSQLClient is the receiver of the method, indicating that the GetFlightByID method is associated with an instance of the PostgresSQLClient.

The example uses Go's type methods to implement the actual work of the application — to fetch a flight by its ID. As a method, it can use the private conn instance, which can be either a real PostgreSQL connection or a mock one. This technique works great for dependency injection to easily switch the implementation of dependencies. When testing, a mock that implements the same interface as a real database controls the interactions and the test verifies the results. With this decoupling, the actual database can be swapped out (e.g., using pgx in production and pgxmock in tests) without affecting the rest of the codebase.

Writing a simple unit test

The next section of code will test the example. I prefer to use pgxmock since it provides a convenient API for setting up and verifying results on database queries. Building the logic to manually mock every database call is cumbersome, error-prone, and time-consuming. While manual mocking may be feasible for simple cases or small projects, a dedicated mocking library that repeatedly mocks a dependency can be used as a project grows in complexity. This helps with writing more maintainable, reliable, and expressive tests for database interactions in an application.

Here is the test for the GetFlightByID function above:

Notice the test verifies a small function that fetches a flight. It ensures that the result values are correctly returned and that all database expectations are met (for example, there are no additional queries that the test didn’t account for).

Explanation of the pgxmock library

Let’s look at how pgxmock helps with mocking the database calls:

NewRows creates mocked SQL query rows from a Go string slice. A slice in Go can be considered an array that can increase or decrease in size. Notice in the example columns are passed in by string slice into the NewRows function.
AddRow adds rows to the result set of the expected query. The mock data to return should match the number of columns defined in NewRows.
ExpectQuery defines the expected SQL query by using query string matching.
WithArgs will match given expected args to actual database query arguments and if at least one argument does not match, it will return an error.
WillReturnRows specifies the set of resulting rows that the triggered query will return.
Then, after executing the function that uses the mock database, ExpectationsWereMet ensures that only the queries expected by the mock were executed during the test.

Conclusion

Mocking downstream functions saves developers time by avoiding the complexity of managing the lifecycle of the mocked objects. The developer can focus on what matters most, which is correctly functioning code. Writing unit tests in Go is not only a best practice, but a fundamental aspect of building reliable and maintainable software. There are several Go mocking packages such as testify, gomock, and mockery. A future blog will focus on utilizing mockery to generate the mocks for the code. These tests (including the example in this blog post) already use the testify package, so incorporating the mockery package will be more straightforward. Stay tuned!

Part I: Mocking database logic for unit tests in Go at FlightAware was originally published in Angle of Attack.

Managing a Technical Transformation (Part 1)

Fri, 12 Jan 2024 15:00:17 -0600

Jonathan Cone is Vice President of Engineering at FlightAware. He has been leading the organization through a technology transformation as it evolves both its products and technical stacks.

Where Did we Start?

When I joined FlightAware almost seven years ago, I knew from the interview process that they used a scripting language called TCL. Before my interviews, I’d never heard of TCL, but I investigated the language before I started, and it seemed straightforward. Sure enough, after joining I was able to become effective within TCL within a few short weeks, but it would be a couple of years before I might start to say I had reached an expert level. This journey was a common one for FlightAware, as most of the technical stack is written in TCL, and you are unlikely to hire engineers with experience in the language. So, this ramping up with the language was baked into the onboarding experience.

Now it wasn’t quite true to say that TCL was the only language in use when I joined FlightAware. There is of course the website, and while the backends were all written in TCL, the frontend was a mix of jQuery, JavaScript, CSS and html. There was some early work going on with React, and that would be increasingly utilized over the next few years. We also used Java for some feed ingestion where a customer was providing a WebSphere MQ interface. So not everything was written in TCL, but around 90% (not a fully accurate number, but illustrative of the amount of TCL code).

If you are wondering why the company had built its stack on top of TCL, the answer is straightforward. TCL was the language that the founders knew best and could utilize to quickly develop a flight tracking application back in 2005 when they were launching the site. And it turns out you can do quite a bit with TCL. It’s an extensible language, and the company had been able to create libraries where needed to support their needs while also supporting the wider community. In fact, if you go and look at FlightAware’s public GitHub account you will see that most of the repositories listed there are various TCL extensions and libraries. And the language can be very performative (at least for a scripted language), allowing FlightAware to process its global data feed in real time using a variety of TCL problems within a data processing pipeline.

The Transformation Starts

But this blog post isn’t about TCL and how everyone should be using it, but rather it is about FlightAware’s shift away from TCL. So first let me say, I have no bones to pick with TCL. I have used it for years, and it is still a language I’ll utilize for certain tasks (especially small analysis where I know all the library calls off the top of my head). But, as the volume of data we are processing has increased, and our need for greater performance, scalability, and tooling has expanded over these past seven years, we have reached the limits of TCL’s capabilities. Because of that ceiling, we have increasingly found ourselves utilizing new languages to solve technical problems at FlightAware. One of my first blog posts was about our shift to C++ for Firehose (FlightAware’s streaming API) because of performance requirements for that service. This was back in 2018, and did not mark any seismic shifts at FlightAware, but was an early indication of things to come.

Within a year or two of that Firehose rewrite, we would see the inclusion of Python as an officially supported language. One of our Engineering directors at the time lobbied for that change following the development of our Machine Learning (ML) ETA models. This was a case where we did not really consider TCL for that application, as the ML community had already built out the tooling for those applications in Python (and other languages) so the investment in TCL to have it support the same would be a huge sunk cost. That ML work demonstrated the value of using programming languages supported by a wider community. With Python, there is almost always at least one popular library for most common tasks and integrations. For example, there is more than one Kafka client library that we can use out of the box. When we first started using Kafka at FlightAware, we had to develop our own client library, wrapping the Kafka C client library with a TCL extension. That also means we had to maintain that client library, updating it whenever a new version of the Kafka C client was released (in theory at least, I don’t know that we were quite that diligent). This is a core advantage that Python or other popular languages will have over TCL: they have a wide community of contributors using the language to solve problems and, therefore, the support libraries already exist for most of the integrations we need. There were other advantages to Python too, the language has linters, static code analysis tools, debuggers and more. These are the things we dreamt of having for TCL, but the investment to add them was more than FlightAware alone could shoulder. The inclusion of Python into FlightAware’s mix of languages was an exciting development, and we all took advantage of that to write new code in Python when possible (there is also the question of how we ensured that engineers were knowledgeable about Python, and we used Alliances to affect that knowledge transfer).

Accelerating the Transformation

Up to this point, I do not believe the language shift at FlightAware was part of a deliberate effort to move away from TCL. The inclusion of new languages (C++, Python) had been driven by specific needs (performance, library support, etc.) and ground-up efforts. There are a few more instances of this. We utilized Haskell for some applications where parallel processing was a key feature and started experimenting with Rust as a replacement system level language for C++ with better memory safety. These experiments and bottom-up efforts were happening during the COVID period and were followed by some significant changes to the company itself (acquisition by Collins Aerospace in 2021).

Coming out of that period of change and uncertainty, we made our first high-level decision to shift a significant portion of our codebase away from TCL. We called this first effort, WebNXT, as the goal was to reimagine FlightAware’s web stack utilizing modern languages, frameworks and tooling. We focused first on the Web stack as that was an area where we were acutely feeling the pains of TCL. There had been significant improvements to web frontend and backend technology over the previous 10 years, so the gap between capabilities of our existing stack and a new modern stack was particularly painful. This was especially true for new hires who were accustomed to a better developer experience; the TCL stack was a real frustration.

Chart a Path Forward

So, in the 2021 company Town Hall our CTO laid down that marker, that we would embrace a technical transformation away from our existing web stack and to something new. But what would this new stack look like? To be fully transparent, we didn’t really know yet. We knew all the pain points with the existing stack, and we knew there were options out there which could improve life in several ways, but there wasn’t just one answer to that question, and we did not have a clear solution in mind yet. We did know that we were going to use React for our front-end applications, but that was only part of the answer. There are lots of additional tools and libraries you can use to flesh out your front-end stack, and that remained an open question. We thought we wanted to switch to using Typescript since React supports it, so that was fairly certain. We were especially uncertain regarding our data backends. Did we want to use node for our backends? Probably not for everything, but maybe in some places. What would we use for our core flight data backends? The list of questions went on and on.

At this point I was just a spectator to this process, but little did I know it would become a focus of my life in a few short months. With the organizational changes happening following FlightAware’s acquisition, I moved into one of the company’s Director of Engineering positions and took on the web team as part of my responsibilities. Prior to that transition though, the team had decided that it would be best to test out a new web stack outside of the main FlightAware web stack as a starting point. Conveniently enough, we had a project we were working on at that time that would need to have an independent technical stack, so we used that as a testing ground.

Some issues were settled early in that experimental project. We would use Typescript for our React work. We like the strict typing and the way that helps us prevent bugs at runtime. We would come to find that that is one gripe we have about Python, the lack of strict typing (Yes, I know you can accomplish it in Python. Most of our code is not annotated properly to achieve that end and it does require a fair amount of overhead to really achieve full type coverage). We also settled on AWS as our cloud provider and had our first introduction to running web services in cloud environments.

Not everything we tried panned out, and most of our web team had been with FlightAware for the past 7, 8 or 9 years, so we did not have huge exposure to the latest developments in front-end technology. This meant we were spending a fair amount of time experimenting and having to reverse course. What we really needed were people who had that experience already.

So, we hired to fill those knowledge gaps. We had existing requisitions for two managers within the group, and we utilized those requisitions to bring in people with experience transitioning from legacy technology stacks and building modern web applications. This meant updating our job postings to include new nice-to-haves like experience with cloud computing (AWS/GCP/Azure), Infrastructure as Code, modern JS frameworks, graphql, etc. Luckily, we were able to find people with those backgrounds.

Just because you have hired people with the experience your team needs, though, doesn’t automatically mean they are going to be successful. They will need some time to build relationships and trust within the organization, and they will need your backing to propose this new direction for your engineering stack. We did not have tremendous time to accomplish that, so we put our new managers in positions to demonstrate their capabilities quickly. At FlightAware, we expect our managers to also be individual contributors for some portion of their time, and our new managers were able to demonstrate their coding chops and the benefits of their proposed approaches early by jumping in the fray. This meant they built the credibility needed when we began proposing what the FlightAware web stack would look like in the future.

Now that we had the talent in place and the relationships built to put some real definitions around WebNXT, what did we determine? We (I’m using we very loosely here, I reviewed, gave blessings and asked questions during this process) proposed the following high-level vision for WebNXT:

We would build our web applications following cloud native principles where we use containers, service meshes, microservices, immutable infrastructure, declarative APIs, continuous delivery, and automation.
Web applications would be standalone, isolated services with separated frontends and backends.
All new web applications and any major changes to existing applications will be done using WebNext methodology. Only small changes or exceptions (where it is universally agreed) will use TCL.

We also defined more specifics around what the architecture would look like, but that last point above was the crux of the vision and the most radical part. We would move on from the existing TCL stack and start fresh. That is a consequential decision given the investment in our existing codebase and the effort required to move functionality into a new paradigm.

We did not come to that decision lightly, and we spent a fair amount of time investigating and building tooling for embedding TCL interpreters so we could re-use code. However, when we decided on a cloud native approach, this really meant building applications that did one thing and did that one thing well. Unfortunately, that was not the approach we had used for much of our TCL development over the years, so we were going to have to peel apart the onion. I think of this as the “rip the band aid off” approach, and it certainly carries certain risks, but in our case those risks associated with rewriting functionality were less than those of continuing within the previous paradigm.

Next Steps

With this high-level plan in place, we went about building buy-in and executing on our plan. There have been several successes and challenges along the way, and I’ll cover that in a subsequent post. We are now about halfway through the transformation, and on the cusp of delivering FlightAware’s website based on the new stack and methodology. That will be an exciting achievement, and I look forward to sharing more details on that process soon.

Managing a Technical Transformation (Part 1) was originally published in Angle of Attack.

Revisiting the Future Leaders of FlightAware

Mon, 04 Dec 2023 10:47:42 -0600

Chadd Mikulin is the Vice President of Engineering at FlightAware. For over fifteen years, he has helped grow and promote leaders in the organizations of which he’s been part.

Since I wrote the two posts (here and here) concerned with growing leaders in the FlightAware organization, a number of things have changed, two of which are that FlightAware fully embraced remote work and was also acquired by Collins Aerospace.

These things had a couple effects on the course. First, it made it even more crucial that new managers to the organization, or those wanting to pursue management, are taught the skills that it takes to be effective leaders. One can no longer count on leadership through osmosis since the workforce is remote and there aren’t opportunities to be around your people all the time.

Second, it’s given us an opportunity to spread the course out across the larger organization. We’ve done this by including Collins employees in the program and training new individuals on how to teach the course. Both of these steps have ensured that the quality of management in the organization remains high, and makes it faster to have the practices adopted across Collins.

Survey says?

The course has evolved some over the three years since we last wrote about it. We’ve trimmed the content from the source material down a bit, changed the order of some things, and added some self-generated content that addresses some common questions that we were getting from participants.

How did we know what to change? We asked. At the conclusion of each iteration of the course, we send out an anonymous survey to the participants. We ask about what they liked (and what they didn’t) about the source material and the format of the course. We also ask if there was anything they wish we’d covered or anything they think should be removed.

The input given from each survey helps to inform and refine the next iteration of the course. Just like in our software, we’re always iterating to get to a better product.

Changes to the Course

I said the course has evolved over the past three years, so what actually changed? First, we cut it from ten sessions to eight and made it weekly. This made it a little easier to schedule and isn’t as big a commitment of time. It doesn’t sound like that big a change, but it was meaningful.

Second, to make up for the shortened schedule, we combined some of the video lessons and reduced the material. We now ask the participants to watch lessons one and three of the Effective Manager Video course in the first session. This gives them a good introduction to the material and the timeline on how they should implement the system in one session instead of two. We also removed chapters one and three of The Effective Executive from the reading list. These are still worth reading, and we encourage everyone to finish the book on their own time, but something had to go to reduce the course by another session. The chapters that we did keep in, time management, hiring, and figuring out how you can have the greatest impact on the organization, are important concepts, and were often brought up as the most important from the folks taking the survey.

Lastly, we added some content that we generated ourselves on topics that were frequently asked for: career paths, succession planning, and annual reviews. We tried to keep this material more general and less like a how-to. We’re interested in the theory and concepts, after all, they can read the specifics of how to do the thing in our process documentation repository.

Overall, the changes made to the course were small but meaningful.

Train the Trainer

As I mentioned above, we’ve had an opportunity to spread the program across the larger organization. One way we’ve accomplished this is by training others on how to conduct the course. This, too, is an eight-week endeavor that mirrors the lessons outlined before. We’ve created lesson plans that summarize the material, give discussion prompts, questions to gauge understanding, and exercises that the instructor can give to the participants to help with practice.

Teaching someone to teach is a new concept for me, so I’ve relied, again, on a post-course survey to refine the approach and material. We’re about to graduate our second set of new trainers, and soon we’ll have six individuals that can teach the course. That will allow us to teach about 65 people next year, and we’re continuing to add more trainers.

It’s been a great way to introduce our methodology into the larger organization and has been really well received. There’s a great opportunity to influence the future of the organization because having good leaders is crucial to retaining and growing the workforce.

What next?

Keep on keepin' on. We’re going to continue to refine the course, based on feedback, and continue to train more trainers. We want to be able to spread the methodology as far and wide as possible.

I’m also working on the next course that will be for managers who are looking to grow their managerial skills further. I’m thinking of it as more of a 300/400-level course where the current one is a 100/200-level. It will cover topics that are beyond the everyday, looking to more strategic work that a manager can do to make their teams/groups/departments more successful. Stay tuned.

Revisiting the Future Leaders of FlightAware was originally published in Angle of Attack.

Everything you wanted to know about SSL/TLS/PKI (But were too busy to ask)

Mon, 06 Nov 2023 10:22:43 -0600

Encryption is all around us. From the websites we visit to the emails we send; from important business and governmental contracts to automated API interactions, SSL/TLS and PKI are there to ensure we can conduct our business safely and securely. Or so we are told. But how do these inscrutable yet omnipresent technologies work? What benefits do they actually provide? And what are some of the risks and pitfalls of their implementation? Most importantly, what is the difference between SSL and TLS, really? Keep reading my dear reader, and you will uncover the deep mysteries and arcane knowledge of these critical components of our online (and offline!) internet infrastructure.

SSL vs TLS

The story of SSL, which stands for Security Sockets Layer begins back in internet dark ages of 1995 when it was first developed at Netscape. SSL was developed as a protocol to establish the identity of a remote party (namely a webserver) and come to an agreement on the cryptographic parameters for establishing an encrypted communication channel. The first version of SSL was never formally published due to significant security weaknesses identified early on, but SSL 2.0 was formally released in early 1995. It too, was quickly discovered to have serious security flaws, and in 1996, SSL 3.0 was published by Netscape. At the time, Netscape was the leading web browser, but faced serious and significant competition from Microsoft with its Internet Explorer. Microsoft had started working on its own version of SSL called PCT, and in an apparent attempt to head off a “now we have 15 standards” situation, the IETF agreed to take ownership of the protocol from Netscape. As part of that agreement, they agreed to make some changes – including renaming it to TLS. Thus, in 1999 (the same year as the first Matrix movie, if you want to feel old – or young), TLS 1.0 was released as RFC 2246. The rest is, as they say, history.

How Does TLS Work?

Excepting the STARTTLS situation, which we will get to later, the TLS protocol starts immediately after the TCP 3-way handshake has completed with the client sending a ClientHello. This message contains information about which protocol version(s) the client supports, what types of encryption parameters are supported, compression parameters, a random number, a session identifier (if an existing session is being resumed/restored), and a number of other possible extensions depending on the TLS version used and the client configuration. The server then responds with a ServerHello message that confirms the protocol version, compression, and the encryption parameters to be used which are selected from the lists provided in the ClientHello. This message also includes a server random value and potentially one or more extensions depending on what was sent by the client. The server then sends a ServerCertificate message containing its certificate. Depending on the cryptographic protocols selected and the type of key used in the certificate, the server may also send a ServerKeyExchange message with additional information needed to establish a shared key for encrypting the data. The server then sends a ServerHello Done message to indicate it has no more data to send for setting up the key exchange. If the client has a certificate to send for client-authentication, it will send a ClientCertificate message potentially followed by a ClientKeyExchange message. The client will then send a ChangeCipherSpec message telling the server that it has computed the shared encryption key and will start sending all future traffic encrypted using the parameters set in the handshake. The client will then send a Finished message, fully encrypted with a hash over all the previous data sent that the server then verifies. If that verification succeeds, the server will then send its own ChangeCipherSpec and Finished messages and then all future data between them will be encrypted.

Nice and complicated, right? And that was without any mention of the numerous TLS extensions like SNI or ALPN. This complexity is, arguably, one of the reasons for a number of vulnerabilities which have mostly led to adding even _more_ complexity in order to mitigate.

Thankfully, the authors of the latest TLS specification, TLS 1.3, have taken the opposite approach. TLS1.3 reduces the complexity by enforcing only a small set of known secure ciphers and key exchange algorithms which allows the client and server to more easily pick one and setup all the necessary parameters without a bunch of coordination. Notice in the diagram above how there are two round trips (the 4 green arrows)? With TLS1.3, the second round trip is eliminated as the client sends the ClientKeyExchange information in its ClientHello message.

Despite everything though, the practicalities of reality invariably intrude on the idealized state imagined by protocol designers. Shortly after TLS1.3 was initially released it was discovered that a number of middleware devices deployed in the wild did not like the changes in the protocol and would cause connections to fail. These middleware devices are generally corporate-owned network security devices that transparently intercept and decrypt the data so the underlying protocols can be inspected. Replacing an updating them would likely be infeasible and significantly delay the implementation. So the designers at the IETF went back and modified the TLS1.3 handshake to look more like TLS1.2 as you can see in this packet capture below:

The client is indicating that this is a TLS1.2 Hello packet, but an extension is used to indicate that it also supports TLS1.3. When the server replies, it does the same thing, though the list of versions in the extension has been reduced to one: the chosen version.

Certificates

TLS uses two different types of encryption to provide security for a connection: asymmetric cryptography (also known as public key cryptography) and symmetric cryptography. Symmetric cryptography is very fast and suitable for streaming lots of encrypted data back and forth – but it requires both parties have a single, shared key. How do you safely communicate a shared key across the internet where anybody could be listening? Well, you use asymmetric cryptography, of course! With asymmetric cryptography the key is split into two parts: a public key that can be shared and a private key that must be kept secret from everyone. These two parts are linked by some mathematical magic (this is not the mathematics/cryptography blog entry you are looking for) such that something encrypted using the public key can only be decrypted by something with the private key and vice versa. This property is critical as it allows the client to send data needed for setting up a symmetric key to the server by encrypting that data with the server’s public key. The client knows only the server can decrypt that data, so everything is safe. But wait, how do you get the server’s public key? And how do you know you have the right public key for the server you are trying to talk to? That’s where certificates come in!

An important component of the TLS handshake is the server supplying its certificate. Certificates contain the server’s public key which is needed to securely generate the shared session key – the symmetric cryptography key. But a certificate contains much, much more than just a public key. It also supplies identity information about whose public key you’ve got. But hey, anybody can write “Steve” on a label and stick it to their shirt – how do you know that just because the certificate says it’s for www.flightaware.com that it really, truly is? This is where Certificate Authorities come in – third party entities whose job it is to validate the identity of a public key holder.

We’ve all seen some version of this error before. For some reason, your browser has decided it doesn’t “trust” the certificate presented by the server. Why? Your browser (and operating system, and phone, and sometimes individual applications) all have a list of trusted certificates. Certificate Authorities spend a lot of money convincing the companies and organizations that manage these lists that they are following all the necessary security practices for managing a CA and properly validating the identity of those requesting a certificate. The CA/Browser Forum is a group of CAs and browser vendors that meet to set these standards.

That setup works fairly well to help ensure that public CAs continue to invest in maintaining the security standards and rigorous processes to maintain their trusted position in the ecosystem. But what if you just happen to operate a network of over 35,000 ADS-B receivers, that you want to maintain trusted connections to, but don’t need the trust assurances of a 3^rd party? Or maybe you have internal systems that aren’t connected to the internet and are thus difficult to get verification from a 3^rd party? In those cases, you may want to operate your own CA and build an internal public key infrastructure (PKI). From a technical perspective, this is not difficult to do at all – create a self-signed certificate with the “Certificate Authority” constraint set to “true”, put the cert in your clients’ trust stores, and then start signing client certificates. There are plenty of guides on the internet that can explain how to use a tool like OpenSSL to generate and sign certificates. Most of the work is in managing the policy and administrative functions: What information needs to be in the cert? Who will issue certs? How will the identity of the requestor be verified? How will you manage revoking certs with compromised/lost private keys?

Anatomy of a Certificate

Though a certificate is conceptually simple, the X.509 standard that defines the format of a cert has been extended and expanded over the years to accommodate a diversity of use-cases and to address various security issues. Lets take a closer look at the parts of a certificate, using our very own cert from www.flightaware.com for an example. Normally, a certificate is stored in a binary format, the structure of which is defined by an ASN.1 specification and serialized per Distinguished Encoding Rules (DER). That’s generally not a very friendly format to work with, and so most commonly the binary data is converted to text with base64 encoding and is stored in Privacy-Enhanced Mail (PEM) format:

Even that is a bit inscrutable, so we will use a handy openssl command: openssl x509 -in -noout -text to convert that into a nice human-friendly format.

First, we have some information about the certificate: the version of the x509 specification used (version 3) and a serial number assigned by the CA that must be unique across all the certificates generated by that CA.

Next, we have some identity information: the issuer is the CA certificate that signed us. The subject is the owner of this certificate. Note that these fields are both in Distinguished Name (DN) format – if you’ve ever used LDAP these fields will look familiar. That’s because X.509 is part of the X.500 standard developed by the telecommunication industry that was intended to be part of a global directory service – a white pages for the internet age. Due to a number of technical, legal, and administrative hurdles, this global directory never fully materialized, but the X.509 standard was adopted by the IETF for use on the internet.

You’ll notice there are also two dates specified, known as the “validity period”. Though it is commonly accepted that this end time is in place to protect against unknown loss or theft of the certificate’s private key this is not the intent and, in most cases, not the actual practice of managing X.509 certificates. The validity period is specifically for the certificate, not for the related public/private keys. Most CAs will accept a renewal with the same public key (and thus the same private key) and many will even make the renewal process easier by saving/re-using the same exact certificate request information (which includes the public key). Thus, the validity period is best described as the time period during which the CA is willing to assert the identity information associated with a certificate is valid. If credential theft or loss is a concern, a new public/private keypair should be generated for every new certificate validity period.

Continuing on, we have the public key information itself which indicates the size and algorithm used to generate the key. In our case, it’s a 2048 bit RSA key.

Subject Key identifier: base64 encoded string Key Usage (critical): Digital Signature, Key Encipherment" loading="lazy" width="587" height="481">

The next section is the X.509v3 extensions which will differ depending on the CA and how they are setup. A few elements should always be present though. Basic Constraints should include information on whether the certificate is able to sign other certificates or not. In our case, it is set to FALSE as this is a server certificate. The Key Usage and Extended Key Usage should be set depending on the purposes the certificate is intended for. Modern certificates should also have at least one Subject Alternative Name (SAN) set that matches the CN in the Subject section above. Additional names or IP addresses may be specified if the server is in a shared environment that provides a number of services or services under other aliases. When a client connects to a server, it checks that the Common Name (CN) in the subject or at least one SAN entry matches the name of the host. If this doesn’t match exactly, you’ll get an error from the client.

There are other sections as well including OCSP and CRL locations. These two services involve managing certificate revocation. If a certificate key is known to be compromised or the certificate was issued in error, the certificate can be revoked – an indication that it is not valid and should not be trusted by any client. CRL or Certificate Revocation List is the legacy mechanism whereby a CA publishes a list of every certificate that was revoked, and clients must download and check the list. OCSP or Online Certificate Status Protocol is a more efficient variation where the status of just a specific certificate can be checked. OCSP stapling is a feature provided by some servers where the OCSP response is cached by the server and included with the certificate. Since the OCSP response is signed by the CA with the same certificate that signed the server cert, the client can trust it is valid and doesn’t have to make extra connections to validate the certificate.

Finally, the certificate is signed by the CA and this signature appears at the end along with an indication of the signing algorithm used so the client can confirm that the certificate has not been tampered with. The signature can be verified (with more math magic!) by checking the public key in the CA certificate. This process is repeated for every signing certificate until one is reached that exists in the client’s trusted certificate (or trusted root) store. This is known as the chain of trust and can, in theory, be arbitrarily long, though in practice there are usually no more than one or two intermediate CAs between the trusted root and the leaf certificate.

Its about trust

At the end of the day, all these technologies are about building and managing trust. Can you trust that the server you are talking to really is the one you want? Can you trust that the bank information you’re sending over the public internet is safe from modification or eavesdropping? Are you sure that email really is from John? Understanding how these technologies work can help you answer these questions. Though there is a lot of technical complexity involved, certificates are essentially just identity documents, not much different from a drivers license or passport. It is worth noting that while a certificate CAN tell you if the server you are connected to is named www.flightaware.com, it cannot tell you if that is the right server for tracking your aircraft. A nefarious actor could just as easily get a certificate for www.fightaware.com or www.flightwarez.com and if the certificate is valid, you’ll hear no complaints from your browser. Extended validation (EV) certificates, which show additional identity information, are one attempt at solving this problem, but they still ultimately rely on the end user knowing what the correct name should be. As the old Russian proverb goes: “доверяй, но проверяй” (Trust, but verify).

Everything you wanted to know about SSL/TLS/PKI (But were too busy to ask) was originally published in Angle of Attack.

Interviewing at FlightAware: A Guide to Joining Our Team

Mon, 16 Oct 2023 10:23:00 -0500

Jonathan Cone is the Director of Engineering over Data and Software Services.

Some of the most important decisions we make at FlightAware are deciding who will become part of our company, team, and culture. Our company was founded by aviation geeks and big data nerds, which organically brings in people from diverse backgrounds who have a passion for aviation, software, and the services we provide. FlightAware’s success has been driven by its employees and their commitment to the quality, reliability, and innovation of its products. We strive to provide an interview experience that will identify talented candidates and sell them on the benefits of working here. It’s a continuous process to fine-tune and improve the hiring process, and we have a team of people dedicated to that endeavor. In this post, I’ll introduce the overall interview process with an emphasis on the technical interview.

The Interview Process

Candidates will submit their application (with resume) to the open job posting. Once a resume is received, we start tracking the candidate’s progress and pass their information onto that group’s hiring manager if they meet the position qualifications. The hiring manager reviews the resume to gauge interest and applicability of skills for the role in question. They may also refer the candidate to another team if the skillset doesn’t match that team’s needs. Following the resume review, the hiring manager will decide if we should initiate the interview process.

Most engineering positions at FlightAware will include the following steps. Some positions, such as Software Engineering Interns, Site Reliability Engineers, Mobile Engineers, or Data Engineers may have slightly different interview processes.

Exploratory Talent Acquisition Interview

The first step in the interview process is a non-technical exploratory interview conducted by a recruiter. The interview is typically 15 - 20 minutes, and the candidate will be asked about their background and experience. This is our first opportunity to gauge culture fit, genuine interest in the role, and to rule out any non-technical disqualifiers. Candidates who have spent some time researching the company and the role they have applied for typically do well here.

Technical Interview

Once past the initial recruiter interview, we schedule the technical interview with the candidate, which typically takes up to 90 minutes. We utilize CoderPad to facilitate this exercise as we’ve found it to be reliable, configurable, and it supports a wide variety of languages. The questions we ask during this interview are selected and curated by a group of engineers at FlightAware. We spend a non-trivial amount of time reviewing the process and the questions, proposing new and refining existing questions to ensure that we can identify the strongest candidates. There is a mix of small algorithmic and real-world problems; the candidate is also asked to solve a programming problem similar to something we might deal with at FlightAware using a programming language of their choice. We expect that the candidate’s code will compile and produce a correct result.

Our intent is not to trip anyone up by providing trick questions or problems that only the most nimble and well-versed in their language's nuances can solve. We want candidates to demonstrate how they approach solving problems, reason through questions, weigh the benefits and drawbacks of possible solutions, and verify the correctness of their response.

Again, while we expect that a candidate will produce quality code that solves the problem, a broader focus is on how the candidate approaches the problem. It behooves a candidate to talk through their solution, approach, and articulate any issues they are struggling with so that we can better make that determination. In the absence of that insight, we might pass on a candidate who just made a simple mistake but never expressed their thought process, leading us to believe they were weaker than they may really be as a developer. Depending on the candidate’s experience and position for which they are applying, there may be some follow-up questions to further probe their breadth of experience and knowledge. This technical interview serves as a go/no-go decision point for the candidate. If the candidate performs well here, they will continue with the process, and if they fall short, we will provide a written communication to share a status update.

System Design Interview

The system design interview is a discussion-based interview typically lasting 60 minutes that will be included as part of the interview process for certain positions (including most senior individual contributor roles and leadership roles). During this interview, there is no coding required nor expected, but you may need to draw out diagrams or define database schemas. This interview is usually scheduled as part of the virtual “on-site” interview described in the next section but may also be scheduled separately prior to the final non-technical interviews.

At the beginning of the interview, you will be provided a vague or ambiguous problem and asked to design a FlightAware-style system. We are evaluating how you approach defining requirements, estimating resources, and designing the components, scaling and reliability of the system. As with the previous interviews, we are examining your thought process and interested in the questions you ask to develop this system.

Non-Technical Interviews

The final step of the interview process is a virtual "on-site" interview. The interview schedule consists of meeting with managers, peers, and leadership at FlightAware as we continue to evaluate cultural fit, behavioral characteristics, and overall interest in FlightAware. The interviews’ ordering may vary, but generally the first meeting will be with the hiring manager (and potentially team manager). This interview typically focuses on behavioral questions and assesses the candidate's motivation and interest in the position and company. Here you can expect questions about successes and failures in past employment, exposure to agile development practices, and experiences working in a team environment. There may be some high-level technical questions during this stage to further suss out a candidate’s level of experience with some technology.

The candidate will generally next meet with one or two peer members from the development team. The peer interviewers will show the candidate some of the projects they are working on and products they are responsible for. The idea is to show a candidate, as much as possible, what it’s like to actually work at FlightAware. This is a good opportunity for a candidate to engage and ask questions about our development process to better understand the role and what the day-to-day work looks like. At this point, we’re both trying to sell the candidate on the role and continuing to evaluate interest and exposure to development practices. There should be many opportunities for a candidate to talk about past work and how it is applicable or not to something being undertaken at FlightAware.

Some interview processes will also include a meeting with FlightAware leadership; historically this has consisted of a meeting with either an Engineering Director or the VP of Engineering. They most likely have communicated with those involved in the interview earlier to gauge performance and will be striving to make the last sales pitch to quality candidates and the final evaluation stage. These interviews are typically about 30 - 45 minutes each. The interviews typically pose some additional behavioral questions and high-level discussion about what environments in past positions have worked well for the candidate versus those that have not. This also serves as an opportunity for a candidate to ask questions and get to know the company from a broad perspective from those responsible for the larger direction of engineering at FlightAware.

At that point, the interview is completed, and the interview team will meet to discuss the candidate’s performance and hopefully decide to make an offer!

General Timeline

Candidates are usually notified within a few days of completing their technical interview if they are proceeding to the next step in the process. The actual scheduling of each interview can vary depending upon the candidate’s schedule as well as our engineers’ schedules. The timeframe for completing the entire process can vary and the recruiter will be in contact throughout the process to keep you informed and discuss upcoming steps.

The Best Candidates

Here is a list of common traits among the candidates who have done well and been extended offers.

Be Prepared

You might think this to be an obvious task that everyone undertakes before attending an interview, but we find it to not always be the case. Even those who prepare diligently for the interview's technical portions may not have spent time understanding what the company does or what the role may entail. This is frequently exposed in the questions, or lack of questions, asked by a candidate. Spending some time exploring the website and the engineering blog before attending the interview will ensure that you have at least a high-level grasp of what FlightAware is and what we do, which hopefully leads to some interesting questions about how we manage flight tracking.

For the technical interview, there are a myriad of resources available online to help job applicants prepare for technical interviews. Spending some time on those sites is likely to increase the odds that you perform well. The questions asked during interviews may seem artificial because they are unlikely to have a 1:1 relationship with the work you do on a daily basis or in school. Our questions are limited in scope and designed to give us insights into your thought process. We could give take-home problems with larger scope, but that would not address this key interview requirement. So, spending a little time familiarizing yourself with the types of questions asked in interviews is likely to put your mind more at ease and help you feel comfortable during this high-stress period, so you perform at your best.

Finally, candidates who have spent time thinking about the types of behavioral questions that arise in interviews are, not surprisingly, better prepared to answer those questions. It can be challenging when in the heat of the moment to come up with examples of times you failed or disagreed with your boss that showcase your ability to work through a difficult environment or relationship. That’s not the sort of thing you want to wing because you may find on reflection that your response didn’t cast you in a favorable light (e.g., I just bashed my former boss in front of my new prospective boss. We don’t know you or your former boss well enough to establish who was in the right or wrong, so it’s unlikely that is going to score you any points). The task of preparing for an interview is critical and requires the same dedication you bring to your work as a developer. Those interviewing you have not known you for years, so while preparing can feel like a job unto itself, failure to do so will be perceived as a lack of seriousness on your part.

Deliver Effective Communication

The best engineers at FlightAware are also excellent communicators. Candidates that demonstrate this ability are more likely to succeed at the company. The types of problems we are solving require the collaboration of multiple individuals and teams. For those efforts to succeed, all individuals involved need to be able to effectively communicate with one another. Therefore, as a candidate, we need to see that you possess that ability as well. Ensure that your responses to questions are reasoned and concise, addressing the full topic without becoming a filibuster. Convey your past contributions to significant projects espousing your technical understanding of the problem and its solution.

Show Enthusiasm

Some are aviation enthusiasts, while some are just passionate about software. During the interview, let the enthusiasm you possess for software engineering and our industry shine through.

What’s Next?

Go apply for a job with us! You may be thinking, “The interview process at FlightAware seems a bit daunting.” You’d be right. We’re serious about interviewing because a bad hire is not good for anyone involved. With that, however, we’re not looking for nor expect perfection in candidates. If you’re genuinely interested in a career here, spending the appropriate time to prepare for the interview is critical. We’re also constantly working to improve our interview process to ensure that each and every candidate has a positive interview experience. We’re looking for the brightest, most innovative minds to help build a more connected world. Go apply for a job with us!

Interviewing at FlightAware: A Guide to Joining Our Team was originally published in Angle of Attack.

Tips and Tricks for Winning your Next FlightAware Hackathon

Mon, 02 Oct 2023 11:23:49 -0500

FlightAware's yearly hackathons are a fun diversion from usual work where we can focus on just making something cool. They're also an opportunity to vie for dominance on the battlefield of public opinion! Coolest project wins.

I've always loved hackathons. It started in college, where a hackathon was just a good excuse to stay up all night with your friends drinking red bull, fooling around, and getting free swag. Maybe you coded up something impressive by the end, but your brain was probably too fried from sleep deprivation to care either way.

I was delighted, then, to discover when joining FlightAware that they also host a yearly employee hackathon. The 24-hour timeline is swapped for around 2 working days of hacking, but the spirit of the hackathon is still surprisingly consistent with what I remember from school: just build something cool. It doesn't need to be FlightAware's next big idea, it doesn't need any product potential, and it definitely doesn't need any tests or documentation.

At the end of each FlightAware hackathon, the teams present their projects to the rest of the company and then the participants vote on who had the best project (no voting for yourself of course). There's no real rubric for voting; people just pick the project they liked the most. The team receiving the most votes are crowned the winners and get bragging rights for the rest of the year.

I've had the good fortune of being on the winning team for the last few years and would like to share some tips I've developed over those years for how to take your hackathon game to the next level. I'll also be sharing some anecdotes from past projects to highlight the effectiveness of my strategy (I hope you'll see this more as a nostalgic trip down memory lane and less like brazen gloating).

The presentation matters as much as the code

I get it, we're engineers, we like to code. We don't so much like putting together and rehearsing PowerPoint presentations. "Isn't the point of a hackathon to hack?", you ask. Well, yes, but ultimately you're here because you want to win, right? If so, you're going to need to spend more than 15 or 20 minutes on creating the presentation. All your hard work will have gone to waste if you don't tell people about it.

For a technical analogy: think of the presentation like an interface, and your project as the implementation. No one's ever going to look at your awful hacked-together code, just as far fewer people work with a module's implementation versus its interface.

For a team that faced serious roadblocks, this is also your chance to tell your story. Share the approaches you tried that didn't work out (make sure to include silly screenshots if you've got them).

Be sure to also involve all team members in the presentation, with everyone getting a chance to speak, and rehearse it all the way through at least once.

The demo is the most important part of the presentation

Steve Jobs knew the value of a good live demo. As much as your presentation matters, the only thing people are probably going to remember from it is your demo. It directly answers the question: did you accomplish what you set out to do?

I use demo-ability as my guiding light when brainstorming hackathon projects. Having a good demo in mind can really give a project focus. I can always feel the excitement in the team build as step-by-step we get closer to something that really works. Picking a project, working on it for a couple days, and then scrambling to figure out how you're going to show it off right at the end isn't a recipe for victory.

Your demo should be carefully rehearsed. Don't just pick some random examples and cross your fingers, pick examples that show your project at its best! You likely missed plenty of edge cases when working on it; the worst time for those to rear their heads is during the demo. You don't want to be too misleading, but I think it's totally fine to tune your project to the specific use cases you want to show off.

Try to ensure your demo is memorable and stays on people's minds. One possibly risky approach could be to share a link to your project with your audience so they can engage and play with it themselves. This is particularly effective at FlightAware now that we're fully remote, meaning most people are watching the hackathon presentations at their computers. This does risk spoiling your careful rehearsing, though, and may readily show the cracks in your project.

The "crowd participation" approach didn't quite go to plan for my team's demo in 2021. Our project involved revamping FlightAware's internal "quip" system. Quips are how we keep track of the funny, embarrassing, or bitingly sarcastic things our colleagues have said that deserve to be memorialized (we have a corner of our operational dashboards dedicated to rotating through random quips). Previously, this was handled by an extremely barebones HTML form for submitting and a table for viewing. We thought it'd be fun to integrate quip submission more closely with Slack, as that's where most of them came from anyway. We introduced the ability to "quip" a message by reacting to it with a specific emoji. People quickly found the limits of this new quipping ability during our demo (reacting to bots, pictures, etc), and we had to do some brief live debugging to get things working again.

Nothing like some live debugging to really spice up your demo!

Another option is to inject some whimsy or humor into your demo. In 2022, my team worked on a project we called SongBird. It used the Spotify API to play music on www.flightaware.com flight pages based on the aircraft's location. We had various ideas for our musical choices when the plane was over land: artists born nearby, songs that mention the state in their title, etc, but we struggled with what to play for planes over oceans. We could have tuned our demo to avoid such a case, but instead we opted for an easter egg where oceanic flights would play Never Gonna Give You Up and Rickrolled the audience by demoing such a flight.

The lack of a pause button is a feature, not a bug.

Demo-driven development

Treating the demo as your deliverable rather than your code can be quite freeing! Cleanliness, tests, documentation, none of it matters in the face of creating something that has to work for just a few specific cases. In fact, they all drain precious time.

Do you need to expose a service on a local machine? Don't waste your time fiddling with firewalls, just sprinkle some ngrok on it. Does your app have a huge memory leak and crashes after 20 minutes? No problem, just restart it right before the demo.

Thinking back to the SongBird project I mentioned above: Spotify uses OAuth2.0 to control access to its API. The access token that Spotify generates as part of that flow is only valid for 1 hour, at which point you're expected to get a new one using a refresh token. OAuth2.0 is pretty much the last thing anyone wants to be thinking about when they've got 2 days to make something worth showing off. Instead, we just manually regenerated the access token as needed when it expired (and made sure to regenerate it right before the demo, just in case).

Conclusion

Much of this post was a bit tongue-in-cheek. The main focus of a hackathon should be about having fun, building cool stuff, and rekindling that startup spirit of moving fast and breaking things. Winning an ill-defined popularity contest is just a small cherry on top. In the end, it's hard to say whether 3 wins in a row is a fluke or if I've really got hackathons figured out now. After 4 wins, though; then I'll know for sure.

Tips and Tricks for Winning your Next FlightAware Hackathon was originally published in Angle of Attack.

2023 Intern Summer Projects -Part 3

Mon, 18 Sep 2023 11:28:08 -0500

Chadd Mikulin is the Vice President of Engineering at FlightAware. For over fifteen years, he has helped grow and promote leaders in the organizations of which he’s been part.

Every year, our interns spend the summer working on interesting and meaningful projects that help them learn how to work on a professional team and help us solve problems across our business. Jared's project was no exception. As we transition to a new set of technologies to deliver our web products, we needed to create a platform on which all of our behind-the-scenes administrative pages could live. Jared took on the task with aplomb, creating that framework and building its first tool, a new way for us to manage hex code mappings. We were excited to have him, and all our other interns, and hope you enjoy this final post in our series.

Jared Harvey

My Project

The system architecture for my project.

A sample GraphQL API query provided by Hasura.

The Mode S Management tool.

The edit form that I created for my project.

2023 Intern Summer Projects -Part 3 was originally published in Angle of Attack.

2023 Intern Summer Projects - Part 2

Mon, 11 Sep 2023 09:29:28 -0500

As the Senior Director of IT Operations & Reliability, Sean Kelly has been involved in the design & reliability of FlightAware's infrastructure and the adoption of Site Reliability Engineering.

We are excited to share the second project in our series showcasing the work from our class of Summer 2023 interns. Our summer internships aren't limited to just software engineering. This week, we will take a look at a project done on our Site Reliability Engineering team. This is the second year that we've had an SRE intern. Mentoring someone on the ways of Site Reliability Engineering, something not really covered in coursework, is gratifying and rewarding. Especially when they bring the same excitement and interest to the table. With that, let's take a look at Will's project.

William Burns

My Project

The project is organized into several distinct components, each with its specific role: NetBox, Capirca, Docker, and the deployment of the ACLs to a chosen Juniper SRX device.

NetBox

Capirca

Juniper PyEZ

Docker

Figure 1: NETWORK.net object generated using network prefixes from NetBox. Contains the IP addresses that will become part of the Address Book in the Access Control List.

Figure 2: SERVICES.svc object which contains port and protocol naming service definitions.

Figure 4: Example generated Access Control List

Figure 5: Log output for NetBox component

Figure 6: Log output for Capirca ACL generation component

Figure 7: Log ouput for Juniper PyEZ component. Example of pushing ACL to target device.

Conclusion

Thank you FlightAware!

2023 Intern Summer Projects - Part 2 was originally published in Angle of Attack.

2023 Intern Summer Projects - Part 1

Tue, 05 Sep 2023 10:38:32 -0500

Gabrielle Toutin is a Software Engineer on the Backend team at FlightAware. She contributes to software development efforts including flight maps, AeroAPI, and data feeds. In addition, she is the 2023 Intern Coordinator.

Adithya Chandrashekar

My Project

During this summer, I was fortunate to have completed two projects.

Project #1

My first project was rewriting Surface Monitor, an existing internal tool used to monitor the performance and health of Surface Fuser and Surface Combiner, in Python3.

Project #2

Attached below are 2 images of Flight TAM3343 with each image representing one go-around of the flight, and below those images are the emitted log messages from each go-around.

2023 Intern Summer Projects - Part 1 was originally published in Angle of Attack.

Data and the Power of Story

Mon, 07 Aug 2023 11:42:38 -0500

On a ruggedly mountainous Indonesian island called Sulawesi sits a cave containing what is possibly the earliest known example of human storytelling. According to uranium-series analysis, the below scene was etched into limestone some 44,000 years ago, and it shows us a group of eight beings in the act of communal hunting. These ancient discoveries often shine light onto our own experience and reveal an incredible, though unsurprising fact: human beings love stories, we’ve been that way for a long time, and there’s no indication it’s going to change!

In the 18 years since FlightAware was founded, the amount of data informing every aspect of our lives has grown dramatically, and our modern stories are often woven together with numbers and code as much as they are with words and pictures. FlightAware’s own evolution into the aviation intelligence company it is today cannot be separated from the core objective of making meaning from data. This began with basic flight tracking in 2005 and has expanded to cover almost every data point in the aviation ecosystem.

Not unlike the storytellers in that Indonesian cave, we find ourselves at a pivotal moment in history. We are standing at the foot of a colossal mountain of data, and the pressing question is less about the volume of data we have available, and more about how we can use it to tell meaningful stories—stories that empower the aviation industry to reach new heights of efficiency, safety, and innovation.

Data-Driven Storytelling: Shaping the Future of Aviation

As with any good story, preparation and planning are essential. This is particularly true when it comes to data, and it’s why FlightAware has spent years developing systems that guarantee accuracy and dependability in everything from Flight Tracking to Deep Learning Neural Networks. We do this in a number of ways, but much of it begins with our ADS-B network, which is a critical source of global aircraft positions. By fusing these positions together with hundreds of additional data feeds from around the world, we use our proprietary data engine—called HyperFeed—to integrate, process, and enrich that data in real time. This commitment to data quality through rigorous governance, stringent pedigree filters, and constant scrutiny ensures that we’re able to ‘tell the story’ of a flight and do it in a way that is both compelling and reliable.

The post-processed data that is then available to our engineering crews opens countless avenues for solving real-world problems. In particular, we’ve found great value in visualizing historical flight behavior in three dimensions using H3 (hexagonal hierarchical geospatial indexing). In short, this approach generates a grid of hex bins that are then overlayed for a defined geographical area. FlightAware recently used this approach to evaluate Singaporean airspace to determine high risk areas for drone operation which deconflicts aircraft and drone flightpaths. To do this, we isolate track data using a script that builds a bounding box for the established geographical region. We then extract parameters such as speed and altitude from our dataset and apply those to the hex bins (in this case average altitude per hex). Using Esri, we were able to construct a three dimensional view of the airspace that can be seen below. To plan for system redundancy, a similar approach is being used by FlightAware’s ADS-B engineering crew leveraging Uber’s H3 grid to evaluate areas of risk where coverage may be adequate, but we are dependent on a suboptimal number of ADS-B receivers. For more information about how we’re applying H3, you can visit this link to watch our joint webinar with Esri.

The beauty of data-driven storytelling lies not just in its visual appeal but also in its power to uncover hidden trends, detect anomalies, and predict future behavior. So, how do we do that? One particular focus is through FlightAware Foresight, which is a suite of Machine Learning tools that cover aircraft arrival predictions, taxi out times, and arrival runways. To learn more about the intricacies of our model architecture, check out this excellent article written by one of our very own engineers, Caroline Rodewig. But needless to say, FlightAware leverages a massive historical archive of flight metadata to train exceptionally accurate ML models for aviation, and we’ve accomplished a feat many have sought out since the advent of modern aviation. We predict critical flight events through a mix of Neural Network and LightGBM based models, and these predictions enable airline operators and global airport hubs to use FlightAware Foresight to drive decision-making in real-time and improve everything from crew utilization and gate readiness to fuel planning and route optimization.

Our approach to data visualization within the realm of machine learning has been particularly helpful in the storytelling process to clarify the tangible value of predictive aviation data. Through tools like FlightAware Foresight Labs and our interactive dashboards, we've made complex information accessible, engaging, and, most importantly, useful! As an example of this, in our latest Streamlit application for “what if my gate changes” users can view quantile estimates for global flights (see below). Within the realm of statistics, quantiles divide probability distributions into equally likely regions. As an example, the median is represented as the 50th percentile, which means 50% of observations in the distribution fall below that value. This functionality was not available in our earlier model architecture and it is now enabled through our Neural Network, supporting a multitude of use cases allowing operators and data scientists to essentially bias predictions to address specific areas of need (like making sure flights will not arrive later than a prediction). With tools like Streamlit, Grafana, Tableau, Esri, and many others, we are representing flight behaviors and trends in a way that’s easy to comprehend, which is critical in the constrained environment of modern-day operations.

Data Visualization: Unlocking Value

Decoding value from petabytes of aviation data does, however, present unique challenges. In its raw unformed state, this data is like a nebulous cloud of facts and figures, laden with possibilities but infused with complexity. Our challenge is wrangling this magnitude of data to isolate meaningful insights amidst the static. This looks different across our various engineering teams and ranges from building ETL pipelines for Machine Learning development, all the way to executing complex SQL queries across thousands of available data tables to produce reports for a variety of customers. In essentially every use case, the raw data must be carefully analyzed, cleansed, and translated into a comprehensible form before its true value can be revealed.

While visualizations are not the only means of getting value from raw data, as we’ve established, human beings are drawn to patterns and pictures. From ancient constellations mapped across the night sky to the user interfaces we carry in our pockets and wear on our wrists, we rely on visual representations to help us comprehend remarkably complex subjects. For example, we can reflect on Anscombe's Quartet (see figure below), a case of four datasets with identical statistical properties that, when visualized, reveal dramatically different relationships. This is a great example of how the power of visuals can help us interpret large amounts of information.

Anscombe's Quartet comprises four data sets that have nearly identical simple descriptive statistics yet have very different distributions and appear very different when graphed.

In his seminal work "Thinking Fast and Slow," psychologist Daniel Kahneman discusses two cognitive systems that steer much of our thoughts and behaviors. Dubbed ‘System 1’ and ‘System 2’, these work in tandem but have distinct roles. ‘System 1’ is the agile side of us, operating intuitively and quickly with little conscious effort, like a kind of autopilot. On the other hand, ‘System 2’ is our meticulous navigator, intensely allocating attention to complex mental operations that demand focus.

Skillful data communicators must understand these systems and how they work within our brains. What is perhaps surprising is that many analysts often unknowingly push their audiences to decipher data through the more demanding ‘System 2’. To illustrate this, let's refer to the concept of 'preattentive attributes,' as explained in "The Big Book of Dashboards".

Preattentive attributes are sensory signposts our brains immediately process before we consciously focus on anything. There's a whole array of these attributes, like color, size, length, and orientation, which act as subconscious cues. For instance, in the first diagram below, how many 9s are there? It seems like a simple question, but without a way to distinguish them, we involuntarily engage ‘System 2’, triggering a difficult mental computation.

However, as soon as we color-code the 9s, finding and counting them becomes a simple task. In this instance, color serves as a potent preattentive attribute. As we scan a scene or evaluate a chart, our brains process preattentive attributes in less than a second. At FlightAware, we’ve used this fundamental human behavior to influence the design of tools like our Foresight Dashboard, where colors and symbols indicate when FlightAware’s arrival predictions differ—early or late—from legacy sources of non-ML arrival times. This creates a nearly immediate recognition of flights that may demand action. Things like resources at a gate for early aircraft or managing tight connections for those expected to be late. The art and science of effectively communicating with data lies in leveraging these attributes, capitalizing on the mental strengths of efficiency and speed available through ‘System 1’.

Conclusion:

By prioritizing data accuracy, integrity, and visualization, FlightAware ensures that the stories being told are instrumental for enhancing efficiency and innovation throughout the aviation industry. The next phase of this commitment involves a focus on building performant visual tools that connect directly to FlightAware data through cloud infrastructure.

Our on-prem architecture reliably and securely manages nearly 7 petabytes of historical data. We absolutely must maintain this level of stability as we move forward and continue to innovate, and we’ve got to do this while simultaneously exploring ways to maximize performance through optimizations that cloud infrastructure offers, especially when querying large amounts of historical data. This is a primary reason we’ve begun experimenting with data replication in AWS Redshift to assess how it may be used to enhance our ability to isolate trends and patterns, and to do it more quickly than previously possible. We’re constantly evaluating how the tools and technologies we develop are being applied, and our users range from travelers looking for when to leave for the airport, all the way to the world's largest organizations who depend on our data for their operations.

As we’ve outlined, data is fundamental to FlightAware’s business; however, we don’t simply produce charts, CSV files, and predictions—we showcase the narratives that influence the future of air travel. From ancient etchings in the caves of Sulawesi to today's sophisticated technical landscape, the origins of human storytelling are elemental, and since its inception, FlightAware has been at the forefront of reimagining storytelling in the realm of aviation.

Reference Links:

Data and the Power of Story was originally published in Angle of Attack.

Next Generation Model Serving

Mon, 05 Jun 2023 11:15:21 -0500

As a Senior Software Engineer and lead of the Predictive Technology crew, Caroline Rodewig works to expand the scope and quality of FlightAware's predictive applications.

FlightAware’s Predictive Technologies team is responsible for training and serving machine learning models to predict key flight elements. Our primary product is ETA predictions – en-route landing time (ON) and gate arrival time (IN) predictions.

Model Server

Motivation

Producing real-time predictions requires two components: feature vector building and model evaluation (aka inference).

Our original infrastructure (the Python streamer) handled both of these in one place. It followed our flight data feed to build feature vectors and evaluated those vectors on models it loaded into memory. The streamer was not performant enough to run for all the airports we wished to make predictions for, so we ran many copies of it, with each streamer responsible for a subset of airports.

While this pattern worked for us for several years, it was not a good long-term solution for several reasons:

1. It was inefficient. Each streamer had to read the entire flight data stream to pick out data for a tiny fraction of flights. Each streamer was responsible for about 20 destination airports. FlightAware tracks flights into over 20,000 distinct destination airports, and 400 of those destinations average at least 100 flights per day. In the best-case scenario, each streamer was making use of only 5% of the flight data it read. Feature computation that could be cached across airports was recomputed instead. For example, we loaded GFS weather data on each streamer independently, when a lot of that processing could have been reused across destinations. It’s typically much faster to perform inference on feature vectors in parallel rather than sequentially – i.e., by batching several feature vectors and evaluating them together. Our architecture could only perform inference sequentially, and it would have taken significant work to change it to perform batch inferences.

2. It was difficult to manage. We were running 60 streamer instances (across 12 hosts) in each data center (120 streamers total). Managing this was unwieldy, and it added complexity to deployments, which took around 20 minutes as we staggered the restarts across hosts. It also just used a lot of hardware inefficiently. Delegating airports to streamers was a manual process and was prone to imbalance. It was possible that one streamer would run 20 huge airports and fall behind while another streamer was barely loaded with 20 tiny airports. Correcting these imbalances required manually copying model files to different hosts and hoping that the streamers would keep up when they loaded the new suite of models.

3. A future neural network model release would exacerbate both problems described above. We were working on training a single neural network ETA model to replace the many LightGBM models. Testing showed that it would increase our predictions' stability but would be significantly slower to evaluate. In the best-case scenario, inference on a neural network would take five times as long as a LightGBM model. To one-for-one replace our LightGBM models with a neural network, we would have needed to double or triple our hardware – an unacceptable solution.

4. It tied us to languages with strong machine learning support. The streamer needed to be written in a language which supported loading and evaluating a model; for the most part, that tied us to Python. As we added new model features, the performance limitations of Python became fairly apparent.

Solution

These problems were far from unique to our infrastructure. A typical pattern within the MLOps world is to split apart feature vector building and model evaluation. Rather than loading models into memory directly, the streamer would call out to a model server, which would perform feature vector batching and inference.

This architecture has a variety of benefits:

It enables independent scaling of feature vector building and model evaluation. For example, if model evaluation becomes more expensive, we can scale up the model server pool while leaving the number of streamers the same. We can even make scaling automatic by monitoring for increased model server response time.
It removes the language dependency from feature vector building. Rather than requiring machine learning language support, the streamers just need to be able to make asynchronous HTTP/gRPC requests.
It provides native dynamic batching support. As streamers make requests to the model server, it will dynamically assemble batches of requests to evaluate. The streamers don’t need any knowledge of this, which keeps them simple.
It moves us in the direction of service-based architecture. FlightAware’s Web and Backend teams have also been working on stack improvements with microservices, and recently completed a language assessment to begin development in earnest.

Triton Inference Server

We decided to use NVIDIA’s Triton Inference Server as our model server. It’s highly performant, well-optimized, and is particularly well-suited to our practice of training many models per product. It also supports a wide variety of machine learning and deep learning model backends, including TensorFlow, PyTorch, ONNX Runtime, and tree-based models (including XGBoost and LightGBM).

Triton is much simpler to set up than most other model servers; all you need to do is provide it with one or more properly configured model repositories. The repository must have a flat structure and each model directory must contain configuration for that model.

Triton supports API access through both HTTP and gRPC, using a standard API defined by KServe. The available APIs include:

check server health
list models available to the server and whether or not they're loaded
load model so that it can be used for inference
perform inference with provided input vector
view model configuration
view model statistics, including number of calls, total duration spent performing inference, and the time the last inference was made

Model ensembles

One of the key features of Triton is the ability to define an ensemble model. An ensemble is simply a DAG of other models within Triton, but which otherwise has the same properties as any other model. Independent steps can be run in parallel, and the output of one model can even be piped to the input of another model.

This is particularly useful for our LightGBM ETA models because the ON and IN models use exactly the same features. Using an ensemble allows us to send one request and get two predictions back, cutting the total number of requests we need to make in half.

Here’s an example ETAs ensemble model configuration. Note that the same input is sent to both ON and IN models and that both model outputs are made available in the final output.

name: "v3.0.0-KHOU" 
platform: "ensemble" 
max_batch_size : 10000 
input [ 
  { 
    name: "INPUT" 
    data_type: TYPE_FP32 
    dims: [ 515 ] 
  } 
] 
output [ 
  { 
    name: "EON" 
    data_type: TYPE_FP32 
    dims: [ 1 ] 
  }, 
  { 
    name: "EIN" 
    data_type: TYPE_FP32 
    dims: [ 1 ] 
  } 
] 
ensemble_scheduling { 
  step [ 
    { 
      model_name: "v3.0.0-KHOU-eon" 
      model_version: 1 
      input_map { 
        key: "X" 
        value: "INPUT" 
      } 
      output_map { 
        key: "variable" 
        value: "EON" 
      } 
    }, 
    { 
      model_name: "v3.0.0-KHOU-ein" 
      model_version: 1 
      input_map { 
        key: "X" 
        value: "INPUT" 
      } 
      output_map { 
        key: "variable" 
        value: "EIN" 
      } 
    } 
  ] 
}

Running Triton

We decided to run Triton in Kubernetes to take advantage of Kubernetes’ scaling and failover capabilities. Our deployment ended up being a simplified version of Triton’s Helm chart, with a load balancer service and a deployment of Triton pods. Each pod maintains its own volume with a model repository that is copied in on pod startup.

This service was FlightAware’s first production-critical service deployed on Kubernetes, so keeping things simple was top-priority. In the future, we intend to add:

A persistent volume with a model repository that is used by all Triton pods, and which can be deployed to via automation tools like Github Actions.
Horizontal pod autoscaling based on the servers' response and inference times. Our day-to-day load is typically stable but having autoscaling would protect us in case of air traffic spikes and general traffic growth.

Other model server contenders

There are a number of other open source model servers available, including Seldon Core, Ray Serve, TensorFlow Serving, TorchServe, BentoML, and KServe. We wanted a platform that was backend-agnostic and scaleable, so we compared Triton against Seldon Core and Ray Serve.

Seldon Core

Seldon Core is a Kubernetes-native solution that is used pretty broadly in MLOps infrastructures. It offers many neat features, including outlier detection, A/B testing, and sophisticated graph-based model ensembling.

Its main downside is its complexity – it has a steep learning curve, and I wasn’t convinced that the complexity/feature tradeoff would come out in our favor. Additionally, it is poorly suited to our practice of training hundreds/thousands of models. It seems targeted at running one model per Kubernetes pod (we would run out of pods pretty quickly).

Ray Serve

Ray is a framework that is designed to scale Python applications – it feels a lot like ‘Kubernetes for Python.’ It allows you to submit jobs to a cluster, scale them up and down, and monitor them easily. We are already using Ray for training certain models and are hoping to expand its use. Ray Serve is a library built on top of Ray which provides some conveniences when it comes to model serving.

While Ray Serve could have worked for us, to use it we would have had to implement a lot of infrastructure that other platforms already offer – rolling our own platform seemed foolish when there are other, more sophisticated options out there.

Streamer Rewrite

Motivation

The naive approach to using the model server was to simply replace the streamers' in-memory model evaluation with an HTTP call. Unfortunately, this approach was unable to keep up in real-time, even when running for only twenty airports.

You might be thinking, ‘but shouldn’t the model server make inference faster? If it doesn’t, why bother with all this?'

The answer to that is that Triton is optimized for handling many parallel requests. Each individual request will take longer to compute, but Triton can evaluate models massively in parallel (think of those 10k batch sizes). The streamers did use async for building feature vectors but were limited to processing a single input line at a time, which meant that they would only send one inference request at a time.

Changing this architecture would have required a fairly major rewrite. It would have been possible, and if there hadn’t been other things we wanted to change about the streamer, it’s what we would have done. The most efficient way to use Triton is to make ON and IN predictions simultaneously, in one request. However, our streamers were optimized for producing either ON or IN predictions, but not at the same time. Changing this would have required additional streamer rearchitecting as well as changes to the deployment process. In addition, we had been struggling with the performance of the streamer (and generally the limitations of Python). The ‘hacks’ that made it fast enough for production had also made the code much more difficult to reason about. The code would have only become more complex when reworked to send parallel inference requests. We had already been considering rewriting the streamer in a faster language, especially as we considered adding more complex feature sets to our models. It would be trivial to validate that a rewritten streamer is accurate, as we could simply compare feature vectors between our ETL and different streamer implementations. Ultimately, a streamer rewrite was on the horizon, independent of the new model server. Why waste time on a massive overhaul of the Python streamer now, when we wanted to rewrite it soon anyway?

Solution: Rust

We selected Rust for our high-performance language replacement. It had been used to great success by the Flight Tracking team for high-volume stream processing, and offers the usual benefits of performance, memory safety, and concurrency support. Preliminary testing showed it could easily handle our average rates of 3.5k input messages per second and 1k output messages per second, with significant headroom for catchup should there be an outage in the input data stream.

Post-deployment

This new architecture was released early March 2023. Upon release, it used a single machine for the streamer and three Triton replicas in Kubernetes, which could easily fit onto two nodes. This cut our hardware usage by 75% while still quadrupling our catchup rate.

Since the initial release, we have made several additional changes:

We released our new multi-airport neural network model. Accommodating the new model was as simple as changing three configuration values: increasing the number of Triton replicas, increasing the number of connections from the streamer to Triton, and increasing the number of concurrent inference requests made by the streamer to Triton.
We rearchitected our taxi-out prediction infrastructure to match the Rust streamer + Triton pattern.

Over time, we plan to increase the sophistication of our Triton deployments, namely via load-based autoscaling and moving our model repository to a persistent volume (potentially with an MLFlow integration). That said, the shift to Triton has already enabled the release of the neural network model and drastically increased our fault-tolerance and failure recovery capabilities.

I am excited to see what else we can do with Triton and what the future of this project is.

Next Generation Model Serving was originally published in Angle of Attack.

Onboarding at FlightAware

Mon, 01 May 2023 14:14:38 -0500

By Samantha Turnage with Foreword by Chadd Mikulin.

As a Software Engineer on FlightAware’s Mobile team, Samantha Turnage works to refine and maintain the FlightAware iOS app.

At FlightAware, onboarding is not just orientation, it's much more. We've iterated on our onboarding process over the years to make sure that we're setting a new team member up for success and helping them get integrated quickly. We have Preflight Checklists for the new hire, enumerating what needs to be done their first day, week, month, even year, so there's never a question of what they should be doing. We also see the value of assigning a mentor to a new hire, aside from their manager, to help guide them in their new role. All parts of the onboarding process help a new hire get to the goal of committing production code in their first week, making them feel productive and reinforcing our commitment to delivery. –Chadd Mikulin, VP Engineering

Joining a new company brings opportunities to work on new projects, meet new colleagues, learn the company culture, and navigate its unique processes. Generally, this should be an exciting and engaging process for the people involved. What makes this possible is a good onboarding and training process to help new hires feel more engaged, connected, and supported in their new role.

Recently, I was hired on the Mobile team at FlightAware. I found the onboarding process was not quite what I expected, but I definitely enjoyed it, so I wanted to share about it here. I went through the process twice, once as an intern and once as a new hire, and both experiences were very similar. Hopefully, this can give any incoming new hires some insight into what they can expect.

First Day: What Happened?

As I had been sent a schedule about a week before my start date, I was already aware of the numerous meetings that were going to take place. There were multiple meetings that involved setting up all my work accounts to ensure I had access to everything I needed. Although this was a tedious process, I’m glad I had access to everything up front, so I wasn’t battling any accounts later on. Next, I attended a meeting where all recent new hires were “introduced” to FlightAware. During the event, we learned more about FlightAware's mission statement, the "big picture", and had the opportunity to meet some of the other new employees.

The latter half of my day was more work oriented. This was when I met with my Wing Lead. As part of the get-to-know-you conversation, we discussed some of my upcoming work that was already planned out. Here, I received a "new-hire checklist" explaining what I should expect for the next 6 months - 1 year during onboarding, as well as some supplemental learning materials. Moreover, we discussed projects I could contribute to during the year that were outlined on a projects/tasks list I was given, as well as specific tickets I could tackle on my own. Those tickets were ordered by an increasing level of difficulty that I could choose to do in any order I wanted. My first few contributions to the team would be small, but I was delighted to hear that I could quickly become a part of it.

The last meeting of the day was with my assigned mentor. As an intern, this was a casual conversation where we discussed my summer project, how things worked at FlightAware, and any questions I had from the day. As a new hire, this meeting was to catch me up on everything that happened between my internship and my start date. So far, I’ve found that having a mentor has been valuable for someone new to the industry. Not only has the extra guidance and support been reassuring, but I have found that this has helped fine-tune the skills I need to be successful in my role and provided a lot more opportunities for growth and development as an engineer. Certainly, I can ask anyone for help if I need it, but having someone specifically assigned to answer my questions has been effective in keeping me engaged and setting me up for success. Not only that, but I’ve been able to learn a lot through pair programming and code review sessions.

The “New-hire Checklist”

While I have really enjoyed the mentoring aspect of the onboarding process, I want to talk a little more about the checklist I was given by my Wing Lead. Generally, I enjoy having checklists if I have a lot to do, so I was happy to see that they are often used at FlightAware. The one I received was long, but not at all in a bad way. Now, you’re probably sitting there thinking “Why would anyone be happy about a long list of to-dos?” Well, I can say that most of what was on it was to help me actively learn about the company rather than passively learning through only reading and not contributing for the first few months. Don’t worry if you find yourself thinking “Dang it, I really wanted to read a lot”. There was still reading involved.

One of the larger items on my checklist was to complete a set of introductory exercises. These were simple coding exercises intended to help in becoming oriented with the code base, understanding how to commit to our repository, and getting code reviewed by the team. For anyone wanting to read, here’s where that happens. This required a lot of reading code and additional documentation on how everything is set up and organized to be able to figure out where everything goes. These were purposefully at my own pace to give me a lot of time to ask questions and for any clarification when needed. Of course, a few intro exercises are not going to be enough to make anyone an immediate expert, but it was a nice warm-up that helped me make sure I had all my source control tools configured.

In addition to those exercises, it was recommended to schedule recurring meetings with my wing lead and mentor to make sure any questions and concerns were being addressed. This time was also for getting feedback on the work I was doing for the tickets I got assigned, as well as discussing my goals for the year and how I can be successful. The first few meetings I had were generally for discussing major customers, FA infrastructure, and crash courses on things specific to the mobile team. Overall, these meetings are something I particularly appreciate, since it helps me understand what I’m doing well, and where I can improve, as well as ensure that my contributions are making a positive impact.

Conclusion

The onboarding process at FlightAware helped me understand the company’s values and mission, and how my role fits into the bigger picture. I’ve received mentorship that has helped me build relationships and feel more connected to the company. I’ve also received training and feedback on the specific skills and knowledge that I need to be successful, which has helped me feel more confident in my abilities and be more productive on a day-to-day basis.

Onboarding at FlightAware was originally published in Angle of Attack.

FlightAware Interns

Tue, 11 Apr 2023 10:11:41 -0500

By Gabrielle Toutin with Foreword by Chadd Mikulin.

Gabrielle Toutin is a Software Engineer on the Backend team at FlightAware. She contributes to software development efforts including AeroAPI, Firehose, and the data feeds that FlightAware ingests. In addition, she is the 2022 Intern Coordinator.

FlightAware's summer intern program offers a number of benefits for both the company and the interns involved. For FlightAware, the program allows us to identify and recruit top talent, as well as gain fresh perspectives and insights from the next generation of engineers. In addition, the program allows FlightAware to build relationships with top universities and showcase our company culture and values. For interns, the program provides invaluable hands-on experience in a real-world environment, exposure to a wide variety of technology and industry best practices, and opportunities for mentorship and networking. Moreover, interns gain insights into the aviation industry and get a chance to work on innovative projects that have the potential to shape the future of aviation. Overall, FlightAware's summer intern program is a win-win situation for both the company and the interns involved, providing a unique and rewarding experience for all. -Chadd Mikulin, VP Engineering

Welcome Class of 2023

As we prepare to welcome our 2023 interns, we wanted to give you more insight into the Internship program at FlightAware! Every summer, we have a new class of interns, from many different schools across the country, that will each join one of the many crews across FlightAware engineering. Each intern is assigned a project to work on all summer, with the goal of having it production-ready by the end of their internship. They are considered members of the team and participate in the same daily events as full-time employees, such as 1:1s with their managers and daily standups. Their internship experience is supplemented with various learning sessions and social events throughout the summer. Plus, they get a FlightAware engineer on their team as a dedicated mentor.

Intern-mentor Relationship

The intern-mentor relationship is a symbiotic relationship. It offers help for interns and leadership experience for mentors. The mentor gets an opportunity to guide both the intern and the direction of their project. In turn, the intern has access to a software professional for brainstorming ideas, debugging help, career advice, and more. By the end of the internship, they will have gained a deep knowledge of a FlightAware system. Since they will have also gained some familiarity with their crew’s tech stack and products, a FlightAware intern is well-positioned for a strong start to their full-time career. We have several engineers that started out as FlightAware interns, two from the last two classes of summer interns. However, even if they do not come back to FlightAware to begin their careers, our goal is to provide an internship experience that will prepare them well.

Recruiting

Recruiting for interns is a long process that starts nearly a year before they join the team. FlightAware engineers and HR staff talk to potential interns at career fairs at schools across the country. We encourage the students to apply to the team, or teams, that most fit their skills and interests. Interviews typically take place through the fall and into the early spring. We are looking for interns who want to learn, who take and internalize feedback well, and have a great attitude. A successful intern is one who will learn from engineers over the summer, so they need to be teachable, as well.

Work and Play

The intern program is a mix of professional development, project demos, and social events. Each week we alternate between a social event and a learning session. There are interim demos scheduled throughout the summer so that they can show progress, with the internship culminating in a final demo presented to the entire company. You can get more insight by reading this post made by our 2022 interns! The point of the interim demo is to show how the project is going, share knowledge across teams, show off accomplishments, and let the intern practice explaining their work before the final demo. The learning sessions discuss FlightAware systems, general software engineering topics, and career advice. The social sessions are for the interns to make friends and network with other interns and FlightAware employees.

Feedback

Intern feedback is very valuable and we are using it to continuously improve the program. We took feedback from the intern experience survey last year and applied them directly to this year’s program. Like FlightAware software, we are iteratively improving the program on an ongoing basis. Each year will improve upon the last so that we can deliver the best internship program possible and make it a meaningful and fun experience for both our interns and our employees.

Conclusion

We can’t tell you how excited we are to have our 2023 interns starting soon. We look forward to the contributions they will provide and the exchange of knowledge that will continue to help FlightAware grow. We are fortunate to have some of the best young minds in the industry. And, if you are interested, we’d love to talk about you joining a future intern class too!

FlightAware Interns was originally published in Angle of Attack.

Automation in the FlightAware Desktop Department

Tue, 07 Mar 2023 09:22:44 -0600

Derek Wilkinson is part of the Desktop Engineer Team at FlightAware. He provides support for internal IT across the company.

Most opportunities for automation are discovered during day-to-day activity and recognizing repetitive process. In addition to supporting all internal IT needs for FlightAware users, the Desktop team is always looking for ways to improve processes, including automating repetitive tasks to reduce human error and to save time. One of these tasks is onboarding and off-boarding users. This is a very important business process, but it can be very tedious and time consuming, which makes it a perfect candidate for automation. As I became more familiar with the processes, I was able to identify which steps would be eligible for automation and which would continue to require manual intervention.

Office365

I initially began with the Office 365 process of the user onboarding, which includes creating an account, applying a license, and adding the user to the correct mail groups. For this process, I used Microsoft PowerShell with the ExchangeOnline module. I have written plenty of PowerShell scripts, including scripts using the ExchangeOnline module, but those had mainly been smaller scripts with one or two commands, not a complete user onboarding. I began writing the script using a combination of my previous knowledge of the module commands and filling in the blanks with pseudocode. After I had the flow of the script, I began to refine the processes and replace any pseudocode with the correct commands. Thankfully, Microsoft has some great reference articles for the PowerShell commands used, which can be found here. I now had a script that would create and configure a new user in Office 365, but the script was still missing some of the desired functionality – namely, copying mailing lists and groups. Unfortunately, there isn’t a simple way to copy groups from one user to another. I did some research online and found other users who desired the same functions and found workarounds to get the desired actions. For example, there is not a command to find which Office 365 groups a user is a member of. However, you can query the members of an Office 365 group if you have the address of the group. In the script, I ran a command to get a list of all the Office 365 groups in our tenant:

I then used a "for each” loop to get the email addresses contained in the selected group, and then ran a command to check if the list of email addresses contained the email address of the user we are copying from with the following:

If the list contains the email address, it is added to a “groups to add” list. Once the loop is completed, the user is added to each of the groups in the groups to add list. With workarounds like this, I was able to complete the Office 365 onboarding automation, making it much quicker and easier to add and configure new users to our environment.

PSQL

But why stop there? I continued to look at the onboarding process for other steps we could automate. It was not yet important to determine how a task could be automated, but if it could or should be automated. Some other systems involved with new user setup were eligible for automation, so I began to research how to do these steps. All our users are added to a PSQL database table. Previously, they were added by connecting to the database and manually entering information provided by Human Resources. Using Python and some additional libraries, I was able to create a simple script that would prompt the user for the new employee’s information and then use their credentials to connect to the database and run the generated PSQL queries. This worked well, but it lacked error checking. What if another employee already existed with the same name? To prevent the existing user’s data from being overwritten, I added a step that would check for an existing user with the same information. A PSQL query is performed:

If a result was found, a user with the new user’s ID already exists in the table, and the script will terminate and output an error to the user.

What if the user needed different information entered than the standard input? What if the user had entered some incorrect information without realizing it? For these two instances, I added a function for the script to output the PSQL query and ask the user to verify if this was the query they wanted to run. If the user selected “no,” they are given the option to change the variables and create a new query or have the option to exit the script.

Mailman

The last step of the process was our mailing list, hosted in Mailman. Thankfully, we already had an existing webhook that could be used to add users to the mailing lists. Using the requests library, I was able to define the header and data to send to the webhook, making Mailman list management simple. To determine to which lists the user should be added, a list was defined based on each FlightAware department. While running the script, the Mailman lists, determined by the new employee's department, are displayed for the user to approve. If the user running the script wants to manually select Mailman lists, they could enter “manual” and use manual selection mode. This mode displays the available Mailman lists to choose from and asks the user to enter a list. Once the user has entered all their lists, they can enter “done” to continue, or “reset” if they would like to clear the selected list. Once the lists were determined, the selected lists and new employee’s email are passed to the Mailman webhook and the response is formatted and outputted to the user.

If a result was found, a user with the new user’s ID already exists in the table, and the script will terminate and output an error to the user.

Peer Review

At this point, the script was essentially finished. The processes were now automated with the script, and the project was ready for peer review. I sent the code and working copy to the other members of my team, who tested it in a non-production environment and worked to find bugs or steps that do not make sense. This is an essential step before moving to production, as the developer might not be able to find potential bugs while doing their own run through. Having a good peer review also helps with getting perspective of other team members, as they might make usability observations that the main developer can miss. Based on the received feedback, I made small changes to the script, corrected some bugs, and asked the team for a final peer review. After getting their approval, the project was ready for production.

To distribute the project, which required Python, PowerShell, and multiple libraries, I investigated Docker. Docker is a popular program that makes it easy to create containers for simplified distribution and is used widely in FlightAware. Although I was new to Docker, I was able to easily create a file, called a “Dockerfile,” that contained the details of the base image of the container and its dependencies. If you want to learn more about Dockerfiles, the documentation can be found here. On a Linux host with Docker installed, I was able to copy the Dockerfile and run the “docker build” command to create the container. This container was set to start the new user script automatically. To run the container, the user simply needed to SSH to the host server, run the docker run command (docker run -it fa/newuser) and then follow the prompts in the script.

Adding a GUI

Despite the script working fine at this point, I wanted to improve the user experience. The script was completely run through the terminal, with limited options and increased work for the user to change them. I began to research an easy way to add an interface to the main Python script and came across PyWebIO. After importing the library and making some small changes with the input and output lines of the script, I had a working web interface. This upgrade to the script really changed the whole project, leading me to think of more scripts and processes that could be changed from text-based scripts to scripts that can be run from a web portal.

I knew that the first requirement would be security. It was important to restrict access to the scripts to authorized users only. To accomplish this, I setup a SQLite database and used the bcrypt Python library to salt and hash passwords upon new user creation and decrypt them during login. I wrote a simple script in Python that generates the SQLite database, then asks for a username and password to create the initial admin user. Using PyWebIO and Python, I created a user portal where, using the web interface, a user could create other users, modify existing users, and run user history audits of actions performed from the portal. I then created another column in the SQLite database for user type to allow admin users and regular users. Admin users can run all the scripts, create and remove users, run audits, and change passwords for any user. Regular users have a limited number of scripts they can run and can only reset their own passwords. Allowing any user to have an account in the scripts web portal, not just the Desktop team, further expanded the possibilities of this project.

Polishing the Interface

Now that the user onboarding script had a web interface, I worked on improving the user experience. By default, output was displayed in plain black text on a white background. I went through the script and used PyWebIO’s output formatting functions to change the output where appropriate, from plain output to the informational output (beige background), success output (green background), warning output (light yellow background) or error output (red background). This might sound like a small change, but it makes a big difference in making the status of a function much clearer for a user. I also worked on having the interface scroll to the part of the browser window with the latest input, outputting the log file to a scrollable box, and the creation of a dropdown menu for user selection, instead of displaying a list of users and requiring the user to enter the username. Next, I updated all parts of the script that required a “yes or no" input. Previously, the user would have to type in yes or no (or, y or n). After making the updates, users had a yes or no button to select from. This interface improvement also cut down on the required code, as there was no longer a need to verify that the user was entering a valid form of yes or no.

From Onboarding to Off-boarding

Now that I was happy with the end user experience of the onboarding process, I worked to add our off-boarding scripts to the scripts portal. This process was relatively easy, as most of the functions were similar to the onboarding functions and just required changing certain commands from adding to removing the user. For example, instead of using “Add-UnifiedGroupLinks” to add a user to an Office 365 group with PowerShell, “Remove-UnifiedGroupLinks” is used. After adding the off-boarding functionality and making sure the interface was clean and functional, it was time for one of the most important tasks of all: documentation.

Documentation

To document the project, I started with two sections: one on how to use the scripts portal, and another section on how the script portal works. The first section contains documentation pages on each part of the scripts portal, including logging in, creating and managing users, and using the onboarding and off-boarding functions. I included screenshots and clear instructions on how to perform the desired functions. I believe that it is important to try to make the documentation as easy to follow as possible; otherwise, users might become frustrated and not even want to use software. After completing the documentation on how to use the script, I proceeded with the documentation on how the script works. This section was much more time consuming, as I went in depth and documented each function of each script. I created a page for each included script (eight Python scripts, two PowerShell scripts, two Docker related files, and a page explaining additional files used by the script). On each page, I created a section describing the purpose of the script, the libraries imported, any functions defined, variables or constants defined, and the functions performed by the script. When documenting a specific function within a script, I described the actions performed by the function, the parameters used, any variables returned, and variables used in the function. When describing this information, I provided the name of the item being described and why it was used. For example, if using a variable named “log,” I added the description “the opened log file.” My main goal with this part of the documentation was to make sure that the scripts would remain useful for years to come, even if they have a new maintainer.

Conclusion

To say that this script was a journey would be an understatement. What began as a process to automate Office 365 onboarding functions became a full-fledged web portal for running scripts that were previously shell-based or didn't exist at all. I believe that in this regard, this project reflects the spirit of FlightAware. What began as a way for Daniel Baker to share his location with friends and family while flying became a major force in aviation, being used by millions worldwide. With technology, things can always be improved; all it takes is the desire to make it happen.

Automation in the FlightAware Desktop Department was originally published in Angle of Attack.

The Best Tool for the Job: Assessing Languages for Microservice Development at FlightAware

Mon, 06 Feb 2023 10:35:31 -0600

Most of the posts on the Angle of Attack blog look to the past. "This is how we solved a hard technical problem;" "here's how we designed a product;" "here's how I optimized something." This post takes a different approach, as it speaks more to what FlightAware will do rather than what it has done. My hope is that it will give readers some insight into how significant technical decisions are made at FlightAware.

I recently wrote an internal document that examines the viability of a few different programming languages for writing medium-to-high level volume microservices, which I have adapted for this article. This was one part of a larger assessment performed across my entire division.

The Assessment

When assessing potential languages for implementing new web services in the Backend wing, three languages felt like natural candidates: Python, Go, and Rust. These languages share a few traits: large and enthusiastic communities, strong tooling, and technical advocates already present at FlightAware (sorry, I'm sure Java and .NET are great, but without existing expertise, adoption is an uphill battle). What follows is a broad survey of the languages' strengths and weaknesses and an attempt to capture the tradeoffs between them in the context of web service development. This rubric will be considered at the conclusion of each section, with scores ranging from 1-5 for each category:

Performance
Safety
Learning Curve
Cloud/Library Support
Tooling

Python

Python may at first seem like the ideal choice for web services. The language boasts huge popularity in the software industry, resulting in a broad ecosystem of libraries and language tooling. FlightAware has also already invested significant effort in Python education, and our own Python codebases are growing (and will continue do so regardless of the outcome of this assessment). More amorphously, people just like writing in Python. Personally, I've found that Python does a really good job as a language of staying out of the way when trying to map thoughts to code and understand how to solve a problem.

There are two main areas where Python falls short, though, both arising from its dynamic nature: performance and safety.

Performance

As an interpreted language, Python's single-threaded performance has never been terribly good. There are various approaches that try to alleviate this issue, often by JITting hot code paths or by reimplementing them as native extensions. There's also an ongoing effort to improve the primary Python interpreter's performance, which has already born some fruit with its recent 3.11 release. It would probably be best not to make decisions based on hypothetical future performance, though. It's also unlikely that Python will ever get close to the performance of our other two candidates anyway.

I would describe Python's parallelism story as "unspecial" (certainly when compared to our other two choices). It has the usual collection of synchronization primitives (locks, queues, semaphores, etc.). These are made a bit easier to use correctly through Python's context manager syntax (the with keyword), but it's all still a bit fraught. Python is also limited in the sense that it only has process-level parallelism. This makes it difficult to share memory between distinct tasks (though you can always send data through queues and even use the fairly new shared_memory module in a pinch).

The poor single-threaded performance of Python can also contribute to system complexity as approaches like parallelism or queueing must be leveraged more readily to try to sidestep the problem. This ends up making the system harder to reason about and debug.

Safety

This property could perhaps be restated as, "how likely are you to know that code will or won't work before running it?" Ideally, we could have computers help us with this effort. Historically, Python, a dynamically typed language, has been quite unsafe. If you pass the wrong type of object to a function that doesn't expect it, you won't realize it until your code is running - possibly in production.

Tools capable of statically analyzing your code like pylint or pytype can help avoid these issues, but they're not perfect. Recently, optional support for static type annotations in the language has improved safety quite a bit. The key term here is "optional," though. Nothing's stopping you from running code that lacks type annotations, and there are still plenty of libraries that lack annotations. The ergonomics of Python's typing-related APIs were a bit rough at first but have improved greatly with subsequent releases (and are typically backported so you don't have to upgrade Python to get them).

Other stuff

Python's got a lot of good stuff going for it, and since it's effectively the only language of its type in this assessment (we're not looking at other interpreted languages like Ruby, TypeScript, or PHP), I chose to focus on its clearest shortcomings to better motivate the presence of Go and Rust in the comparison. It's worth covering some of Python's other attributes, though, for the sake of completeness.

Learning curve

Python has the gentlest of learning curves, with it often being the first language taught to novice programmers. Its tutorial is strong and I've always felt that its reference documentation is quite comprehensive. There are also myriad online resources covering Python (for better or for worse); I've found that Real Python has some quality in-depth articles on tricky Python features. The presence of modules like json and datetime in Python's standard library also contribute to it feeling like a language you can quickly become productive with; batteries are included!

About those batteries

Though Python has the reputation as a "batteries included" language with a strong standard library, this doesn't quite apply to web services. Due to Python's lack of a production HTTP server and the proliferation of 3rd party options, the WSGI standard was defined as an interface between web servers and Python web frameworks. This brings with it the usual tradeoff of having to spend the effort to pick both sides of the equation (server and framework) in exchange for a more adaptable ecosystem. These days WSGI has been succeeded by its async progeny ASGI, and you tend to have some combination of (uvicorn/nginx unit/daphne/...) x (fastapi/quart/falcon/...). There's also the somewhat unique case of Sanic, which is an ASGI-compatible framework that also has its own server baked in. Regardless of what you choose, you're still running just the one server, but the architecture all feels more complicated due to the proliferation of options.

Cloud / Library support

Due to Python's widespread use in the software industry, it's extremely likely that if there's a service you want to use from Python, a library to use said service already exists. This is true for Cloud platforms (where Python has first class support, both with SDKs and in serverless runtimes) and any given SaaS you can think of.

Tooling

Python has plenty of quality tools for profiling, formatting, linting, etc.; it's just a matter of finding them. The trouble is that much of Python's tooling is 3rd party, and there are often plenty of competing approaches. Consider this dauntingly titled article assessing the state of something like 20 different linters: https://inventwithpython.com/blog/2022/11/19/python-linter-comparison-2022-pylint-vs-pyflakes-vs-flake8-vs-autopep8-vs-bandit-vs-prospector-vs-pylama-vs-pyroma-vs-black-vs-mypy-vs-radon-vs-mccabe/. Even Python package management has at least 3 competing libraries (4 if you include the underlying official pip tool that most rely on, 5 if you count conda, 6 if you...). Speaking of package management, that seems to be a particularly sore spot for the community to harp on, though I haven't experienced frustrating issues with it. Perhaps it's more apparent when working with native libraries.

One area where Python suffers due to its lack of official tooling is in the inconsistent documentation for 3rd party libraries. Although many projects use Sphinx and are found at readthedocs.io, this is not broadly true, which can make it a jarring experience to switch between documentation for various libraries. I never realized how good things could be until I was exposed to Go's https://pkg.go.dev and Rust's https://docs.rs.

Conclusion

So, the rubric for Python:

Performance: 2
Safety: 3
Learning Curve: 5
Cloud/Library Support: 5
Tooling: 4

Go(lang)

So, Python's slow and a bit unsafe; what are we to do? Let's throw some compiled, strongly typed code into the mix! Go came to the scene in 2012 touting garbage collection, language-level concurrency, and much more.

Essentially, Go is fast to learn, fast to build, and fast to run. It's become fairly popular for the development of high-performance web services (which makes sense, that's what Google designed it for!), with many companies transitioning to it from slower interpreted languages like Python. You can find various posts online highlighting such transitions. I think Go's approach to structural typing makes it particularly appealing when considering a transition from a dynamic language like Python whose duck typing is quite similar in nature.

Go's philosophy

There's no shortage of talks and posts from Go's creators about why they designed it the way they did (here are but a few: Go at Google: Language Design in the Service of Software Engineering, GopherCon 2015 Keynote, dotGo 2015 - Simplicity is Complicated). The designers make it clear that they were coming from a background of C/C++ and were mainly focused on improving that experience in the context of web services. They explicitly mention the following needs as primary drivers of Go's development:

Developing "at scale" (referring to both headcount and dependency count)
Programmer familiarity, so new developers can be brought up quickly
"Modernity" – specifically, taking advantage of multicore CPUs in a highly networked environment

Out of these needs arose the following language properties:

A spare (austere, even) syntax that eschews most features of functional programming
Intentionally minimal ways to perform a given task (just one type of looping construct?!)
Only one way of formatting your code (with no knobs to tune)
Documentation as a first-class citizen
Extremely fast compilation times
Easy dependency management
Composable interfaces instead of object orientation
Green threads (goroutines) as a language construct
Sizable standard library built with web services in mind
First-class first party tooling

One thing I'd like to highlight from the above list is how much effort Go's developers put into parts of Go that are adjacent to the language design itself. It's clear that they valued the entire developer experience: writing code, reading code, building it, profiling it, etc. Go's web-focused standard library also makes it easy to get a service up and running without needing to reach for too many dependencies early on. This is a big reduction in cognitive load as developers spend less time surveying 3rd-party libraries to get what they need. I was also struck by some of the other modules you can find in Go's standard library, such as testing/quick for property-based testing, and net/pprof, which can run a server for exporting profile and trace data (this is the magic of including a production grade webserver in your standard library).

Go's ecosystem

Go has taken the cloud by storm. Docker and Kubernetes (the latter having come straight from Google) are probably the highest profile examples of Go's outsized presence in the cloud native ecosystem, but there are oodles of other widely used services implemented in Go (Prometheus, Grafana, Traefik, Caddy - the list is extensive). It's no exaggeration to say that many businesses have been built on Go. Go's prevalence in the cloud means it has first class SDK support for any given cloud vendor, but it also has a pretty good chance of having an official SDK from . Chargebee and PagerDuty are two examples of services we use internally that I was able to readily find. Admittedly, most of these language-specific SDKs are just wrapping a standard HTTP API and were likely automatically generated, but I think it speaks to Go's hold in the industry that these platforms explicitly choose to support it.

Go's shortcomings

Go has managed to amass its fair share of detractors. They tend to focus on 3 main categories: Go's opinionated design, Go's weak type system, and Go's resistance to extensibility. There's also a grab bag of papercuts and inconsistencies in the language that tend to stand in stark contrast to Go's mantra of simplicity.

Opinionated design

If you want only one way to do something, you must pick a way and stick with it. There are going to be people who'd much rather do things another way, maybe for very good reasons. Go's designers decided that those reasons typically didn't merit having multiple ways to do something. Go's strong focus on procedural programming over functional programming paradigms is an example. If you're thinking about operating on a collection of data, your only choice is typically going to be with a for loop, probably with a bunch of if statements inside of it. It can be verbose, but it's obvious what's going on, and the developer doesn't need to spend any brainpower choosing between a loop or a functional approach. One of Go's designers, Rob Pike, went so far as to write a filter implementation in Go just to demonstrate that it's possible and he still thinks it's a bad idea.

A weak type system

Until recently, one of the most controversial exclusions from Go was generics. Although generics elicited the most vocal complaints, Go seems like it missed various other opportunities for crafting a more expressive type system that lets developers build safe abstractions (Go in general seems very anti-abstraction). Go's enums are but simple numbers, it has no way to indicate immutability, and it has no concept of (non)-nil-ability (outside of its value types, but then you get to complain about Go's zero values instead). All these features can be found in other languages (even Python's bolted-on type checking has them!), and they are seen as huge wins for writing safe code. In Go, the answer to most of these is to read the documentation carefully and write nil checks.

Lack of language extensibility

Go's designers don't want user code to look too much like native Go code. There's no operator overloading nor is there any way to use the range operator for your own types. Previously, Go's only generic data structures were the builtin map, slice, and channel types. This seems quite in line Go's explicit nature, actually, and some may see it as a boon. It certainly avoids certain clever programming that can harm code readability. Combine this with the lack of macros in Go, and users are frequently left writing and rewriting lots of boilerplate code.

Papercuts

Consider this a lightning round of Go issues that center around no specific theme but serve to irritate its users:

Interaction of growing slices with underlying arrays can be unexpected, confusing
Two nil values may not be equal
Unexpected loop variable capture
Struct tags feel like a hack, possibility for clashes
Data races that Go won't catch
Go zero values can be troublesome (particularly numbers)

None of these are gamebreakers, but when combined they leave me scratching my chin and wondering, "how simple is this language really?"

One more: error handling

You can't read about Go without hearing complaints about how it handles errors. Specifically, the fact that any operation that returns an error requires the additional 3-line (maybe 2) incantation

if err != nil {
    return nil, err
}

Often accompanying a one-line function call above it (or possibly integrated with it), many find Go's error handling distractingly verbose and a pain to wade through when reading code. There's also little in the way of safety rails for handling errors. If you forget to handle an error or outright ignore it with a _ variable name, Go's compiler won't bother you; I hope you remember to run errcheck! Go's error-handling capabilities have expanded slightly with time, with the introduction of error (un)wrapping in 1.13 improving their ergonomics significantly. The boilerplate, it seems, will remain, though.

Conclusion

And Go's rubric

Performance: 4
Safety: 3
Learning Curve: 4
Cloud/Library Support: 4
Tooling: 5

Rust

The internet has been set aflame in recent years by the introduction of Rust, a compiled language that makes a lot of big promises with respect to safety, be that memory or types. Imagine writing effortlessly memory-safe code without needing a garbage collector! Well, it's not quite effortless. That hasn't stopped Rust from being the most-loved language in StackOverflow's yearly developer survey for the last 7 years, though (and no, it's not even close). Rust brings a lot more to the table than performance and memory safety. It has an incredibly expressive type system that can prevent entire classes of potential bugs, high quality tooling, and what is most kindly described as an extremely passionate community. Let's start by looking at what makes Rust so great.

The good stuff

Borrow checker

Rust's borrow checker -- combined with its type system -- underpins its memory safety guarantees and enables "fearless concurrency." It does this through the enforcement of a small number of ownership rules at compile-time. By ensuring that variables only have a single owner at a time, Rust's compiler can prevent various errors like dangling references, use-after-free, and double-free (I know, I know, "who cares, Go has a GC!"). Rust essentially takes C++'s best practice of Resource Acquisition Is Initialization (RAII) and makes it an inviolable property of the language. This ownership tracking also prevents data races in concurrent code since data can only have a single owner at a time (hence the "fearless concurrency"). It's worth noting that the power of the borrow checker goes beyond just memory safety; it's RAII, not M(emory)AII. In Rust, there's no need for Go's defer statements or Python's with blocks. Rust implicitly knows when a variable has gone out of scope and can clean up its resources accordingly. All this checking and compile-time safety has given Rust a strong reputation that "if your code builds, it's not going to crash."

Type system

I use the category "type system" here to cover a number of use cases enabled by Rust's type system that I've found particularly refreshing. Obviously, covering the entire type system would be quite an endeavor that none of us have the time for.

Errors as enums

Rust, much like Go, handles errors by returning them directly from functions just like other data. Distinctly from Go, though, Rust encodes errors into a Result enum, which can contain either the result of a successful execution or an error if something went wrong. It’s up to the caller of the function to handle this Result as they see fit. This pattern more accurately models the exclusive nature of errors (there’s no possibility that a function may return both a result and an error). It also encourages dealing with a potential error as soon as possible, preventing the scenario where an error check is forgotten and a bad result flows through your code until something somewhere else blows up.

Option enums

Rust’s notion of “null” differs from that of most other widely used languages. Rather than every type in Rust potentially being null, types in Rust are non-nullable, and there is instead an Option enum, which may contain either a value of a given type or the None sentinel. This allows functions to indicate in their signature whether they might return "nothing," and it lets the compiler check to ensure that you've properly handled that possibility in your calling code.

Tooling

Like Go, Rust's tooling story is quite strong. It has a high-quality formatter, language server, linter, etc. Its dependency management is particularly strong (which is pretty important, considering how many things Rust's standard library leaves out, but I'll get to that). Rust also follows in Go's footsteps with a single website where most library documentation can be found (docs.rs). Rust does rely on the broader native ecosystem for tooling in some cases (for performance profiling it's recommended to use the same tools you might use for profiling a C++ application, like perf or valgrind). The Rust team has also demonstrated a willingness to bring 3rd party tools that outclass their official counterparts into the Rust organization (this recently occurred with rust-analyzer replacing rls as the official Rust LSP server). There are still some tooling voids, though, particularly in the async realm where it can be difficult to inspect how a complex running system is behaving.

Language interop

Rust has a strikingly good language interoperability story. Its Foreign Function Interface (FFI) compatibility is equivalent to C, but it can be written more safely and with its full arsenal of high-level language features brought to bear. This presents an exciting direction to consider when assessing languages: perhaps we can write the bulk of a service in a more readily productive language and write the performance-critical parts as native Rust extensions. This mindset does discount Rust's other strengths like its null-safety, but it's still worth exploring.

The rough edges

It’s not all roses with Rust. Those incredible features granted by the borrow checker that I discussed above do come with a cost in the form of language complexity. The Rust team's painstaking approach to language development has also resulted in a reliance on Rust’s ecosystem of community libraries for functionality that many would expect to see in the language’s own standard library. Let’s explore these issues in more detail.

The learning curve

Rust's learning curve (the "cost" I mentioned above) seems practically legendary at this point. Just the knowledge of it served as a significant mental block to me when trying to learn and assess Rust for this writeup. Unless a developer is already comfortable with C++'s best practices around RAII and ownership and has a strong grasp of the typing concepts originating with the ML programming language, they're going to have a lot of learning to do to become proficient with Rust - and I haven't even mentioned Rust's macros yet! Rust puts its best foot forward with the Rust book, a free online resource for learning Rust that I found to be very approachable, but there remains a gulf between "getting" Rust's concepts and actually putting them to work. There are myriad stories of people having to "fight with the borrow checker" to get their code to compile. While this is ultimately beneficial, as it ensures their code's safety and proper function, it can be quite demotivating to spend potentially hours on your code with little to show for it.

All the features of Rust's type system also bring with them a glut of syntax, making it a visually dense language, and it's tough to learn it all piecemeal. It feels like you need to understand the whole language before you can make much progress with it.

The density of language features in Rust can also make the question of "is this code idiomatic?" hard to answer conclusively. Consider the various ways an error from a function can be handled:

Match statement
"If let" statement
Result methods
? operator

The differences between these approaches can differ subtly and become unnecessary points of focus. As a personal anecdote, I was writing some Rust code to handle a Result wrapping an Option wrapping another Option. I painstakingly golfed my way to a 7-line chain of function calls while spending quite a while getting comfortable with Result/Option's large library of functions; I then discovered with clippy's help that there was an even better 6-line approach that I had missed! Now while this was honestly quite fun for me, it wasn't really the best use of my time.

Light standard library

Rust has a fairly small standard library by most standards; certainly no one would describe it as "batteries included." Some see this as a positive due to the strict backwards compatibility expectations of the stdlib ("the standard library is where modules go to die" is a recurring phrase in the Python community highlighting this phenomenon), meaning that non-critical functionality is free to become the best version of itself as a 3rd party library. This approach can create overhead, though, as competing implementations of functionality arise, and developers must spend effort to determine what will best fit their use case. For a few examples, consider time vs chrono, tokio vs async-std, or rustls vs native-tls. This fracturing drags down library developers who need to decide whether to support one or both implementations and produces comical "feature matrices" of possible library combinations.

Another anecdote: when writing some Rust to replicate a test in an existing Go service of ours that checks the color of a couple pixels in a png, I looked to crates.io for a library to handle this. I added the png crate as it seemed like it could do the job, but I found out after attempting to implement the test and digging around the documentation for a while that the crate just wasn't high-level enough to let me do what I wanted succinctly. Instead, it was the image crate that I ended up using (and it was admittedly quite succinct). Go just has a png module built in, along with an interface for getting pixels from images.

Immature ecosystem

This isn't an inherent shortcoming of Rust, but it is a reality that must be accepted: Rust's ecosystem for web services is still young and continues to shift. Full integration of async (which most web frameworks rely on) throughout Rust is still being developed. Rust's async "book", the feature's canonical documentation, is also missing significant sections. Rocket was previously a web framework of choice, but now its sole developer has let it languish. Actix-web's maintainer received so much community backlash for using significant amounts of unsafe code that he stepped down (this was 3 years ago and actix-web quickly got a new maintainer, but it's indicative of the perils of relying on 3rd party projects). Rust also lacks official SDK support from most cloud and SaaS platforms, though AWS has a Rust SDK in developer preview and seems committed to promoting the language more broadly.

Conclusion

And finally, Rust:

Performance: 5
Safety: 4
Learning Curve: 2
Cloud/Library Support: 2
Tooling: 5

The above languages at FlightAware

Of course, we can't simply consider Python, Go, and Rust in a vacuum. Each language is already in use at FlightAware in some capacity. As mentioned, we've invested significantly in Python as a general purpose language of choice at FlightAware; Rust has been used by the Flight Tracking Wing to implement multiple high performance data streaming services; and even Go has been used in a more limited fashion to implement some high volume microservices that handle tasks such as proxying weather map tiles through AWS.

Though people often say you should use the best tool for the job at hand, there is also a cost to the proliferation of languages in an organization. It promotes siloing; effort is duplicated when implementing libraries, developing guidelines, and even just learning the languages; and there can be a maintenance burden if your experts in a given language depart.

At FlightAware, Python certainly has the strongest foothold, with Rust coming in second, and Go trailing behind it. Suppose Go really is the best tool for the job of developing performant web services; is it better enough than Rust that it's worth the added burden and risk detailed above? Consider, too, the fact that the Backend wing at FlightAware is going to be investing in Rust for the development of other software like feeds from our data partners.

If we consider the possibility that different web services may merit different languages based on their performance needs, I see the following possible selections we could make:

Go
- With its generally very solid performance, strong developer experience, and web service focus, I see Go as a "sweet spot" language. It's productive enough that we likely wouldn't need to reach for Python, and it's performant enough that Rust wouldn't be necessary either (no, I don't expect we'll encounter Discord-level scaling for quite some time).
Rust
- Using just Rust for web services could work, but I think Rust's focus on correctness and safety over all else could really slow down development, especially as the wing first trains up. That combined with Rust's still immature web service ecosystem makes it a tough sell.
Python and Rust
- Although Python alone won't suffice for our more performance-intensive services, I think Python combined with Rust could be a compelling choice. We could use Rust as needed, enabling a more gradual learning process, and continue to leverage the languages that FlightAware's already invested in.

Overall, I believe that Go would be the best choice for developing web services at FlightAware. Yet it's hard to wholeheartedly recommend it with the awareness that FlightAware's Rust codebases continue to grow, as does the language's community. Rust's costs seem mainly upfront, with large promised dividends.

Ultimately, I see Go as the lowest risk choice. I have confidence that the Backend wing can learn it and become productive with it readily, and I have little doubt that it will serve our needs. I also think it will be an approachable language for the Web wing, who will also need to work with and support these services.

The Best Tool for the Job: Assessing Languages for Microservice Development at FlightAware was originally published in Angle of Attack.

Hacking at FlightAware

Mon, 09 Jan 2023 12:32:17 -0600

Each Summer during our town hall week, FlightAware engineers participate in our annual Hackathon. During the Hackathon, we take a break from our normal work to play around with technology, have some fun, compete against each other, and try to hack something amazing together. It’s a great experience where we interact with other engineers throughout the organization and get to know each other a little better.

During the Hackathon of 2020 (incidentally our first remote Hackathon), I had the pleasure of being on a team with James Wilson (a web engineer), Chris Roberts (a fellow backend engineer), and Lakshmi Raman (our QA lead at the time). After throwing out a variety of ideas, we decided we wanted to try to animate multiple historical flights simultaneously, and this post will cover how we went about doing that and how that hackathon project continues to live on today.

Hack It Up

So, we decided we wanted to replay multiple historical flights simultaneously, but how were we going to do that? Thankfully, we work at FlightAware, so we have access to probably the most complete historical flight dataset one could hope for. Our first issue was that we could not easily use the existing map infrastructure at FlightAware. It’s built on OpenLayers, and while it works great for FlightAware’s current use case, it’s a very bespoke implementation and trying to incorporate multiple flight replay into FAMap in basically a day was not something we thought we could accomplish.

James suggested we use the Mapbox SDK to speed things along, and he very quickly (thanks to his years of experience on the Web team and a great example on the Mapbox site) threw together the first proof of concept (POC). The backend for that first version used our typical Tcl / Rivet stack and leveraged something called trackstream to access historical flight tracks. Trackstream itself is just a library which queries either our real-time position data store (popeye) or postgres tracks database to marshal an appropriate tracks response. That first version worked well and gave us confidence we could produce a slick presentation.

Following on James’ initial POC, we set to work on making it better. The animation needed some improvement, and we really wanted to have a scrubber on the screen, other controls, and some interface to load interesting historical flight events. I spent most of my time during the hackathon working on the animation piece. If you have not dealt with map animations before, our approach was the same as found in that Mapbox example above. We defined an animate function, and call requestAnimationFrame, passing it our animation function. The animation function figures out where all the aircraft in our dataset should be for a given animation frame. We do that by creating a virtual clock which moves forward by some increment, determined by the replay rate, then by examining our tracks dataset to filter down to the position to display at that virtual time. The function also handles adding markers for aircraft that depart during the animation window and removing markers when they arrive (incidentally the early versions didn’t do this, so you would get a big pileup of markers at airports if you were doing a replay of aircraft departing and arriving at a given airport). As part of the data ingestion process, we create a position for each flight at each virtual time increment using a simple interpolation between real positions to achieve a smooth animation. The best description of the code and approach would be hacky, but hey, it’s a hackathon!

Chris built out our scrubber functionality during the hackathon as well as fixing up all sorts of bugs we found along the way, James built out an admin interface for creating new replays, and Lakshmi made sure everything was working as we were quickly adding new code and features to the tool. The replay interface requires you to figure out the list of flights you want displayed ahead of time, and we did that aggregation separately, producing a list of flight identifiers for each event we wanted to show.

Ultimately, everything came together, and we found a few fun events to show. A couple of noteworthy ones were a replay of flights arriving at Oshkosh for AirVenture the previous Summer and a replay of a FlightAware flyout where the aircraft I was flying had an issue, and all the other planes had to fly back to pick us up. Our team wound up winning that year and we had a great time throwing our project together.

Fast Forward to Today

The hackathon project was a lot of fun, and while it demonstrated a fun way to visualize historical flight data, the project sat on the shelf for the next couple of years (as is usually the case for hackathon projects; after all, we were building things for fun, not to put into production). Following the acquisition by Collins Aerospace of FlightAware, we hosted a hackathon for Collins engineers and that got me thinking about our old hackathon project. I went looking to see if it still worked; lo and behold, I was able to pull the code up, dust it off, and go relive the glory of our 2020 hackathon project. I then went and found a recent event and created a replay for that, which I posted in our flying channel on Slack.

Apparently, some marketing people lurk in that channel too, and they noticed the replay and thought that was fun. I forgot about that little trip down memory lane shortly thereafter, but in the leadup to Hurricane Ian approaching Florida, marketing reached out to see if I could do one of these replays for flight traffic around the hurricane. Of course I said, “Sure, that sounds like fun! And you know what? I’ll see if I can incorporate weather replay (because how hard could that be) so you can see both the flights and the hurricane.”

Well, it turns out that replaying flights over a broader area wasn’t what we had in mind back in the day, and yes, figuring out how to get all the weather tiles in there and playing nice wasn’t going to take just 5 minutes. But I really wanted to be able to pull this together, so I banged away on it that evening, figuring out how to preload all the weather tiles I’d need and then show and hide them based on the virtual clock in the replay and even have that work with our scrubber. I also had to change out some the backend code because the page was timing out while trying to load flight tracks for the ~15,000 aircraft I was animating over that one-day period. So really, it was a bit of a hackathon on my own that evening, but I succeeded in producing the animation.

As a nice bonus, it turned out we had great coverage of all the hurricane hunters flying into the eye of the storm and the animation really highlighted that.

Any Lessons?

This is probably one of the less serious posts to grace the pages of FlightAware’s Angle of Attack blog, but it’s worth highlighting that this is only one example of how a fun hackathon or side project turned into something useful for the company. It’s a strong testament to the unique way that FlightAware empowers its employees to create powerful tools and software and iterate on them. I’m not alone in finding ways to find joy in the work I do – everyone at FlightAware demonstrates their passion for our chosen crafts in ways big and small, and that’s a big part of why I enjoy working here so much.

Hacking at FlightAware was originally published in Angle of Attack.

Flight Tracking Access Control

Mon, 05 Dec 2022 10:03:44 -0600

Garrett McGrath, a Staff Engineer in FlightAware's Aviation Insight group, is responsible for the performance, reliability, & observability of multi-machine Hyperfeed & an increasing constellation of services that consume Hyperfeed's output.

The FlightAware Multiverse

Theoretical physics has gifted the popular imagination the intriguing notion of the multiverse: a collection of parallel, divergent universes that together constitute reality^[1]. At FlightAware, we have a similar situation with the flights we track. FlightAware’s picture of global flight activity emerges by fusing together information from dozens of disparate data sources. Each of these sources provides distinct tracking data and, crucially for this discussion, each data source can have its own level of confidentiality^[2] or visibility. Put simply: not every data source can be seen by everyone. Some data sources, like FlightAware’s network of terrestrial ADS-B receivers, are visible to everyone without exception. But others, like Aireon’s space-based ADS-B data, have a restricted, limited distribution. Having so many data sources is an enormous asset in terms of flight tracking quality, but it also poses a technical problem. How do we track a single flightplan^[3] with multiple access levels, restricting privileged data from unauthorized users while preserving the full picture of a flight for those with elevated permissions? FlightAware solves this problem with its version of the multiverse: by maintaining multiple versions of the same flightplan simultaneously. Rather than tracking a single flightplan, FlightAware tracks a family of flightplans in tandem. To achieve this, we employ two core abstractions: provenance and pedigree.

Provenance and Pedigree

Flight tracking at FlightAware occurs in a program called Hyperfeed, which manages the flightplan multiverse. When processing an input message, Hyperfeed performs several high-level steps. First, it normalizes and validates the data. As part of this data preparation, Hyperfeed assigns every input message a provenance. In Hyperfeed’s domain, a provenance is a textual tag indicating how widely the input data can be shared with FlightAware’s customers. The specific textual tags used are kept internal to FlightAware’s systems; no end user deals directly with provenance. Once the input has been prepared for ingestion and a provenance has been assigned, Hyperfeed consults its current state^[4] to find the flightplan for the input message.

If no flightplan can be found, a new one is created. Only considering for now the case where Hyperfeed creates a new flightplan, a key part of flightplan creation is the assignment of a pedigree. Every input message gets a provenance; every flightplan gets a pedigree. A flightplan’s pedigree contains all the provenances used to track it. Initially, a flightplan only has a single provenance in its pedigree, but things get more interesting when additional provenances come into play.

In the converse situation, where Hyperfeed finds an extant flightplan for an input message, it first checks if the provenance of the message exists in the pedigree of the flightplan. If it does, then it uses the input message to update the flightplan accordingly^[5]. If it does not, then before Hyperfeed updates the flightplan, it undergoes a process called forking where it splits the universe of the flightplan into parallel, divergent paths. Forking has some nuances^[6] outside the scope of this post, but in every case, Hyperfeed maintains a parent flightplan whose pedigree contains all the provenances ever used to track the flight and creates some child flightplans, each one containing a subset of the provenances in the parent flightplan’s pedigree. Every time forking occurs, the parent gets a new provenance in its pedigree and adds at least one new child flightplan. In this way, the tracking of a flightplan family evolves over time as more provenances get incorporated.

Flightplan Family Pedigree Evolution

To illustrate this process, let’s consider an example: the addition of two new provenances to a parent flightplan’s pedigree. Initially, our example flightplan has a single fork in its family: the parent fork with a lone provenance, called A, and a pedigree consisting of just that provenance. When an input message with a new provenance of B comes along and gets matched to this flightplan, forking occurs and results in two new members of the flightplan family. The parent flightplan’s pedigree changes at this point from A → A B. Of the two new family members, one child has a pedigree of A, and another child has a pedigree of B.

Assume now that an input message adds an additional provenance of C. This updates the parent’s pedigree and creates four additional child flightplans. Starting simply, after adding C to the parent flightplan, its pedigree becomes A B C. Previously the parent had a pedigree of A B; that pedigree persists, but now in a child flightplan rather than as the parent. The child with a pedigree of A similarly sticks around, and a new child is created with the pedigree A C. Similarly, the child with a pedigree of B persists, with a new child created for the pedigree B C. Lastly, a new child flightplan begins with a pedigree of C.

Speaking generally, the addition of a provenance to a flightplan family creates a flightplan with the newly added provenance as its pedigree, plus each existing member of the family is copied, including the parent; one instance of the copy gets the new provenance added to its pedigree while the other version of the copy retains its previous pedigree. Forking a flightplan family with N members yields a new family with 2N + 1 members. In the previous example, the family size grew with each provenance added to the parent’s pedigree from 1 to 3 to 7 members.

Pedigree Access Control

Forking and maintaining the FlightAware multiverse happens within Hyperfeed, but this is only part of the story. The other part is what happens with Hyperfeed’s output, i.e., FlightAware’s picture of global flight activity, which is called controlstream. Controlstream is meant for external consumption: once it leaves Hyperfeed it becomes, for example, the data seen on the website, the source of API offerings, or an input to ML models for predicting flight ETAs. However, controlstream can only be consumed and presented to FlightAware’s customers and users in a way that respects pedigree. Respecting pedigree means that every consumer of controlstream sees only their permitted portion of the flightplan multiverse. To accomplish this, every controlstream consumer has a pedigree access-control list (PACL). For controlstream consumers without a FlightAware account, e.g., a non-logged-in user of the website, a default PACL is used that only shows publicly distributable data. Otherwise, a PACL gets assigned at FlightAware account creation time^[7]. Any presentation of controlstream, then, necessitates evaluating a consumer’s PACL against the pedigree(s) of a flightplan family. Evaluation of a consumer’s PACL determines whether a flightplan family is visible, and, if so, which family member’s pedigree has the most provenances with respect to what the PACL permits.

Evaluating a PACL sounds relatively straightforward, but there is an additional wrinkle that adds complexity. A PACL does not simply contain the provenance that a controlstream consumer can see, it also supports additional constraints on any given provenance in a PACL. Namely, a specific provenance can be further limited to a restricted list of callsigns or tail numbers and airports. Furthermore, the permission to view a given provenance for a PACL can also have a date range attached to it. This allows FlightAware to express restrictions for controlstream consumers like the following: consumer U can see the Aireon spaced-based ADS-B provenance only for tail numbers verified to belong to U and only for flights starting on a specific date, or consumer X can see Eurocontrol data but only for flights originating or destined for Heathrow airport in the year 2020. Supporting these sorts of additional restrictions adds enormous power to the pedigree system of data access. This system enables FlightAware to granularly carve up the flightplan multiverse into finer-grained parallel universes tailored for very precise use cases^[8].

Footnotes

In case anyone wants to write in with a correction, please, ahead of time, excuse the sloppiness of my characterization of the multiverse. The intention here is to capture a colloquial understanding of the concept, not that of professional physicists. ↩︎
Confidentiality is used here in the computer security sense which means “the privacy of information, including authorizations to view, share, and use it.” ↩︎
While flightplan is an oft-used term, in the context of this blog it means the intention of a plane to move at a certain time from one location on earth to another location on earth at a different time. Each flight page on FlightAware’s website or mobile app, for example here or here, represents a flightplan. ↩︎
Hyperfeed keeps its state, which includes all the flightplans it is currently tracking, along with any positions for those flights and a host of additional supporting information, in a PostgreSQL database. ↩︎
When updating a flightplan family, Hyperfeed only updates the members with a pedigree containing the provenance of the input message. ↩︎
Most of the nuance not covered in detail involves special case handling for the most permissive provenance, which happens to improve performance. Forking behaves somewhat differently when the most permissive provenance is seen for the first time for a flightplan. This special case handling, though, only adds more detail without invalidating the discussion herein. ↩︎
A FlightAware account’s PACL can change over time depending on the addition or removal of premium features; a user’s PACL is not immutable. ↩︎
For those wondering, the pedigree-based access control mechanism does not enforce the block list. That is also a key part of displaying the contents of controlstream to FlightAware’s customers, but it is handled separately. Hyperfeed does not have any notion of blocked idents, so the access control decisions for it happen entirely in any application or presentation layer used to display controlstream’s data. ↩︎

Flight Tracking Access Control was originally published in Angle of Attack.

Taking off with Nix at FlightAware

Fri, 11 Nov 2022 13:08:27 -0600

As a Senior Software Engineer 2 on FlightAware's Predictive Tech crew, Andrew Brooks works to maintain, analyze, & improve FA's predictive models as well as the software & infrastructure that supports them. Andrew is also the tech lead of the Engineering Productivity crew, which designs and maintains internal tools and standards to support FlightAware's engineers.

Overview

At the time of writing, most articles on the Nix package manager are written with individuals or small teams in mind. Unfortunately, these articles do not offer much insight or advice on adopting Nix across an entire engineering organization. This blog post is our attempt to fix that.

In this blog post, we’ll provide an account of the problems that motivated FlightAware to adopt Nix and how we’ve used Nix to solve them at scale. Additionally, we’ll share some insight into the patterns, infrastructure, and practices that allow us to use it effectively, and offer some tips for both individuals and teams that are considering working with Nix.

Although this blog post assumes no familiarity with Nix, having some prior Nix experience may help readers get the most from our advice.

FlightAware Circa 2020

Our earliest experiments with Nix occurred around 2020 in response to a few day-to-day problems that FlightAware engineers didn’t have a good answer for.

Some languages we used had no package manager

FlightAware has a substantial amount of legacy code written in Tcl. Unfortunately, Tcl did not have an actively maintained package manager. This means that if you want to install a Tcl package, you’re on the hook for figuring out how to install it and its entire dependency tree. To make matters worse, Tcl packages often don’t clearly state their dependencies, so even identifying what needs to be installed often proved to be a hassle.

FlightAware also has several performance-critical libraries and applications that are written in C++. Unfortunately, the package management landscape C++ is bleak: even though there are a few C++ package managers, none are widely accepted. When building internal C++ projects, engineers often resorted to doing git operations with CMake to pull in any dependencies.

Creating Docker images was miserable

If you’ve ever written a nontrivial Dockerfile, you’ll know that building a Docker image for an application with a complex dependency tree would be a tedious endeavor without a package manager. Furthermore, Docker builds were often painfully slow due to the large number of repositories and build steps involved.

Changing any steps early in a complex Dockerfile was especially painful: Docker can’t tell whether running one step will affect the outcome of another, so it’s forced to re-run every step after the one you changed. Leveraging multi-stage builds is a common way to address this problem, and it works very well when installing something results in a manageable number of obvious paths to copy between stages. Unfortunately, this was not the case for many FlightAware libraries. In practice, Dockerfiles using multi-stage builds were sometimes so much more verbose than their naïve alternatives that they drove engineers back to less efficient, conventional Dockerfiles.

One approach taken by some FlightAware crews was to assemble one giant Docker image containing almost every FlightAware package, suffer through its long build time, and use it as a base for all applications. Unfortunately, this had a few drawbacks of its own. In addition to producing enormous images, you were still potentially stuck waiting on a very long build if you needed to change some component in the base image and your application in tandem. Some packages also proved challenging to install on top of a “comprehensive base image” because they required a conflicting version of a library or compiler present in the base image.

What we really needed was a way to say “go build/install application X and its dependencies – without repeating work unnecessarily – and give me a Docker image” without being drowned in an ocean of problems and lengthy build times.

Multi-language projects could be challenging to support

FlightAware had (and still has!) a “just use the best programming language for the problem” policy. Accordingly, many of our projects make use of native extensions to achieve performance goals or enable interoperability with native libraries. From a package management perspective, mixing languages could be a hassle because it’s frequently necessary to jump around between different package managers and accommodate languages that might not have a well-established one. Additionally, some languages in use at FlightAware tend to compile slowly (for example, one of our mission-critical Haskell projects typically takes about 40 minutes to compile its dependency tree), which exacerbated our frustrations with long Docker image build times.

What’s the Deal with Nix, Anyway?

Before we can describe how we’ve used Nix to address these problems, we first need to offer a quick overview of Nix and some basic concepts needed to understand why it’s been helpful at FlightAware.

In short: Nix is a package manager, a domain-specific language (DSL) used by that package manager, and a command line toolset based around the previous two. You may also have heard of NixOS, a Nix-centric Linux distribution (but you can comfortably use Nix without using NixOS)[1]. You’ll also hear “nixpkgs” (the Nix packages collection) mentioned alongside Nix. Nixpkgs is a git repository that offers common packaging functions and Nix package definitions for over 70,000 packages (to put that number in perspective, that’s currently more than twice as many packages as are available in the entire Ubuntu 22.10 repository).

At this point, you might be imagining Nix as “another apt/rpm/pip.” If so, think again! Nix provides several features and guarantees that more conventional package managers are unable to, requiring some surprising technical decisions to enable them:

· Nix is a “multi-language package manager”: it supports a wide variety of programming languages and build systems. Conveniently, it’s sometimes able to reuse other package managers’ package formats and give you a Nix package “for free.” Supporting projects in multiple languages means that you can avoid jumping pip/apt/etc. to install your software.

· Nix packages are built reproducibly: no matter where you build them, you’ll always get the same result[2]. There are several package managers that claim to provide reproducible builds, but they’re generally limited in what they can control for in comparison to Nix. Very few of them use a build sandbox, manage the entire build toolchain, account for native dependencies, or handle dependencies from other languages like Nix does.

· Nix does not install packages into FHS paths, like /usr/bin or /opt, but instead installs into a special directory called the “Nix store,” usually placed at /nix/store. The exact paths inside the store are determined by a cryptographic hash of the instructions used to build each package (which includes source code and store paths of dependencies) and the package’s human-readable name[3]. It’s important to note that paths inside the store are immutable: once written, their contents never change.

· Nix makes strong isolation guarantees for packages: installing one package can never break another. Additionally, you can simultaneously install as many different versions or customized variants of a package as you’d like without worrying about conflicts between them. Nix treats each of these versions/variants as distinct packages and installs them to distinct paths inside the store. If you change a build instruction, alter any inputs to the build sandbox, or change any of the input sources, you create a different package and therefore a different path in the store. Combined with Nix’s reproducibility guarantees, this means that if the same path is present inside the Nix store on two different hosts, it will have the same contents.

· In Nix, the notion of a “package” is slightly different than in other package managers. To simplify somewhat, a “package” in Nix is a function of all inputs needed to build paths in the Nix store, which you (or another host) may already have built[4]. The fact that Nix always carries around a description of how to build a package is extremely powerful for one-off experiments and package customization, as we’ll show later.

· Nix has a DSL for defining these packages (unfortunately, this DSL is also called Nix, a point of confusion for new users). The Nix language facilitates succinct, declarative package definitions[5], enables flexible customization of already-defined packages, and allows defining arbitrary functions to encapsulate common packaging patterns and logic. Although the Nix language is very capable, it is not intended as a general purpose programming language.

This is not an all-encompassing list: Nix boasts an extensive (and growing) set of features and guarantees that we’re not going to attempt to list exhaustively.

Solving Our Problems With Nix

Here’s where things get interesting: at this point, Nix seemed like a promising package manager choice for FlightAware. While Nix eventually proved to be a powerful solution for the problems we were facing, the road to using Nix effectively across FlightAware took a few surprising turns.

Could Nix be the package manager we were missing for Tcl and C++?

Nix already supports several languages’ common build/install patterns, package managers, and language-specific package formats. Unfortunately, at the time, Tcl was not among the languages supported by nixpkgs. However, because Nix already supports several other interpreted languages, it wasn’t difficult to add some conveniences for packaging code written in another. In late 2020, I implemented Tcl packaging support for nixpkgs, which wound up being more straightforward than I’d expected. FlightAware later contributed this work to nixpkgs, and this functionality is included in any recent nixpkgs release as the tcl.mkTclDerivationfunction.

Unsurprisingly, C/C++ were already very well supported by Nix. Nixpkgs contained an extensive collection of C/C++ packages and adding your own tended to be easy: packaging programs or libraries with a commonplace autotools or CMake configure/build/install process was often as simple as specifying immediate dependencies, where to find the source code, and basic package metadata (like a package name and description).

At this point, the case for Nix was compelling: not only did we have a package manager for two extensively used languages that didn’t really have one, but we could use the same package manager for any other languages at FlightAware. This made it easy for us to get a comprehensive picture of the dependency tree for FlightAware projects -- even across languages -- when doing so used to be impractical. As frosting on the cake, the Nix CLI includes several tools for querying and visualizing dependencies in addition to allowing inspection of packages from within the Nix language.

Figure 1: Runtime dependencies of scipy for Python 3.9, queried with Nix

Building the FlightAware package overlay

Now that we had support for common FlightAware languages figured out, we needed to actually write new Nix packages for FlightAware internal libraries and applications. For a more conventional “system-level” package manager (like apt), you’d generally accomplish this by creating a new package repository and populating it with packages built by a CI server based on some spec.

Nix, however, is a little different: to add your own tweaks and custom packages, you generally define something called an overlay. In simplest terms, an overlay is just a function, written in the Nix language, that describes how to extend or modify some base package set (typically nixpkgs). Conveniently, those modifications can be “fed back into” the base set of packages as inputs (i.e., if you modify the “sqlite” package in your overlay, any packages in the base set would be redefined to build against your customized sqlite).

Both nixpkgs itself and the internal package overlay we applied on top of it are “just” git repositories containing Nix expressions. This turns out to be very powerful: you can upgrade or roll back the set of packages in use for a project just by specifying a different git branch and commit of the FlightAware overlay and nixpkgs itself. The packages you use are just code like everything else. Engineers could experiment with upgrading/tweaking/modifying packages on their own git branch and install them without worrying about stepping on other engineers’ toes or having to coordinate uploading packages to a shared apt repository.

One important distinction to note is that these git repositories don’t contain “already built packages,” only the Nix expressions that describe how to build them. However, this doesn’t necessarily mean that you’re forced to recompile and reinstall software every time you install a package. We’ll talk more about how this works later. In the meantime, take our word for it.

With support for the most common languages at FlightAware and several new FlightAware packages at our fingertips, we were off to the races.

Creating Docker images without Dockerfiles

Ordinarily, when building a Docker container, you’d write a Dockerfile with one or more RUN steps that invoke the package manager of your choosing to install whatever packages you need or build any software that you need to. Surprisingly, this isn’t how you’d usually create a Docker image with Nix. As a matter of fact, you generally don’t need a Dockerfile at all.

Nixpkgs defines several functions for building Docker images with just a Nix expression. Behind the scenes, Nix builds any packages or paths that you’d need inside the image as “ordinary” Nix store paths in the usual Nix build sandbox. These paths are then copied from the Nix store into a layer of the image being built. Although copying in files from the build host like this might sound dangerous, Nix’s reproducibility and isolation guarantees make it safe – after all, if you get the same result anywhere you build a Nix package, who cares if you build it inside a Docker container, outside of one, or somewhere else entirely? Finally, any additional shell commands needed to set up the image are run in either the Nix build sandbox or a disposable VM.

Nix offers us a pleasant API for building Docker images – you don’t have to do much beyond specifying the packages that need to go in each layer and any shell commands that need to run to finish setting up that layer. The arrangement of the layers in the final image isn’t determined by “steps” as it would be with a Dockerfile -- Nix allows you to split package installation / setup across layers however you’d like. In fact, Nix’s dockerTools can optionally leverage Nix’s understanding of packages’ runtime dependencies to guess how to split the contents of your image across layers in the hope of maximizing layer reuse across images.

For example, here’s how a Nix language expression would look like to define a single layer Docker image with some application, an interactive shell for debugging, and basic user setup, from scratch:

When building an image with Nix, the Nix store acts a bit like the Docker layer cache. However, it caches not only image layers but also built packages themselves, their dependencies, and some intermediate build artifacts (like downloaded source code). This allows Nix-based image builds to achieve much finer grained caching than would generally be practical with a Dockerfile using a multistage build. For example, if installing an NPM package from a node2nix-generated Nix package, the Nix store would separately “cache” the downloaded source tarball for each NPM dependency. The only way to achieve a comparable level of caching in a multistage Docker build would be to create a separate stage describing the download of each dependency source tarball. For a nontrivial Node application, writing a multistage Dockerfile like that would be completely impractical.

Docker images produced by Nix tend to be very small: Nix knows exactly which store paths an installed package depends on at runtime and pulls in only those paths to the created Docker image. In our experience, Nix does this a little too well, and we occasionally need to manually pull in coreutils/bash/etc. so that we’ll have enough basic tools inside each image to effectively debug running containers when necessary.

In short, Nix’s dockerTools offered us an immense quality of life improvement when creating Docker images by making them simple to define, fast to build, small, and easy to iterate on quickly.

Setting up a private binary cache server

There’s one important and nuanced detail that we’ve glossed over so far: we’ve said very little about how already built Nix packages are distributed -- after all, we don’t want to waste time recompiling packages if we can avoid it. This is where binary cache servers come into play.

Basically, a binary cache server has a large Nix store with paths that are likely to be useful to other Nix installations and serves those paths to other Nix installations. When trying to install or use a package, another Nix installation will first check against a list of binary caches to see if it can find an already built store path for that package. If so, and if the paths are signed by a key that it considers trustworthy, it’ll download and install the package from the binary cache instead of building it. Otherwise, it just builds the package itself.

With a more conventional package manager, you might build and upload packages from a dedicated server or build farm. However, Nix enables a more decentralized approach: you can have any Nix installation automatically sign and upload paths to the binary cache as soon as they’re built, allowing them to be automatically and transparently reused elsewhere. This works especially well given how many variations on a package may need to be cached at once because of how easily and frequently packages wound up overridden or tweaked(e.g., from an engineer applying our overlay to a different nixpkgs release or modifying a package when testing something).

Figure 2: A possible deployment of a conventional package repository

Figure 3: A possible deployment of a shared Nix binary cache

We’ve found that an internal binary cache deployment like this has some compelling advantages:

· For our predictive tech crew, it allows data science environments to be quickly, reproducibly, and effortlessly shared between developers and development machines. This is especially useful given that some of them customize large swaths of the package set (e.g., by swapping out the BLAS implementation that linear-algebra-heavy packages are built against), which could otherwise result in disruptive, long build times.

· In general, it makes jumping between development hosts easy: any packages you built while developing on one server should be trivially reusable on another, with no additional work or compiling required.

· It can make Docker image builds incredibly fast: if a package being installed into a Docker image was built anywhere in the past and uploaded to the cache, it can just be downloaded from the binary cache and immediately copied into a Docker layer. If you’re experimenting with something outside of Docker on a dev host, then build a production Docker image on a CI/CD server using Nix, you won’t have to re-compile anything when building the Docker image (assuming both environments are submitting to/pulling from the same binary cache).

“The hard part” of setting up a binary cache

When productionizing our binary cache infrastructure, we found that there was a little more plumbing involved than we’d initially anticipated:

· The automatic upload is triggered by the Nix daemon’s post-build-hook setting. Unfortunately, you’ll need to implement the actual build hook script responsible for signing/uploading the package (the Nix manual has a trivial example).

· As noted by the Nix manual, the post-build-hook is blocking and fails the build if it doesn’t succeed. To prevent slowing down or breaking builds, our post-build hook merely submits a path for upload to a dedicated cache upload service we created that runs alongside each Nix installation. This allows the post-build hook to run quickly and get out of the way while the upload service does the “heavy lifting.”

· It’s probably a mistake to blindly upload everything you build to your binary cache. For example, uploading Docker layers or source code to the cache isn’t actually very helpful: they tend to be large and can be downloaded from sources other than the binary cache anyway. The upload service that we install alongside Nix uses a configurable set of heuristics for excluding store paths that are either too expensive to cache or seem unlikely to be useful.

· To prevent the binary cache’s Nix store from growing endlessly larger, you’ll want to configure automatic garbage collection (we’ll discuss this later).

If setting this up seems daunting, keep in mind that you don’t need to set up your own binary cache to make effective use of Nix. Note that you can also manually move built packages between servers without a binary cache if desired. However, for us, investing the time to set up an internal binary cache and automatic package submission has been well worth it.

On a related note, Nix has good support for transparent, multiplatform, distributed builds. While we don’t currently use this functionality, other teams or organizations might benefit from this flexibility.

Solving More Than Known Problems

In addition to offering elegant solutions to problems that we’d recognized, Nix proved to be useful for several reasons that we didn’t initially expect.

Unified Tooling and Infrastructure for Multi-Language Development

Although package management for Tcl and C++ motivated our initial adoption of Nix, we soon found that Nix still has a lot to offer when developing in languages with well-established package managers, especially for multi-language projects and services. Nix has been very effective as a unified package management and build solution, used across 12 teams for languages spanning C++, Tcl, Python, Haskell, Rust, Golang, and more. For languages with well-established package managers and practices, Nix is often able to trivially leverage existing language-specific package formats, which makes it “cheap” to add to a project.

Conveniently, Nix also has good support for describing and entering development environments with temporary installations of specific versions of compilers, interpreters, and other development tools. This means that engineers don’t need to waste time learning and tangling with language-specific tools to manage per-user or per-project installations of these tools. When engineers work on a project with a well-written shell.nix, you seldom hear the infamous phrase, “well, it worked on my machine.”

Nix (the language) is impressively powerful

Once you’re familiar with the Nix DSL, it’s surprisingly easy to arbitrarily tweak or experiment with variations on packages. It’s particularly easy to describe a new package in terms of an existing one within the Nix language. For example, let’s say you’d forked the kubectl GitHub repo and made a few tweaks. If you want a Nix package for your “custom kubectl” you can define it in terms of the existing one, keeping all old build/configure/install steps, except the source code will come from your fork instead:

Additionally, Nix’s overlay mechanism makes it easy to experiment with changes that span large swaths of the entire package set in a way that’s inconvenient or even unthinkable to do with a more conventional package manager. For example, suppose that you want to test the performance impact of enabling the AVX ISA extension across all C/C++ packages. Nix makes this trivial: in just a few lines of the Nix language, you can define and apply a package overlay that modifies the compiler flags used in the standard build environment and let Nix call all existing package definitions again with that new build environment.

Seriously, it only takes a few lines of Nix, and you can refer to AVX/non-AVX package definitions side by side:

If your team has internal conventions or team specific patterns in build/install processes, it’s easy to create and reuse functions in the Nix language that encapsulate them. For example, at FlightAware, we have convenience functions for pulling sources from our GitHub Enterprise deployment with reasonable defaults or offering shortcuts to allow commonly used repositories to be easily overridden during development.

Cross-compiling doesn’t have to be hard

Nixpkgs uses the Nix language very effectively to make cross-compiling relatively painless. Nixpkgs defines cross-compiled package sets for several common systems out of the box, and working with them is often as simple as just changing a variable assignment (check out the NixOS cross-compilation guide for more details). If you’ve been eyeing those ARM instances in AWS EC2, you’ll generally have an easier time experimenting with them if you’re packaging your application with Nix.

The Nix community is easy to work with

At this point, FlightAware engineers have contributed PRs to nixpkgs and to Nix itself. It’s generally been easy to navigate the PR process, and any interactions we’ve had with maintainers have been positive and constructive.

The Ugly Parts

Until now, we’ve painted a rosy picture of Nix. Make no mistake: while Nix has been immensely useful at FlightAware, there are a few aspects of Nix that have proved challenging to work with. We’d be remiss if we didn’t mention some of the “ugly parts” of Nix that we’ve encountered and briefly discuss how we’ve been able to address them.

The intended audience for Nix documentation varies considerably

Like any documentation, Nix documentation is written with a specific audience in mind, and that intended audience can vary significantly depending on what you’re reading. Unfortunately, that audience isn’t always clearly identified, and unknowingly encountering something intended for a different audience can be off-putting for engineers with little Nix experience. For example, the Nix Pills are well written but can be divisive among new users. Although they’re very effective at introducing many of the design decisions behind the Nix language and nixpkgs, explaining Nix “from first principles” can seem far removed from the day to day realities of working with it. This may frustrate new users if they were expecting documentation describing the latter.

Nix is simple, but really weird

When learning how to use Nix, the refrain seems to be “simple, but weird.” Nix isn’t complicated so much as it is unusual. Most package managers don’t treat packages as functions or have their own DSL (much less a functional one with non-eager evaluation semantics). Unfortunately, while oddities like these in Nix generally exist for good reason, they can make the learning “curve” look a lot more like a step function if you’re not careful.

From the perspective of someone unacquainted with Nix, “simple, but weird” is intimidating because it initially doesn’t look any different from “really complicated.” Any organization trying to achieve widespread Nix adoption will need to invest time both into helping engineers climb the learning curve quickly and demonstrating why they’d want to. There’s not any one way to do this effectively, and you’ll need to adapt any internal docs/explanations/examples based on the backgrounds of the engineers you’re working with. For example, the Nix language would likely be relatively easy for someone with Clojure experience to pick up. However, it may require careful explanation to someone with a background in exclusively imperative, non-expression-oriented languages.

It’s worth mentioning that Nix’s upcoming “flakes” feature helps address this issue by making the CLI more consistent and standardizing some patterns commonly used for multi-repository projects (more on this later).

MacOS support is second-rate

Nix officially supports MacOS, which is convenient for FlightAware (most FlightAware employees’ development machines run MacOS). Unfortunately, if you’re expecting to replace homebrew with Nix, you might be a bit disappointed: if you use Nix on MacOS for any extended period, expect to brush shoulders with a few broken packages. However, the landscape is quickly improving, and we’d be surprised if this remains a problem for long.

Handling private repos can be clumsy

Unsurprisingly, most Nix examples and packages are built with publicly visible open source repositories in mind. However, as you add your own internal packages, you’ll likely need to check out sources that come from private repositories. This means that you may need to use “impure” fetchers (e.g., like using builtins.fetchGit instead of pkgs.fetchgit in order to prevent the build sandbox from blocking SSH agent forwarding) on occasion. Some of Nix’s functions for handling other package managers’ package formats may not know to do this, which is an occasional headache.

Thankfully, in practice, this is little more than a minor inconvenience: it’s generally not difficult to override any offending functions to use a private repo-friendly built-in when needed (and if your team sets up a dedicated overlay, that’s a great place to make that override). The upcoming “flakes” feature can also be helpful here: repositories fetched as flake inputs “just work” (more on flakes later).

“Angle-bracket paths” are troublesome

In many Nix tutorials/blog posts, you’ll see Nix language expressions that start out with something like:

{};" loading="lazy" width="1810" height="142" srcset="https://flightaware.engineering/content/images/size/w600/2022/11/Screen-Shot-2022-11-03-at-2.36.34-PM.png 600w, https://flightaware.engineering/content/images/size/w1000/2022/11/Screen-Shot-2022-11-03-at-2.36.34-PM.png 1000w, https://flightaware.engineering/content/images/size/w1600/2022/11/Screen-Shot-2022-11-03-at-2.36.34-PM.png 1600w, https://flightaware.engineering/content/images/2022/11/Screen-Shot-2022-11-03-at-2.36.34-PM.png 1810w" sizes="(min-width: 720px) 720px">

Roughly, this means, “when evaluating the expression that follows, bring into scope everything defined in the package set loaded from the ‘nixpkgs’ channel.” This is a convenient shortcut for one-off experimentation, but it has a serious drawback that’s very easy to overlook: the specific release and revision of the “‘nixpkgs’ channel” is environment dependent. Different hosts and different users may have different channels configured called ‘nixpkgs.’ Practically speaking, this means that if you share code that uses angle-bracket paths, you’ll eventually see behavior that looks nonreproducible because you may be unknowingly asking Nix to build different packages on different hosts. It’s a shame that it’s so easy for Nix’s reproducible builds to appear otherwise to new users.

Fortunately, once you know to avoid this, it’s a nonissue: you can easily “pin” the package set in use to a specific branch/commit. Our internal Nix standards require avoiding angle bracket paths altogether and pinning the source set. Nonetheless, many Nix articles and examples that start this way assume that you’re already aware of the drawbacks (unfortunately, Nix newbies usually aren’t).

Conveniently, Nix’s upcoming “flakes” feature helps address this problem.

Flakes are still an experimental feature

We’ve mentioned “flakes” as being useful for addressing some of these concerns. To make a long story short, a flake is a composable unit for packaging Nix code that standardizes the structure of Nix-based projects and adds several conveniences for a Nix-based workflow.

Flakes are tremendously helpful, and we could dedicate an entire blog post to discussing how they work and why they’re useful. However, we’ll stay on topic and stick to the bottom line. Concretely, flakes improve a Nix-based workflow in several respects:

· Flakes make it much harder to shoot yourself in the foot with angle-bracket paths – your Nix packages/expressions would generally reference a “flake input” instead. With flakes, Nix handles pinning any such inputs for you.

· Updating or overriding inputs to the flake (like nixpkgs, the repo that holds your team’s overlay or some other repo) can be done entirely from the command line without manually editing any Nix expressions.

· The Nix CLI is overhauled to support new operations on flakes and in the process becomes more consistent and intuitive.

· Repositories with Nix sources follow a standard structure and have a standard entry point (flake.nix), which makes it a lot easier to pick up and understand others’ Nix-based projects.

· Flakes generally improve the performance of evaluating Nix expressions.

Several new and immensely useful Nix features are also based on flakes. For example, you can use the Nix CLI to generate new projects using templates from any flake that provides them (this is a great way to ensure that new projects adhere to your team’s standards from the get-go) or automatically figure out how to create a .deb, .rpm, or Docker image from a Nix package.

Flakes do a lot more to improve the Nix experience, and we’re glossing over a lot here in the interest of keeping this blog post accessible and focused. However, if you’re looking for more details about flakes and the motivation behind them, we’d recommend reading Eelco Dolstra’s introduction , this tutorial from the Serokell Labs blog, or even just the Nix manual’s section on flakes.

This brings us to the only thing that we don’t like about flakes: at the time of writing, they’re still considered an experimental feature. Although the flake CLI seems to have stabilized significantly as of late, there’s still no guarantee that there won’t be breaking changes. Obviously, this makes flakes regrettably difficult to recommend for production use just yet.

Other Nix Tips and Tricks

At this point, if you’re considering adopting Nix at your company, we have a few useful tips to share based on what’s worked well for us and what we wish we’d known before adopting Nix.

Flatten the learning curve

Nix is a powerful tool, but you should expect to invest time in flattening the learning curve to help your team learn to use it effectively. What this looks like will vary between teams, but we’ve generally had luck with writing thoroughly commented example projects, documenting common development tasks, establishing best practices/guidelines for using Nix, and when appropriate, pair programming. Having a dedicated overlay for your team can also be helpful for flattening the learning curve by serving as a shared source of convenience functions for team specific, common development patterns and addressing complaints raised by other engineers.

You should also make sure that your fellow engineers know about nix repl, which is altogether too easy for new Nix users to overlook. Being able to interactively examine expressions and packages is a great way to experiment with the Nix language and get your hands dirty. Experimenting with the REPL made Nix “click” for more than a few FlightAware engineers.

Care and feeding of your Nix store

When installing Nix and setting up the Nix store, there are a few tips and tricks that can make your Nix experience a lot more pleasant:

· Set up automatic garbage collection: in addition to installed packages, the Nix store holds (some) intermediate build artifacts and downloaded packages that aren’t currently in use. It may also continue to hold previously installed packages that aren’t currently needed by anything. While this can be a real time saver for future builds, it means that /nix/store will tend to keep increasing in size unless garbage-collected (if you’ve ever run “docker prune,” it’s roughly the same idea). Thankfully, Nix can do this for you if free space drops below a configurable threshold, but you’ll have to tell it to.

· Give /nix its own volume when feasible: this is helpful for making the most of automatic GC (it won’t be triggered by other files on disk taking up a lot of space) and gives you the flexibility to select a filesystem well suited to the Nix store.

· Choose a good filesystem for the Nix store: using a filesystem that supports transparent compression, like btrfs or zfs, is a good idea. Even if your compiled packages often don’t compress well, transparent compression can still be helpful to compress sources or scripts that are unpacked into the Nix store while building a package. Note that Nix does use the store for locking and maintains some state in a sqlite database under /nix, so it’d be wise to avoid a distributed/networked filesystem.

· Consider enabling automatic store “optimization”: this automatically checks for and replaces paths in the Nix store with identical contents with hard links to only one copy[6]. This cuts down on disk usage when you have several different variations/versions of a package in the store at once.

Addressing non-NixOS quirks

Using Nix on a Linux distribution other than NixOS is a great way to try out Nix without uprooting any existing Linux installation. However, when you’re not using Nix on NixOS, there are a few quirks that are easy to overlook compared to a NixOS environment. We haven’t seen very many tips on managing such installations, so we’ll provide some basic tips that we wish someone had told us:

· The Nix installer selects the “unstable” nixpkgs channel by default. There are fewer surprises on the latest stable channel, so we’d recommend switching ASAP unless you have a good reason to prefer the very latest nixpkgs.

· On NixOS, you upgrade your system and its configuration with nixos-rebuild. On non-NixOS Nix installation it’s a little different: you pull the nixpkgs channel and do a nix-env --upgrade as root to upgrade the default profile. If Nix itself was upgraded, you may also need to restart the nix-daemon service manually.

· Running Nix commands as root (eg, to upgrade packages in the system-wide profile) may require using the -i flag for sudo.

· Consider adjusting/dev/kvm permissions: some features of Nix’s dockerTools can leverage KVM for acceleration when building images if you need to run any commands “inside the image” to prepare a layer. Unfortunately, Linux distributions vary in the permissions that they assign to /dev/kvm, and some will prevent the Nix build users from using it by default. If practical, we’d recommend setting up a udev rule to grant +rw on /dev/kvm to benefit from KVM acceleration. If this isn’t an option for you, don’t worry: dockerTools can fall back on unaccelerated qemu if you’re using nixpkgs-22.05 or later.

One Last Thing: Don’t Rush Into It

We hope that this article was able to shed some light on how we’ve been able to use Nix effectively across the entire FlightAware engineering team, and we hope that it proves helpful to anyone thinking of doing the same. In closing, we’d like to offer one last piece of advice: Although Nix has served us well at FlightAware, for you and your team, its value proposition will vary depending on what problems your team is solving, which features are already available with your current tooling, and the backgrounds and experience levels of the engineers that you work with. At the end of the day, package management has a significant social dimension, and whether/where to adopt Nix should be a discussion with your team.

_________________________________________________________

[1] It’s trivial to install Nix on Ubuntu or most other Linux distributions, and Nix thankfully won’t interfere with more conventional package managers on your system.

[2] Nix goes to great lengths to ensure reproducibility, but it’s technically still possible to cause non-reproducible behavior in specific circumstances (eg, if your compiler isn’t deterministic or if the kernel’s behavior changes in a way that affects a build). However, Nix provides tools to verify reproducibility and upholds this guarantee quite well in practice.

[3] This is a simplification: there are situations where the contents of something are hashed and used in determining the path inside the store. For example, some Nix built-in functions that fetch source code behave this way.

[4] Nix experts may take issue with how this blog post uses the word “package.” Formally, we’re using “package” to describe a derivation, a Nix language expression that produces a derivation, or a store path built from that derivation. We’re deliberately glossing over the distinction because we think it obscures the basic ideas in this blog post. If you don’t know what any of this means, don’t worry about it.

[5] The Nix language is “so declarative” that new Nix users often mistake Nix sources for configuration!

[6] It’s a good thing that store paths are immutable – this would otherwise be dangerous!

Taking off with Nix at FlightAware was originally published in Angle of Attack.

The Evolution of AeroAPI

Mon, 03 Oct 2022 17:23:37 -0500

AeroAPI is FlightAware's query-based API, allowing customers access to FlightAware's vast array of flight and related data. It is utilized by thousands of customers for myriad purposes, some directly related to aviation and others tangentially so. It has evolved over the years, and the latest iteration offers a RESTful API with comprehensive documentation, better tooling for customers, and an ever-expanding aviation data offering.

A Brief History of the API

The API was launched back in 2006 as DirectFlight (v1), offering a SOAP interface and implemented with the TclSOAP library. At the time (and really for most of FlightAware’s history), Tcl was the language of choice for all backend services, so it was utilized for the DirectFlight implementation. This early version charged users based on their query count and the class of the query, and it was FlightAware's first API offering which attracted a number of customers. The query interfaces mirrored how FlightAware stored its flight data and presented fields and operations matching those internal representations.

DirectFlight was followed up by FlightXML (v2) in 2010 as an expanded API offering. With the new version, there were several new endpoints offered, expanding flight data availability for customers. Behind the scenes, a new Tcl library, tclws, was utilized to implement the API; its endpoints were hosted within Apache utilizing Apache Rivet and made available on the same servers hosting FlightAware's website. The original FlightXML release offered the same SOAP interface as DirectFlight, but in 2011 a JSON implementation was added, allowing for more flexible customer integrations. This version continued the same usage-based monetization model used by DirectFlight and continued to grow the API customer base.

Fast forward to 2017 and a new version of FlightXML, v3, was released as a beta. The new version maintained the same underlying architecture with a few enhancements to the tclws library but offered more comprehensive data in an individual call. This was driven by user feedback that the number of calls required to gather the full data for a given flight was onerous in FlightXML v2, so richer endpoints were produced to address this concern. As an example of the difference: with earlier versions of FlightXML, to get information about a particular aircraft, you would call the FlightInfoEx endpoint which would give you standard data on origin, destination, and runway departure and arrival times. That endpoint would return a list of flights, each with a FlightAware flight ID (a unique identifier for each flight), which would be used to query AirlineFlightInfo to get the gate departure and arrival times, information about codeshares, baggage claim, and terminal information. If you wanted that detailed information for 10 flights, you would make 1 call to FlightInfoEx followed by 10 calls to AirlineFlightInfo.

With the updated version, all this data was made available in a single call. The other major change in v3 was the monetization model utilized. In the v3 version, users would select a tier where they prepaid for calls in advance, so assuming they stayed within the allotted call volume, they would have a predictable spend each month.

Following the release of the FlightXML v3 beta, the team pivoted to work on enhancing and expanding FlightAware's streaming API: Firehose. There were still several endpoints that would need to be implemented to make v3 a full product offering, but it would be a few years before resources and priorities aligned for that work.

Figure 1 - FlightXML Overview

The Motivation for AeroAPI (v4)

In the Summer of 2020, after adjusting to a new world where we all worked remotely and were dealing with COVID, our product manager for APIs and I began talking about the future of FlightXML. We had achieved quite a bit of success with Firehose, but there were use cases where it didn't make sense and FlightXML was starting to look pretty long in the tooth. As we began brainstorming about its future, our PM proposed examining some commercial offerings, in particular certain API Gateway and Management solutions. I wasn't familiar with the various offerings from Mulesoft, Apigee, IBM, etc., and didn't see the value add at first blush.

During my first three and a half years at FlightAware, we had continued to embrace Tcl and home grow many of our solutions and product offerings. As the company was maturing, though, we recognized that we spent a non-trivial amount of engineering time designing libraries and other tooling for Tcl that would be freely available to us with other, more widely adopted languages. By 2020, we were beginning to adopt new third-party solutions and had integrated different languages into our technology stack. As an example, our Predict team utilized Python, Scala, and Spark in their tech stack and AWS for their model training. This was one of FlightAware's first uses of cloud technologies and a cloud compute provider. Historically, all of FlightAware's hardware was housed within leased data centers on our own servers and related equipment. This all led to some reluctance on my part at first to seriously consider any cloud hosted, third-party solutions for the future of FlightXML.

To guide our API's evolution, we had conducted surveys of our existing FlightXML users and there were a few key points that they wanted to see addressed. First was an improvement to the documentation and resources for developers using FlightXML. Second, they wanted us to offer a truly RESTful interface for the API. And finally, they wanted to see the functionality in v3 expanded to encompass the full suite of flight data endpoints. We realized that we were not going to be able to easily address all these concerns with another iteration of our existing API infrastructure, using the tclws library, without major investment in that library. We did not consider continued investment in that library tenable, and instead were excited about utilizing RESTful tooling available in most other languages. We also wanted to use OpenAPI to define the next interface of FlightXML, which would give us access to better documentation and tooling right out of the box. As we considered these requirements, it became clear that rewriting our API in a new language would be a monumental undertaking given all the ancillary work that would be required to make that a reality. We wanted to deliver this new version on an accelerated timeline and realized we could make use of an API gateway to leverage our existing endpoints to expose a new RESTful interface, combining v2 and v3 functionality.

We still needed to determine what API management features would be of interest to us and what monetization model our sales and product teams wanted to employ in the next version. That would require both more discovery for the engineering team and some legwork for myself and the product manager to wrangle the commercial details.

Figure 2 - AeroAPI Overview

Building the Case for AeroAPI v4

Going into late 2020, FlightXML was rebranded as AeroAPI. We also determined that we wanted the first new version of AeroAPI to be a RESTful interface, defined by an OpenAPI spec, and implemented using an API Gateway. We had our baseline requirements together, but we knew we would have to do some internal selling to build consensus for our proposed approach, and we would need to examine the market of API Gateway/Management providers to determine which solution would fit our needs. We started this internal promotion campaign with the sales and product teams. During our meetings with them, we presented the findings from our survey of users, discussed what we wanted the next version of the API to be, and demoed some OpenAPI features to start building consensus around our approach. At this point we discussed how we were evaluating open source and commercial API Gateways and Management solutions with these stakeholders in a non-technical way, speaking to the feedback we had received from customers and how our proposed solution and approach would solve those problems.

We also began discussions with the groups around the monetization model they wanted to employ. This had historically been a bit of a pain point for us, as our billing systems for the APIs were all bespoke and had a limited set of variability designed into them. So, if the sales team had a particular deal with a customer in mind that didn't fit into one of our billing arrangements, there was a considerable amount of engineering time or manual work required to accommodate that sale. Ideally, with our next generation platform we wanted to address their initial needs but have the flexibility to support variations in the billing model for custom deals. The sales group determined that they wanted to drive charges largely based on usage versus the prepaid plan in place for v3, but that for certain features they wanted customers to commit to some minimum monthly spend.

In parallel to the selling and exploring phase with the sales and product groups, we started examining a list of solution providers. We started with a pretty big mix of both commercial and open source offerings and looked at our basic requirements to begin narrowing the field. Our initial batch of candidates was on the order of 10 to 15 offerings, but through an examination of their published feature sets and our initial requirements, we were able to narrow that list down to the 6 top contenders. Those included two of the top API Management providers (Apigee, Mulesoft), two cloud solution providers (Azure API Management / AWS API Gateway), and two open source solutions.

In order to evaluate each solution, we created a draft OpenAPI spec for the updated API. We would use this as a baseline for our POC work with the various solutions and to begin the discussions around endpoint pricing with product and sales. Moving into the first part of 2021, we met with each of our potential solution providers, attended workshops, set up trials, and created POCs of the API with our various platform choices. Based on those evaluations, we were able to establish some metrics around the effort required to fully develop and integrate the new platform into the API. We also looked at what, if any, monetization options the platforms supported and examined the various bundled developer portals that were offered. This data was compiled into a feature comparison chart, and we began to assign dollar values to the integration development efforts required and licensing costs for each option.

Arriving at a Solution

As we narrowed down our preferred solution providers in the beginning of 2021, it was time to start engaging with the executive team to pitch our solutions and see what kind of budget we could have allocated for the project. During our evaluations, we had been impressed with the offering from Apigee, and found their monetization package's flexibility attractive. Apigee was one of the most expensive options from a licensing perspective, but we believed it would be one of the quickest to fully implement and integrate, minimizing internal development costs.

We prepared presentations on the options we had evaluated, the objectives we saw for the new API, and some proposed timelines for delivering the new product. Our product manager and I did several dress rehearsals on our presentation with our director to ensure we delivered our message succinctly and in the most impactful way. When the day came and we presented our plans to the team, we were encouraged to find that the CEO and others on the executive team recognized that while the more mature options – in particular, Apigee – had higher licensing costs than some of the cloud providers or open source solutions, there were significant development costs associated with implementing those solutions and that time to market was an essential variable as well. During this presentation, we did not make a final recommendation on platform choice but discussed our top three options and sought guidance on budget to arrive at a final recommendation within their guidelines. We were given approval to complete the exploration of our final options and embarked on that last endeavor.

The final technical explorations focused on our top three contenders. While we were excited about some of the features in our preferred open source solution, there were various limitations in their out-of-the-box offering that would have required significant development efforts to fully realize, and we eliminated that option first. This left us with Apigee and one of the cloud provider's API management solutions. The cloud provider had a very simplistic billing option allowing you to connect to a third-party payment solution and setup a hook for managing your own monetization, while Apigee’s monetization model was very flexible and could support FlightAware’s billing model. Given the pains we have had with our bespoke billing implementations, Apigee's monetization solution was particularly attractive.

I mentioned previously that we were also evaluating the possibility of using the API management solution's developer portal as an out-of-the-box solution to enhance our customer's experience. Both of our final options offered these developer portals, but the cloud solution's portal was written in TypeScript while Apigee's relied on Drupal and a custom Drupal module. Neither one would be a perfect solution for our needs, so we would need to invest development efforts in either case. In the Apigee case, Drupal is a PHP CRM solution and FlightAware's web stack contains no PHP, nor do we have PHP developers on staff. Had we proposed deploying a modified Drupal supported by FlightAware engineers, that proposal would have been rejected as that is not a direction we want to embrace for our web engineers. Therefore, if we were to select Apigee, we decided we would build our own developer portal backed by Apigee's API in React and host it on FlightAware's website.

This left us with a choice between Apigee, where we would need to leverage our web team's efforts to build a developer portal, or the cloud solution which would require some modification of their developer portal as well as building out a full billing system to accommodate our monetization model. We now updated our more detailed features comparison chart, compared the development effort and resource availability, and weighed our choices.

We ultimately decided that Apigee best fit our needs, as we were better prepared to develop a new web portal than another generation of billing code. The executive team agreed with our final recommendation, and we were given the approval to proceed with final contract negotiation with Apigee and development of the AeroAPI v4+ API. That process was undertaken during the Summer of 2021, and we launched the new API in the first week of October. There were, of course, surprises along the way during the implementation, but no major roadblocks during that implementation. An upcoming companion post, written by our tech lead, Chris Roberts, will dive into the details of the technical implementation and discuss lessons learned during that process.

The Future of AeroAPI

With AeroAPI now out the door, we've been delivering on the promise of adding more capabilities to the API on a faster release cadence. The product and engineering teams have been working with sales and marketing to determine what new features and changes need to be made to the API to address more customer needs, and they have delivered several new features this year with a host of future enhancements in the backlog. You may recall that one of our reasons for utilizing an API gateway was that it would allow us to leverage our existing endpoints in a new product. That's only part of the story, though. We are increasingly moving away from the use of Tcl for our backend infrastructure, and our API gateway is allowing us to begin replacing legacy backends with new solutions in an iterative way. One of the first examples of that will likely be our alerts configuration backend. As the API team enhances our alerting capabilities, they're going to replace that backend; we will then make use of that new backend not only for the API but also for our web and mobile products, allowing us to multiply the benefits of one team's efforts.

It is an exciting and challenging time for us at FlightAware. This project and our efforts in predictive flight data have been some of our the company's first uses of cloud offerings and third party solutions, but they have paid off significantly and paved the way for future use cases. We are in the process of replacing a number of our legacy billing solutions with a third-party platform, and are excited about how all of these changes will accelerate our ability to deliver more value to customers in the future.

The Evolution of AeroAPI was originally published in Angle of Attack.

An Inside Look Into FlightAware's Software Defined Radio

Tue, 06 Sep 2022 12:23:02 -0500

As a Hardware Design Engineer on the ADS-B team at FlightAware, Ziquan Wang is responsible for maintaining and designing the current and future SDR hardware of FlightAware’s ADS-B receiver network. In addition, he is currently pursuing an MBA at Rice University.

As a Senior Software Engineer on the ADS-B team at FlightAware, Eric Tran contributes to the software development and growth of FlightAware’s terrestrial ADS-B receiver network.

Summary

In this blog post, Ziquan and Eric discuss the hardware and software architecture of the software defined radio that powers our ADS-B receivers.

Hardware Architecture

Radio Frequency (RF) signals broadcasted from nearby aircraft get filtered and processed by several components within the Software Defined Radio (SDR). We are going to go through the hardware architecture of the SDR by following how the signal flows through the hardware device.

The entry point of the RF signals is the antenna. FlightAware has two official antennas that we provide with our ADS-B receivers. The first is an outdoor-capable antenna that has high gain and is tuned to maximize the reception of signals at 1090MHz. The second antenna is what we call a small coil antenna, which is what we ship with our new PiAware ADS-B Kit. It has less gain and is intended to be used indoors.

The signal passes through the RF filter, which is responsible for filtering out unwanted RF energy, also known as “noise.” Even the highly tuned outdoor antenna can still pick up a good amount of noise, especially if the ground station is in a big city, where the RF energy from cellular towers and local TV stations can easily saturate the LNA if the signals are not filtered first. We supply an external filter that has low insertion loss and the best out band rejection among all our product offerings.

A Low-Noise Amplifier (LNA) is an amplifier that is designed to amplify the signal without adding significant noise to the signal chain. A good LNA can increase the coverage and message rate of the SDR. This was proven when we added a on-board LNA to the standard RTL-SDR to make FlightAware ProStick. Later we released the ProStick Plus that had an on-board filter after the LNA. This was designed with the intention of minimizing the impact of signal loss from the filter. This product is suitable for people who live in areas with low RF noise environment (e.g., rural areas).

We are working on an alternative ProStick Plus model that has a high-performance filter before the LNA for people who want to minimize cost and place their ground station in an RF noisy environment. Below are some diagrams of the results we found during the prototyping of this new model. We developed a test setup to scan the frequency response of our products and prototypes. It measures the strength of signal, noise floor, and interferer to calculate the signal-to-noise ratio (SNR).

The interferer below in red is identified to be the spikes picked either externally or internally. It can be the emission from the clock lines or the imperfection of RF generator.

The most important factor among these graphs is the signal-to-noise ratio (the black line). When the test subject receives the sweeping signal from low to high frequency at equal power, the higher SNR means the better reception at that frequency. For ProStick Plus, there are some rejections outside its passing band. For the new alternative ProStick Plus model, it has significantly better rejection outside passing band. For the Pro-Stick paired with External Filter case, the rejections below 900MHz and above 1200MHz are almost perfect.

The signal then passes through the signal mixer. FlightAware Pro-Sticks use the R820T2 Integrated Circuit (IC). The R820T2 will soon be replaced by the R860, which is almost identical to R820T2. This IC has a frequency mixer and variable frequency generator to convert any frequency signal in the reception band to a standard intermediate frequency signal. Wikipedia has very good articles and pictures that detail the principle of operation here.

We are currently exploring other receiver ICs that can bring better analog RF performance and provide two receiver channels to our ground station offering.

The intermediate frequency is then sampled by an Analog-to-Digital Converter (ADC). In our current RTL-SDR based design, we use an 8-bit ADC in the RTL2832U IC, which samples at 28.8MHz. After being resampled, the data is delivered to the Raspberry Pi through its internal USB2.0 controller. We are exploring improvements to the ADC to see if we can improve signal reception of the ground station. The preliminary data seems to be promising. When the RF input is terminated and all gain settings are dialed down to 0, the raw ADC data was collected from the Pro-Stick and prototyped with a 12-bit ADC. Results show the voltage variation range of 12-bit ADC is within one bit of 8-bit ADC. One incrementation of 12-bit ADC is 0.49mV and one incrementation of 8-bit ADC is 7.8mV.

Voltages of Raw ADC Counts from 12bit and 8bit ADCs

This suggests that the 12-bit ADC can have much less noise floor and possibly better resolution than the 8-bit ADC. Additionally, a higher, variable sampling rate will give our hosts more freedom to explore what combination of bit-width and sampling rate is the best for their application.

Through theoretical calculation, the data rate from USB2.0 is not enough to sustain the raw ADC data rate at higher sampling frequency. We are going to introduce a USB controller capable of USB3.0 protocol. We are trying to release new in-house designed hardware to bring better performance than RTL-SDR-based hardware. The global supply chain shortage has been slowing down this process.

Once the analog signal has been converted by the ADC, it’s then sent to our demodulation software.

Software Architecture

In our last ADS-B blog post, we introduced dump1090-fa, FlightAware’s fork of the original dump1090 software. It is used to interface with RTL-SDRs to demodulate and decode ADS-B, Mode-S, and Mode A/C signals. We’ll dive a little deeper to understand how the demodulation software works. It may be beneficial to read about Mode-S and ADS-B to understand the data formats and what information we can extract from them. The 1090 Megahertz Riddle is a good primer.

Data acquisition

dump1090-fa spawns a background thread that is responsible for data acquisition. This thread asynchronously reads samples from the SDR and pushes that data into a shared FIFO queue. The main thread consumes and processes the data from this queue. Because both threads are simultaneously reading and writing to the same memory buffer, it is protected with a mutex to avoid any potential race conditions.

Data processing

There are two steps in processing the sample buffers: demodulation and decoding.

Demodulation extracts the original information from a carrier wave into a stream of bytes. The main thread of dump1090-fa consumes the data from the shared FIFO queue and sends the sample buffers to the demodulator. The demodulator scans through the input buffer for an 8 µs preamble for a series of pulses with specific timings known as the Mode-S preamble (see diagram below). If a preamble is found, it demodulates the remaining data block and conducts some extra validation to determine if this is a valid Mode-S message.

At this point, the Mode-S message has been demodulated into a stream of bytes to be decoded. The first 5 bits of the data block are known as the format code, which we can use to determine how to process the rest of the message. We are particularly interested in 112-bit Extended Squitter messages, which is used by Automatic Dependent Surveillance-Broadcast (ADS-B) systems to transmit position, velocity, identification, etc. information. These messages have a Downlink format code of 17 (DF17). The full breakdown of ADS-B message bits is shown in the table below.

Here is an example of a raw ADS-B message and the aircraft information decoded from it.

RAW message:

8da21c4c5815a206490d97d23613

Decoded message:

CRC: 000000 
Score: 27 (DF17_KNOWN) 
Time: 623188.92us 
DF:17 AA:A21C4C CA:5 ME:5815A206490D97 
Extended Squitter Airborne position (barometric altitude) (11) (reliable) 
  ICAO Address:  A21C4C (Mode S / ADS-B) 
  Air/Ground:    airborne 
  Baro altitude: 3250 ft 
  CPR type:      Airborne 
  CPR odd flag:  even 
  CPR latitude:  33.03680 (66340) 
  CPR longitude: -97.00889 (69015) 
  CPR decoding:  global 
  NIC:           8 
  Rc:            0.186 km / 0.1 NM 
  NIC-B:         0

Data streaming

Dump1090-fa processes the aircraft data into a variety of different formats. It creates several listener ports that clients can connect to for sending and receiving the data. To take a closer look at some of the available output data, we will be using the network utility, Netcat, to open a TCP connection on each port. If you have a PiAware running dump1090-fa, you can follow along on the command line to view the output on your local receiver.

Port 30002 outputs the raw aircraft data in hexadecimal format. We can see this data by entering the following command:

pi@piaware:~ $ nc localhost 30002 
*200003BDC92229; 
*02818316F69033; 
*02618218DCB383; 
*0281839CF04118; 
*02E19A1F53AFC4; 
*02059422ED6FFD; 
*5DA6976129386D; 
*02A1853C97BFE4; 
*8DA2C90DE11A0F00000000F90A57;

Port 30003 outputs aircraft data in BaseStation format.

pi@piaware:~ $ nc localhost 30003 
MSG,7,1,1,ABFE12,1,2022/04/02,01:56:36.924,2022/04/02,01:56:36.977,,2250,,,,,,,,,, 
MSG,7,1,1,AA4A90,1,2022/04/02,01:56:36.929,2022/04/02,01:56:36.979,,2175,,,,,,,,,, 
MSG,6,1,1,AA295D,1,2022/04/02,01:56:36.930,2022/04/02,01:56:36.979,,,,,,,,1511,0,0,0, 
MSG,3,1,1,AD7828,1,2022/04/02,01:56:36.941,2022/04/02,01:56:36.981,,1800,,,32.91406,-96.91354,,,0,,0,0 
MSG,7,1,1,ABFE12,1,2022/04/02,01:56:36.943,2022/04/02,01:56:36.981,,2250,,,,,,,,,, 
MSG,5,1,1,A22B28,1,2022/04/02,01:56:36.945,2022/04/02,01:56:36.982,,10025,,,,,,,0,,0, 
MSG,3,1,1,AB674B,1,2022/04/02,01:56:36.958,2022/04/02,01:56:36.984,,21375,,,32.81689,-96.14262,,,0,,0,0 
MSG,7,1,1,ACAB11,1,2022/04/02,01:56:36.964,2022/04/02,01:56:36.985,,15550,,,,,,,,,, 
MSG,7,1,1,AA15CC,1,2022/04/02,01:56:36.966,2022/04/02,01:56:36.985,,1650,,,,,,,,,, 
MSG,4,1,1,AB674B,1,2022/04/02,01:56:36.968,2022/04/02,01:56:36.986,,,475,83,,,640,,,,,0

Depending on what type of format a client application needs, they can make TCP connections to the appropriate ports to stream the data. Notably, our piaware client connects to Port 30005 for aircraft data in beast binary format to process and upload to FlightAware. The table below shows all the available ports and their respective data formats.

Aircraft Tracking

Dump1090-fa also generates several JSON files with information about the receiver, current aircraft detected, and general decoder statistics. These files are consumed by the SkyAware map interface to display the aircraft data the receiver is receiving in real time. These files get written to the temporary runtime directory /var/run/dump1090-fa/. Some sample output of aircraft.json below.

{ 
  "now": 1649040909.2, 
  "messages": 30451161, 
  "aircraft": [ 
    { 
      "hex": "aa6a98", 
      "alt_baro": 8950, 
      "alt_geom": 9275, 
      "gs": 288.1, 
      "track": 64.9, 
      "baro_rate": -1344, 
      "lat": 32.46446, 
      "lon": -97.035006, 
      "nic": 8, 
      "rc": 186, 
      "seen_pos": 2.1, 
      "version": 0, 
      "nac_p": 8, 
      "nac_v": 1, 
      "sil": 2, 
      "sil_type": "unknown", 
      "mlat": [], 
      "tisb": [], 
      "messages": 19, 
      "seen": 1.9, 
      "rssi": -21.4 
    }, 
    { 
      "hex": "aaec9c", 
      "flight": "AAL2190", 
      "alt_baro": 3200, 
      "alt_geom": 3100, 
      "gs": 200, 
      "track": 179.4, 
      "baro_rate": 1280, 
      "squawk": "2355", 
      "emergency": "none", 
      "category": "A3", 
      "nav_qnh": 1008.8, 
      "nav_altitude_mcp": 16992, 
      "lat": 32.837436, 
      "lon": -97.030577, 
      "nic": 8, 
      "rc": 186, 
      "seen_pos": 0, 
      "version": 2, 
      "nic_baro": 1, 
      "nac_p": 9, 
      "nac_v": 1, 
      "sil": 3, 
      "sil_type": "perhour", 
      "gva": 2, 
      "sda": 2, 
      "mlat": [], 
      "tisb": [], 
      "messages": 407, 
      "seen": 0, 
      "rssi": -16.7 
    }, 
    { 
      "hex": "a8aa87", 
      "flight": "AAL1966", 
      "alt_baro": 12025, 
      "alt_geom": 12375, 
      "gs": 300.6, 
      "track": 44.1, 
      "baro_rate": -1280, 
      "squawk": "7404", 
      "emergency": "none", 
      "category": "A3", 
      "nav_qnh": 1008, 
      "nav_altitude_mcp": 4000, 
      "lat": 32.74098, 
      "lon": -97.214318, 
      "nic": 8, 
      "rc": 186, 
      "seen_pos": 0.4, 
      "version": 2, 
      "nic_baro": 1, 
      "nac_p": 9, 
      "nac_v": 1, 
      "sil": 3, 
      "sil_type": "perhour", 
      "gva": 2, 
      "sda": 2, 
      "mlat": [], 
      "tisb": [], 
      "messages": 436, 
      "seen": 0.2, 
      "rssi": -17.2 
    } 
  ] 
}

More details about these JSON data can be found on Github.

Conclusion

We hope this post gave you some insight into the software defined radio that powers our ADS-B receivers. We have designed the FlightAware Pro-Stick and Pro-Stick Plus to be one of the best SDRs on the market and are continuing to iterate and innovate on the designs for our next generation of FlightAware hardware products.

An Inside Look Into FlightAware's Software Defined Radio was originally published in Angle of Attack.

Blast from the Past: 2022 Summer Intern Projects

Mon, 22 Aug 2022 15:12:47 -0500

By Gabrielle Toutin with Foreword by Chadd Mikulin.

This month, we welcomed five new interns from all around the country to the 2023 FlightAware Summer Intern Program. They'll follow in the footsteps of previous intern classes, creating new and interesting ways for users, both internal and external, to interact with FlightAware's immense aviation dataset. We'll have a post later this summer that discusses all their work, but first, we wanted to remind of all the great accomplishments of our intern class of 2022. -Chadd Mikulin, VP Engineering

‌

This summer, FlightAware welcomed 9 students from universities across the country to work as interns on projects spanning 7 FlightAware teams. They gained experience and insight into professional software development and grew their technical abilities while also participating in team building events, learning sessions presented by FlightAware engineers, and regular demo sessions. This year, we invite you to get to know them and what they accomplished over each of their internships.

Paul Kuruvila

I’m Paul Kuruvila, a Computer Science major in my senior year at the University of Houston, and the Web team intern for the Summer of 2022. I aspire to be a full stack developer, and I am grateful for the opportunity to begin my career path at FlightAware.

During my time here, I have received an overwhelming amount of support from my mentor, manager, and everyone else that I had the privilege of working with. At FlightAware, it truly feels like everyone is working as part of one team.

In the first couple of weeks, I worked through various introductory exercises that helped me familiarize myself with the technology stack, and I got acquainted with technologies I had not yet had much prior exposure to. Specifically, the exercises helped me to improve my skills in languages such as HTML, CSS, JavaScript, and Tcl, along with tools such as vim, the Linux/Unix terminal, and, later, several AWS products, which were vital to the completion of my project. I believe the skills that I was exposed to during my internship will be useful for the rest of my career as a software engineer.

As part of the Web team, I was challenged with the task of extracting FlightAware’s SkyAware Anywhere service from within its monolithic website infrastructure – also known as fa_web, and essentially where most of the FlightAware website code is contained – and having it run as a standalone service instead. Given the nature of remote work, I was initially concerned that I would have trouble getting help with any difficulties I faced. However, my mentor was almost always available and willing to help. Not only was I given the freedom and trust to get my work done, but I was encouraged to share my thoughts on how we could possibly improve the user experience for SkyAware Anywhere.

My Project

SkyAware Anywhere is a service based on the web interface for the FlightFeeder and PiAware ADS-B receivers that enables users to view live aircraft data as it’s being tracked by their devices. Rather than being restricted to only having access to the web interface on the local network, as is the case with PiAware SkyAware and FlightFeeder SkyAware, SkyAware Anywhere is accessible from any network. This project is open source and available to view on FlightAware’s public Github repository, dump1090.

Beginning with the background of my project, the existing version of SkyAware Anywhere was hosted within FlightAware’s fa_web infrastructure. This was mostly done so that we could make use of existing functions housed in the website codebase. However, the integration of the service with the newly implemented Docker-based release processes was not inefficient, so it made more sense for SkyAware Anywhere to run completely independently. Thus, the first of my tasks was to move away from everything that was currently dependent on Tcl, which led to me removing a Python rendering script and Jinja template file that were functioning together to generate the HTML with the proper versioning for the assets. To accomplish this, I first rendered that HTML file by making use of the Python script, and then worked with the resultant HTML file from then on, later deleting the Python rendering script and Jinja template.

Python script calling TCL procs

Now working with the HTML file, I needed to get the aircraft data to load properly for the page. This required me to replace the then current data fetcher in the dump1090 abstract_data branch with our socket-specific data fetcher within fa_web and change the XHR call to hit the receiver data endpoint that resides within fa_web.

For the UI logo, I needed to create a copy of an existing file, status.json, from within fa_web to the dump1090 repository. The purpose of this file was to tell our service to load the UI logo for the SkyAware Anywhere case as opposed to the PiAware SkyAware and FlightFeeder SkyAware local cases.

Once those issues were resolved, the next step was to create a build and deploy process for the service using Amazon Web Services to host the content and Github Actions for the build and deployment pipelines.

The three main components of AWS that served as the backbone for my project were Amazon’s S3 bucket, CloudFront content delivery network, and Route 53 service. Before I could get started on working with AWS, I needed to coordinate with the Operations team to have an account and S3 bucket created for SkyAware Anywhere. In similar fashion, a CloudFront distribution was also needed to host the S3 bucket. The S3 bucket at its core is a cloud storage unit, and I began testing the functionality of its integration with the CloudFront distribution by manually dumping the content of SkyAware Anywhere into the bucket and checking to see if the auto-generated CloudFront domain was properly displaying it.

Following up on that, I started looking into automating the deployment process to the S3 bucket and decided to utilize the AWS CLI (Command Line Interface) to access and manipulate the content within the S3 bucket.

For the housekeeping task of bundling assets, I decided that webpack was the tool for the job. Configuring webpack properly for SkyAware Anywhere was a challenge, but I was eventually able to come to a solution. We then decided to utilize Github Actions to automate the bundling process and deploy the bundled content to our established SkyAware Anywhere S3 bucket and CloudFront distribution. From that, I was able to gain experience in creating workflows for Github Actions as well as in working with YAML files.

After I confirmed that everything was functioning properly on the CloudFront auto-generated domain, I coordinated with Ops to set up a domain called skyaware.flightaware.com for the CloudFront distribution using Amazon’s Route 53 service. This was also something I thought would be able to resolve a CORS error I received when trying to hit the receiver data endpoint within fa_web, but I later realized that I had to make some changes to my XHR request to the receiver data endpoint as well as in the code for the endpoint before the subdomain was able to fetch the aircraft data properly. Once I had figured that issue out, my project was essentially finished.

I faced a lot of obstacles as I progressed through my project, but I always received the help I needed to overcome them. Beyond the scope of my project, I got to work with various relevant technologies, and I have developed into a significantly better software engineer as a result. I had a great time interning at FlightAware, and I am grateful for everyone that I had the pleasure of working with.

Seth Morley

‌ I am a Senior at Texas A&M University pursuing a bachelor’s degree in Computer Science and minor in Business and Cybersecurity. After flying a Cessna 172 for the first time almost 10 years ago, I have been interested in aviation and flying. Due to this personal interest, I wanted to apply my technical skills towards the aviation industry, and my FlightAware internship has provided me such an opportunity.

My experience throughout the internship has been amazing. Working and learning with FlightAware professionals has brought me experience that will help me throughout my career. Knowing that the work I am doing is important, especially when it is something that will be used every day, has been motivating and rewarding.

Throughout my internship and while working on my project, I have been exposed to and learned about many different tools and languages. I am very glad that I had the opportunity to work with such tools as Docker, Nix, and Grafana. I am looking forward to using what I have learned and applying it to my future.

My advice to future interns is to always ask questions, to seek knowledge, and learn as much as you can.

My Project

My Summer project targeted improvements to Aircraft Delay Detector (ADD), an existing FlightAware program initially created by Karl Lehenbauer, co-founder and former CTO of FlightAware, who has since served in an advisory role following FlightAware’s acquisition by Collins Aerospace. FlightAware Software Engineer Jack Gleeson subsequently added functionality as well as modernized and upgraded ADD to current FlightAware standards.

When I started working on the project, ADD identified delayed flights by discerning when an aircraft travelled in a circle. My project, on which I collaborated with Jack Gleeson and FlightAware Software Engineer Grant Larson, sought to improve ADD by more accurately identifying when an airplane was flying in a holding pattern.

Holds may be flight delays, but delays aren’t always holds.

Different types of delays can be flown by aircraft; however, my project focused on detecting aircraft flying in a formal holding pattern. An example of a holding pattern can be seen in the first image below. Holds can occur due to many factors such as weather or congested airspace. These holds are important because aircraft could spend over an hour flying in a hold waiting for improved conditions or clearance for airspace. As noted in FAA directive 8260.3E, “efficient and economical use of airspace requires standardization of aircraft entry and holding maneuvers” [1]. Other forms of delays can be 360° turns where the aircraft flies in one or more circles, or when an aircraft turns off - and then back onto - its course. The second example below shows a circle that is not a hold.

Figure 1. Example of a typical holding pattern

Figure 2. Example of a circle which is not a hold

Although holding patterns are specific, regulated shapes, they are not always identical. The holding pattern and airspace that an aircraft flies in is determined by the aircraft’s location, maximum holding airspeed, and altitude. Air Traffic Control (ATC) can also assign an aircraft with unique uncharted holding instructions [2]. A standard holding patten is illustrated in the figure below.

Figure 3. Image showing structure of a holding pattern [3]

‌Being able to detect holds is very important because we can learn a lot about current flight operations as well as what could happen in the future. Holds will almost always result in delays for the aircraft and can also give insight into possible delays, not only for one specific aircraft but also for other aircraft traveling in the same area or to the same destination. Whether delays are caused by weather, congested air traffic, or any other reason; knowing if or how long your aircraft may be holding can lead to both time and money-saving decisions.

How does Aircraft Delay Detector (ADD) work? ADD receives information input from controlstream, which is the data feed that contains many different types of messages. The primary types of messages that ADD reads are position messages. Position messages contain information including an aircraft’s speed, altitude, heading, location, and the timestamp of the message. ADD analyzes thousands of these messages every second to detect each aircraft that is delayed. When a hold is detected, ADD emits messages to holdstream, which is the data feed which contains all the holds detected by ADD.

Improving ADD When I started working on ADD, the program could detect when aircraft were flying in any sort of circle pattern or 360° turn. My work on ADD focused on improving the program’s ability to discern between any 360° circle and a standard holding patten as illustrated above.

Using the limited information from position messages, I was able to improve the accuracy of hold detection through the use of several deterministic methods. Specifically, ADD was modified to discern between the standard “racetrack” hold pattern and “non-hold” circle patterns. These changes allowed ADD hold detection to accurately detect over 90% of holds, which included the ability to detect non-holding delay tactics, such as those used to add space between two aircraft that are approaching the destination airport. As a result, ADD can effectively identify aircraft in hold patterns while also providing information regarding additional delays.

A Flask application in Python primarily developed by Jack Gleeson serves as a complementary program and visual verification tool where users can review holds detected by ADD. Using these tools, our ultimate goal is to build large datasets to train a predictive model for detecting and predicting holds more accurately than the heuristic approach.

Jacob Pan

Hi everyone! My name is Jacob Pan, and I’m a software engineer intern on the Backend team for FlightAware.

I’m currently a rising Junior at Rice University majoring in Computer Science (to no surprise), and a fun fact about me is that I am the oldest out of 5 kids. I’ve really appreciated the opportunity FlightAware has given me to intern here, and it’s been a blast!

I really enjoyed my project as I was able to use both new and familiar technologies. I’ve never used Flask but was interested in using it as I consider Python to be the language I am most well-versed in, and my project gave me a great opportunity to use it in a real world application.

I also really enjoyed being able to interact with my coworkers on the Backend team often, from questions for my mentor (thanks, Paul!) to having fun times playing Don’t Starve Together with the whole Backend team. It was clear that the whole team was overall a chill and smart group that liked to have fun but also work on challenging problems. I also enjoyed participating in the hackathon, as working with others to create a “gimmicky” project using FlightAware’s vast data set was fun. As for advice to potential future interns: you should keep being curious throughout your internship.

My Project

With alliances, FlightAware is constantly trying to find new tools to utilize, from Nix to Flux; I’ve learned a lot from attending various competency meetings and watching recordings that FlightAware has for showcasing new tools. FlightAware is always innovating, and having the chance to glimpse what’s new is exciting and cool to hear and see – perhaps even implement!

I had one main project, which was to create a working demo for creating and displaying FlightAware alerts using the AeroAPI interface on a lightweight web application. These alerts are used to tell the user of certain events for the flight they configured the alert for. A typical website user, for example, might configure an email to be sent to them when their flight arrives at an airport. My task was to use AeroAPI’s alerts features to show potential customers how to effectively create a web application that would allow them to create alerts and serve as an endpoint for receiving alerts directly. Both the configured alerts and the delivered messages would be stored in a database and showcased on the frontend.

‌For the technical specs, Flask was used as the backend server, while React JS was used to host the frontend where the user would see the web application. For storing alerts and triggered alert notifications, SQLite was used, and finally Docker was used to containerize the services. Some features I was able to add included the ability to delete and edit alerts on its table on the frontend - it would send a request back to the backend which would then send a request to AeroAPI, along with updating the SQLite database accordingly.

Overall, the project is meant to serve as an example for customers that will enable them to utilize the power of AeroAPI and FlightAware much faster!

Samantha Turnage

‌Hello! This Summer, I was given a great opportunity to work at FlightAware as an intern on the Mobile iOS team. I am a Senior at Arizona State University studying Computer Science and plan on graduating with a bachelor's degree in December of 2022. Throughout college, I've found myself interested in many technical areas, but I have truly enjoyed iOS development. I was excited to get a chance to work with FlightAware because I knew I would get to learn more about iOS development and aviation at the same time while also being at a smaller company. Not only has the work experience through this internship been great, but I've also gotten to know such kind and talented people that were always willing to help set me up for success.

My Project

I've been working on implementing a basic set of ADS-B receiver stats in the mobile iOS app throughout the Summer. This consisted of transforming the single-page receiver stats screen into multiple screens of information, some of which can even be edited in the app rather than solely on the web. The primary motivation was to provide an additional enticement for users to own ADS-B feeders and make the experience of accessing the whole FlightAware ADS-B feeding easier.

For anyone not familiar with it, ADS-B is a technology that allows an aircraft to determine its position using satellites (GPS). The position is then broadcast over radio waves (on 978 and/or 1090 MHz), where these radio signals can be received by anyone with an appropriate radio.

This data has also been made accessible with open-source software called PiAware. This software can be used with any Linux-based computer (such as a Raspberry Pi) equipped with a software-defined radio to make an ADS-B receiver. These receivers collect ADS-B data and send it back to FlightAware’s servers, where it’s combined with data from other sources to provide real-time flight tracking. Currently, customers with receivers have a statistics web page on their profile under the ADS-B tab. This page is full of information related to each receiver, where each has data provided such as location information, feeder type, feeder status, etc. Additionally, numerous graphs with coverage data, hourly collections, aircraft/positions reported, and user rankings exist. For more information, FlightAware provides a brief overview of ADS-B on their website.

Currently, in the iOS app for users with no receivers, there is a page with a link that redirects to set up a new receiver and another to request one from FlightAware. Another page exists to manage receivers for those who have them, and it lists out each of them with their names and provides a link to the statistics page on the web.

‌To kick off the project, I first transformed the page that users with no receivers see. That page still has the same functionality but contains more detailed information about building and hosting a FlightFeeder.

Once that was completed, it was time to start working on the new content for users with receivers. The first page displays information about the highest user ranking achieved in the last 30 days and when a receiver was first set up. Right after that, the user has a list of receivers and two links: the first link redirects the user to set up a new ADS-B receiver and the second one allows them to view the SkyAware Anywhere page.

Lastly, there is an interactive map that displays the set location of each receiver. Individually, each receiver cell shows its name, type, and health indicators related to the device, its connection to FlightAware servers, and MLAT. Tapping on the cell can reveal more detailed information, bringing up another page containing specific data.

At the top of that page, there is another site information section. The receiver’s name can be edited by tapping on the cell; saving it will also reflect those changes on the web. Underneath that is everything related to the location of the receiver. In that section, the editable data is the location and the nearest airport to the receiver. If the user clicks on the location cell, it will take them to a separate map view where they can set the latitude and longitude. If the user clicks on the nearest airport cell, an airport selector will show up where users can search for an airport name or city. The bottom half of the receiver page has a button that redirects to the web version of the statistics page, followed by a button that sends device commands to the receiver. The last item to be seen is a graph showing the positions reported by distance from that receiver in the previous 24 hours.

Overall, this project was enjoyable to work on, but that does not mean it was not challenging. I got some excellent practice navigating through code I had never seen before, communicating with other teams, and further challenging my skills in Swift. I think that the most difficult part was learning how to incorporate new code into the existing code without messing everything up. Luckily, I had a great mentor to help me through that. Thank you to everyone at FlightAware I met and worked with; it's been a pleasure.

Kelton Bassingthwaite

I am in my final year of a Computer Science Bachelor’s degree at Eastern Washington University in Spokane, Washington. As an intern in the Systems Wing at FlightAware, I have really enjoyed learning about the company and the work my team does. I am interested in development and operations, so this team was a great fit for me. After graduation, I would like to work as a Systems Engineer and would love to return to FlightAware as a full-time employee.

My Project

This Summer, my project was to work on a way to ensure our data center infrastructure management software, Netbox, stays up to date. This has been a very interesting project and I've learned about managing switches, Kubernetes operators, and Nix.

The scope of the project is relatively small compared to everything that is stored in Netbox. Some categories, like Circuits, describe physical aspects and cannot be automated. To start, only custom fields on docker-enabled Linux servers will be automated. Examples of these custom fields include: the CPU model, core count, and speed; total memory; and total storage. As a rule of thumb, information stored in Netbox should be relatively static. After all, it's not monitoring software.

The project is split into three parts: agents, a queue, and consumers. The agents are responsible for collecting, formatting, and sending the data to the queue. The consumers pull messages off the queue, match the host to the Netbox ID, and update the custom fields if needed. Each of these services communicate using a standard json message format, which is also split into three parts.

The first part, metadata, contains information about the message; most often, this is populated by the consumer if it needs to re-queue the message. The next part contains the custom fields. They are a set of key-value pairs and contain the information that gets updated in Netbox. The last and only required part contains the information needed to match the physical device with the corresponding Netbox entry.

One benefit of this architecture is that agents can send arbitrary custom fields to the queue, and the consumers will smoothly handle them. If a custom field is not in the message, then it won't be updated. Conversely, if a custom field is in the message, but not in Netbox, it will be logged but not added. This also allows the consumer to work with completely different clients simultaneously. For example, a future version could introduce a FreeBSD agent without changing the consumer.

As for deployment, both the agent and consumer are containerized using Nix. Then the agent is deployed using salt and run daily using cron. On the other hand, the consumer is deployed into Kubernetes using flux. The queue is also deployed in Kubernetes using a combination of flux and the RabbitMQ Kubernetes Operator. Furthermore, because of how the queue is deployed, other projects at FlightAware can easily deploy their own using the Kubernetes operator.

Xander Hirsch

Hi, I am Xander and a Senior at Harvey Mudd College majoring in Computer Science. My internship at FlightAware was a valuable experience in professional software development. This summer, I had the opportunity to design and develop a new software project, both of which which proved to be valuable experiences in evaluating tradeoffs of various design principles. In addition, I was able to adapt my prior software development experience to be an effective member of an Agile software development team.

My Project

‌Background: HyperFeed ingests FlightAware’s data sources to produce a single, synthesized view of flights, which is subsequently used by FlightAware’s flight tracking tools and data services. The HyperFeed simulator is a related tool which allows FlightAware employees to understand why HyperFeed produced a particular output and evaluate how proposed changes to the HyperFeed data processing engine would affect output when given the same input.

A HyperFeed simulation is started with a command ran on the HyperFeed simulator server and is then stored in a PostgreSQL database. The simulation results include the simulated flight data output and debugging information, which explains how the flight data output was produced from the given input sources. Most tasks related to HyperFeed simulations can also be accomplished through a web server, which takes requests from clients over WebSocket connections and then forwards the requests as the appropriate SSH or PostgreSQL commands to the HyperFeed simulation server or the simulation result database. This sim run server, distinct from the HyperFeed simulation server, is currently written in Python 2. The goal of this project is to rewrite the existing web server in Python 3.

Motivation: One key area of improvement from Python 2 to Python 3 are the tools available to ensure the correctness of the program. Python 3 has optional type annotations which allows tools like mypy to provide static type analysis. Type checking provides a way to detect potential errors introduced by variables whose actual data type does not match an expected data type. Python is a dynamically typed language and is flexible with type compatibility; however, this flexibility is a disadvantage when building an entire program where differing types may be compatible in some circumstances and not compatible in others.

Another common way to demonstrate the correctness of the program is through unit testing. The Hypothesis library for Python 3 allows developers to write property-based tests, which are a more effective form of unit testing. Hypothesis will generate arbitrary inputs for unit tests and the developer provides assertions which will be true in any circumstance. This form of unit testing often finds more bugs in the implementation than the developer trying to enumerate all the possible edge cases manually.

Rewriting the program in Python 3 also allows FlightAware to build and deploy it with Nix and Docker. Nix is a relatively new tool used at FlightAware that enables deterministic program builds. Python is an interpreted language which does not need to be built. However, Python dependencies may have C source code dependencies themselves, which do need to be built. Other Python package management tools such as pipenv and pip-tools can provide reproducible Python package dependencies, but these package management tools cannot provide reproducible C source code dependencies. Additionally, building a Docker image for the sim run server will allow FlightAware to easily deploy the sim run server anywhere.

Project Goals and Considerations

• Goal: The first version of the Python 3 WebSocket server will behave the same as the Python 2 version. Consideration: No new features will be included in the initial release. However, the code should be organized in such a way that new types of client requests can be added easily in the future.

• Goal: The WebSocket server can be used efficiently by multiple users at the same time. Thus, our implementation must enable client requests to be fulfilled concurrently. Consideration: The server spends most of the time fulfilling a request by waiting for an external resource response such as a PostgreSQL query or SSH command. Asynchronous programming is the most effective way to concurrently handle client request when most of the time is spent waiting.

• Goal: We need to test the code to ensure correctness and pinpoint bugs. This will entail unit, integration, and point to point tests. Consideration: We need a way to isolate components or subset of components for testing purposes. This can be achieved with loose coupling where each component can be instantiated and used independently. The program structure must be written in a way that minimizes dependencies on other components. For example, the WebSocket server component is only responsible for communicating with clients. It has no notion of the SSH or PostgreSQL components. Instead, it knows where to send client messages for further processing.

• Goal: When a client sends one or more requests and disconnects before the server responds, the server will cancel the pending responses for the client. Consideration: The server will need to keep track of tasks and the associated clients. However, some tasks such as a new sim run will continue even if the requesting client disconnects.

• Goal: Notify all connected clients when a sim run job is added, updated, or deleted from the list of all sim run jobs. Additionally, several clients can request the same sim run and all those clients will be sent the simulation data when it is available. Consideration: The WebSocket server will need to keep track of all connected clients. Additionally, the component responsible for executing new sim runs needs a way to keep track of which clients have requested a pending sim run.

An Aside on Asynchronous Programming in Python Concurrency, in the context of our WebSocket server, is when the server fulfills multiple requests in the same time span. Asynchronous programming is an effective way to achieve concurrency when most of the time taken to fulfill a client request is spent waiting for an external resource. Python’s asyncio library provides a way to schedule client requests to execute concurrently by wrapping the handler code for each client request in a Task. When one request wrapped in a Task reaches a point of waiting, such as a PostgreSQL query, the Python interpreter puts that Task’s execution aside and runs another Task until that task reaches a point of waiting. The program waits until any Task is runnable again, then continues a runnable Task’s execution until completion or the next waiting phase.‌

Program Structure Components

• WebSocket Server – The WebSocket server is the primary interface for client communication. This component is responsible for parsing messages from the client, forwarding valid messages for further processing, and notifying the mediator of client disconnects.

• Dispatcher – This component is ultimately responsible for the server’s handling of client requests. The dispatcher and WebSocket client handler indirectly communicate through an asyncio.Queue. The dispatcher puts the response on the queue for the WebSocket server to eventually send to the client. The dispatcher has child components for the SSH and PostgreSQL connections. These components handle communication with the HyperFeed simulation server and database.

• Task manager – The server needs a way to keep track of pending client request tasks for two reasons. First, the Python garbage collector can collect unreferenced asyncio Tasks, so keeping a reference to the task ensures the task runs to completion. Second, the server needs a mechanism to cancel client tasks when the client disconnects with tasks pending. This prevents wasteful execution of unneeded tasks.

Design The primary design goal of this project is to write each component for a single purpose, which will make the code base maintainable and testable. However, writing each component for a single purpose opens a new challenge on how to compose the collective components in a way which offers the overall functionality of the program. The mediator design pattern, which is one of many design patterns offered by Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, addresses this scenario. This design pattern suggests that components should not address each other explicitly, but rather act through a central mediator. The mediator knows all the child components, but each child components do not know of each other.

Figure 2: Sequence Diagram for Valid Request

The sim run server utilizes the mediator design pattern by providing a mediator object which separates the WebSocket server, dispatcher, and task manager. The relationships between the classes are shown in Figure 1. The WebSocket server has a reference to the mediator and utilizes this interface to forward requests from the client and notify the mediator when a client disconnects.

The sequence diagram in Figure 2 shows an example of how the sim run server would process a valid request from a client. These steps also illustrate how the mediator delegates the high-level steps for a client request to be processed and defers the request-specific steps to the dispatcher. The mediator performs three steps for every client request. First, the message is parsed with the parse_request() method provided by the dispatcher. Next, if the client request is parsed successfully, the mediator schedules the request to be fulfilled concurrently. The mediator does this by wrapping the dispatcher’s handle_request() method with the request as an argument in a new asyncio.Task. Finally, the mediator registers the new Task with the task manager. The mediator’s role allows the WebSocket server and dispatcher to not directly communicate. The cause-and-effect relationship between the WebSocket server receiving a client request and the dispatcher fulfilling the request is merely consequential because of the mediator standing between the two key components.

The loose coupling of components through the mediator enables us to write maintainable and testable code. There is no code in the WebSocket server class related to message handling beyond a simple call to the mediator. Likewise, the dispatcher does not need to understand how the request messages arrived. One notable exception to the idea that components communicate through the mediator is that the dispatcher sends response messages back to the WebSocket server through an asyncio.Queue. However, this does not break the broader principle that the components are loosely coupled. The dispatcher and WebSocket server share knowledge about a queue object, not each other. This indirect communication path was chosen due to how asyncio works regarding Task completion. Any other response communication which works through the mediator would be more complicated.

Lessons and Conclusion Developing this project from the start taught me to spend time upfront to understand the capabilities and limitations of each dependency. The core ideas of the project’s software architecture have remained consistent since the beginning. For example, the WebSocket server and sim dispatcher were always going to be separated. However, the mediator concept took me several attempts since I did not fully understand asyncio at the beginning, despite having a conceptual understanding of asynchronous programming.

In addition to becoming a more experienced software engineer, the internship taught me how to collaborate effectively and help deliver software projects that reach beyond my own capabilities. Many of the ideas that have improved the quality project started off as suggestions or change requests as part of the peer review process. I would like to thank my supportive team for helping me grow as a software engineer and deliver the project.‌

Robert Liu

As a Summer intern on the Predict Crew, I was exposed to a breadth of technologies that comprise FlightAware’s machine learning workflows, including the extract-transform-load and model training processes; but what I worked with most closely, and what I enjoyed learning the most, was the production process. My project—to make performance improvements to the Predict Crew’s Taxi Out Duration Estimate production streamer—gave me invaluable experience in making performance optimizations and adapting to Agile software development processes.

Taxi Out Duration Estimates (TODEs) are one of FlightAware’s newest Foresight products; produced by proprietary machine learning models, TODEs are informed by many features of which the streaming service must also be aware. My task was to develop a Rust rewrite of the Python TODE streamer. Rust is a systems language that, owing to its unique paradigm of ownership and non-lexical lifetimes, eliminates the need for reference counters or garbage collectors—traditional methods of ensuring memory safety. As such, Rust enjoys high performance. And as Rust incorporates high-level functional features and object-oriented programming paradigms, it also features more expressiveness than other systems languages.

My Project

FlightAware data flow begins in various data feeds—FlightAware processes information from sources including air navigation service providers (ANSPs), space-based ADS-B, and FlightAware’s own global network of over 30,000 terrestrial ADS-B receivers. hcombiner is the program responsible for combining these various feeds into a single input, which HyperFeed, FlightAware’s proprietary flight decision engine, then fuses and synchronizes to produce controlstream, the single source of truth for aircraft positions. controlstream is where the Predict Crew’s various streaming applications begin—and where predictions are ultimately sent.

At the heart of this data pipeline is daystream, a publish-subscribe queue system; daystream feeds are distributed across TSV files that are tailed by daystream readers and appended to by daystream writers.

The Rust TODE streamer is one such predictive application and consists of four key functions: extracting and saving aircraft positions from controlstream, building relevant input samples with the help of feature caches and remote databases, feeding these feature vectors into machine learning models that are loaded into memory, and writing the per-flight predictions to the relevant daystream feed. In designing the Rust rewrite, I opted for a multithreaded design that exploited Rust’s various tools for concurrent programming. Below is a simplified model of the streamer’s architecture.

Various LRU and time-based caches store information about the innumerable flights and aircrafts that the streamer processes, and information that comprise the feature vectors, including various traffic metrics, data from the National Oceanic and Atmospheric Administration, and METeorological Aerodrome Reports (METARs). Additionally, state variables containing information about virtual clocks or model features also facilitate communication between threads. As the streamer runs, feature extractors—functions that consume a flight’s state and return a relevant derived feature—build the final feature vector that is consumed by the machine learning model.

The machine learning framework currently used by TODE models is LightGBM. As Rust support for LightGBM is rare, I found only a single open-source library of scant bindings for basic LightGBM functions. This sufficed for the TODE streamer; but the Predict Crew is currently investigating options to abstract machine learning models with a remote server, which will eliminate the need for machine learning APIs in future streamer projects.

Finally, predictions are written to a daystream feed through an endpoint program that fuses data sent over a TCP connection into a timestamped TSV file. Below is an example of a tab-separated prediction message:

Reflections

As this internship was my first in software development, working in the Predict Crew quickly acclimated me to daily life on a high-functioning engineering team. One of my core takeaways was embracing the process of learning and exploring new technologies as not merely a responsibility of the job, but as being at the heart of the culture and principles of software development; and at FlightAware, continuous learning is indeed at the core of the company’s engineering culture. For me, this meant learning various languages and technologies—Docker, Tcl, Nix—but most meaningful to me was diving into foreign development processes through the lens of Rust, a language that was similarly foreign at first.

Taking ownership over an entire project also exposed me to the design decisions software developers constantly grapple with; a constant challenge I faced, for example, was translating the complex web of “has-a” Python inheritance relationships into “is-a” composition relationships idiomatic to Rust, in a maintainable and reproducible manner. The principle of composite reuse of newer languages like Rust challenged me to unlearn the inheritance paradigms I was accustomed to in Java and C++ and, in the end, deepened my understanding of object-oriented programming and program abstraction.

Additionally, working with Rust—mapping its lifetimes and ownership rules over a large-scale project—helped expand my understanding of memory management and how to interact with a computer system as a programmer to maximize performance and readability.

Conclusion

FlightAware is a unique community of pilots, engineers, and scientists passionate about expanding the frontiers of big data and aviation technology, and I’m so grateful to have spent my summer learning with them. Interning on the Predict Crew gave me invaluable experience in building software and acclimating myself to making the decisions that software developers must constantly consider in building and delivering performant software. The FlightAware community is incredibly supportive, and I never felt hesitant to ask for help or schedule one-on-ones with people across the company; I truly believe FlightAware is the perfect company to begin a career in aviation technology.

Blast from the Past: 2022 Summer Intern Projects was originally published in Angle of Attack.

Unifying Air and Surface Coverage

Mon, 01 Aug 2022 13:29:56 -0500

As a Staff Software Engineer, Garrett McGrath is responsible for the performance, reliability, & observability of multi-machine Hyperfeed and an increasing constellation of services that consume Hyperfeed's output.

Before October[1] of 2021, Firehose customers wanting global flight tracking information with both airborne and surface positions had to overcome a few technical hurdles. First, they had to consume two separate feeds: one with airborne and one with surface positions. On top of the clumsiness of requiring multiple connections to provide a complete picture of global flight tracking, the surface feed presented the primary challenge: it lacked the flight IDs provided by the airborne feed. Flight IDs supply a globally unique identifier for a flightplan[2]; without flight IDs in the surface feed, the customer had to link the surface positions received with the flightplans ingested in the airborne feed, an error-prone task with many tricky edge cases. This situation did not serve customers or FlightAware very well. To remediate this issue, the Unified Feed (UF) was developed. The UF unifies the airborne and surface feeds into a single feed with flight IDs for all messages[3].

Combining Feeds by Timestamp

Combining multiple feeds together into a single synchronized stream ordered by timestamp is a recurrent problem at FlightAware. To produce the UF, this problem had to be solved once again, albeit with some twists specific to the feeds being combined. Before getting to the distinctness of the UF’s combining task, though, it is worth spending some time investigating the general problem of combining feeds. Combining feeds can be expressed in an informal but general way:

Input: Some number, F, of input feeds.

Each feed represents an infinite stream of flight tracking data, broken up into individual messages, consumed one at a time, from a particular source.
The contents of each feed differ depending on source, but there is one commonality among all feeds crucial for this combining task. The individual messages in each feed contain a timestamp indicating when a given message was received by FlightAware. For a given input feed this timestamp is monotonically increasing[4].

Output: A single feed containing messages from all F input feeds synchronized, or ordered, by timestamp.

The output feed is typically written while the input feeds are producing data in real-time, i.e., when the message timestamps are within a small threshold from the current time. Crucially, the combining process must work the same when not reading its input feeds in real-time, e.g., when run over historical data or after an outage. When not operating in real-time, the combining process is said to be running in catch-up mode.
It is worth noting at this point that despite ordering feeds based on timestamp, the combining process as outlined above does not account for inaccurate data. The combining process is content agnostic: validation of accurate data is necessarily handled elsewhere.

While seemingly a simple problem, nuance arises from the requirement that the combining process must perform equivalently in real-time and during catch-up. For instance, the simplest, most naive solution (pictured below) is to use a single priority queue for the input feeds. In this scheme, each input feed puts its messages on a shared priority queue where the priority is the timestamp used for combining. Then, to produce the output, keep reading the message with the smallest timestamp from the priority queue. This works well for real-time operation but fails in an easy-to-miss way when in catch-up mode. In catch-up mode, the timestamps in the feeds are well behind the current time. This can happen for several reasons, but a common scenario is a hardware failure where no output is produced for some minutes until the combining process is resuscitated on a new machine. Whatever the underlying cause, combining feeds in catch-up mode necessitates some extra care lest the feeds get majorly out of sync.

Naive, priority queue based feed combining

To illustrate the problem of using the priority queue method of combining feeds in catch-up mode, consider the diagram below. It consists of a simple example of two input feeds being combined in catch-up mode where each feed outputs messages at a different rate[5]. One of the feeds, call it Feed A, outputs messages at a rate of one per second, while the other feed, Feed B, outputs messages at a rate of one per five seconds[6]. Using a priority queue for combining the feeds in this situation risks outputting messages for Feed A whose timestamps are before the timestamps from Feed B. If at any given time during catch-up the queue contains only a message for Feed B, then once Feed A’s next message gets put into the queue, it could be behind the last timestamp for Feed B. This can happen because of the difference in volume between the feeds. When running in real-time this is not an issue since the consumption of Feed A cannot rush ahead of Feed B, but during catch-up the naïve combining process effectively desynchronizes the feeds and violates the requirement that the procedure works equivalently in both operating modes.

Desynchronization issue with naive feed combining in catch-up mode

Fixing this issue involves a remarkably simple solution that makes two amendments to the naïve priority queue solution. Instead of having a single priority queue for all the input feeds, each feed gets its own first-in-first-out queue (as pictured below). To decide which message to output into the combined feed in this scheme, loop over all the input feed queues and output the message with the smallest timestamp. In addition, during catch-up, rather than allowing some of the queues to be empty, require that the queues for every feed that could suffer from the desynchronization issue have a message before outputting anything into the output. Giving each feed its own queue has the same core semantics as the priority queue solution but also supplies additional information, namely which input feed(s) currently have data. That new information and the stipulation that the essential feeds have non-empty queues ensures that they stay in sync during catch-up. These two changes obviate the issue of feed desynchronization outlined in the paragraph above this one.

FIFO queue solution to desynchronization issue

Combining the Unified Feed

Moving away from the general problem of combining feeds, let’s now return to some of the specifics of the UF. Generating the UF entails synchronizing three feeds. The first feed is called controlstream, and it consists of FlightAware’s global picture of flight tracking activity with airborne positions. The input to the program that produces controlstream is a combined feed of several dozen input feeds. Controlstream contains flight IDs for all the flightplans that it tracks. The second feed is called surfacestream; it consists of FlightAware’s global picture of surface movement activity at airports around the world. Surfacestream, which is another combined feed, consists of positions from FlightAware’s ADS-B network as well as an FAA-sourced feed restricted to the United States. Analogous to controlstream’s flight IDs, surfacestream contains surface IDs that uniquely identify each surface track. The third feed, called mappingstream, is derived from controlstream[7] and contains flight ID to surface ID mappings along with data access control information[8].

To generate the UF, then, surfacestream positions need to be synchronized with controlstream messages. However, the positions cannot be emitted simply as found in surfacestream. Instead, the surfacestream positions need to be annotated with flight IDs so that they can be unified with the flightplans in controlstream. This task layers significant complexity on top of the task of combining the input feeds. For producing the UF, multiple threads are arranged to accomplish this feat. A diagram of the unified feed combiner’s architecture is found below[9].

Unified Feed combiner architecture

Annotation of surfacestream positions based on the data in mappingstream is handled by its own thread. The thread responsible for writing the output consumes from two queues: one with controlstream and one with annotated surfacestream, i.e., surface positions with flight IDs. Since position annotation has already been handled by this point, the thread writing the output can focus on combining feeds and nothing else. For avoiding the desynchronization issues that were touched on in the discussion of combining feeds, separate behavior is used for real-time vs. catch-up operation. In real-time mode, only controlstream is required for generating output, but in catch-up mode both input queues must have data before proceeding. Generating the UF, then, requires some special tasks peculiar to its input feeds but ultimately boils down to the general task of combining feeds.

The Unified Feed Service

Throughout this post, the UF has referred to the data delivered through Firehose that unifies FlightAware’s previously separate airborne and surface flight tracking feeds. Internally to FlightAware, the UF also refers to a service: a collection of components that together output the data streamed to Firehose customers. All the components in the UF service are written in Rust, a relatively new addition to FlightAware’s technology stack. Rust has seen an increase in usage at FlightAware as an alternative to C++ for the typical reasons: performance, memory safety, concurrency support, excellent tooling, high quality libraries, and the relative ease of refactoring and maintaining the code base. Augmenting the use of Rust, for building and deploying the UF service, nix is used in conjunction with Jenkins. nix provides reproducible builds, which are especially useful for pinning the Rust toolchain and dependent libraries. With that tooling in place, Jenkins then automates the operation of the UF service.

The UF service’s components operate with a combination of Kubernetes (K8s) and Kafka. K8s runs the Rust programs that make up the service; Kafka persists the state[10] of the components and stores mappingstream and the UF itself. Since the UF service does not expose itself to external clients, it is straightforwardly deployable on K8s: it sidesteps any of the potential intricacies of load balancers or ingress controllers. In addition, since the UF service’s state is stored entirely in Kafka, it does not need to use any storage in K8s, so it also avoids the potential operational complexities of persistent storage in a distributed context. In the UF service’s K8s cluster, each component of the service runs as a deployment with a single replica so that only one instance of each executes at any given time. There is nothing especially complicated about the K8s resources used: the UF service is an excellent fit for the basic abstractions provided by K8s. Lastly, although K8s and Kafka already provide some built-in fault tolerance, the UF service runs in an active-active setup in two different data centers where each instance of the UF service produces the same output. An illustration of a single data center’s UF service deployment is found below.

Unified Feed Service Components

During development of the UF, the correctness and feasibility of the implementation was tested using several different but standard approaches. Each component in the service has a battery of unit tests. While unit testing works well for isolated pieces of code, the UF needed more than that before sufficient confidence in its correctness could be assumed. To supplement unit testing, integration tests played a huge role. In particular, the testcontainers crate permitted spinning up an ephemeral Kafka container for running integration tests[11]. Another type of integration test used is so-called golden file testing. With this style of integration test, there is a small output file, the “golden file,” representing the correct output of the UF service over a fixed set of input data. When making changes to the UF service, a new golden file is generated and compared against the known correct copy. No matter the style of integration test, these tests permitted verification of the actual output of the UF service in a full end-to-end manner. Rounding out the unit and integration tests, there is also a benchmarking suite for keeping a watchful eye on performance degradations.

With the UF service in production, Firehose customers wanting airborne and surface positions now have it available in a single feed. Compared to controlstream, which outputs roughly 2k messages per second on average throughout a given 24-hour period, the UF emits just over twice that. In terms of position rate, and only counting positions with a flight ID, the UF more than doubles the average per second rate found in controlstream. While controlstream works well for a large swath of users, having surface positions available for applications that need it represents a significant improvement in flight tracking quality and customer experience. Running the UF as a separate service in a performant language also paves the way for adding additional high volume position feeds in the future. To try out the UF today, sign up for a Firehose trial.

Footnotes:

[1] For posterity’s sake, this means October 15th 2021.

[2] A flightplan in this context is a single instance of an intention to move a plane from one location and time to another location and time.

[3] Some surface messages intentionally lack a flight ID. These messages provide vehicle location information and location entry and exit events. Vehicle location positions are for non-aircraft vehicles with ADS-B or radar tracking at supported airports. Location entry and exit events are for when aircraft and vehicles go in or out of pre-defined polygons at an airport, e.g., a cargo ramp or a de-icing pad.

[4] There are often multiple timestamps in a single feed message and not all the timestamps are guaranteed to be monotonically increasing. For instance, a salient example is FlightAware’s terrestrial ADS-B feed, which also has a timestamp indicating when a position was emitted by an aircraft. These timestamps can, and often do, arrive out of order. Digging into how this is handled is outside the scope of this blog post, but it centers on batch processing. Before writing to the output feed of ADS-B positions, a batch of positions is first collected for a configurable time period, and then sorted by position emission time. This introduces a delay between when a message is created and when it is processed by FlightAware but having message ordering for most positions is worth the trade-off.

[5] Credit to Zach Conn for the example.

[6] The exact numbers do not matter if there is a difference between the rate that the feeds are produced, which is the case in practice for any two of FlightAware’s data feeds.

[7] While controlstream does not have surface positions, it does have surface events, e.g., power on or taxi start. These surface events provide the needed link between flight IDs and surface IDs. Mappingstream, which ties these feeds together, is produced by a program that runs as part of the Unified Feed service, which is described below.

[8] How FlightAware handles data access control for its data sources will be covered in a future blog post.

[9] Special thanks to FlightAware developer Yuki Saito. The diagram here is an adaptation of one he made during the development of the Unified Feed service.

[10] For each input feed combined to produce the UF, the startup state consists of the timestamp of the last successfully processed message. Input feed timestamps are stored in a single Kafka topic and partition and mimics the way Kafka stores consumer offsets except instead of offsets into a Kafka topic the UF service stores timestamps of input feeds. To pick up where it left off, the UF service simply consumes the last record in the partition.

[11] A huge thanks to Yuki Saito, a primary developer of the UF service, for contributing Kafka support to the testcontainer’s repo.

Unifying Air and Surface Coverage was originally published in Angle of Attack.

Aviator - The Journey from an Idea to a Product

Tue, 05 Jul 2022 12:53:50 -0500

FlightAware Aviator was developed by FlightAware pilots for General Aviation (GA) pilots, bringing many of FlightAware’s most advanced features to this community and introducing many new innovations along the journey. James Parkman is a Senior Product Manager with FlightAware, developing the roadmaps and guiding the implementation of many of the aviation company’s product portfolio. James is also a Commercial pilot and Flight Instructor who guided the Aviator product from conception and continues to develop the product today.

What is FlightAware Aviator?

FlightAware Aviator is a subscription offering aimed at piston engine aircraft owners and operators that unlocks a wide range of powerful flight tracking and planning features for a small fleet of registered aircraft. The product was developed by FlightAware pilots to serve this community in the General Aviation space. Many of the features available in Aviator have been developed and perfected in a variety of FlightAware’s commercial product line (such as Global, aimed at business aviation – jets) but were not widely known or in use by the GA community. Aviator allows a pilot to assemble a fleet of their favorite aircraft, whether owned or rented, and unlocks a powerful feature set to aid in flight planning, tracking, and much more. Aviator is the perfect companion application for small aircraft owners, student pilots, private pilots who rent aircraft, flight instructors, and flight schools.

This blog will detail the journey from conception to commercial release of the product, giving you an insight into the development process and bringing light to some of the challenges and successes achieved along the way.

Figure 1 - My Aviator Fleet Management in iOS

The Concept of FlightAware Aviator

To start off, I’ll give a little more background on myself. Before joining FlightAware, I was not a pilot. I had played a lot of flight sims, but I hadn’t seriously considered earning my pilot’s certificate. One of my favorite parts of FlightAware is the strong community of pilots that call the company home (there are over 25 certified pilots that work here – about 20% of the company). Also, there is a Flying Program in place that assists in training. I latched onto this from my earliest discussion with FlightAware and immediately began flight training after joining the team. I’m now an instrument rated Commercial pilot, a flight instructor, and the owner of a 1967 Piper Cherokee 180.

The purpose of me sharing that story, besides sharing my love of flying, is to give a little background on how the concept of Aviator began. As a FlightAware employee, I had behind-the-scenes access to all the features available to the most high-end commercial customer, and I found myself using these features in my own flying more and more. I’d refer to old flights and examine my routes, altitudes, and speeds. I’d observe and set up alerts for surface movement events, such as powering on my aircraft or parking it at an FBO. I’d enable advanced weather and aviation map layers when planning my flights. I would schedule my flights using Flight Intents (they used to be called FPAS – Flight Plan Advisory Service). And I quickly realized there wasn’t an offering that would give this same feature set to a General Aviation pilot like myself.

I began speaking with Daniel Baker and Karl Lehenbauer, our leaders (and pilots themselves), about how we could package these features in an affordable, streamlined offering. The idea for Aviator was born during these conversations.

Discovery

Before becoming a pilot, I was a project and product manager for software projects. In that role, there are a lot of considerations when creating a new product. An idea needs real insights into the problem to be solved, along with definition and organization to be realized. I call this process Discovery, and it typically takes the form of a series of meetings with stakeholders (a stakeholder in this context is anyone who has an interest in the results of the initiative). In the case of Aviator, these stakeholders were high level leadership as well as Engineering and Design who would implement the necessary feature set. Our resident pilots were also stakeholders interested in the feature set being made available in Aviator.

Discovery is a process; meetings have an agenda. Known facts about the newly evolving product are reiterated and the concepts contemplated by the group. Open questions are posed, debated, and solutions settled upon. Feature lists are proposed, modified through discussion, and refined. From this process emerge the actual work items that will be undertaken in the form of User Stories and technical Tasks. An initiative of the scope of Aviator results in many hundreds of these Stories and Tasks. Each needs to be written with specific requirements in mind, so that the development team can build the feature set to specification and will not be surprised or hindered by things that haven’t been thought out. This is somewhat of an iterative process, as even the robust Discovery process will fail to cover some angles of the end product; but the goal is to get ahead of the development as much as possible. This reduces “scope creep,” where new ideas and requirements continually make their way into a project, causing delays and complicating the outcome.

Discovery also includes market and pricing consideration. A new product must have a home with a particular market segment (in the case of Aviator, General Aviation pilots and aircraft owners). The pricing of the product should also conform with industry expectations. Competitive analysis is done to identify what competing products might exist, or to define a specific niche that our product fills. When considering Aviator, we realized that although there are several excellent flight planning tools on the market, there was a need for a world-class flight TRACKING solution for small aircraft, and FlightAware had the technology to provide this.

The conclusion of Discovery is a defined set of features, solutions to edge cases, a refined market analysis, and a work breakdown that allows development to begin. The Discovery process for Aviator lasted several months.

FlightAware is a mature software development studio, but we also evolved this process in new ways for our teams and endeavored to improve our development process. We’re proud of the efficiency and process that we brought to the cycle for Aviator and have continued to use the lessons learned in both ongoing Aviator feature development and in unrelated projects.

Figure 2 - The Discovery Process

Implementation Process

After Discovery, the crew was assembled for implementation. Aviator is what we at FlightAware would call a cross-crew collaborative project, which means that representatives from several different divisions in the company are required to complete the work. The Aviator Crew consists of engineers and systems architects from our Flight Tracking, Backend, Web, and Mobile teams. It also includes representation from Design, Marketing, the PMO (Program Management Organization), and Product Team. A total of roughly 15 people worked primarily on Aviator, along with a supporting cast of 20 or so more. Development lasted about 9 months, the first 3 being the Discovery process. The final 6 months of development were dedicated engineering work and beta testing. As any software developer knows, this is a rather efficient cycle for a new product.

FlightAware embraces agile software development and utilizes a robust SCRUM Sprint cycle to develop most new products, projects, and features. We worked in 2-week Sprints, each Sprint with a defined goal. Stated Sprint goals are important – they provide measurable milestones to small segments of development, and they state achievable goals in small increments. We utilized Backlog Refinement, Sprint Reviews, and Retrospectives along with Daily Standups as our standard Sprint ceremonies. The Aviator Backlog contained all the User Stories that were defined during Discovery, and each two-week period we would refine those stories as necessary based on new information, discovered edge cases, or changes in our approach for any issues. For a given Sprint, the development team would select the Stories and Tasks that they would work on during that 2-week period, pull them into the current Sprint, and the goals would be defined based on that work selected.

Figure 3- An Aviator Sprint Burndown Chart

Aviator Features

Aviator as a product consists of a mix of existing FlightAware features, and some new and important innovations. As most FlightAware web or mobile app users know, an unregistered or basic user has access to a lot of functionality; however, there is a wide range of capability that is only available to a premium subscriber to one of our services. The list of existing features that were included as part of an Aviator subscription is long and includes examples like premium weather map layers, additional alerting capability, additional flight history for aircraft, Cockpit Situational Insights (current weather and runway information about the intended airport of origin and departure, as well as autopilot settings in the flight track log), Surface tracking and Ready-To-Taxi (additional information about surface movement of an aircraft, such as power on and parked status), the Schedule Visualizer (which shows conflicts between scheduled flight) … the list goes on. These are all informative features that could each warrant a blog of their own.

In addition to bundling these existing capabilities, Aviator offers some excellent new innovations. The first is the ability to register a fleet of aircraft, which required some new account management infrastructure. An Aviator account can have 5 registered piston engine aircraft, and an Aviator+ account can contain 10. You’ll note I specify piston engine aircraft – Aviator also requires a registration check against the aircraft tail number to verify that it is a piston engine (single or multi-engine). This is to ensure that large business or other commercial jets are not registered to an Aviator account. Aviator is intended for the General Aviation owner and operator and is tailored for their needs. Our Global product is designed for these larger aircraft and has unique features intended for that audience.

A second area of innovation is an exciting capability called Flight Intents. A Flight Intent is a definition of an upcoming flight that can be created by the pilot that appears in an aircraft’s schedule but is not filed with the FAA as a true flight plan. This gives additional capability when tracking these flights, such as: a planned route line on our maps, a designated destination, the ability to benefit from our accurate Foresight machine learning predictive times, the ability to designate and inform a destination FBO to your arrival, and many other details about the upcoming flight. It’s amazing what the simple inclusion of a Flight Intent unlocks. It gives us knowledge of a flight and opens our ability to inform others about it. They are a simple way to enlighten your friends and family to a full picture of your upcoming flight.

Flight Intents had been a feature that existed for a while in the FlightAware ecosystem, but their inclusion in Aviator required a full-fledged subproject to re-envision the engineering architecture of the system and revamp the interface across our apps. Work is still ongoing today refining the Flight Intent capability. This system was designed and implemented by Michael Yantosca, a FlightAware Flight Tracking Senior Engineer, along with extensive work by our Web and Backend teams.

Figure 4 - Creating a Flight Intent

A third major innovation undertaken by the Aviator team was a simultaneous launch of the product on both web and iOS platforms. This was the first product that FlightAware has endeavored to release on both platforms, and the challenges here were larger than might first appear. Step one in this subproject was revamping the web-based subscription and billing platform to utilize Authorize.net, a platform which allows us to take much of the billing and accounting burden off our own teams. The second challenge was creating a similar subscription system in the Apple Store. Maintaining parity between both subscription platforms is not as easy as it sounds. Lastly, to provide the best experience possible for an Aviator subscriber, these platforms need to communicate seamlessly with each other. So, a user who signs up on web should be able to access their fleet and features on the iOS platform, and vice versa. This is accomplished through a service that we called the Receipt Processor. Most of this background account management work was architected and implemented by Anne-Leslie Dean and James Wilson, both Senior Engineer leaders at FlightAware. The result is that a user can subscribe to Aviator on either platform and move seamlessly between the two anytime, and their fleet and scheduled flights are shown in either.

Lastly, although FlightAware had some old and rudimentary aviation map layers in place, we realized that the Aviator audience would benefit from up-to-date and attractive Sectional and IFR Low and High Enroute map layer, so we partnered with a company called SkyVector to ingest and display their always updated and accurate layers. We also continued our partnership with DTN Weather to provide the most robust, high quality premium weather layers to Aviator subscribers.

Beta and Launch

FlightAware Aviator was launched at the EAA AirVenture 2021 General Aviation event, and there was a lot of preparatory work that went into this product rollout. Aviator is a self-signup product, meaning we don’t rely on sales folks to shop the product around to potential customers, there are no contracts to sign, and there are no customization options available to personalize the product. It’s like Netflix or any other typical subscription application. Consequently, we rely more on social media marketing, advertising, word of mouth, and co-promotional partnerships to get the word out about Aviator.

Our Marketing team did a great job of enlisting numerous influencers, media personalities, and journalists to try the product during a beta testing period. Many of these are also pilots and aviation industry people themselves. This not only allowed us to generate awareness, but also gave us some useful testers to refine the product and iron out any kinks. We garnered valuable feedback from pilots that helped us refine the offering.

FlightAware attended OshKosh in 2021, armed with iPads and PCs to demonstrate the product for attendees. We also created some nice supporting materials, such as attractive brochures and social media advertisements to get the information about our new product into the hands of our audience. It was very exciting to begin seeing customers signing up for our new product on the OshKosh show floor.

Figure 5 - FlightAware at EAA AirVenture 2021

The Future of Aviator

Although Aviator is a smaller and more community-oriented product than some of FlightAware’s commercial aviation and data products, it is a product that I am passionate about. It serves the General Aviation pilot market that plays host to students, new pilots, instructors, and aircraft owners. Aviator is also a great incubator for new features that can and do make their way into other FlightAware offerings, and even contribute to the free website and applications.

One element of development that we are pursuing are more extensive social features that will help to support and grow this community. We’ve recently redesigned the fleet management interface on iOS to be more user friendly, and we’ve added a whole new photo gallery to the app. In addition, we now give the ability to Aviator users to set the primary photo on their aircraft. Previously, the photo that appeared was the one most upvoted by the community. Now, the pilot will be able to select the one that shows up across all of FlightAware.

The most exciting feature we’ve recently developed is one called Enhanced Flight Sharing. This will allow an app user to share a particular flight on social media using the latest iOS sharing system, which allows sharing to any application that supports sharing on the device. So, what once could only be shared on a small number of platforms like Facebook or Twitter can now be shared anywhere – SMS, LinkedIn, Slack, etc. But the most exciting element is a new FlightAware Service called Parrot, which generates an attractive and informative map of the flight route, along with details of the flight. This is a much more full-featured sharing option than was previously offered. As we continue to develop this sharing service, we plan to offer the ability to share maps that contain much more detail about numerous flights, a flight history map for a given aircraft, and time period.

Figure 6 - FlightAware Enhanced Social Sharing

We have many other ideas for future development for Aviator and continue to pay attention to what is important to the General Aviation community. Our intent is to continue to use FlightAware’s agile development processes to incrementally provide value to our subscribers through the years.

Conclusion

I sincerely hope that this view into the product development for FlightAware Aviator has proven interesting and informative. This is a process that I, my team, and all of the teams at FlightAware are continually improving. We strive to move quickly and listen to customer feedback and the needs of the industry. I invite any questions or commentary on our products or process at james.parkman@flightaware.com, and look forward to seeing some of you in the sky.

Have a great week, and fly safe!

Aviator - The Journey from an Idea to a Product was originally published in Angle of Attack.

Origin and Development of Tohil

Mon, 20 Jun 2022 10:00:00 -0500

Karl Lehenbauer is a founder of FlightAware and served as Chief Technology Officer from 2005 to 2021. In his role as CTO Emeritus and Advisor, he provides guidance to Collins Aerospace on shaping the future of digital aviation and building a connected ecosystem for customers.

At FlightAware, we always lean toward being open instead of closed. It helps to keep us open-minded, open to new ideas, new opportunities. When it comes to programming languages, that means we are willing to experiment with, evaluate, and, when appropriate, adopt new languages into our repertoire.

When we spun up our Predictive Technologies team, it was natural to use Python because it is heavily used in machine learning. Python impressed the team for its general-purpose usability, and Zach Conn, then the team leader (now CTO), came to me to say that Python was really good, and suggested I look at it.

I'd been kind of automatically down on Python because of its indentation-based code scoping. (I had seen in Snobol how a space character could change the meaning of a program, and I found it troublesome.) But I set that concern aside and started learning and experimenting with it. With both growing excitement and growing embarrassment, I began to think that it was important, and we probably ought to embrace it.

How we got here

In 2005 when we were first starting FlightAware we knew some stuff that were "must haves" for us: SQL. Scripting. Unix. C. Doing stuff over the web instead of native PC and Mac apps. And the Internet, obviously.

We already had a software stack because of what we had already been working on, and it was a quick and easy decision to use Tcl, Unix, PostgreSQL, and Apache, because we understood it, we were good at it, we had a lot of code we could leverage, and we were actually pretty happy with how it had been going using that stuff.

We had the startup mentality, big time. Move fast and break things. While we were still making code of overall high quality, what we were writing was completely greenfield. Almost anywhere you looked you could see something that needed to be written and bang something out that while perhaps far from perfect, was way better than nothing.

Within months, FlightAware was a sensation. The quality of the website was in appearance just a notch under what I considered the gold standard at the time, apple.com, and in capability was beyond about everything at the time, and just far far beyond what people were used to in aerospace.

Tcl has served us well. It’s pretty strong. And there’s a lot of open source Tcl out there to build on. It’s got a stable user base, but not a large one; we find ourselves writing interfaces to libraries, creating packages, etc., that already have solid open source implementations for other languages. We want to use web frameworks and the like but then have to consider making our own.

As you might guess, over the course of nearly 17 years we wrote and brought to production a lot of Tcl code, most of which is in daily use by our 13+ million registered users and our thousands of commercial customers, partners, and vendors.

Should we try to rewrite the codebase in Python or in some other language? Absolutely not! It would be a certain disaster. Perhaps suicidal. (A quick entertaining read that probably says it better is “Things You Should Never Do” by Joel Spolsky).

Dipping a toe in the water

As an experiment in evaluating and learning Python I wrote analogues of our Tcl watchdog and alarm/normal functions as a ground-up rewrite where they would be modern and correct Python. It was a small library, only a few hundred lines, that I had originally written and still remembered fairly well.

Even given those advantages, it was surprisingly difficult, and I introduced a half dozen nonobvious regressions!

No, rewriting was off the table, for a lot of good reasons. Rewriting is perhaps a once-in-a-career thing.

A little more coming into focus

OK, the stage is set. We want to make Python a first-class language, something we can use for anything, which means it needs to be able to do everything we can do from Tcl. We are aghast at the thought of rewriting. Ergo we are going to look really hard at how we might call Tcl from Python.

I still write software

At the beginning of FlightAware, the three of us wrote software full-time or pretty much full-time. I also took care of the database servers for years, and pulled a lot of shifts as the ops on-call. Eventually as the company grew and the complexity of the roles grew, we spent less and less time writing software ourselves.

But I still liked to, and would. I’ve been writing software for 49 years and I would say I've become pretty decent at it.

I believed, and believe, that as someone directing development, it’s useful to experience what the developers are experiencing. Is this environment amenable to rapidly making high quality software? What are the pain points? How can we make it better?

As CTO, I still kept my hand in. I still wrote software, although again with the growth of the company it evolved more to be an activity I would indulge in on evenings and weekends.

A consensus is reached

We had a consensus among our technical leadership that we wanted to get to Python, and we wanted to be able to use our existing Tcl code base from it, and without a rewrite. And my boss, our CEO, was ever supportive, and when he did express a technical opinion, it was always a contribution, because he's super smart and knows a ton.

The spec

So here's the spec: We wanted a way to call Tcl from Python, and to have those Tcl functions, from Python, look and behave as much as possible as if they had been written natively in Python, such that, ideally, the developer wouldn’t even be aware they were calling a Tcl function.

Trying to get started

And then… it stalled.

After all, we were super busy taking care of customers, fulfilling on our roadmap, increasing scalability, adding redundancy, fixing bugs, etc. When you have more to do than you can do, it’s natural that some things fall to the wayside, and the things that don’t have advocates pulling for them tend to get pushed to the side.

A dawning realization

Chewing on this, it began to dawn on me that I was uniquely qualified to do some of the work. After all, I had made many contributions to the development of Tcl itself, so I was quite familiar with its C language internals, and I also had some experience with the Python C interface because I had studied it when Jim Nasby and I overhauled PostgreSQL’s PL/Tcl Tcl server-side language to make use of more modern Tcl interfaces that greatly improved its performance.

It fit with my nights-and-weekends ethos that I would at least look into it myself.

First see if it's already been done

The first thing anyone in the know has done for the last 30 years when considering writing a piece of software is a search for open source that either already does what you’re looking for or can be a piece of what you’d otherwise have to write yourself, and today that pretty much involves google searches and GitHub.

Python already has Tkinter to get an X-Windows GUI interface using Tcl and Tcl's Tk toolkit, so that at least showed a possible way to do things and was an option. And we did do a little work with it and found it to be a little clunky... useful, but not quite what we were looking for.

libtclpy

Google also revealed a single-developer, dormant GitHub project, "libtclpy", billed as “A Tcl extension to effortlessly to call bidirectionally between Tcl and Python.”

That seemed pretty good.

It was written by Aidan Hobson-Sayers, who is CTO of a UK software company and leader of the Rust language infrastructure team, which was a good sign, and under a permissive BSD-style license that allowed for unrestricted reuse provided its copyright notice and disclaimers were reproduced in any derivative versions.

It had the ability to call Python from Tcl via a “py” command, and to call Tcl from Python via a "tclpy.eval" function. It got the ball rolling, was something useful to study, and possibly start from.

Even though it didn’t meet the aforementioned desired call transparency, one of Tcl’s extraordinary strengths is its introspection. It was easy to imagine using Tcl’s introspection to find all the Tcl functions ("procs", in Tcl nomenclature), their optional and required parameters and default values, and make Python wrappers for them‚ something like that, not something to be dug into immediately, just comfort in the knowledge that that part should be completely doable.

libtclpy was only a few hundred lines of code, but it had this great function, pyObjToTcl, that could convert Python objects to Tcl objects, including not just strings, floats and integers, but also dicts, lists, arrays, booleans, byte arrays, etc.

And it had the mechanics of creating a single C shared library that had the requisite entry points such that it could be loaded as a C language extension both by Tcl and Python.

If a Tcl interpreter loaded the libtclpy package, a Python interpreter would be created and initialized, and vice versa.

This was all great stuff and all stuff that we needed.

So that jumped us ahead a lot.

And nothing energizes the committed like an early success.

Building the foundation

On GitHub I forked libtclpy and started working on it. The build was hard coded for a particular version of Linux. We also need it to work on FreeBSD and the Mac, so get it building instead using the GNU autoconf build system, something I already knew how to do from writing Tcl extensions.

Start performing little cleanups and normalizations. Instead of loading the libtclpy shared library from the Tcl side with Tcl's relatively obscure shared library "load" command, make it load with a standard Tcl “package require”.

There was a way to call Python functions with explicit arguments, but add a Python-like eval and exec from Tcl. There are a bunch of things you can do with Tcl from C besides eval so add Python-side methods to get and set variables, evaluate Tcl expressions, substitute variables and evaluate embedded Tcl commands in strings, etc., i.e. we are at this point speculatively making interfaces to features of Tcl in the expectation that some of them will be useful.

And we’re trying to use it to call our stuff and looking for where the friction is.

Tcl returning only strings to Python isn't ideal

One thing that became apparent, while libpytcl was very good at importing Python objects into Tcl, it punted everything returned from Tcl into being a string. If Tcl returned a list of integers from 1 to 3, it'd sure be handy to get it as a Python list, like [1, 2, 3] not a string like ”1 2 3”. This was going to need to be addressed, else the Python code that invoked Tcl functions would have to be coded to handle the data returned in a special way. In a simple example, if a function returned an integer but it came to Python as a string, the developer would have to wrap the result in int() before using it as an integer. This would violate our transparency ethos.

This deserves a name

Also the name libtclpy kind of rubbed, like could it be more anodyne? (Although it did make pretty clear what it was.) Aidan seemed to have moved on. We were beginning to far exceed what libtclpy started as. And we like to give things fun names, like hummingbird, grackle, hyperfeed, horde, slick, etc.

After some casting about we chose Tohil, a Mayan deity often represented as a feathered serpent, which seemed particularly apt since Tcl’s emblem is a feather and Python’s is a snake.

So we pushed our modified libtclpy to GitHub as Tohil, and continued iteratively experimenting with and extending it.

Addressing the string return issue

An initial attempt to address Tcl calls only returning strings was to add a "to=" optional parameter to the tohil.call, tohil.eval, tohil.expr functions, etc., where the developer could specify the data type they wanted returned. to="int" would return an int, to="float", a float, etc., and so on, including higher-level Python objects such as tuples, lists, sets, and dicts.

I also added a "shadow dictionary", aka tohil.ShadowDict that shadows a Tcl associative array to create an analog to Python dicts, but one where reads and writes came from and went to the Tcl array.

First release

Soon we released 1.0.0 that contained all of our advances to date.

We had a ready and hungry user base that grabbed it and started experimenting with it, which was really valuable to the effort.

A flurry of releases followed, containing bug fixes and new features. People liked it, but Tohil's stringiness of call returns continued to rub.

Getting deeper into Python

A couple of our developers were legit Python experts, and Chris Roberts suggested that instead of saying to="int", to="float", etc, that it work where you could say to=int, to=float, and so forth, i.e. lose the quotes. I was like "How would that work? What would the call actually receive?" and he breezily replied oh you would get a Python object containing the data type itself and I was like OK I can't even think about that right now.

But it stuck in my head and after a little bit I looked into it, and before long I was like wow this is easy, you can check the argument to see if it's a Python data type object, raise an exception if it's not, and then examine the object to see which data type it is. Within a day or two one of the new capabilities in the dev branch was that you could say to=dict, etc.

A Python object that contains a Tcl object?

As I continued to improve Tohil, responding to feedback, implementing Chris and others' suggestions, I inevitably began to understand Python better, particularly its internals. I started to muse about whether it would be possible to create a C language Python object that wrapped the Tcl object, how it might be able to provide normal Python semantics to access it, whether that would be useful, and what it might make possible.

One appealing thing was the object wouldn't have to be translated and copied when sent to Python, and for something large being returned, that could really add up. Both languages use reference counting to manage object lifetimes, and that seems pretty manageable, like if you returned a Tcl object as a Python object you could increment the Tcl object's reference count, decrement it when your Python stuff was done with it. It would play well with the rest of Tcl, and the Tcl object would be freed, as always, when the last user of it decremented its reference count to zero.

So I began to investigate how to create a new Python object using C. Already I could see the value of Python having such a large user community, as the Python documentation itself is excellent, and there were multiple good blog postings, explanations and introductions to creating your own data type. And of course Python being open source meant I could look at the source code to the implementations of Python's own data types.

tclobj

And I would call it, a tclobj. It wasn't a lock that they would be useful, but my sense, and Chris and Andrew Brooks’ sense, was that it was interesting and seemed promising. The level of effort required seemed manageable, so I set out to extend Tohil to support tclobjs.

I got it working. We found it to be immediately useful. While it didn't make Tcl returns transparent, it made them more handy, closer. Tclobjs could be created empty, or from a Tcl call return, access to Tcl variables, etc., and perhaps most importantly, created at will from the contents of many of the most important Python types like int, float, str, list, dict, tuple, etc. This became release 3.

If a Tclobj contained a valid Tcl list, it could be accessed with Python semantics, like [lindex $t 3] in Tcl was t[3] in Python, and lset t 3 "bar" in Tcl was t[3] = "bar" in Python. It even supported slice notation.

The tclobj object had special methods for obtaining the value of an object in Python, (assuming your object is named 't') t.as_bool(), t.as_bytearray(), t.as_dict(), t.as_float(), t.as_int(), t.as_list(), t.as_set(), t.as_str(), t.as_tclobj(), and t.as_tuple(). Also it had special methods to access Tcl dictionaries.

The feedback was really positive, and we continued to experiment with Tohil and try to use it for stuff. 3.1.0 brought a slew of new bug fixes and improvements. Peter da Silva added full bidirectional UTF-8 support.

A new tcldict object brought Python dict semantics to Tcl dicts, and pointed the way to making Tohil more pythonic, that is, making it work in a more standard Python way.

Eager users but not too many

We were at this time kind of at a happy medium where we had users using and experimenting with Tohil and providing feedback, but we didn't have a lot of Tohil code to worry about, so we could pretty freely break compatibility as we found new and better ways of doing things.

Tests

As development proceeded, I started writing a lot of tests. While this was valuable for all the usual reasons, the tests were particularly helpful because of the nature of the code, a lot of pointer action and reference count managing, sharing objects between languages and the like, that could cause a crash to show up later than where the bug actually was, often a tricky situation to debug, looking at the wrong code and stuff like that. Committing small changes when possible, running the tests frequently, and having a lot of them, exercised Tohil a considerable amount, and helped catch these sort of problems, and fingering a small set of changes, rather than later finding some innocuous change had introduced a crash and the introduction of the bug was far away in time and commits from when the bug started showing up.

Extending the Python tclobj type

I started implementing ever more of the methods that a Python type could implement, wherever it made sense. I started looking at Python's "number protocol," a set of methods that a data type can implement if its developers want it to be able to be used as a number, that is, used in calculations, used to receive the result of a calculation, etc. At this point we're a little off the map insofar as Google's not turning up how-to postings or anything. The Python source code was very helpful, although the built-in data types often use functions that are internal to the Python interpreter or even use older, obsolete ways of defining things, etc.

Chris pointed out that tclobjs could be a lot more pythonic, and as I continued to work on the tclobj and tcldict data types, I began to understand Python internals better. t.as_int() went away, replaced simply by int(t). Yes, Python types implemented in C can provide a C function to perform the conversion to int. For example, Tohil would use the Tcl C function Tcl_GetIntFromObj to get the integer, or error. Likewise float(t), set(t), tuple(t), bool(t), etc., replaced their as-underscore equivalents.

Returning tclobjs by default

Tohil was starting to get good. Explicitly working with tclobjs wasn't utterly aligned with achieving the transparency we were calling for, but it sure made Python facile with Tcl objects, and we could tell it "felt right."

Then one day Chris suggested changing Tohil to return tclobjs by default. At first I was concerned it would cause a lot of problems, but I agreed to try it. By this time the test suite had hundreds of tests, and with trepidation I ran "make test", expecting tons of tests to fail, or worse, and to my surprise only two tests failed, and the reason why was immediately obvious: The test code assumed a string had been returned and then tried to invoke a method that string objects had that tclobjs didn't. It was trivial to fix and then all the tests passed.

The next thing was to try it with the code our developers had written. Again except for the same string method issue in a couple places, everything just worked.

Wow. Just wow.

Not only did returning tclobjs by default work great, tclobjs are in many ways more flexible and "do the right thing" when used from Python than native Python ints, floats, and strings! For example, in Python you get a TypeError exception if you try to add a string and a number, but with a tclobj, it works just fine.

5 + '5'
t = tohil.tclobj('5')
t + 5

It now seemed that we were able to call our Tcl functions and get their complex returns like lists, dictionaries, etc., without any funny business, so we could turn our attention to creating the Python interfaces to the Tcl functions.

Getting the transparent call stuff going

As I knew at the beginning, it was no biggie to get Tcl to recursively find all the defined namespaces and find all the native Tcl functions ("procs") and all the C-based functions, which Tcl calls "commands". For procs we could also determine their arguments, and default values for any arguments that had them.

I crafted a "trampoline" function, one that could take the arguments from a Python call and correctly determine if all the arguments were there that were needed, raise an exception if not, and otherwise get all the arguments and default values (when needed) assembled properly for Tcl and make the call and return the results.

I used metaprogramming; that is, for each of the Tcl functions I generated a string comprising a Python function that called the trampoline and ran them through Python to make them available to Python programs.

And it worked. Worked pretty well. You could start to imagine the end being in sight.

Then Chris points out to me, "Y'know, it isn't really necessary to use metaprogramming to do that, and if you can do it without it, you should, in the interest of reducing complexity." Interesting. Do tell. So he teaches me that we can directly create an executable Python object that invokes the trampoline function when called, with no source code having to be made our evaluated, and that worked, and he was right: It is a lot less complicated that way.

Perhaps surprisingly, out of all of Tohil this was probably the easiest part. I think because it was pretty clear what it needed to do, and both Tcl and Python had what we needed to pull it off.

Multiple interpreters

Tcl has supported having multiple interpreters since the very beginning, and Python too has more recently received the capability to have multiple interpreters. We added support for many Tcl interpreters to each have their own Python interpreter, and vice versa. This was tricky to get right. Chris solved a problem I created before we added support for this where I cached a couple handy pointers that should not have been cached under multiple interpreters, requiring him to spend a day in gdb. (Sorry about that, Chris.)

Tohil and Apache Rivet

We use Apache Rivet to script Tcl in our HTML, etc., and used multiple Tcl interpreters, one for each developer for each httpd process, on our dev servers at that time, so multiple interpreter support was vital to making Tohil available to the web devs.

The web devs noted that embedded Python didn't line up well with the rest of the code because of Python's indenting rules requiring code to start from the left margin. Peter contributed the intricate needlepoint to allow the embedded Python code to be indented inline with the rest of the code that invoked it, while using minimal CPU cycles.

Hypothesis testing framework

Also thank goodness I wrote a ton of tests and also adopted the hypothesis framework.

Hypothesis was great because it turned up a lot of problems. It's fine to confirm that 25 / 5 = 5, but when hypothesis generated a lot of variants it turned up that while 25 / 0 causes Python to raise a divide-by-zero exception, if one of the sources of the numbers was a tclobj then instead the program died from a floating point exception.

I didn't think to try zero, but Hypothesis did. I didn't think to try a string of "{", etc.

If you're writing a Python module and want to catch bugs in your tests that you didn’t foresee, check out Hypothesis.

Documentation

If the first month was one of exhilaration and creation, the second and third months were a slog. Bug fixing and writing documentation.

I set a goal that the Tohil documentation should be as good as the Python documentation. And that is a high bar, and kind of overwhelming at first. I was a little disappointed at the gap but then I realized that the Python docs had been written by a lot of people over the course of many years.

But then like so many other things you just keep iterating and iterating and eventually if you actually are improving it then you get something good.

I would say the Tohil docs are not quite as good as the Python docs, but they are close, and in the upper echelon of documentation for open source projects on GitHub.

Success

It works and we're using it. Heavily. Another good sign that you're on the right track, the release cadence has fallen sharply, because the developers aren't getting cut on sharp edges and it's not breaking in production.

The things they wanted to do, they are doing. For instance all the popdown menus on the FlightAware website are being done using a Python framework.

Surprises

Of course if you're going to call Tcl from Python, you're going to eventually need to call Python from Tcl. It was obvious. So that was in there from the beginning.

What I didn't expect was the devs using Python from Tcl to make something easier from Tcl. But there it was. Rather than coding up a Tcl library for a REST API to the FastAPI interface, they used a Python module that already existed.

Conclusions

This was a very important project for us, and it's the sort of project that has a higher likelihood to fail or to become a tarpit than most.

We succeeded.

We took the time to experiment and iterate, to figure out what worked and make what we needed.

Also having Python experts helped immensely, because they, Chris especially, made great suggestions that resulted in Tohil being much more pythonic than where I, with less experience, would have left it.

It was vital to have people trying to use it and providing feedback.

And it was great to be able to outsource some of the tricky parts to others, those with a very particular set of skills.

I drew on my decades of experience, taste, and openness, and accepted help from all corners. Openness to allow myself to be contributed to. Openness to speculatively try stuff, to prototype and test and see if something's possible and to swing for the bleachers.

In the Mythical Man Month, Frederick Brooks said "Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds." Tohil has high conceptual integrity.

He also said, "What one programmer can do in one month, two programmers can do in two months."

Looking this over, one of the most ready criticisms would be that the spec was too thin. But how good of a spec could one or ten people have written without trying to build anything? We didn't understand the need for tclobjs until we had something working and tried to use it. I am reminded of TCP/IP versus OSI, or Unix versus 360/OS.

You can see all the commits, for better or worse, on the GitHub project.

If you have a bunch of Tcl code and you want to call it from Python, or vice versa, you probably want Tohil.

If you have a problem with a set of known inputs and a set of known outputs, you probably don't need a lot of design.

But if you're trying to create something new, and you only have vague outlines of what it should look like, you need the right people and you need to allot the time for experimentation and play. This isn't always easy to do within the structures most of us use for managing the building of software. But when you pull it off, it's magical. Enjoy.

Origin and Development of Tohil was originally published in Angle of Attack.

FlightAware’s Terrestrial ADS-B Network

Thu, 07 Oct 2021 11:03:19 -0500

As a Software Engineer 2 on the ADS-B team at FlightAware, Eric Tran is responsible for the development and maintenance of FlightAware’s ADS-B receiver network.

This post gives a breakdown of our ADS-B ground station hardware and how they process aircraft data to provide us our largest source of real-time flight data at FlightAware.

How It Started

Before we dive into it, let’s take a flight through history. In the early 2000s, there was an emergence of a new aircraft surveillance technology known as Automatic Dependent Surveillance–Broadcast, or ADS–B. With this technology, aircraft obtain their location and other information via satellite and broadcast it on the 1090 MHz frequency. Air Traffic Control (ATC) receives this ADS-B data and uses the information to manage nearby airspace.

FlightAware took advantage of this new opportunity and created our very own ADS-B receiver that could forward raw flight data to FlightAware over the Internet. With the help of evolving technology and a group of dedicated aviation enthusiasts, plane-spotter techies, and airports around the world, we were able to build out the network of receivers we know today.

What’s in our ADS-B Receivers?

The Hardware

The central components consist of a Raspberry Pi Computer, a USB RTL2832U Software Defined Radio (SDR), and a 1090 MHz antenna. We developed our very own line of SDRs, which we call the Pro-Stick and Pro-Stick Plus. While both models have built-in RF amplifiers to maximize ADS-B performance, the Pro-Stick Plus has a built-in 1090 MHz filter to reduce noise in high RF environments. A detailed review of our 1090 MHz antenna and SDRs can be found here.

FlightAware provides users with two ADS-B hardware solutions. The first is an open-source solution, called PiAware, that anyone can build themselves with full autonomy by gathering the hardware and loading FlightAware ADS-B decoding software on it. The other solution is called FlightFeeder, which is manufactured by FlightAware. FlightFeeder is self-configuring and remotely managed by FlightAware, so we can provide troubleshooting support.

Left -PiAware Right - FlightFeeder

The Software

The software that runs on our receivers is responsible for decoding aircraft signals, providing a local webpage to view the aircraft on a map, and sending the tracking data to FlightAware.

dump1090-fa is an ADS-B, Mode S, and Mode 3A/3C demodulator and decoder that will receive and decode aircraft transponder messages received from a connected SDR. It listens for client connections on specific TCP ports to allow streaming of the decoded messages in a variety of formats. Examples of the data formats and their respective ports:

TCP port 30002 for raw/unparsed messages in AVR format
TCP port 30003 for parsed messages in BaseStation format
TCP port 30005 for raw/unparsed messages in Beast binary format

The information that can be derived directly from these aircraft messages includes position, altitude, squawk code, aircraft identification, airborne vs ground messages, speed, heading, roll, and more. Using this data, we even derive other information like weather!

SkyAware Web Interface

For easy installation, we’ve bundled all the necessary software into a pre-built Raspberry Pi OS Lite image that users can load onto a micro-SD card for their Raspberry Pi. Alternatively, the software packages can be downloaded and installed from FlightAware’s upstream repositories via Debian’s Advanced Package Tool (APT).

Multilateration (MLAT)

Not all aircraft are ADS-B equipped and, as a result, cannot broadcast their location. However, through the use of a multilateration, FlightAware can derive an aircraft’s location using the 1090 MHz Mode-S transponder signals being emitted from the aircraft. By using the known location of 4 or more ADS-B receivers on the ground, we can calculate the distance an aircraft is away from each of the receivers. We achieve this by using the time it takes for the Mode-S signal to propagate from the aircraft to the receiver and the propagation speed of the signal. With those distances, we can derive the location and track these aircraft that are not trackable via ADS-B.

Visual diagram of MLAT

TCP port 30105 for multilateration results in Beast binary format
TCP port 30106 for multilateration results in extended BaseStation format

Where We Are Today and What’s to Come

Over the last 15+ years, we gained an invaluable network of users that helped reach more than 32,000 receivers throughout the world. These receivers are simultaneously sending real-time flight data 24/7/365 to FlightAware, so you can imagine the immense amount of data there is to process. We have built out a backend infrastructure to ingest, filter, and standardize this data and can handle the load from the rapid network growth we expect in the coming years.

The FlightAware ADS-B team has some highly anticipated hardware and software releases coming later this year. We plan to continue making new developments to benefit our existing flight tracking community and plan to reach out to a new generation of aviation enthusiasts to teach them about the world of flight tracking.

Global view of FlightAware receivers

Terrestrial data coverage provided by FlightAware receivers

FlightAware’s Terrestrial ADS-B Network was originally published in Angle of Attack.

Using Jenkins and Fastlane in Our CI/CD Pipeline to Manage Our iOS Release Process

Tue, 17 Aug 2021 10:00:14 -0500

As a Software Engineer on FlightAware’s Mobile team, Zack Falgout works with Swift every day to ensure the FlightAware iOS app continues to be a gold standard in aviation apps on the Apple App Store.

Getting an app on Apple’s App Store is the end goal for many iOS developers. The amount of work that goes into reaching that goal is substantial. After first learning the language, the aspiring developer must come up with an idea that they can subsequently translate into a fully functioning application. This is already months, if not years, of work. Naturally, creating the application itself is where most developers focus at the beginning of their development journies, but there’s one final step that gets overlooked in the process: submitting the app to Apple for approval. It’s a step that can be more complicated than expected, and it’s easy to miss when starting out because there’s already a mountain of information to parse and learn.

At FlightAware, we aim to release a new version of our app each month, so frequently repeating this process could end up taking a significant amount of time.

Our solution is a Continuous Integration (CI), Continuous Delivery (CD) pipeline utilizing two tools: Jenkins and Fastlane. Not only does this ease the burden of the final upload to Apple, but it also smoothes out the entire development process.

There’s some technical jargon in that last sentence, so let’s first set some definitions. Continuous Integration is the act of building and testing code automatically alongside each new commit made to a repository. Continuous Delivery simply means that your application is always in a state in which it can be deployed. There’s a manual component associated with Continuous Delivery as well: if one were to automate the Continuous Delivery portion of the pipeline, it would then be referred to as Continuous Deployment.

The steps of our CI/CD procedure are as follows:

Code is written to handle a particular issue or feature.
We then begin our peer review (PR) process. Once a PR is submitted to the rest of the team, our iOS – Tests Jenkins job runs automatically. The integration with Fastlane will trigger the test lane to run. Should any of these tests fail, the Jenkins job itself fails and we can investigate the issue. This job is one of our most important because it is our early warning system that something may be wrong.
Once the code successfully passes peer review, it then gets merged into our develop branch. Any merges onto our develop branch will automatically begin our iOS – Alpha Jenkins job. This will trigger the Fastlane alpha lane to run. The product of the alpha lane is an .ipa file that our QA testers can use for early testing. This procedure gets repeated for every commit made to develop.
At some point, we decide it is time to release our current code on develop into the app store. We create a release branch from develop. The Jenkins iOS – Beta job is manually run. This will use our beta lane in Fastlane to package and upload our files to Apple’s servers. Once the process is complete, we are free to begin our TestFlight process in App Store Connect.

Jenkins is our automation tool of choice for the Continuous Integration portion of the workflow, and we have a suite of jobs we use for different parts of the release process and special cases. We can create and destroy jobs as needed. This is beneficial because there have been many times where we had a long-running branch that could not be merged into our ‘develop’ branch due to new feature development, but the code on that branch still needed to be tested by QA.

Our current Jenkins Mobile jobs. More information on Jenkins can be found at https://www.jenkins.io/.

Fastlane is a tool that allows developers to automate tasks associated with the iOS app development process. Each group of scripts is called a lane, and each lane is aimed at handling a specific phase of the CI/CD process. At FlightAware, we have lanes for many different phases of development including unit testing (step two of our CI/CD process), UI testing, alpha builds (step three of our CI/CD process), beta builds (step four of our CI/CD process), and automated screenshots. We even have a lane to handle package management for some of our in-house packages.

Here is our test lane as an example of how we are handling lane setup and execution:

  desc "Runs all the tests"
  desc "Generates the test coverage"
  desc "Makes sure that the app can still be archived."
  lane :test do
    run_tests
    
    
    testExportOptions = {
      "provisioningProfiles" => {
        "com.flightaware.iphone.live-flight-tracker" => "com.flightaware.iphone.live-flight-tracker AdHoc",
        "com.flightaware.iphone.live-flight-tracker.notification-content-extension" => "com.flightaware.iphone.live-flight-tracker.notification-content-extension AdHoc"
      }
    }

    gym(
      project: ENV["XCODE_PROJECT"],
      scheme: ENV["XCODE_SCHEME"],
      configuration: ENV["XCODE_CONFIGURATION_ALPHA"],
      export_options: testExportOptions,
      skip_package_ipa: true,
      use_system_scm: true
      )
  end

The integration between Fastlane and Jenkins is seamless and allows for us to offload time-consuming tasks to a build server, which allows any developer or non-developer to kick off a build and submit to Apple for review. In fact, there’s an ever-increasing counter of developer hours shown on Fastlane’s website that stands at 28,039,568 at the time of this writing. Because of Fastlane, we save around half an hour with each of our beta builds.

Jenkins integrates with Fastlane via a shell script. We have a Jenkins folder that contains scripts for each Jenkins job in our project so that as much as possible, our build scripts are versioned in source control. For example, our iOS – Alpha job has a command that looks like this:

The contents of the file at that path are:

sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
export XCODE_APP=Xcode.app

export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
export LC_ALL=en_US.UTF-8

cp /Users/xcodeserver/.env.ios.default .env.default

make install

bundle exec fastlane alpha

The final line of that script is what will start our alpha lane in Fastlane. This is how we keep the two tools functioning together.

Unfortunately, there are a few drawbacks to our current system. With each new tool added into the chain, we also add the overhead of understanding said tool. Both Jenkins and Fastlane have learning curves associated with them, and coming up to speed may not be an easy endeavor. Fastlane itself requires a small amount of scripting knowledge, which I admittedly did not have when I began using it. At FlightAware, we share knowledge amongst each other so that learning curves become less steep. Keeping each tool up to date and the versioning in sync across our build server and personal machines can sometimes cause headaches, but they’re typically minor and can be handled easily.

One of our greatest struggles with our current system began when Apple began requiring that all developers have two-factor authentication enabled to use their developer accounts. This meant that our build server would need to periodically re-verify its authenticity to connect to Apple’s servers. To solve the issue, we created a ruby script that will request authorization from the command line. This does require a developer to log on to the build server and execute the script, but it is a quick process to get us back up and running.

We are currently happy with our pipeline, but we also understand that we are not taking full advantage of the power offered by the tools we are currently using. For example, we have a Jenkins job and Fastlane lane to handle capturing screenshots throughout the app. The screenshots are captured on devices of different sizes and localizations. This is important as it is impossible for one developer to go through each combination looking for errors, unless that was all the developer wanted to do with their time. The issue here is that to get good screenshots, we need to delete the app from the simulators after each run. We have code in place to do that, but Apple keeps updating the prompts to remove apps from devices. Each time they add or remove a button, it breaks our code. Revisiting this job will likely be something we do soon.

Outside of that, Fastlane offers several built-in tools that we are not yet utilizing. One such tool is the automated distribution of beta builds. The implementation of this is halfway complete as we’re already using Fastlane to upload the beta build of our app. The final two components we would need to implement would be to allow Fastlane to handle incrementing the build and version number and generating a CSV file with the information of our testers so that Fastlane has access to that information. Incrementing the build and version number would be beneficial for us alone as right now we are utilizing the agvtool to do this manually before each run of the beta lane. Sometimes this step is missed, which means the job will fail as App Store Connect will not accept new uploads with existing build numbers.

Another pain point that Fastlane could alleviate is managing provisioning profiles. The act of onboarding new employees can be particularly painful, as they need to create new certificates, add those new certificates to the existing provisioning profiles, and then import the provisioning profiles for use with Xcode. It sounds like a simple process on the surface, but in practice it is overly complicated. Fastlane’s match action would allow us to share a single code signing identity for the entire team. This identity would be kept in a GitHub repository, which Fastlane could access. Fastlane can even reset existing profiles and certificates when they expire.

The Continuous Integration, Continuous Delivery philosophy has benefitted us on the Mobile team at FlightAware tremendously. If we save an average of thirty minutes per beta build, then Jenkins and Fastlane have saved us months of time on that job alone. Include the time saved between all the jobs and it increases significantly. The integration between Jenkins and Fastlane has opened up delivery of any stage of the app to anyone on the team. This means we’re not reliant on a specific developer or group of developers to get an alpha build to QA or a production build out to Apple. The only privilege necessary is access to our Jenkins dashboard. All of these together have given us the freedom to focus on what is important, which is writing high-quality code and getting it to our users as quickly and efficiently as possible.

Using Jenkins and Fastlane in Our CI/CD Pipeline to Manage Our iOS Release Process was originally published in Angle of Attack.

Redeye: Cloud Regression Tests for HyperFeed ®

Mon, 09 Aug 2021 14:58:17 -0500

As a Software Engineer 2 for the Flight Tracking crew, Yuki Saito is responsible for the performance, reliability, and observability of multi-machine Hyperfeed.

Abstract

HyperFeed® (HF) is FlightAware’s core flight data processing engine that empowers worldwide flight tracking for our customers. Redeye is an internal service at FlightAware that allows developers to run regression tests for HyperFeed in AWS. Running regression tests is critical in ensuring that code changes made to HyperFeed do not break previously working functionality. Historically, regression tests ran on a single machine on-prem using a program called hf_regression_tests. As the number of tests and the number of users increased, a need arose to move our testing infrastructure from on-prem to the cloud. A significant effort was then undertaken to support the initiative throughout 2020.

This post introduces our historical hf_regression_tests, describes the motivation for migration to the cloud, provides high-level, dynamic views of Redeye, and discusses the pros and cons of Redeye. This post assumes familiarity with some basic concepts in AWS, Kubernetes, and Argo.

What Is hf_regression_tests?

HyperFeed is responsible for ingesting all FlightAware’s 40+ data feeds, aggregating the data together, resolving inconsistencies, filling in gaps, detecting and filtering out unreliable or bad data, and producing a single data feed that represents an all-encompassing, coherent view of worldwide flight traffic as understood by FlightAware^[1]. Viewed as a black box, HyperFeed is one large function with side effects. It takes various data feeds as input and produces the canonical consolidated feed as output. While doing so, it causes side effects to PostgreSQL to manage the states of flights being tracked.

Once tasked with creating an infrastructure for testing the functionality of HyperFeed, we developed hf_regression_tests. This was meant to run on a single machine on-prem. To describe how it works, we show hf_regression_tests in action in Figure 1:

Figure 1. hf_regression_tests in action

A sequence of steps is carried out as follows:

The test user prepares a test file in the YAML format. The file contains two primary pieces of information: the predefined description of a flight under test and SQL queries to run against PostgreSQL for verifying the states of the flight.
The test user starts running hf_regression_tests, passing the test file among other inputs.
hf_regression_tests spawns a HyperFeed process in simulation (sim) mode.
Running in sim mode, HyperFeed generates a stripped-down version of data feeds based on the predefined description of the flight in the test file. This ensures that HyperFeed only needs to process a subset of data feeds relevant to the given test, reducing the total execution time. This derived file may be stored on a disk at a designated location so that the next time the same test runs, HyperFeed in sim mode does not need to regenerate the same file again.
HyperFeed processes the stripped-down version of data feeds in a streaming fashion. As it handles each data feed line, it updates the relevant tables in PostgreSQL such as "flightplans.
After the HyperFeed process has exited, hf_regression_tests runs the SQL queries described in the test file against PostgreSQL to determine a pass/failure for the test.

The steps above describe how hf_regression_tests processes a single test file. In the case of processing multiple test files as input, it spawns a HyperFeed process per file. hf_regression_tests ensures that the maximum number of tests being processed is 30 at any point, with each assigned its own schema in PostgreSQL so their schemas were isolated from one another.

Growing Pains for hf_regression_tests

Now that we have seen how hf_regression_tests works, let us talk about its pain points.

The first pain point was the growing turnaround time for a run of hf_regression_tests. As of writing, we have about 400 test files, each containing multiple test cases. On average it takes a single run approximately 45 minutes to finish all tests. We could have increased the maximum number of tests being processed simultaneously, but we feared that this architecture might not keep up with the ever-increasing number of tests in the future.

The second pain point was that only one user at a time could run hf_regression_tests in our Jenkins pipeline. We have Jenkins integrated into our GitHub repository for HyperFeed, and every pull request (PR) must run this pipeline and pass all tests for it to be qualified as “ready to merge.” In this model, Jenkins will deploy code changes being verified to a shared location on a designated machine on-prem before it runs hf_regression_tests. This means that the second user wishing to run it via Jenkins would have to wait for the first user to finish, which could take up to the full duration of the tests in the worst case. The issue was less painful when the number of users was limited, but we knew that this would not continue to be the case as more people were added to the team.

Moving Testing Infrastructure to Cloud

To address the pain points of hf_regression_tests, we considered migrating our testing environment from on-prem to the cloud. The Predictive Technology crew had tremendous success in using Argo running in Elastic Kubernetes Service (EKS) to train predictive models. Therefore, it made sense to adopt a similar technology stack for developing our testing infrastructure in the cloud. We were reasonably confident that Cluster Autoscaler in AWS would ensure that it automatically provisioned additional machines based on the number of tests running simultaneously. Furthermore, Argo was a perfect fit for managing the testing workflow we had envisioned where we wished to execute all tests in parallel and aggregate their results at the end.

Apart from executing tests in the cloud, there were more caveats to consider for this new testing service in the cloud. While it was meant to largely replace the existing hf_regression_tests, we wanted to minimize the impact on users’ day-to-day workflow as much as possible. Specifically, users would interact with GitHub and Jenkins as a façade to trigger runs of hf_regression_tests. It was crucial that they would be able to continue using the same façade to interact with our new testing service in the cloud.

Next, we will navigate through a high-level view of Redeye and see how the pieces we laid out in AWS met the requirements above.

Redeye from a Rube Goldberg Machine’s Perspective

In this section, we will be looking at several diagrams, each focusing on a particular scenario during the execution of Redeye. The goal here is to understand the behavior of Redeye at a high level from a Rube Goldberg machine’s perspective.

Redeye infrastructure

Figure 2. Services for Redeye in AWS

Figure 2 shows how various services for Redeye are laid out in AWS and how the user can indirectly interact with them through GitHub and Jenkins. Everything outside the AWS box (depicted as a solid rectangle in black) is considered “things operating at FlightAware.”

Figure 2 also shows dotted arrows between services. Each dotted arrow indicates how one service (from which the line is originating) is configured to run in response to some events from another service (to which the arrowhead is pointing). This means that actual data flow at runtime between two services is in the reverse direction of the dotted arrow connecting them (if a dotted arrow goes from service A to service B, data flows from service B to service A at runtime). These configurations are persisted and do not change each time an EKS cluster is provisioned (which is in the VPC box depicted as a solid rectangle in green).

An EKS cluster, on the other hand, is not persisted. It is set up at the beginning of each business day and torn down at the end of the day. This is because we want to save our AWS cost by not keeping around the cluster during hours when no one is using it. Cluster setup and teardown are managed by Terraform and we have a cron job that runs “terraform apply” and “terraform destroy” at specified times.

Furthermore, we maintain two auto scaling groups in the EKS cluster that together manage the execution of regression tests by automatically adjusting the number of EC2 instances depending on how many tests are running simultaneously.

Remember the entities previously discussed in Figure 1? They are now stored in the following locations in Figure 2:

Test files: stored in the “master tests” S3 bucket and in the “pending tests” S3 bucket
Stripped-down data feeds: stored in the “master tests” S3 bucket and in the “pending tests” S3 bucket
HyperFeed: stored as a Docker image in Elastic Container Registry (ECR)
PostgreSQL: stored as a Docker image in ECR

When a new EC2 instance is provisioned, either at the initial setup of an EKS cluster or by Cluster Autoscaler during the execution of regression tests, it runs a user data script to download the test files and stripped-down data feeds from both of the “master tests” S3 bucket and the “pending tests” S3 bucket to a local directory. This ensures that every EC2 instance in the EKS cluster has the same set of test files and stripped-down data feeds locally.

We have just scratched the surface of the architecture of Redeye but glancing over how pieces are laid out statically is merely one way to understand the whole infrastructure. Just like one could better understand a Rube Goldberg machine when it is in action, we should see Redeye in action, focusing on one execution scenario at a time.

Scenario 1 - Triggering a run of Redeye

Figure 3. The user (on the left) triggering a run of Redeye

Figure 3 shows a sequence of actions (depicted by solid arrows in yellow) initiated by the user and leading up to a point where an Argo workflow for regression tests is submitted to EKS.

Let us go through the sequence one step at a time:

The user opens a PR for a feature branch or “git push” new commits to the PR. This was how the user would also run hf_regression_tests on-prem through Jenkins, so the user’s workflow is unchanged for interfacing with Jenkins.
The GitHub Pull Request Builder plugin reacts to a PR event and causes Jenkins to run the “regression tests” pipeline.
The execution of the pipeline runs “docker build” to create a new Docker image for HyperFeed containing code changes in the feature branch. It then uploads the resulting image to ECR so that EKS can later fetch it while running the regression tests. Optionally, if the changed files in the PR include a schema update to PostgreSQL, the pipeline execution also builds an updated Docker image for PostgreSQL and uploads it to ECR.
The execution of the pipeline inspects the changed files in the PR, looking for added/modified/deleted test YAML files. For added/modified ones, it derives stripped-down data feeds if necessary and uploads them along with the test files to the “pending tests” S3 bucket. For deleted ones, it does not immediately remove them from the “master tests” S3 bucket because other users on different branches may still need to run them for their regression tests. Instead, the pipeline execution treats them as filtered-out and later passes their test names to an “argo submit” command (which will be issued at step 7), so they are excluded from a list of tests to run.
A lambda “add new tests” is triggered in response to an “ObjectCreated” event due to added/modified test files and stripped-down data feeds in the pending tests S3 bucket.
The lambda sends a Systems Manager (SSM) command to all EC2 instances in EKS. Each EC2 instance then executes the command to download test files and stripped-down data feeds from the pending tests S3 bucket to a local directory unique to the feature branch.
The execution of the pipeline runs a Docker container called “test-runner” that knows how to update the kubeconfig file to point to a currently active EKS cluster and issues “argo submit” to EKS to start running our testing workflow using Argo.
The execution of the pipeline runs a Docker container called Commit Status Updater. This container will repeatedly poll for a message in Simple Queue Service (SQS) indicating the completion of the test run started at the previous step. Upon finding the target message in SQS, the container updates a GitHub build status icon for the PR depending on the result of the test run (see this for an example of GitHub build status icon). The reason we created Commit Status Updater was that hidden behind the company’s firewall, an endpoint for updating the GitHub build status icon could only be accessed by programs running within our company firewall.

We have gone through a series of event triggers, which started with the user’s “git push” and resulted in the launch of the Argo workflow managing the execution of regression tests. In the next scenario, we will be briefly looking at the execution of the workflow.

Scenario 2 - Executing regression tests workflow using Argo

For those who are not familiar with Argo, we highly suggest checking out our previous post Using Argo to Train Predictive Models since we are not going to provide an overview of Argo here. Instead, we will be talking about the topology of our Argo workflow specific to Redeye. The following Figure 4 shows an instance of the workflow topology, which is a snapshot of our testing workflow spawned by “argo submit” in the previous scenario.

Figure 4: A snapshot of Argo UI showing the execution of testing workflow

The figure shows a Directed Acyclic Graph (DAG) that forms a fan-out and fan-in topology (although the fan-in part is not shown because it will kick in when all tests have finished running). The fan-out part starts at the top of the figure that enumerates a list of all tests to run and branches off to 6-step vertical branches. Each branch represents running a single test file, which corresponds roughly to what we saw in Figure 1 but performs different steps specific to this Argo workflow. The first node in the branch, which is step 0, is a no-op and simply a marker step indicating which test file to process.

The remaining steps comprise the following:

generate-postgres-pod-label: This step is responsible for generating a unique label given to a PosgreSQL pod that will be created at the next step.
init-postgres: This step creates a PostgreSQL pod by pulling its Docker image stored in ECR. The pod created is given the unique label from the previous step so that it can be uniquely identified by that label by a Service that will be created at the next step.
expose-postgres: This step creates a Service for the PostgreSQL pod and exposes its cluster IP. This ensures that a HyperFeed pod can locate its “buddy” database to which it writes through the Service. The unique label is important because the Service is guaranteed to expose one and only one buddy PostgreSQL pod for the HyperFeed pod; without the unique label, the Service could expose a different PostgreSQL pod used by another HyperFeed pod, and two HyperFeed pods would then write to the same PostgreSQL pod. It is worth mentioning that Argo does not provide a direct way of creating Services by itself but does provide Kubernetes Resources that allow us to embed a Kubernetes manifest within an Argo step. Lastly, the PostgreSQL Service itself is given a label (derived from the name of the current workflow) that is shared by other PostgreSQL Services within the same workflow.
run-sim: This step pulls a Docker image for HyperFeed and creates a HyperFeed pod. The test file and its corresponding stripped-down data feeds that the pod needs should have been downloaded to whatever host on which it is assigned. The HyperFeed pod then behaves in much the same way as the HyperFeed process we saw in Figure 1.
verify: The final step, just like step 6 in Figure 1, runs the SQL queries described in the test file against the PostgreSQL pod to determine a pass/failure for the test.

Note that there is an ellipsis in a grey circle in Figure 4 that hides 405 more vertical branches. This means Argo performs the above steps for all 407 tests in parallel (fan-out), and when they have finished it collects individual results to determine the ultimate pass/failure for the workflow (fan-in). Using Cluster Autoscaler, Argo makes sure that enough EC2 instances are provisioned during the execution of the tests so that every pod can be scheduled onto some EC2 instance and make progress.

We have touched upon how Argo manages the execution of our testing workflow. Once the pass/failure of the workflow is determined, we need to deliver it to the user. The next scenario illustrates how we can achieve it.

Scenario 3 - Delivering result to the user

The use case we will now be looking at is to report the result of all tests back to the user. This means two things: one is to let the user know the result, and the other is to update the GitHub build status icon accordingly. The following Figure 5, taking us back to the AWS diagram, shows a sequence of actions that achieves those goals.

Before we move on, we need to clarify some notations used in Figure 5. Steps starting with a numeric value followed by an alphabet such as 3a, 3b, and 3c mean that they execute in parallel with no specific orderings between them. Furthermore, a step with a larger numeric value but with the same alphabet means that it executes sequentially after the previous step with a smaller numeric value with that alphabet. For instance, step 4a runs after step 3a but there are no specific orderings with respect to 3b or 3c.

With that, let us continue our journey through the Rube Goldberg machine in Figure 5, on its way back to the user.

Figure 5. Delivering the result to the user and updating GitHub commit status icon accordingly

1. When Argo has completed the testing workflow, it uploads the result (in JSON) to the “test results” S3 bucket. The JSON has the name of the user in its filename and it will be used at step 3b.
2. In response to an “ObjectCreated” event triggered by the result JSON uploaded, the “test results” S3 bucket sends a notification to a topic in the “tests completed” Simple Notification Service (SNS). The reason for the use of SNS is that S3 only supports sending notifications to a single target (at least at the time of development). If multiple lambdas are interested in such notifications, we need to have some mediator between S3 and lambdas to fan-out the notifications to them.

3a. This is one branch of the execution triggered by the “tests completed” SNS. It invokes a lambda “clean up DB Services.” Why do we need to clean up Services for PostgreSQL? Recall that Argo created those Services using the Kubernetes Resource constructs. Their lifecycle is managed by Kubernetes itself rather than Argo and because of that, it is not up to Argo to clean them up when it has completed the testing workflow, but rather it is up to us to ensure that those resources are deleted properly.
4a. The lambda issues the equivalent of “kubectl delete svc” to EKS, using the Kubernetes API in Python, to delete PostgreSQL Services. In the EKS cluster, there may be other PostgreSQL Services that belong to a different workflow owned by another user. How do we know which of them to delete? Recall from the step “expose-postgres” in the previous scenario that all PostgreSQL Services share the same label. This label is derived from the name of a workflow, and it is available in a parameter passed to the lambda. The lambda can then construct the label on-the-fly and pass it to “kubectl delete svc,” deleting PostgreSQL Services whose labels match the target one.

3b. In another branch of the execution triggered by SNS at step 2, a lambda “deliver result email” is invoked. Using the username embedded in the filename of the JSON, the lambda uses Simple Email Service (SES) to deliver the pass/failure to the user.
4b. SES delivers the test result to the user.

3c. In the last branch of the execution triggered by SNS at step 2, SNS sends a message to the “tests completed” SQS.
4c. The Commit Status Updater container has been polling for the message. Once it has found the message from step 3c, the container retrieves it and deletes it from SQS.
5c. The Commit Status Updater container checks the contents of the retrieved message and obtains the result of the test run. It then sends a POST request to our GitHub endpoint to update the build status icon in the PR accordingly and terminates the container itself.

At this point, the user who initiated the run of Redeye has received the email containing the pass/failure, and the build status icon in the PR has been updated to display the result.

If code changes from the feature branch did not pass all tests, the user needs to address the test failures and re-run Redeye. Rinse and repeat until the code changes pass all tests. When they do, the user is finally able to merge the PR to the developers’ shared branch.

It turns out that there is some additional housekeeping that needs to be done when a PR is merged into the developers’ shared branch, which we will be looking at in the next scenario.

Scenario 4 - Merging feature branch to developers’ shared branch

The use case discussed in this scenario is to merge a feature branch that has been qualified to be merged into the developers' shared branch. This gives our Rube Goldberg machine the final nudge to complete its journey. In essence, we ensure that what has been merged to the shared branch will be visible to other users of Redeye. With that in mind, let us trace through Figure 6 (the notations we explained for Figure 5 also apply here).

Figure 6. Merging feature branch to shared branch

1. The user merges the feature branch to the developers’ shared branch.

2a. In response to a GitHub PR event, one of the GitHub Webhooks is triggered to send a POST request to an endpoint defined by API Gateway.
3a. The API Gateway triggers a lambda “merge pending tests.”
4a. The lambda inspects an incoming payload to determine whether it should proceed with further processing. In particular, the lambda checks for two conditions about the payload: whether it is “git merge” and “merged to the shared branch.” If the payload satisfies both conditions, the lambda executes a command to move added/modified test files and their stripped-down data feeds from the “pending tests” S3 bucket to the “master tests” S3 bucket.
5a. The added/modified test files and their stripped-down data feeds are moved from the "pending tests" S3 bucket to the "master tests" S3 bucket.
6a. In response to an “ObjectCreatedByCopy” event in the “master tests” S3 bucket, the lambda “add new tests” is triggered.
7a. Like step 6 in Figure 3, the lambda sends an SSM command to all EC2 instances in EKS. This time, each instance executes the command to download the moved test files from the “master tests” S3 bucket to a local directory visible to all test users.

2b. Another GitHub Webhook is triggered to send a POST request to Jenkins via the Generic Webhook Trigger plugin, which runs the “merge” pipeline. The plugin performs a sanity check similar to what is performed at step 4a. Specifically, it checks whether the POST request represents “git merge” and “merged to the shared branch.” If so, it proceeds with subsequent steps.
3b. The execution of the pipeline optionally builds and uploads auxiliary Docker images to ECR. Whether auxiliary images need to be uploaded depends on the changed files that have been merged to the shared branch. Auxiliary images include a base Docker image for a HyperFeed image (as used by its multi-stage build) or an updated PostgreSQL image if merged files contain a schema change.
4b. If merged files include deleted test files, the execution of the pipeline will remove them and their stripped-down data feeds from the “master tests” S3 bucket.

After we have gone through the above steps, the code changes from the feature branch have been merged to the shared branch, and the required updates have been reflected to our testing infrastructure. That way, the states of Redeye will be in sync with our GitHub repository for HyperFeed as developers make code changes.

Pros and Cons of Redeye

We have indeed come this far. It is a good time to go back and review the original pain points for hf_regression_tests.

They boil down to:

Increasing turnaround time: at least around 45 minutes
Single person use

Here is how Redeye addressed them:

The average turnaround time is now between 20 to 23 minutes
Multiple users can run it at the same time

Crucially, we expect the average turnaround time to remain unchanged by virtue of Cluster Autoscaler even if the number of tests increases in the future, as long as Argo can support our growing workflow.

One notable downside of Redeye is that it is currently far less capable of allowing a user to diagnose the test failures in the cloud than hf_regression_tests on-prem. The primary reason is that each PostgreSQL pod will be terminated by Argo as soon as the corresponding 6-step vertial branch has finished. This means that the tables in PostgreSQL such as “flightplans” will cease to exist for the user to run the SQL queries against to understand why some tests failed. To work around this limitation, the user is encouraged to use hf_regression_tests on-prem instead where the user can run a small number of tests quickly and not worry about PostgreSQL being deleted. Therefore, we have a clean separation of responsibilities where Redeye is sort of a sledgehammer that can run all tests faster but is less capable of allowing the user to diagnose individual test failures whereas hf_regression_tests on-prem takes twice as long to process all tests but provides better support for investigating test failures.

Conclusion

In this post, we have looked at how hf_regression_tests works on-prem and described the motivation for migration to the cloud due to its pain points. We have gone through a set of scenarios for Redeye to better understand how it works end-to-end from a Rube Goldberg machine’s perspective. Finally, we have discussed the pros and cons of Redeye and why hf_regression_tests on-prem continues to be useful for us.

Now, the reader may have noticed that we did not discuss any architectural tradeoffs associated with our design of Redeye. For example, we did not provide any rationale behind some of the architectural decisions we made:

Why are there two auto scaling groups in EKS, one being a spot instance and the rest on-demand instances?
Why do all provisioned EC2 instances in EKS reside in a single Availability Zone as opposed to multiple Availability Zones?

In a nutshell, these are tactics to satisfy our high priority quality attributes^[2] for Redeye. More in-depth discussions on quality attributes, architectural tradeoffs, and what they mean in our context can be a topic for a future blog post

References

^{1. Zach Conn (2016). Hyperfeed: FlightAware’s parallel flight tracking engine. 23rd Tcl/Tk Conference, Houston, TX.↩}
^{2. Len Bass, Paul Clements, and Rick Kazman (2012). Software Architecture in Practice 3rd Edition. Addison-Wesley. ↩}

Redeye: Cloud Regression Tests for HyperFeed ® was originally published in Angle of Attack.

Using Argo to Train Predictive Models

Fri, 04 Jun 2021 10:00:00 -0500

As a Software Engineer 2 on FlightAware’s Predictive Technology crew, Andrew Brooks continually works to maintain, analyze, and improve FlightAware’s predictive models as well as the software and infrastructure that supports them.

A Quick Introduction to FlightAware Foresight

Foresight is the name we’ve given to FlightAware’s suite of predictive technologies, which use machine learning models to emit real-time predictions about flights. These models draw upon datasets that combine thousands of data sources and incorporate routing and weather data to forecast future events in real-time.

In this blog post, we’re going to focus on Foresight’s real-time ETA predictions. Specifically, Foresight can provide two kinds of ETAs: an estimated “on” time (when a flight lands on its destination’s runway) and an estimated “in” time (when a flight has finished taxiing to its gate at the destination). Foresight ETA predictions require two machine learning models for each destination, which provide the “on” and “in” predictions, respectively.

Training the machine learning models that power Foresight ETAs is not an easy task. In order to support thousands of destinations around the world, we need to train about 3500 different models. To make matters worse, these models must be retrained once per month, from scratch, to ensure that they’re able to adapt to any changes in real world conditions.

Training Foresight Models

Before we can discuss training models at scale, it’s necessary to describe the process we use to train a single model. Seen from a high level, the model training process can be divided into two phases:

The “split” phase, which prepares a single-airport dataset for training and performs a test/train split on that dataset
The “train” phase, which takes the output datasets from the previous phase and produces a trained model. Note that this phase may need to be run multiple times -- if the created model doesn’t seem to generalize well or pass our sanity checks, it must be retrained with adjusted hyperparameters.

Growing Pains

In the abstract, the model training process is pretty straightforward. Unfortunately, if we try to do it at scale, it’s clear that the real world has other plans for us. Several challenges become painfully obvious:

There are a large number of models

As previously mentioned, we need to train approximately 3500 different models.

Training a model is very resource-intensive

In the worst case, running the “train” phase for a single Foresight model can require a few hundred gigabytes of RAM and can saturate a 96 vCPU server for several hours. Thankfully, disk requirements are a bit more forgiving – we need only a few hundred GB (and spinning disks will suffice).

The model training workload is variable/unpredictable

We don’t know in advance how long it will take to train a given model. Although the “dataset splitting” phase is easy (it typically completes within 2 hours for the largest datasets and runs in time roughly proportional to the size of the output dataset), the “train” phase is more complicated. Training may take anywhere between a few minutes to nearly 12 hours depending on the airport and model type. Surprisingly, the duration of the “train” phase for an airport does not have a clear relationship with the dataset size: low-traffic airports sometimes take much longer to train than models for high-traffic airports.

This unpredictability has serious consequences for estimating resource usage and trying to plan a cluster size in advance. It’s difficult to accurately estimate how long it will take to generate all models, and it’s equally challenging to estimate peak resource consumption.

Distribution Concerns

Given the scale of the problem, it should come as no surprise that we’ll need to distribute the model training effort over a cluster if we want to finish in any reasonable length of time. This realization comes with a new set of challenges that we’ll need to address.

Coordination

We need a way to efficiently coordinate model training across all participating servers. “Coordination” is a bit vague, so we’ll be a bit more specific about our requirements:

Tasks must execute in the correct order (i.e., we won’t attempt the “train” phase for a model until the “split” phase has completed for that model).
We should try to run as many tasks as reasonably possible, so long as doing so will not exhaust available resources on any given server. This is complicated by the fact that our “split” and “train” tasks have very different resource requirements: the same server that can run dozens of “split” tasks in parallel may only be able to execute the “train” phase for a single model at once.
We need to be able to monitor the training process on a per-model basis to confirm that training is making progress.

Fault-tolerance

Introducing additional servers creates opportunities for failure. This threatens to aggravate “acceptably rare” failure modes to the point of being unacceptably common. To illustrate, suppose we have a server with a 1 in 100 chance of failing during model training. If we use 100 such servers, there’s now a nearly 2 in 3 chance of having at least one server fail. Left unmitigated, those odds are the difference between a system that “generally works” and one that’s infuriatingly unreliable.

When possible, our solution needs to recover from these failures.It should also attempt to isolate them: failing to train a model for one airport shouldn’t jeopardize model training for other airports.

Ease of automation

Initiating and manually monitoring model training on a regular basis could be a substantial drain on developers’ time. Any solution must be easy to automatically initiate, monitor, and check for success or failure.

Angle of Attack

At this point, we’ve established a rough picture of the challenges and requirements of our model training pipeline.It’s time to start getting more specific about how we’ve designed around them.

Addressing Scale Requirements

First and foremost, we need an enormous amount of compute resources for a short period of time in order to train models at scale. This makes our workload an obvious fit for the cloud – we pay for several instances once a month to train a batch of models, but we can still avoid paying for them when we aren’t using them.Thankfully, AWS EC2 has several instance types that have enough memory to train even the largest of our models (specifically, we use m5.24xlarge and comparable instances).Running on AWS also means that we can take advantage of S3 to store input datasets, datasets generated by the “split” phase, and trained models in a common location.

Having the “raw hardware” to run our training process for thousands of models is only a start.Deciding how many instances to provision to train our models is difficult: provisioning too few will require us to wait for a long time for all models to train, yet provisioning too many runs the risk of leaving some instances idle (and paying for them in the meantime).

The solution we adopted is to avoid using a fixed-size cluster at all. By running our workload in an autoscaling^[1] Kubernetes cluster, we can automatically provision or remove instances as necessary in reaction to demand^[2]. All that’s required of us is that we attach appropriate resource requests to Pods as necessary (for the time being, assume we create a Pod for each training phase for each model). Using appropriate resource requests for pods created for each phase allows our cluster to start small when performing the lightweight “split” phase for each airport, gradually increase its size as we enter the computationally intensive “train” phase for each model, and scale back down as model training begins to finish for some airports.

Addressing Coordination Requirements

Thankfully, Kubernetes is also quite helpful for addressing our distribution and coordination requirements.The Kubernetes scheduler is particularly helpful: it’ll try to schedule as many Pods/phases as the cluster has the resources to support and will never schedule a Pod onto a node that doesn’t “have room” for it. Conveniently, Kubernetes is also helpful for gaining visibility into the training process – it’s possible to determine how many phases of model training have completed successfully for a given model by examining Pods in the cluster.

So far, there’s one part of the puzzle that we need to address to use Kubernetes: we still need a means of realizing the training phases for each model into Pods. Those familiar with Kubernetes might suggest bulk-creating Kubernetes Jobs with an appropriate kustomization. At first glance, this is very promising – it’s an easy way to ensure that each phase gets a Pod, and it also helps ensure fault tolerance by automatically restarting pods on failure. Unfortunately, Jobs aren’t quite what we want. There’s no way to describe dependencies between Jobs, and there’s certainly not any way to condition the creation of one Job on the output of another (which we may need to do if the “train” phase needs to be restarted with new hyperparameters).

Technically, we could still create a working model training pipeline using Jobs, but we’d need to create a service to manually handle coordination between tasks. In particular, we’d need to implement a means of preventing the “train” phase Pods for a given model from being schedulable until the two test/train “split” phase Pods had completed. We may also have to add a similar mechanism for deciding whether to re-run the “train” phase Pod with new hyperparameters.

A Better Option: Argo Workflows

At this point, we have a clear niche that we want to fill. We’d like to have some service that executes a directed acyclic graph (DAG) of tasks on a Kubernetes cluster while providing automatic restart behavior and allowing us to inspect model training progress. This is where Argo comes in: it’s an open-source service that does exactly that.

Argo: An Overview

Central to Argo is an abstraction called the “workflow.” An Argo workflow consists of either a sequence of steps or a DAG of inter-dependent tasks. After setting up the Argo service on your Kubernetes cluster, you can parameterize and submit workflows for execution. When Argo executes a workflow, it will create one Kubernetes Pod for each step as soon as its dependencies on other tasks are satisfied. Just as with Jobs, you can tell Argo how to re-try failed tasks in order to recover from transient failures.

Critically, using Argo doesn’t require us to forego many of the useful features and patterns of using manually defined Kubernetes Pods. Resource requests are supported and can be provided for each step in the DAG, allowing us to reap the benefits of the Kubernetes scheduler and autoscaler. Similarly, you can provide templates for Kubernetes PersistentVolumeClaims for specific steps, which can be automatically provisioned by your cluster. In our case, this makes it easy to guarantee that the “train” phase has sufficient scratch space to execute successfully. Although they aren’t used in our model training workflow, Argo is also able to create “sidecar” containers for the duration of a task or use label selectors, among other things.

Writing our workflow

Like many objects and resources in Kubernetes, Argo workflows are written declaratively in YAML. Setting up steps in workflows is rather simple. At a minimum, you specify a container and provide either a command or an interpreter/inline script to execute:

#
# A single-step, “hello world” workflow
#
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-
spec:
  # where to start executing the workflow
  entrypoint: say-hello
  templates:
    # Here’s the step:
  -	name: say-hello
    script: 
      image: python:alpine3.6
      command: [python]
      source: |
        print(“hello there!”)

Specifying a trivial DAG describing a two-step workflow is similarly convenient – we just need to state the dependencies between the steps:

#
# A trivial DAG workflow
#
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: trivial-dag
spec:
  entrypoint: run-this-dag
  templates:
  # The DAG itself
  -	name: run-this-dag
    dag:
      tasks:
        - name: hello-bash
          template: say-hi-in-bash
        - name: hello-python
          template: say-hi-in-python
          dependencies: [hello-bash]

  # Actual scripts in the DAG
  -	name: say-hi-in-bash
    script:
      image: python:alpine3.6
      command: [bash]
      source: "echo 'hello from bash'"
  -	name: say-hi-in-python
    script:
      image: python:alpine3.6
      command: [python]
      source: "print('hello from python')"

If we take this basic idea and combine it with a few of Argo’s other features, like making steps parameterizable and conditioning execution of steps on files generated by previous ones, we can sketch out a simplified skeleton for our model training workflow:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: train-model-
spec:
  entrypoint: split-and-train

  templates:
  # 
  - name: split-and-train
    dag:
      tasks:
        - name: training-data
          template: split-phase
          arguments:
            parameters: [{name: test-or-train, value: 'train'}]
        - name: test-data
          template: split-phase
          arguments:
            parameters: [{name: test-or-train, value: 'test'}]
        - name: train-model
          dependencies: [training-data, test-data]
          template: train-phase
          arguments:
            parameters: [{name: hyperparams, value: '1st try hyperparameters'}]
        # Only retrain with new hyperparameters when the first training round produced
        # a bad model -- this part of the DAG is skipped when the 'when' condition does
        # not hold.
        - name: train-model-with-new-hyperparameters
          dependencies: [train-model]
          template: train-phase
          when: "{{tasks.train-model.outputs.parameters.model_looks_bad}} != 0"
          arguments:
            parameters: [{name: hyperparams, value: '2nd try hyperparameters'}]

  # The "split phase" of training
  - name: split-phase
    retryStrategy: # retry to handle transient failures
      limit: 2
    inputs:
      parameters:
        - name: test-or-train
    script:
      image: foresight-training-image
      command: [bash]
      source: |
        # ... generate the test/train datasets, push to s3...
        # If {{inputs.parameters.test-or-train}} appears here, it's templated to
        # 'test' or 'train'
      resources:
        requests:
          cpu: 2500m # lightweight step, ask for 2.5 cores

  # The "train phase" of model training
  - name: train-phase
    retryStrategy: # retry to handle transient failures
      limit: 2
    inputs:
      parameters:
        - name: hyperparams
    outputs:
      parameters:
        - name: model_looks_bad
          valueFrom:
            path: /model_looks_bad
    script:
      image: foresight-training-image
      command: [bash]
      source: |
        # ... download the test/train datasets from s3 ...
        # ... try training the model, upload to s3 if looks good ...
        # ... write 0 to /model_looks_bad if model seems good, 1 otherwise ... 
      resources:
        requests:
          cpu: 90000m # heavyweight step, ask for 90 cores

It’s worth highlighting that the determination about whether or not the model needs to repeat the training phase can be represented in the workflow itself. Obviously, this mechanism could become unwieldy for anything nontrivial, but it’s pretty impressive that Argo allows us to condition the execution of one step on some output produced by another.

Workflows like this are usually submitted with the argo command line program. When submitting a workflow, Argo allows you to override or set any parameters that your workflow might require (in our case, the type of model we want to create and the airport we’re creating it for). This allows us to pick up an airport code and model type with a simple addition to the workflow’s spec section:

arguments:
  parameters:
  - name: airport
  - name: type

After adding this and amending our steps to refer to {{workflow.parameters.airport}} and {{workflow.parameters.type}}, which will be templated by Argo, we can then submit a workflow to generate an eon model for Hobby Airport in Houston by saying something like

argo submit model-workflow.yaml \
 -p airport=KHOU \
 -p type=eon

If you’d like to see other examples of Argo in action, the workflow specs are surprisingly flexible, and there are a wide variety of examples available in the Argo GitHub repository.

Monitoring Model Training

The same argo command-line program that’s used to submit workflows to the cluster also allows you to list submitted workflows, ask for information about them, or interactively monitor them. If you prefer the web browser to the terminal, Argo also comes with a web UI that will allow you to visualize the DAG associated with a workflow.

These are both nice options for manually inspecting workflows, but there are better means of examining workflows programmatically. Actually, Argo workflows are implemented as Kubernetes Custom Resources^[3] , which allows us to interact with them as first-class resources in kubectl, just like we might for Nodes, Pods, or Deployments. This is immensely useful for shell scripting or programmatic interaction with the cluster. For example, if you were writing a shell script and wanted to get the names of all completed workflows, you could do so by running

kubectl get workflow \
--no-headers \
-i ‘workflows.argoproj.io/completed==true’ \
-o ‘custom-columns=NAME:.metadata.name’

On a similar note, jq fanatics who spend a lot of time in bash will appreciate the flexibility afforded by kubectl get workflow -o json | jq … for asking one-off questions about submitted workflows.

Final thoughts

Generally speaking, we’re quite pleased with how easy it was to prototype and productionize a large-scale model training pipeline with Argo. However, there are a few points of frustration that we’ve encountered that are worth pointing out:

There’s currently no way to parameterize a resource request in an Argo workflow. If you specify a resource request, you must hardcode it.
If you accidentally mismatch the version of the argo command-line program and the Argo service running in the cluster, you may encounter very strange errors. Had the command line program noticed the version mismatch and displayed a prominent warning, it may have saved us some time.

Nonetheless, these “papercut issues” would not stop us from recommending Argo for similar workloads, and we’ve used Argo quite successfully both on the Predictive Technology crew and elsewhere at FlightAware.

Footnotes

“Autoscaling” is a loaded term when you’re talking about Kubernetes. Specifically, we mean “autoscaling” as in “cluster autoscaling”, not “horizontal pod autoscaling”, “vertical pod autoscaling”, or any cloud vendor-specific feature that happens to have “autoscaling” in its name. ↩︎
Cluster autoscaling works beautifully for our use case, but it's not a wise or effective choice for all workloads. There are a number of excellent accounts of cluster autoscaling causing serious headaches in certain applications. ↩︎
Unfortunately, the official documentation is somewhat abstract. If you want a simple example that demonstrates the power of custom resources, please feast your eyes on the pizza-controller. ↩︎

Using Argo to Train Predictive Models was originally published in Angle of Attack.

Leveraging Data Analytics for Decision Making: Product Management at FlightAware

Mon, 10 May 2021 11:48:19 -0500

Marissa Konicke is a Product Manager for FlightAware’s B2B and B2C SaaS applications. She has a strong focus on using data to drive every decision and eliminating end users’ pain points.

A Product Manager’s job is to answer the ‘why’ for every ticket. Engineers may know what needs to get done, or how to do it, but do they know why they’re doing it? Why they’re building a new signup journey? Why they’re fixing a bug that’s been around for years? Why they’re spending so much time researching how to optimize maps? Without a ‘why,’ we have no purpose. No objective to reach. So, how do you find the ‘why?’

At FlightAware, we have over one thousand tickets in the Web team’s backlog and counting. This is extremely overwhelming to any new Product Manager trying to figure out what to prioritize next. Do we prioritize the bugs reported by the Executive team (who happen to be vigilant users of our platform)? The feature requests from the Customer Support team? The product improvements from the Sales team? Or the constant asks from our users? Figuring out what to prioritize is very similar to figuring out our ‘why.’ That is where data analytics helps to steer us in the right direction.

Data Analytics and Prioritization

Using data to drive every decision allows us to eliminate any biases, focus on our objectives, maintain competitiveness, keep the user top of mind, and ultimately make better decisions for the business’ success. Our team is constantly monitoring analytics across our website and mobile apps. We look at monthly active users, conversion rate, feature engagement, and top pages/screens visited. With over eight million users using our platforms every month, in addition to having multiple software releases a day, it's crucial to our success to constantly pay attention to these metrics. Data analytics allow us to understand our users’ activity, how they’re engaging with our site, what they’re interested in, how feature releases impact user behavior, and where they’re spending the most time. Without this data, we would only see a tiny fraction of the picture of who our users are and what they’re doing, which could cause us to make the wrong decisions and negatively impact our business’ performance.

Top visited pages on FlightAware.com

Accompanying data analytics, defined business or product goals help significantly with making better decisions. Whether you are trying to grow revenue, decrease cost, raise your product’s NPS or customer satisfaction score, reduce churn, or double your user base, having a ‘North Star’ objective defined early on will help you align your priorities. When you know what you’re trying to achieve, the data will help you navigate to your ‘North Star.’ Right now, FlightAware is primarily focused on growing revenue, and our ‘North Star’ objective is to double our revenue by 2022. We’re using user and market data to find segments of the market that are actively using our platforms, but that we aren’t monetizing. In addition, we are identifying highly trafficked pages to target our product upsell messaging to further entice engaged users at key points in their journey to purchase one of our subscriptions.

Transforming Data into Meaningful Change

At FlightAware, we also keep a close eye on metrics after big releases, such as launching a new feature or product. We measure the success of the feature or product release based on a set of Key Performance Indicators, or KPIs, established before the release. Following these metrics, we can see if the release is achieving what we wanted it to. Were our assumptions and hypotheses correct? Did this change actually lead to an increase in conversion?

The whole purpose of being agile is to allow room for adjustment – allowing us to change the direction we’re heading by either adjusting to market changes or our users’ behavior. Constantly monitoring our data analytics allows us to make decisions to continuously iterate on our platforms in a way that creates meaningful change.

Heatmap of our homepage after a redesign

In addition to monitoring our website and mobile apps analytics, we utilize user surveys, user interviews, and user testing to capture both quantitative and qualitative data on our customers. Capturing this feedback is crucial to understanding our users and the market. Gathering all of these pieces of data helps us paint a larger picture of our users and their needs and pain points. Being focused on data allows us to understand our customers better and improve the user experience in ways that matter the most. This eliminates the ‘shot in the dark’ guessing mentality, biased opinions, and feelings. Having so much that needs to get done, we want to maximize the positive impact we have on our users and our business with each item we choose to do.

Example of a survey question’s results (via SurveyMonkey)

Understanding our users better using data helps us define our ‘why’ by allowing us to make data-driven decisions that better set us up for success at achieving our ‘North Star’ Objective and Key Results (OKRs). Recently, FlightAware rebuilt one of our product’s online signup journeys. We needed to double our self-signup conversion (one of our Key Results), and the data was telling us that users were tending to contact our Sales team rather than signing up themselves. Using our gathered information, we identified ways to optimize the online signup journey. Using data from user surveys, we included better messaging of customers’ top-rated features and benefits of the product. Using user engagement data, we redesigned the UI to allow users to easily self-serve and see all of the information they needed to make a purchase decision independently. So why did we rebuild the online signup journey? To increase the number of users who could make a purchase decision without having to contact our Sales team. And why did we do that? To increase conversion, which ultimately led to increased revenue, getting us closer to our ‘North Star’ objective.

Understanding your users and how they interact with your product using data analytics will help ensure you maximize the impact you’re making, not only on your user experience but on your OKRs as well. Being a Product Manager is all about making decisions – what’s important to work on and what can wait, what is going to have the most impact to your OKRs, and therefore, help prioritize your backlog of an overwhelming number of tickets. Understanding and analyzing this data, along with having defined business and product OKRs is how FlightAware’s Product team optimizes our data-decision-making and answers the question of ‘why.’

Analytics tools used by FlightAware’s product team:

AppFollow (mobile apps)
App Store Connect (iOS mobile app)
Crazy Egg (website heatmaps)
Google Analytics (website and mobile apps)
Google Play Console (Android mobile app)
Google Search Console (website SEO)
Jupyter Notebooks (data analysis)
Microsoft App Center (mobile apps)
Postico (database SQL querying)
SEMrush (website SEO)
SurveyMonkey (user feedback)

Leveraging Data Analytics for Decision Making: Product Management at FlightAware was originally published in Angle of Attack.

Firehose++ - The Evolution of a High-Performance Streaming API

Tue, 30 Mar 2021 14:53:52 -0500

As the Engineering Manager of the Backend Crew, Jonathan Cone leads the team responsible for FlightAware’s primary customer-facing APIs & individual feed processors.

FlightAware offers two major APIs for its customers, AeroAPI and Firehose. AeroAPI is a query-based request/response API while Firehose is a socket streaming API. Firehose began under a different name for a specific customer in 2013 to relay position data in real-time to their aircraft management system. In July 2014, the product was rebranded as Firehose and made available to a broader audience of FlightAware customers with flight information (FLIFO) messages included at the end of that year. As the product grew, a need arose to increase the product's performance, and significant changes were made from the end of 2017 through late 2018. This post discusses the motivation for those changes, the technical challenges, and lessons learned.

The Opportunity

Firehose emits JSON messages consisting of surveillance data (positions) and flight information (FLIFO) with multiple (on the order of 500 to 1000) messages per second during live operation. In 2017, a requirement arose to have Firehose replay historical data at a minimum of 60x real-time. This meant that if a customer connected and requested an hour of historical global flight data, that data should be delivered in one minute or less. The original implementation could only achieve ~8x real-time, so substantive changes would be necessary to meet this requirement. New customers also desired higher SLAs for the product, but Firehose lacked a substantive test suite or any Continuous Integration (CI) or Continuous Deployment (CD) pipelines.

Firehose Overview

FlightAware consumes data from over 40 different sources, including its ADS-B networking of over 30,000 receivers, and writes normalized feed data into tab-separated value (TSV) files. These feeds are then combined and processed by Hyperfeed, FlightAware’s flight decision engine, to arrive at a canonical view of global flight data. Hyperfeed emits data to TSV files, called controlstream, with the results of its evaluations and this data is the input to the Firehose API. Each line in the TSV contains key/value pairs with relevant flight data for that message. FlightAware uses a variety of means for storing and moving around flight data, but ultimately the controlstream TSVs are the canonical source of truth and are preserved in perpetuity. When a user connects to Firehose, the following occurs to serve them live (or historical) flight data:

Authenticate the user
Load the user’s permissions
Process the user’s initiation command
Start a controlstream reader at the desired point in time
Process each controlstream message
a. Check that the message is appropriate for the user (pedigree evaluation)
b. Check that the user has permission and requested to see this message type (rules evaluation)
c. Apply any other rules (e.g., rate limiting, airline enforcement)
d. Serialize the message to JSON

First Steps

The original implementation of Firehose was written in the Tcl scripting language. Much of FlightAware’s codebase at the time was in Tcl, including the libraries and packages used to access data at FlightAware. The first question that arose was whether our Tcl packages could be used to read raw data at the 60x replay requirement. We decided to break the problem into manageable chunks, the first being addressing the speed at which we could read the raw data and transform it into a data structure for subsequent processing. It turns out Tcl is capable of reading a file fairly quickly. As an example, consider an hour of raw flight data locally that we want to read synchronously:

set fp [open "flight_data.tsv" r] ;# open the file
while {[gets $fp line] > -1} {} ;# no-op read
close $fp ;# close the file

The no-op read happens quickly, and on one of our dev servers I found the replay rate to be >150x. This isn’t that surprising since the file operations here are little more than wrappers for the system “open” and “read” calls. Firehose needs to do more than just read and dump the data, so we expand our simple experiment to include transforming the TSV data into a Tcl list.

set fp [open "flight_data.tsv" r]
while {[gets $fp line] > -1} {
    set raw_data [split $line \t] ; # Split each line into a Tcl list
}
close $fp

We now find the replay performance to be ~30X, falling well short of our desired target. That split results in multiple memory allocations for each element of the list, so it is not surprising that the performance is now suffering. Given that we don’t have a way to control the allocations in Tcl easily, it seemed unlikely we could improve the read performance sufficiently for our use case.

Stepping into C++

We decided to explore language alternatives at this point and were particularly interested in C++. One of our principal developers had created a C++ library to read FlightAware’s TSV data files, and this served as a launching point for the exploratory work. The early work focused on reading data locally and chopping each line into its keys and values. To achieve the necessary performance, allocations needed to be limited and ownership of the underlying block of data handled responsibly. We were using C++14 at this point in the process and developed a simple class to store views of the underlying data that represented lines or various fields without anything heavier than a pointer allocation:

class FAString {
public:
    FAString();

    FAString(const char * bytes, ssize_t start, ssize_t end);

private:
    const char * bytes = NULL;
    ssize_t start = 0;
    ssize_t end = 0;
};

For those familiar with C++17, you may be reminded of std::string_view (or boost::string_view). This class achieved the same result, storing a pointer to the underlying data and a start and endpoint for the value of interest. The code that splits the TSV line is pretty standard fare, and with that in place, we were able to read data and split lines into key/value pairs in a performant way.

Firehose needs to pull information from a discrete set of keys in each message to evaluate rules and serialize the output.We evaluated how to make those key/value pairs available in a performant way. During the key/value parsing, each key/value becomes an entry in a std::vector, which isn’t a convenient container for accessing a specific key. This view class didn’t have all the specializations included to allow it to be used as a key in a std::unordered_map, and we were wary of using FAString as a map key since the map would not have ownership of the underlying data. Instead, we opted to create a class with data members for the controlstream keys and use switch statements to quickly parse the key/value into its appropriate class member. Since the switch statement acts as a jump table (roughly equivalent to using goto), the insertion performance was as good or better than std::unordered_map.

Here is a simplified example of that class:

class ControlStreamMsg {
    long _hc;
    FAString ident;

    ControlStreamMsg(const std::vector> &keyValues) {
        for (auto &[key, value] : keyValues) {
            switch (key[0]) {
                case '_':
                    if (key == "_hc")
                        _hc = atol(value.data());
                    break;
                case 'i':
                    if (key == "ident")
                        ident = value;
			break;
            }
        }
    }
}

In this case, we switch on the first character of the key (validation that the character exists is omitted here) and then use an if/else block within each case statement to handle the value assignment. We prefer for the individual case statements to consist of fewer than 4 if/else statements for the final matching, so in instances where the first character does not resolve to 4 or fewer keys, a nested switch statement is applied on the second character.

class ControlStreamMsg { 
    ... 
    switch (key[0]) { 
        case 'a': 
            // Many keys start with 'a' 
            switch (key[1]) { 
                case 'c': 
                    if (key == "actualArrivalRunway") { 
                        actualArrivalRunway = value; 
                    } else if (key == "actualDepartureRunway") { 
                        actualDepartureRunway = value; 
                    } 
                    break; 
    ... 
};

These first experiments with reading flight data TSVs were reviewed, tested, and formed the foundation of a high-performance C++ TSV reader we named copystream. The actual reader logic and controlstream parser were refactored into a library, falcon, for reuse in other C++ applications. This first iteration of a C++ reader/parser achieved read rates of at least 250,000 lines per second, which is performant enough for the overall replay rate requirement (60x). Of course, this doesn’t entirely solve our problems because operations downstream of the reading / parsing still need to keep up with that read rate.

Moving Toward a Full Rewrite

At this point, we were not committed to rewriting all of Firehose in C++, but rather began integrating the high-performance C++ reader into the existing application. Using the C++ reader/parser, Firehose replay performance increased by about 20%, giving us replay rates in the 10x – 12x range. That was certainly an improvement but still far from the 60x replay rate requirement. Using tclgdb we continued to profile Firehose to identify bottlenecks. Previously, we had seen significant time being spent on reading and allocation of lists in that process. Using the C++ reader, that was no longer the greatest bottleneck, but the allocations of lists when moving back into Tcl were still an issue. The other major consumers of CPU time were pedigree evaluations and JSON serialization.

FlightAware consumes data from a variety of sources. Some of those sources are public and available to any user, while others are privileged and require special permission. Each message in controlstream has fields specifying the type of sources for that message. When a flight contains both public and privileged data, controlstream will contain a view of that flight for each access type. We refer to these source descriptions as pedigrees, and each user has an access control list describing the pedigrees they are permitted to view. For example, a flight that is based entirely off FAA radar and public FLIFO data would be assigned a public pedigree. Suppose during the operation of the flight we obtain position data from Aireon, our space-based ADS-B provider, or another satellite source. In that case, the flight will have two views with one containing both the public and privileged data and another just containing public data.

Firehose evaluates each flight message to determine if that user is permitted to see this message and if this message represents the best data available for that user, based on the pedigree (view) of the flight and the user’s permissions. These pedigree evaluations were done entirely in Tcl and were a performance bottleneck for reaching our 60x target. Therefore, another team at FlightAware embarked on the process of rewriting the pedigree evaluation library in C++. The new C++ pedigree evaluation library, pepper, solved this problem and yielded over 100,000 pedigree evaluations per second. At that time, the volume of controlstream was on the order of 1,000 messages per second, so pepper evaluated pedigrees at > 100x real-time. The library can be used either directly by a C++ application or a Tcl application using a cpptcl interface.

We returned to the Tcl implementation of Firehose and instrumented it with the pepper pedigree evaluation library for another round of profiling. Progress—pedigree evaluation was now so fast it was hard to find on the CPU time heat map, but the allocations, when moving data into Tcl and JSON serialization, kept the replay performance below 15x.

Now, the only substantive parts of Firehose running in Tcl were just the rules evaluation and JSON serialization. It seemed unlikely we would realize the gains we needed continuing to work in Tcl, so we decided to rewrite the remaining core functionality in C++.

Committing to C++

With the decision to move the remainder of Firehose’s core functionality into C++, we still opted to leave the user authentication and other startup code in Tcl. This could be accomplished using cpptcl to create a Tcl interpreter and interact with it during the startup phase of a Firehose connection. That code was battle-tested already, and moving it into C++ would just introduce the risk of creating new bugs. Additionally, the initialization and authorization code is run once during startup and completes within a millisecond so there isn’t significant performance improvement to be found there. The core functionality that we needed to rewrite in C++ consisted of rules evaluation and JSON serialization.

One problem was that Firehose lacked sufficient testing, in particular integration testing, for us to have confidence that a rewrite would not cause regressions or introduce new bugs. Therefore, in coordination with QA, one of our team members began writing an extensive integration suite against the existing Firehose implementation. The goal was to run those same tests against the new implementation to ensure that Firehose would still produce the same outputs for a given set of inputs. This endeavor was undertaken in parallel with the remaining rewrite work and gave us sufficient confidence to move forward.

The first attempts at writing Firehose in C++ involved rewriting the JSON serialization. We knew from previous profiling work that the serialization was a significant bottleneck, whereas rules evaluation was less of an issue. However rules evaluation and message serialization are tightly coupled operations, so both needed to happen in the C++ codepath. The first experiments expanded the copystream program into a mini Firehose application. We tried out a number of different JSON libraries from YAJL to RapidJSON to Nlohmann JSON. The interfaces for Nlohmann and RapidJSON reflected more modern C++ styles that we liked, but ultimately the performance of YAJL in our use case far outstripped the others, so we selected it for the JSON serialization. With a YAJL implementation in place to assemble the JSON messages produced by Firehose in the single-threaded copystream, we had now reached our goal. The replay performance was around 60X real time; however, we still lacked the rules evaluation logic and this did not leave much headroom for future growth. Profiling the application revealed that most of the CPU time was spent on serialization, which is work that can be done in parallel across multiple cores.

Moving from the expanded copystream back to the Firehose rewrite, we decided on a multi-threaded approach for the pedigree evaluation, rules evaluation, and JSON serialization. JSON serialization is the most computationally expensive operation in Firehose and while a single-threaded approach achieved 60x replays, it barely managed that. By splitting the serialization into separate threads we gave ourselves headroom for future feed volume growth. Each message Firehose evaluates from controlstream contains a unique Flight ID that is immutable for the duration of a flight. That field can then be used to hash work into workers, and Firehose uses a CRC32 checksum of the Flight ID to “bucketize” work. When the main thread starts up, it performs the SSL handshake, authenticates the user and parses their initiation command. It then initiates streaming by creating threads for the workers and writer and asynchronously reading data. As data is pulled off that asynchronous thread, it hashes the line and passes that data off to the specified worker. Each worker then performs the pedigree evaluation, evaluates the rules associated with that connection and user and finally serializes the message if it is accepted. Workers then pass the original line and any serialized messages back to the main thread for processing. The main thread, after dispatching a chunk of lines to work, collects the results back in the same order which they were dispatched, using an std::unordered_map and std::list, and hands-off any messages with serialized content to a writer thread which handles putting that serialized data onto the TCP socket.

The brevity of the previous paragraph belies the amount of work that went into producing the Firehose multithreaded application. In fact, it took weeks to work out all the details of that implementation. One concern was how we could ensure that we didn’t allocate data as we passed it between threads. Our first attempt involved a class that would hold (own) both the original line and any serialized data produced by the worker. The first iteration of this class stored the data as a std::shared_ptr<> and copied the pointer when handing it off to the std::deque used for passing data between threads. This turned out to be less than ideal, as the reference counting for shared_ptr can be slow when used in a multi-threaded environment, and it made it unclear who the actual owner of the data was at a given point in the processing. In later improvements, we converted to use std::unique_ptr<> for each of the lines being read and std::move them along the path of execution, more accurately reflecting where ownership of the data lies at a given point in time and increasing performance by removing the reference counting. The first implementation also had workers passing their completed work items back to the main thread as soon as an item was completed. This resulted in significant contention on the locks for the deque passing data back to the main thread. Since the work is batched, it was faster to let the worker finish all items in its batch then pass all of the completed items at once.

The ultimate result was that the C++ implementation of Firehose replayed data at 180x or greater real-time, far exceeding our goal. This has proved beneficial as the volume of data in FlightAware’s feed each day has continued to increase. At the time development of Firehose in C++ was underway, a typical day of controlstream was ~70GB, while today that size has grown to ~180GB. Even with the larger data volume, Firehose continues to see >120x real-time replays for customers. The integration tests that were developed as part of this process also found regressions in the new implementation and allowed us to resolve those before launch and resulted in a relatively painless transition to the new application.

Lessons Learned

For applications where performance is critical, having the ability to profile the running code is paramount. One item we struggled with in our early evaluations of the Tcl implementation of Firehose was an inability to profile the application and gain insights into what was eating up CPU cores and needed to be improved. I mentioned previously that we used tclgdb for some of that work; in fact, tclgdb was developed to facilitate this type of profiling as part of this work. For C++ applications there are a variety of profiling tools, but it can be daunting at first to understand how to use them and interpret the results. Much of our initial work on Firehose was done on our Macs in XCode. It turns out that XCode is a decent C++ IDE, and the Mac includes a profiler that’s easy to run and understand. We used that profiler to guide much of our work in both the early days of development and today. For example, during the initial development, I found that suddenly our performance decreased following some changes to replay rates of <20x. Using the profiler, I quickly identified that a typo in the CRC32 function used to hash work was resulting in each line being passed by value. That surge in allocations resulted in a significant drop in performance, but it was found quickly and resolved.

I mentioned earlier that we introduced a class with similar functionality to std::string_view. As C++17 became available in all of our environments, we switched to using that STL container. The FAString implementation worked for our initial use cases, but switching to the STL implementation opened up new possibilities, since their implementation was supported by other STL containers and had specializations we had not, nor were likely to, implement.

Sufficient test coverage for an application is critical for any rewrite and also for future enhancements. The test suite for Firehose, which is now run automatically on any pull requests, merges to master, or tagging of version, has continued to grow. For any bugs that are found, we add tests to confirm the fix and prevent future regressions. Had we not spent the time to write up tests against the old application, we would not have had much confidence in the new implementation and likely would have introduced both new and old bugs into the product.

The Takeaway

Deciding to rewrite an application, particularly one that is part of a company’s core infrastructure, is not a decision to take lightly. The odds of rewriting an application of any non-trivial complexity without introducing bugs or changing behavior in subtle ways are next to none. We ultimately made that decision in this case, but it was decided methodically and gradually. In many other instances, we have been able to simply refactor or replace components to achieve the desired goal. One key to ensuring such an endeavor is successful is providing adequate testing around your application’s interfaces. Whenever possible, use popular and stable open-source libraries or core language functionality instead of rolling your own solution. Those libraries are likely in use by many more users and have addressed issues you are unlikely to encounter right away in your own implementations.

All Firehose customers now use the C++ interface for the streaming API (although they are probably unaware of the change), and the product has grown substantially over the past three years. The ancillary work we did as part of the Firehose rewrite has helped us make robust and efficient changes in a multitude of other applications at FlightAware. The Firehose application now has Continuous Integration and Continuous Delivery pipelines for its build, test, and deployment. The application is containerized and runs in docker. Our work to improve Firehose continues today as we increase the SLAs available to customers and offer new features. The groundwork we laid shows how careful planning and testing allow us to make dramatic changes to our software while providing reliable service to our biggest customers.

Firehose++ - The Evolution of a High-Performance Streaming API was originally published in Angle of Attack.

Is Everything Up? Monitoring 29,302 Points In-the-Loop in the FlightAware Production Stack

Tue, 08 Dec 2020 12:48:16 -0600

Take lessons from high stakes monitoring in the physical world (electrical grids, nuclear power plants, oil rigs, data centers) and apply them to a pure software stack.

Karl Lehenbauer is FlightAware’s Chief Technology Officer.

Little is worse for someone committed to providing reliable service than to have your customer call you to tell you your service isn’t working. It’s a double whammy. You’re broken and you don’t know you’re broken. You hate that. So do we. Worse, we have service level agreements with many of our customers— if a service isn’t working for long enough, we have to start dishing out refunds.

But far more important than that, if we’re down our customers don’t know where their airplanes are. This can be a regulatory violation. It can also mean that the line service technicians aren’t available to meet the plane (if you travel much at all you’ve heard the pilot come on the PA and announce, “Well folks, we’ve arrived but the ground crew isn’t here to help get us into the gate”). Then you sit on the plane in the alleyway until the wing walkers, jetway driver, baggage handlers, etc., show up to guide the plane in.

But more than that, we provide Global Aviation Distress and Signaling System (GADSS) surveillance to lots of airlines as mandated by the International Civil Aviation Organization. They might literally not know that one of their airplanes is in distress or has some kind of incident or even has crashed if our stuff isn’t working properly. People’s lives are on the line.

Let’s go back in time a bit.

Monitoring the Real World

I got my start in control systems, first at a power company, monitoring and controlling electrical power generation, transmission and distribution. Later, I went to work at a vendor of those systems, followed by a stint at GE Aircraft Instruments working on turbine engine monitoring.

By the early 1990s I was leading the engineering team for a very early Internet Service Provider.

One day, our air conditioning failed. The computer room overheated, we lost some disk drives, and I resolved that we should be alerted if it ever started happening again.

This was before you could read the air inlet temperature off of your high-end router or from your fancy air conditioner or from one of the inexpensive SNMP-enabled temperature monitoring solutions of today.

So we came up with a clever solution. We’d use an old school thermostat, wire the transmit pin of a computer’s serial port through the thermostat’s mercury switch to the port’s receive pin, and we’d send a character through the serial port every few seconds. If the computer room went above a threshold temperature, the blob of mercury would complete the circuit and we’d start reading the characters back from the serial port. Then our little program would recognize this and start sending messages to peoples’ pagers using a modem.

Fine.

Lo and behold, one day we partially lost cooling in the data center, only we didn’t get any callouts. An investigation revealed that one of the wires had gotten janked, and though the blob of mercury made contact, no characters were received by the monitoring program.

Meditating on this led to a critical insight. We had it backwards! Rather than completing the circuit when there was a problem, we needed instead to make the circuit normally closed and open it when there was a problem. Instead of wiring into the air conditioning contacts of the thermostat we would wire into the heating ones. We would receive the characters we were sending as long as everything was OK. If the temperature rose past a threshold, the blob of mercury would pull away from the contacts, breaking the circuit, so whenever our program stopped seeing the characters, the alarm would be raised.

By flipping it like this, if one of the wires got inadvertently pulled or whatever, we’d immediately get the alarm.

In other words, the lack of a signal telling you everything is OK means something is wrong.

On the Trail of Monitoring Your Applications

Now, consider we have a service and we want to make sure it’s working. Can I ping the machine or machines? Great. The machine is up. Or is it? Ping is pretty basic in the network stack; ping packets are typically responded to directly from the kernel. We’ve seen machines that are pretty crashed and won’t run programs that will still ping.

OK, so we’ll ping the machine, but we’ll have some kind of agent program that runs on the machines that we’ll talk to over the network. If we can talk to the agent, then we know a little more that the machine is working. But does the machine have enough storage? Does it have enough memory? OK, we’ll make our agent check and alert us if the machine is low on storage or memory, if the swap utilization is abnormally high, etc.

But is the program running? Sure, no problem. We’ll add some code to our agent to read the process table and see if the program shows up there. If it doesn’t, we’ll alarm.

But is the program getting work done? The process might exist, but it might not be doing anything. Well we could read the program’s CPU time repeatedly and see if it’s accumulating time.

But does that mean it’s working? No, it doesn’t. Maybe it’s lost its database connection. Maybe it’s accumulating CPU time but because it wasn’t written defensively enough, it’s trying to do database updates and failing and logging errors but not reconnecting to the database.

Maybe it’s got a good connection to the database server but it’s not receiving any messages.

You can keep increasing the sophistication of your agent. You can keep adding new checks as you discover (usually the hard way) new failure modes you missed, but with this approach you’re vulnerable to any new or unanticipated breakdowns. You might never be completely sure work is getting done.

In-the-Loop Monitoring

I assert that you can’t know your program is doing the work it’s supposed to be doing unless it’s telling you that it is. Call this “in-the-loop” monitoring. If a program is expected to repeatedly receive input messages and update a table in a database, then every time it has done this it should send a message to monitoring software reporting that it has successfully done it.

It should only send the success message to the monitoring program upon completion of the work. So once an input message has been received, successfully processed, and the database updated, a separate message is sent to the monitoring software saying it has succeeded. If no input message is received or the database update fails, the message should not be sent to the monitoring software.

To reiterate, you make a call from inside the program to send a message to your monitoring software every time the program succeeds in doing a parcel of work.

Meanwhile, if a certain amount of time passes without the monitoring program receiving a “work completed” message, it raises an alarm that something is wrong.

The beautiful thing here is that the monitoring program doesn’t have to know why it didn’t receive a message. We don’t have to try to check for every possible reason. All we have to do is recognize that we stopped being told it was OK. The machine may have crashed, the router may have failed, the program may have lost its receive socket or its database connection. The program feeding data to our program may have stopped sending, the ethernet cable may have gotten pulled, a circuit breaker may have tripped, the program may have divided by zero or gotten a memory protection violation, there could have been an earthquake, a flood… locusts! We don’t have to check for all those things. All we have to do is recognize that the program stopped telling us it was OK.

Watchdog Timers

What I’m describing is a variation of a watchdog timer, technology common in real-time systems, space probes, satellites, etc., where software on a computer periodically resets a hardware timer. The hardware timer counts down, but every time the reset signal is received the counter is reset. If for any reason the software stops resetting the hardware timer, the counter eventually reaches zero and triggers a reboot or some other corrective action. Our variation is much more granular and is software-based, but the principle is the same.

On the Trail of Watchdog Resets in the Modern Production Software Stack

Now, if your program is processing 40,000 messages a second, it needn’t send 40,000 messages a second to your monitoring system saying it’s completed work. So the watchdog reset subroutine can have a threshold and only send a completion message once a second or whatever, regardless of how many messages it processed. Fine.

Also, the watchdog reset subroutine must never break the program! So if it can’t reach the monitoring server, the program should keep going regardless. The program shouldn’t freeze if the monitoring software stops responding. For this reason our in-the-loop code has been kept very simple, and uses UDP datagrams to send watchdog resets to the monitoring software. It also makes sure that even if there is an error returned, which can happen even with UDP if the sender and recipient are on the same LAN, that it doesn’t stop or break the program.The watchdog reset message should identify the program, the machine (typically), and perhaps the activity within the program that has succeeded. Examples include the receipt of a message from the FAA, from Aireon’s Space-Based ADS-B network, from our provider of airline schedules, from each of our multiplexing agents that aggregate data from our tens of thousands of ADS-B ground stations, or from HyperFeed®, our suite of programs that process all the input data to produce our coherent feed of what is happening with all the aircraft that are in the air or moving on the ground in the world.

Watchdog reset call in Python – If one of our 12M+ users hasn’t logged into the website in the last 15 minutes, something’s wrong.

Creation of watchdog entries in the monitoring software should be zero config. That is to say on the first receipt by the monitoring program of a new watchdog reset message, be it a new app, an app running on a different machine for the first time, etc., a new watchdog timer should be automatically created and activated by the monitoring software. While this can cause rogue watchdogs to be accidentally created when, for example, a developer runs a program as a test, there are ways to reduce or prevent this. If we required someone to get on a webpage and submit something to create a watchdog timer, the extra cognitive load would cause developers to create far fewer of them, and we would miss a lot of them, meaning stuff would break and we wouldn’t hear about it when a service was moved to a new machine or data center, etc.

Watchdog reset call in TCL – If the track archiver hasn’t run within two hours, something’s wrong.

This approach has proven to be extremely effective. Of course, we still monitor storage space and memory availability and that machines are pingable.

Docker, Kubernetes and Argo have added new wrinkles. Since Kubernetes can schedule work on any of a number of machines in its pool, we don’t want the host name (which in Docker is by default a 20-character hex ID anyway) to be part of the watchdog ID. Instead we use something to identify the Kubernetes instances.

Watchdog reset call in C++

Different Alarms Have Different Priorities

There’s a big difference between a test server that is slowly accreting files crossing some warning threshold for being low on storage and flight tracking being down. One is something that may only need to be dealt with within days while the other could be an all-hands-on-deck emergency.

It’s important to be able to distinguish the severity of the problem. We provide a way to do this by setting the watchdog’s priority. This can be set both as an option by the caller or in the database of the monitoring backend. Production software may be emergency priority, while development and staging instances rarely are.

Watchdog Timer Intervals Are Activity-Dependent

For some critical activity that should be happening thousands of times a second, a watchdog timer should likely expire within seconds when the activity stops. But the duration of the watchdog interval is dependent on the frequency of the activity being watchdogged. For example, something that only happens weekly, like some periodic archiving of data or a weekly report that’s automatically generated and sent to a customer, should have a watchdog timer with an alarm interval of a little longer than a week.

Ergo the watchdog interval is one of the arguments passed to our watchdog reset subroutine.

Out-of-the-Loop aka Endpoint Monitoring

We also test services by exercising them, such as fetching pages from the website (from offsite) and examining their content, connecting to our feed endpoints, logging in, and pulling down data, etc. This is a kind of ultimate proof that something is working; it just doesn’t cover all the bases. For instance, say we have twenty web servers and one of them has a little bit of stale data because one or a few of the table caches aren’t being updated properly; only a few rows of data are wrong and it’s easy to see how the endpoint monitor would fail to notice a small discrepancy.

Another example: Our tens of thousands of ADS-B ground stations connect randomly to one of eighty backends that receive and process their data. Should one of those backends stop producing and we don’t detect it, only about 1.25% of the ground stations’ data will be lost, and since in most places there is a fair bit of overlap between ground stations, we’re unlikely to detect that by using software to look at airport pages or various geographies. Sure, we could look at the aggregate feed and notice one of the subfeeds is missing, but that’s kind of a watchdog timer by different means. Using our approach, each backend resets a watchdog timer whose watchdog ID includes the backend number. If any of them stop reporting success, we’ll get an alarm on each of the ones that have failed.

Again, endpoint monitoring is a powerful technique and probably should be part of your retinue of monitoring practices. Even the programs that check endpoints may fail and thus they too should reset watchdogs after completing checks.

Plugging Watchdog Alarms into Other Systems

Existing monitoring platforms have considerable sophistication. For example, they have mechanisms for scheduling who is on-call and who is backup on-call. Such systems provide automatic notification of the backup person if the primary person hasn’t acknowledged an alarm within a certain period of time (another example of a watchdog, by the way), and many other useful capabilities. Rather than reinventing all this, we push watchdog alarms and normals into an existing platform.

Likewise, we have created a Slack bot to inject watchdog alarms and normals into Slack channels and provided ways for people to acknowledge, override, and redirect alarms by sending messages to the Slack bot.

As we have grown, and the size of our engineering staff has grown, we have evolved these mechanisms to greater levels of sophistication. Developers during development can redirect their watchdog alarms, using wildcards, away from the Slack channels the systems people use, sending them instead to Slack channels dedicated to their crew or particular application, freeing our systems folks from having to wade through dev alarms while looking for real production problems.

Measure What You Want to Improve

Peter Drucker, an influential writer and thinker on management, innovation and entrepreneurship, wrote: “If you can’t measure it, you can’t improve it.” This observation is so profound that entire books have been written elaborating on its implications. It is a widely accepted truth and is a key component of how modern organizations manage themselves. A business cannot increase its sales, improve its customer service, et cetera, if it can’t measure how it’s doing in those areas.

Counterarguments to this have been made, for example that some important things can’t be measured. I am somewhat sympathetic to that, and I think it’s clear that an unbridled devotion to the notion can result in an overweening bureaucracy. However, that your applications are working is measurable, and it’s hard to see how measuring that is anything other than a win.

Mount displays showing your status on your walls for everyone to see.

…one of our dashboards showing current watchdog alarms and number of webpage errors per hour.

Strive to Be Informed, Not Overwhelmed

There is a call for nuance in deciding what needs a watchdog timer. If I am replicating database tables and making them available locally on machines using SQLite, and I define a watchdog timer for each replicated table, one machine going down may result in dozens of alarms, one for each local table that isn’t getting updated. Likewise, if a big database cluster goes down and stays down for long enough, the watchdogs will go off for all the machines replicating tables for each of their replicated tables.

The more granular my watchdogs, the more certain I can be that I’m not missing anything. If a single table isn’t updating, I’ll know, but I may be overwhelmed by a cascade of alarms when something major isn’t working properly.

We recognized this at the power company. If power was lost on a major transmission line, a lot of alarms were generated downstream due to abnormal conditions at the impacted substations. I proposed that in certain cases, like an alarm that a transmission line had lost power, the alarm should inhibit the cascade of subordinate alarms that would follow. I was overruled by the dispatchers, who were concerned that they might miss something. But there is a fundamental tension here because they could also miss something by being overwhelmed by alarms. In the Three Mile Island nuclear accident of 1979, the printer used to log alarms fell hours behind, and operators dumped its queue several times so they could get more up-to-date information.

My advice here is that you try to strike a balance. We do replicate tables locally, but we only have watchdogs for each schedule of updates; the completion of a periodic update and reset of the corresponding watchdog implies that all the tables on that schedule updated properly, and we count on the code to be written properly such that it only resets the watchdog if all the tables on that schedule updated. And if you have a giant product with a lot of complex interrelated systems, that you look carefully at having certain alarms inhibit the cascade of subordinate alarms, at least in some views.

Nuisance Alarms

It takes rigor to act on minor alarms as they arise, staying on top of them and fixing them; or if they aren’t actually a problem, adjusting them so they only appear when there is a legitimate problem. If you don’t, you’ll pay for it. Oh, this alarm is no big deal. That alarm is “normal.” This alarm won’t need to be dealt with for at least a couple weeks. Before long, you might have dozens of alarms that are “normal,” that you have become habituated to ignore. The problem, then, is that there is an alarm buried in those “normal” alarms that is vitally important, and you miss it because you don’t see it amidst all the noise.

The failure to deal rigorously with nuisance alarms on the Deepwater Horizon could be said to have resulted in the death of eleven people and cost British Petroleum more than 65 billion dollars.

Vital warning systems on the Deepwater Horizon oil rig were switched off at the time of the explosion in order to spare workers being woken by false alarms, a federal investigation has heard. ... Williams said he discovered that the physical alarm system had been disabled a full year before the disaster.

Deepwater Horizon alarms were switched off ‘to help workers sleep’

Also referencing back to alarm priorities, they needed (if they didn’t) to differentiate between alarms that the kitchen door had been left open or whatever, and the ones that indicated that there was a dangerous gas leak and other truly hazardous conditions. It was reported that the sensors detected the gas leak and other emergency conditions well in advance of the explosion and fire, but the key had been turned to “inhibit alarms,” so the klaxons were not sounded, and people were not alerted to the danger until the first explosion.

It’s a Journey, Not a Destination

The trail to 100% uptime is more than just monitoring. New paradigms for building services, enabled by technologies like Docker, Kubernetes, and Argo, have transformed how people think about and build services. At its best, machines, racks, and even data centers can fail and have no impact. At the other end of the spectrum, due to epiphenomena arising from the interactions of systems of immense complexity, a few machines hitting an OS limit on the number of concurrent threads can take down a bunch of Amazon East.

If your user base and/or your datasets or whatever are growing, then your systems are being asked to do more every day than they’ve ever done before, and something, somewhere, will get stressed and eventually break. Or something will just break because it breaks.

Your engineering organization has to be committed, through and through, to architecting and building things that work and to keeping them working. There is a cultural aspect as well. Creating a blameless culture, and why that’s desirable, will perhaps be a topic for a future blog post.

Clearly, it’s really hard to keep things working if you don’t know whether or not they’re working.

In this post, I’ve provided some lessons from monitoring the real world and the virtual world; in particular, the value of watchdog timers as applied to a pure software stack. Our watchdog timers have caught and continue to catch so many problems that we would never consider doing without them. Your organization, large or small, will benefit from adopting watchdog timers as one arrow in your quiver on your trail of providing reliable services.

Is Everything Up? Monitoring 29,302 Points In-the-Loop in the FlightAware Production Stack was originally published in Angle of Attack.

Making SQLite Upserts Snappy (with Python and SQLAlchemy)

Tue, 27 Oct 2020 16:06:00 -0500

Chris Roberts is a Software Engineer 2 on FlightAware’s Backend team. He spends a lot of time thinking about how to make FlightAware’s APIs more pleasant to use.

I recently had the opportunity to optimize some code at FlightAware and learned a great deal along the way.

In this post, we explore some of the tradeoffs of performance optimization and where to draw the line in the never-ending quest for speedy code.

Background

I was tasked with developing an open-source reference service we could provide to users of our Firehose flight data feed. It would receive, process, and store messages from the feed, making it easier for users to integrate FlightAware’s flight data with their existing infrastructure. Very simply put, Firehose is a socket connection which sends you newline-separated JSON messages. These messages represent real-time updates to flight statuses (departures, arrivals, cancellations, etc.). Also, the messages are sparse: one message for a flight may provide its route and an ETA while the next may contain just the flight’s callsign.

We decided to call the application Firestarter (get it?).

Firestarter would use the Firehose messages to maintain a simple database table representing the current state of all flights coming through the feed. It follows a classic upsert model: new flight? Insert it. Existing flight? Update the relevant table row with any updated fields.

Our primary goal with Firestarter was to make it simple for users to interact with Firehose. To keep the application entirely self-contained, we chose SQLite as our database engine. Ideally, a user could just run the application and then do what they want with the resulting table. We understood that that was not always feasible, though, and wanted to ensure that users could still benefit from reading Firestarter’s code. In service to this goal, we chose to implement Firestarter in Python due to its widespread adoption and reputation for clarity.

Performance was a consideration as well. At a minimum, Firestarter needs to be capable of handling around 150 messages per second (msg/s). That target represents keeping up with the average real-time rate of worldwide flight events on a busy day. Ideally, there’d be a healthy buffer on top of that to allow for traffic spikes or catching up to real-time if Firestarter needs to reconnect to Firehose. 300 msg/s seemed like a reasonable target then.

Optimizing Firestarter’s Performance

What follows is a reproduction of the rough progression of Firestarter’s performance from a measly 20 msg/s to well beyond our target. We’ll be looking at quite a bit of code, but none of it is terribly complicated.

And of course, it’s not a performance post without some benchmarking! I fed a few thousand Firehose messages into a text file to serve as test data.

All benchmarks were run on a Dell server with a Xeon E5-2630 @ 2.3 GHz and a spinning disk. Software versions:

Ubuntu 20.04.1 (linux kernel 5.4.0-42)
Python 3.9.0
SQLAlchemy 1.3.20
SQLite 3.31.1

Finally, we need a harness to aid us in benchmarking our code. For brevity, I’ve left out the full table definition and standard library imports (and will continue to do so in all proceeding code samples). Don’t worry, though, the full code used for these benchmarks is on our Github. If that only piques your curiosity further, you can find the actual Firestarter code here. Try not to spoil the post for yourself, though!

import sqlalchemy as sa

meta = sa.MetaData()
table = sa.Table(
    "flights",
    meta,
    sa.Column("id", sa.String, primary_key=True),
    sa.Column("ident", sa.String),
    sa.Column("reg", sa.String),
    ...
    sa.Column("predicted_off", sa.String),
    sa.Column("predicted_on", sa.String),
    sa.Column("predicted_in", sa.String),
)

try:
    os.remove("flights.db")
except FileNotFoundError:
    pass
engine = sa.create_engine("sqlite:///flights.db")

meta.create_all(engine)

def run(fn):
    lines = int(sys.argv[1])
    start = time.time()
    fn(lines)
    total_time = time.time() - start
    print(f"{lines} messages processed in {total_time:.2f} seconds: {lines / total_time:.1f}msg/s")

In the harness, we set up our database and provide a function to easily run some code and measure its performance. Separating out the harness like this helps us stay focused on the code we’re actually benchmarking, and it ensures we’re performing identical setup across the multiple test cases we’ll be developing.

v1: The basest of baselines

Now for the good stuff. We’ll start simple, really simple, so simple you probably already know this isn’t going to go well. Nevertheless, we must start somewhere!

from harness import table, engine, run

def write_to_db(message):
    with engine.connect() as conn:
        existing_flight = conn.execute(table.select().where(table.c.id == message[“id”])).first()
        if existing_flight:
            conn.execute(table.update().where(table.c.id == message[“id”]), message)
        else:
            conn.execute(table.insert(), message)

def main(lines):
    for line in itertools.islice(sys.stdin.readlines(), lines):
        write_to_db(json.loads(line))

if __name__ == “__main__”:
    run(main)

We read a line from stdin, parse it, and check the table for its id. If it exists already, we update the row; otherwise, we insert it. Our sole goal right now is clarity, so we made the code straightforward.

How’s the performance? Pretty abysmal, it turns out.

$ python v1.py 100 < messages.jsonl
100 messages processed in 5.02 seconds: 19.9msg/s

We’ve made a classic blunder, one common enough that it was worth including in SQLite’s own FAQ. We’re using a new transaction for every single query. Well, technically SQLite is using a new transaction, we’re just not telling it to do otherwise. Unfortunately, transactions are quite slow. Their performance is based directly on the speed of your disk (likely the slowest component in your computer). In return for this poor performance, though, transactions provide us with data durability.

v2: Tweaking some options

What can we do about this? SQLite’s FAQ has a few pieces of advice:

Disable synchronous database writes with PRAGMA synchronous=OFF;
Perform more queries per transaction

The first tip risks database corruption, which is not acceptable. However, further research shows that the less extreme PRAGMA synchronous=NORMAL; can offer modest performance benefits without risk of corruption. Let’s give it a try.

It’s a bit strange to enable through SQLAlchemy. We have to hook the engine connection:

from sqlalchemy import event

@event.listens_for(engine, "connect")
def set_sqlite_pragma(dbapi_connection, connection_record):
    cursor = dbapi_connection.cursor()
    cursor.execute("PRAGMA synchronous=NORMAL")
    cursor.close()

The rest of the code is identical to V1 above.

$ python v2.py 100 < messages.jsonl
100 messages processed in 3.91 seconds: 25.6msg/s

A 28% improvement. Not bad, but still nowhere near what we need.

v3: Speed at all costs

What about the second option? We can easily reduce the number of transactions, just wrap the whole script in one!

def write_to_db(conn, message):
    existing_flight = conn.execute(table.select().where(table.c.id == message[“id”])).first()
    if existing_flight: 
        conn.execute(table.update().where(table.c.id == message[“id”]), message)
    else: 
        conn.execute(table.insert(), message)

def main(lines):
    with engine.begin() as conn:
        for line in islice(sys.stdin.readlines(), lines):
            write_to_db(conn, json.loads(line))

Only the modified code has been included. We’ve pulled our context manager out to our main function and called engine.begin() to start a transaction that won’t be committed until we’re done processing lines.

$ python v3.py 100 < messages.jsonl
100 messages processed in 0.44 seconds: 225.4msg/s

Whoa! Now there’s the performance we need! In fact, it ran quickly enough that we should really re-run it with more messages to ensure we’re seeing legitimate numbers.

$ python v3.py 2000 < messages.jsonl
2000 messages processed in 3.08 seconds: 649.5msg/s

This easily exceeds our original performance target. There’s a problem, though: we’ve traded away all our durability for pure performance. By putting all our queries into one transaction, any interruption to our application will result in the loss of all data! Worse still, the above approach won’t even work in the real-world application of Firestarter where there’s no end to the data it receives.

v4: Finding a sweet spot

Resolving this isn’t very difficult, but it raises the question: how much data are we willing to lose if something goes wrong? If there’s power loss or a crash in the middle of a transaction, the data written as part of that transaction never ends up in the table. With our V1 code, that meant losing — at worst — one message. With the new speedy option, we’ll lose everything. We need to pick a point somewhere in the middle of that sliding scale of durability vs. performance. To do so, we can rewrite our loop to perform a transaction periodically:

def main(lines):
    connection = engine.connect()
    transaction = connection.begin()
    start_time = time.time()
    for line in islice(sys.stdin.readlines(), lines):
        write_to_db(connection, json.loads(line))
        if time.time() > start_time + :
            transaction.commit()
            transaction = connection.begin()
            start_time = time.time()
    transaction.commit()

It’s not as clean as the prior code. We’ve had to do away with our context manager, and there’s timing code in the loop now, but it’s all for a noble pursuit: better performance. Now we’re just left to pick the period. Fortunately, there are many correct answers here. I somewhat arbitrarily settled on 1 second. Its impact on performance wasn’t too severe.

$ python v4.py 2000 < messages.jsonl
2000 messages processed in 3.42 seconds: 585.3msg/s

v5: Taking things too far

We could stop here, satisfied with almost 2x our initial performance target, but that’s not what I did when I was developing Firestarter; I was just starting to have some fun! I had been bitten by the optimization bug. How fast could we make it? It became an exercise in pride rather than satisfying any business requirements.

Our initial jump in performance came from a classic optimization technique: batching our work. Instead of making many small writes to disk, we make one larger write and reap the performance benefits. Were there other opportunities for batching in the code?

What about batching the SQL queries themselves? There wasn’t really a need for us to execute a SELECT for every message. What if we accumulated the messages for a bit and then checked the table for all the ids at once using the IN operator?

cache = {}

def add_to_cache(message):
    cache.setdefault(message[“id”], {}).update(message)

def write_to_db(conn):
    existing_flights = conn.execute(table.select().where(table.c.id.in_(cache)))
    for flight in existing_flights:
        conn.execute(table.update().where(table.c.id == flight.id), cache.pop(flight.id))
    # We popped the updates, so anything left must be an insert.
    for flight in cache.values():
        conn.execute(table.insert(), flight)
    cache.clear()

def main(lines):
    start_time = time.time()
    for line in islice(sys.stdin.readlines(), lines):
        add_to_cache(json.loads(line))
        if time.time() > start_time + 1:
            with engine.begin() as connection:
                write_to_db(connection)
            start_time = time.time()
    with engine.begin() as connection:
        write_to_db(connection)

Now we’re getting more complicated; we’ve added a cache to the mix. Instead of committing a transaction every second, we flush the cache. In write_to_db(), we now only need to execute SELECT once. Let’s see what that does for us:

$ python v5.py 2000 < messages.jsonl
2000 messages processed in 0.83 seconds: 2417.2msg/s

The endorphins start to rush. We just quadrupled our performance! Let’s bump the line count up again.

$ python v5.py 20000 < messages.jsonl
20000 messages processed in 4.32 seconds: 4631.7msg/s

It just gets better and better.

What about the UPDATEs and INSERTs? Surely those could be batched as well?

v6:⚡️⚡️⚡️

It didn’t take long to stumble across SQLAlchemy’s documentation about executing multiple statements, though adapting the code was trickier than expected:

from sqlalchemy.sql import bindparam

cache = {}
BASE_MESSAGE = dict.fromkeys(c.name for c in table.c)

def add_to_cache(message):
    cache.setdefault(message[“id”], {}).update(message)

def write_to_db(conn):
    existing_flights = conn.execute(table.select().where(table.c.id.in_(cache)))
    updates = [dict(*flight, _id=flight.id) | cache.pop(flight.id) for flight in existing_flights]
    if updates:
        # sqlalchemy reserves the “id” bindparam for its own use
        conn.execute(table.update().where(table.c.id == bindparam(“_id”)), *updates)
    # We popped the updates, so anything left must be an insert.
    # All inserted dicts must have same structure.
    inserts = [BASE_MESSAGE | val for val in cache.values()]
    if inserts:
        conn.execute(table.insert(), *inserts)
    cache.clear()

Updates and inserts could be batched, but all the dictionaries we passed needed to contain the same keys. This required some additional code in the update section since we essentially needed to merge the current and updated fields in Python rather than letting the query do it. It took me a few tries to get that working correctly. Where does our performance stand now?

$ python v6.py 20000 < messages.jsonl
20000 messages processed in 2.53 seconds: 7899.9msg/s

This is around when I came to my senses while working on Firestarter. Having exceeded our initial performance target by 25x, I was really just stroking my ego at this point. Sure, having all this headroom meant we could buy back some durability, but there are other tradeoffs at play here, too.

Tradeoffs

Remember the primary goals I mentioned at the beginning of this post? We wanted this code to be easy to read and understand. In the pursuit of maximum performance, we sacrificed that. And what about correctness? The trickier the techniques that we use, the more likely we are to introduce bugs. As I already said, it took me a few tries to actually get V6 of the code working correctly.

I also haven’t even mentioned the fact that conn.execute(table.select().where(table.c.id.in_(cache))) will fail for caches with over 1000 items if SQLite was compiled with its default flags. Many distributions (Debian, Alpine, Arch) tweak the compile flags so that they are not subject to this issue, but we don’t want Firestarter to be limited to just one of those distributions. We could work around the issue, but that would result in further complicating our code.

Lessons Learned

Striving for performance involves both obvious and not-so-obvious costs. As satisfying as it can be to watch the benchmark numbers tick up and up, we must always keep our primary requirements at the forefront of our minds. Only optimize as much as is needed, and no more.

Making SQLite Upserts Snappy (with Python and SQLAlchemy) was originally published in Angle of Attack.

The Future Leaders of FlightAware. Part 2: Principles, Practices, and Tools for Building Strategic Leadership

Fri, 25 Sep 2020 11:23:48 -0500

Chadd Mikulin is the Director of Engineering for Data and Software Services at FlightAware. For over fifteen years, he has helped grow and promote leaders in the organizations of which he’s been part.

In our previous blog in this series, The Future Leaders of FlightAware. Part I: Structuring the Manager’s Path, we covered some of the high-level strategies we employ to grow the future leaders of FlightAware and why it is so important to us. This time, we’ll discuss the brass tacks of our directed training program: the format for the course, the curriculum we use, and the individual pieces’ pros and cons so that you might choose whether they’re something you’d like to incorporate into your leadership training initiative.

The Tenets of the Course

The class was created with these objectives in mind:

Teach the participants the basic set of behaviors required to get the job done.
Cover material that will allow participants to recognize characteristics in themselves that need to be nurtured or modified.
Pique the participant’s interest in management as a subject and encourage them to learn more on their own.

The class is not designed to make every participant an expert manager as soon as they complete it. That only comes with time, dedication, and introspection. What it will do, however, is send them down the right path to begin their management journey.

The Format of the Class

The format of the class is simple: we meet as a group every two weeks for an hour to discuss the focus topic(s). Before the meeting, everyone reads or watches the material, takes notes, and prepares for the discussion. During the meeting, I facilitate by asking the participants questions to start the discussion or continue it on to other topics that merit attention. I also spend some time each session relating previous experiences that I have and giving more practical examples of the topics being discussed. I also encourage others to do the same so that we can all benefit from each other’s experiences.

We like this approach for a few reasons. First, it fits into people’s schedules. Regular, recurring meetings are easier to schedule and to schedule around. Additionally, the two-week frequency allows ample time to consume the material and ruminate on it. We’re looking to discuss the material in each session, not just regurgitate it. Having enough time to process the reading and form your own opinions about it leads to a much better discussion. Ultimately, the discussion is where the real benefit of the format shines through. It gives multiple viewpoints on the topics, allowing the participants to think about them in new and interesting ways, and potentially leading them to a different understanding than they came in with. In almost every session, someone has an epiphany about a concept, tool, or methodology that grows their ability that much more.

The Coursework

There are three main sources that we use for the course: The Manager Tools Effective Manager video course, The Manager’s Path by Camille Fournier, and The Effective Executive by Peter F. Drucker. There is also some FlightAware-specific material that is included in the course that gives more specifics on how we expect managers to lead their teams related to topics presented by the core coursework.

Effective Manager Video Course, Manager Tools

We start with the entire Effective Manager video course from Manager Tools because it is the most practical of the three. It has short, easily digestible video segments that discuss topics in a way that is meant to train or teach. There are questions to answer, exercises to do on your own, and downloadable documents to support some of the subjects discussed.

In our curriculum, it is used to cover the basics of management and the behaviors a new manager should engage in to be effective. Working through this video course takes the largest number of training sessions, and of the three coursework materials, it is the most important to make sure it is understood fully. It is basically the foundation on which the rest of the learning is built.

It is VERY systematic (that’s the point), which may make it off-putting to some. It can give the impression that the direct sitting across from you is just a set of variables to plug into an equation to get the answer of how to manage them. That seemingly formulaic approach to management may feel less personal to some and therefore may require more discussion of its merits to keep it from being dismissed. But it should not be dismissed. It’s a good way to introduce the fundamentals of management to anyone.

The Manager’s Path by Camille Fournier

After finishing the video course, we next take on The Manager’s Path, chapters one through four. This book is written by an engineer for engineers, and Fournier does an excellent job relating her experiences and expertise. We focus on the first four chapters because they encompass the participants’ journey to this point, covering what to expect from a manager and how to be managed, how to be a mentor, how to run a technical team, and how to manage direct reports.

Fournier sprinkles anecdotes and advice throughout, giving it a relaxed coaching vibe. The participants get to learn from her successes and failures and, as engineers, they’re easy to relate to because they’ve likely had some of the exact same things happen in their careers already. She also discusses the human element of management more in-depth, giving the book a more personal feeling than the Effective Manager video course.

One of the great things about following the video course with this book is their different takes on the practicalities of managing day to day. They have differing opinions on how to do 1:1s and feedback, so it’s a great opportunity for the participants to engage in discussions about the merits of each.

Even if you ultimately decide not to include The Manager’s Path in your management curriculum, I highly encourage you to have all new hires to your organization read the first chapter, especially if this is their first position out of college. It does a fantastic job of level-setting expectations for the direct and manager’s responsibilities in their relationship and will help start the new employee down the right path (pun intended).

The Effective Executive by Peter F. Drucker

The last text we cover as part of the course is frequently touted as one of the best management texts ever written. The Effective Executive has a place in the curriculum because even though it was written well before most of the people participating in the course were born, it still has many nuggets of wisdom that can be mined.

We cover chapters one through four in the class and encourage the participants to finish the book on their own. These chapters cover some concepts that many new managers struggle with, like time management, delegation, and being part of a bigger organization.

The reason we like to include this book in the curriculum is that it’s a stretch. It was written in a different time and the examples (and sometimes the prose) aren’t easy to relate to as an engineer in the twenty-first century. We want the participants to have to work a little hard to tease the advice out of the stories from a bygone era. This makes them really analyze what they’ve previously learned and reinforces those concepts.

The text can be a little redundant after going through the Effective Manager video course; many of their concepts are the same, but there is value in reinforcing them. And there is plenty of new material that is covered that makes it more than worthwhile.

Wrapping it Up

The curriculum we use for this directed management training program is what I would consider 100- and 200-level course material. It also covers some other practical topics specific to how we do things at FlightAware (e.g. annual reviews) so that they have what they need to get started as a manager here. Broadly, though, the topics covered can be used in any organization with a desire to teach their prospective or new managers the basic set of behaviors required to get the job done, get them to recognize characteristics in themselves that need to be nurtured or modified, and pique their interest in management as a subject for career-long learning.

Appendix: Directed Training Program Curriculum

Here is a collected list of the course material we found helpful when creating this program:

Effective Manager Video Course, Manager Tools. We cover the entire video course as part of the class.
The Manager’s Path by Camille Fournie. A modern hands-on manual for both aspiring and existing technical leaders. We cover the chapters listed below but I encourage the participants to read further.
- Chapter 1: Management 101
- Chapter 2: Mentoring
- Chapter 3: Tech Lead
- Chapter 4: Managing People
The Effective Executive by Peter F. Drucker. This book provides timeless insight and coaching for development. Again, we only cover part of the book but highly encourage the participants to read the entire work.
- Chapter 1: Effectiveness Can Be Learned
- Chapter 2: Know Thy Time
- Chapter 3: What Can I Contribute?
- Chapter 4: Making Strength Productive
- Chapter 5: First Things First

The Future Leaders of FlightAware. Part 2: Principles, Practices, and Tools for Building Strategic Leadership was originally published in Angle of Attack.

Securing the Journey to Continuous Integration and Continuous Delivery Using Jenkins Pipeline

Mon, 21 Sep 2020 14:35:59 -0500

As Senior Software Engineer in Test, Lakshmi Raman leads FlightAware's QA team, using Jenkins to maintain and improve software quality.

“The most powerful tool we have as developers is automation.” — Scott Hanselman

Two years ago, we set out to build our first Dockerized application at FlightAware. We needed a tool to automate our Docker build and deployment process. So, the decision to use Jenkins Pipeline (or simply “pipeline”) as our CI/CD seemed natural. At that time, we had a few Freestyle jobs running in Jenkins. To create my first Docker pipeline, I heavily relied on Google examples. That may have been my first mistake: my initial pipeline was a mix of different styles and not very readable. This post is about what we learned as we designed CI/CD with Jenkins Pipeline jobs.

TL;DR

Use pipeline as code with a Multibranch job as it lets you treat your build process as part of your code.
Declarative Pipeline is a more modern approach to creating a pipeline. Always start with declarative syntax and extend it by using libraries.
Make use of shared libraries to reuse your pipeline code across different projects.
Do not tie your build commands to a specific CI/CD tool syntax. Use shell scripts, Makefile, or any other build tools with your CI/CD tool.
Setup a GitHub Organization folder if you do not want to manually setup a job in Jenkins every time there is a new repository in your organization.

What is Pipeline as Code?

The old style of setting up pipelines in Jenkins—Freestyle jobs—was form- and UI-based. Freestyle jobs keep the build script and configuration information in Jenkins XML files. Jenkins has since introduced pipeline as code as a much-needed improvement over Freestyle jobs. Pipeline as code—or build as code—is a concept similar to infrastructure as code and lets users capture their build process in a plain-text file that can be checked into their repository with the rest of their code. Pipeline as code groups tasks into stages, where each stage represents a part of your software delivery workflow. Stages can be executed in parallel, and a visual representation of the whole pipeline is available at the end of the build. None of this can be done easily with Freestyle jobs. Pipeline offers a robust build, test, and deployment tool. Thus, the decision to use pipeline as code over Freestyle jobs is easy.

Scripted vs. Declarative

When implementing pipeline as code, the most confusing part is deciding what flavor of pipeline to use: scripted or declarative. The scripted pipeline DSL is built on Groovy language and offers all the capabilities and extensibility of a full-featured programming language. Declarative Pipeline is for users who are not comfortable working with Groovy. It has a more structured and opinionated syntax. For developers, it is tempting to use scripted syntax over declarative syntax, since declarative syntax does not provide the same flexibility as an imperative programming language. However, I found it is better to use Declarative Pipeline, which is illustrated in the example below:

Scripted

node('duude') {
  timeout(time: 10){
	try{
    	stage('Build'){
        	def myImage=docker.build(“my-image:${env.BUILD_ID}”, “--build-arg token=xxxxx .”)
      	}
 
          stage('Test'){
          	parallel (
                	backend:{
                      	myImage.inside {sh 'cd /home/myapp/js/ ; node all.js'}},
                	frontend:{
                          myImage.inside {sh 'cd /home/myapp/src/test/; python tests.py' }}
          	)  
          }
      	stage('Deploy'){
          	myImage.push()
      	}
	}
	catch(Exception e){
    	echo e.toString()
	}
  }
}

Declarative

pipeline{
agent { label 'duude' }
options { timeout(time:10)
     parallelsAlwaysFailFast()}
  stages{
    	stage('Build'){
      	steps { script { 
                    myImage=docker.build(“my-image:${env.BUILD_ID}”, “--build-arg token=xxxxx .”)
            	}}
    	}
    	stage('Test'){
      	steps {	
                parallel (
               	backend:{ script{
                    	myImage.inside {sh 'cd /home/myapp/js/ ; node all.js'}}},
               	frontend:{ script {
                    	myImage.inside {sh 'cd /home/myapp/src/test/; python tests.py' }}}
             	)  
          }}
    	stage('Deploy'){
        	steps {script {
            	myImage.push()
        	}}	
        }
  }
  post {
  	always {
      emailext (
    	subject:'${env.JOB_NAME} [${env.BUILD_NUMBER}]',
    	body: ${buildStatus}: Job ${env.JOB_NAME} [${env.BUILD_NUMBER}] \n Check console output at ${env.BUILD_URL},
    	recipientProviders: [[$class: 'DevelopersRecipientProvider']], [$class: 'RequesterRecipientProvider']],
  	)
  	}	
  }
}

At a glance, it is not easy to spot the minor differences between the scripted and declarative syntax. In the above code, node specifies the machine where all three stages—build, test, and deploy—are going to run. Timeout specifies the time Jenkins will wait for the three stages to complete. Notice how the scripted pipeline wraps the node and timeout directives around the three stages. Imagine how hard to read the scripted pipeline would get if we had a few more options that we wanted to apply to all our stages. Another important missing piece in scripted pipeline is that there is no post-processing option—you have to rely on try-catch blocks to handle failures.

Declarative Pipeline also gets support from UI plugins like BlueOcean. Our example declarative code visually represented by BlueOcean UI is shown below. The UI clearly shows that the backend and frontend tests runs in parallel. In Declarative Pipeline, you can also restart any top-level stages if the build fails due to some transient or environmental issues. This another added benefit of using Declarative Pipeline.

Shared Libraries

As our pipelines became more complex, it was hard to implement functionality without using Groovy scripts. The one advantage of scripted pipeline is that you can stick Groovy code into any stage. In Declarative Pipeline, you can overcome this limitation by adding Groovy code between script tags. A more elegant solution is to move the Groovy script code to shared libraries. Both Declarative Pipeline and scripted pipeline allow you to use common library. You can even add pipeline templates to your Shared Library.

At FlightAware, several of our teams use the same build patterns. Before long, we discovered pipeline code that was copy-pasted across multiple projects. To fix this, we added these common patterns to our Shared Library repository. We also checked the option to load the Shared Library implicitly. This eliminated the need to directly import the library in our pipeline script.

Shared Library can be implemented as packages and classes. I found it easier to add Shared Library as a Global variable. Adding Groovy script files to the vars directory exposes the name of the file as variable in pipeline code. To move email code out of our pipeline code, we added email.groovy to the vars directory.

/*vars/email.groovy*/
 
/* Notify developers via email */
def call(String buildStatus) {
  def subject = “${buildStatus}: Job '${env.JOB_NAME} [${env.BUILD_NUMBER}]'“
  def details = “${buildStatus}: Job ${env.JOB_NAME} [${env.BUILD_NUMBER}] \n Check console output at ${env.BUILD_URL}”
 
  emailext (
  	subject: subject,
  	body: details,
  	recipientProviders: [[$class: 'DevelopersRecipientProvider']], [$class: 'RequesterRecipientProvider']],
	)
}

Sending email from our pipeline script became much simpler:

//Jenkinsfile
post {
  	always {
       	email currentBuild.result
  	}
}

One important thing to note is that Jenkins uses a special interpreter to execute pipeline code and Shared Library. We would often build code that would run fine in a Groovy console only to see it fail in Jenkins. Groovy script errors are caught only when they are being executed. An error in declarative syntax is caught quickly and Jenkins will fail the job without executing any build steps. For this reason, it is always a good idea to test your pipeline code and Shared Library scripts in Jenkins. The basic pipeline job in Jenkins allows you to paste pipeline code directly into Jenkins to debug it and test its validity. Jenkins also includes a command line / HTTP linter for Declarative Pipelines, and some of our developers use a VS Code plugin to validate their Jenkins files with that tool.

Build Tools

Our pipeline code is more compact. But is it portable? Not so much. In our pipeline code above, if we ever change our CI tool, we will have to move our Jenkins-specific Docker commands to something that the CI tool understands. At FlightAware, we use Makefiles extensively. Adding a Makefile to our project repository with targets for build, deploy, and test keeps the internal details of how to build an image in the Makefile instead of pipeline code. Makefile also provides a consistent interface for build, test, and deploy for everyone using the repository — developers, testers, and the CI tool. Below we have changed our pipeline script to be more CI tool agnostic by using make commands:

pipeline{
agent { label 'docker' }
options {
    timeout(time:10)
	parallelsAlwaysFailFast()
  }
  stages{
    	stage('Build'){
      	steps { sh “make build”}
    	}
    	stage('Test'){
      	steps {
            	failFast true
            	parallel (
                	backend:{  sh  “make frontend-test” },
                	frontend:{ sh  “make backend-test” }
            	) 	
            }
    	}
    	stage('Deploy'){
        	steps { sh “make deploy” }	
        }
  }
  post {
  	always { email currentBuild.result }	
  }
}

Pipeline Jobs

We ended up implementing two different kinds of jobs for our projects: Multibranch jobs for building images and running unit tests, and regular pipeline jobs for deployment. A Multibranch Pipeline job automatically discovers, manages, and executes pipelines for branches that contain a Jenkinsfile in source control. Multibranch jobs, in conjunction with GitHub events for triggering, are great at providing a continuous feedback loop for developers on code check-in.

We don’t do continuous deployment, and our deployment process requires manually selecting both the version and environment before kicking off deployment. One of the things that pipeline scripts did not do well is accept user input at runtime. The UI for accepting user input is very clumsy, both in BlueOcean plugin and base Jenkins. Thankfully, however, regular pipeline jobs can accept user input and work well for our deployment pipelines.

Auto-Discovering CI pipelines

As more teams at FlightAware started adopting Jenkins pipeline for builds, one of the issues teams would run into was that their build was not getting triggered. This is usually due to a misconfigured job or misconfigured webhook. Troubleshooting this issue was frustrating as developers usually did not have admin access to GitHub or Jenkins. To simplify the setup for developers, we configured a GitHub Organization job. This job scans all of the organization’s repositories and automatically creates managed Multibranch pipeline jobs for repositories that have a Jenkinsfile. We also set up webhooks at the GitHub organization level so that all the repositories under the FlightAware organization will send a webhook event to Jenkins on commits and pull requests. Jenkins auto-discovering new repositories to build was a much-welcomed improvement to our build process and matched what other CI/CD tools in the market like Travis and GitHub Actions already offered.

In Conclusion

At FlightAware, we have many projects that are implementing pipelines with success, although it takes a few iterations in Jenkins to figure out the right way of configuring pipeline—or any other job, for that matter. This is primarily because there are numerous plugins to choose from and the Jenkins UI feels outdated and nonintuitive. But once you have the process working, it is easy to replicate the same configuration for other projects. I personally think Jenkins does continuous integration—automated builds and test—well. However, as FlightAware moves toward container orchestration using Kubernetes, it remains to be seen if Jenkins will continue to be part of the continuous deployment process.

Securing the Journey to Continuous Integration and Continuous Delivery Using Jenkins Pipeline was originally published in Angle of Attack.

Systems Monitoring with Prometheus and Grafana

Tue, 11 Aug 2020 13:10:03 -0500

Caroline Rodewig and Garrett McGrath are members of FlightAware's Flight Tracking Wing. As lead for the Predictive Technology Crew, Caroline works to expand the scope and quality of our predictive applications. Garrett is the lead for the Aviation Platform Crew and responsible for the performance, reliability and observability of multi-machine Hyperfeed.

Monitoring Hyperfeed: The History and Challenges of a Distributed System

The Aviation Platform crew (for an explanation of crews and wings, see this blog post) revolves around the performance and reliability of Hyperfeed^[1]. Hyperfeed is FlightAware’s core flight tracking engine. Its output powers the company’s most salient offerings: the flight pages on the website, Firehose, FlightXML, and flight alerts. So if Hyperfeed stops working, so does much of FlightAware. Given the impact of a Hyperfeed outage, we need to know immediately if something goes—or might go—wrong so that we can act swiftly and precisely to remedy the situation. For maintaining the requisite accuracy and timeliness of Hyperfeed’s output, we rely heavily on monitoring^[2]. To that end, we have recently begun adopting Prometheus, a time-series metrics database, and Grafana, a powerful visualization platform. The introduction of these tools marks a significant upgrade and improvement to Hyperfeed’s current monitoring stack. It is hoped that by sharing our experience and the reasons for adoption that we can help inform the decision-making of others in an analogous situation.

Before discussing Prometheus and the many enhancements it brings to our monitoring of Hyperfeed, it helps to provide some context by examining what we currently use and how it has evolved over time. In its current form, Hyperfeed is a distributed system composed of a constellation of more than a dozen components and sub-systems spread across multiple machines for performance and fault-tolerance. Its sub-systems include a number of other well-known services: Postgres, Zookeeper, Kafka, and RabbitMQ. It wasn’t always like this, though. When I started working on Hyperfeed in 2016, it was not distributed; its components all ran on a single machine (this is not strictly true as it also used a Postgres database running on a separate server). Naturally, monitoring at that time was relatively less complex: it had markedly less surface area to cover. We were able to do the job effectively with a combination of custom, in-house tools and a heavy dose of Zabbix.

Our custom monitoring tools mostly consisted of some scripts that followed Hyperfeed’s output and its logging statements looking for problems, e.g., the output has stopped completely, the output is out of date, or an excess number of tracebacks happened. These tools were specific to Hyperfeed and did not generalize well to other software. Zabbix, on the other hand, has been FlightAware’s primary monitoring and alerting framework. It is used throughout the company and not specific to any piece of software. When applied to Hyperfeed, Zabbix monitored some key characteristics of the server running the application such as CPU, memory, disk, and network usage. Moreover, the aforementioned custom tools fed their data into Zabbix so that they could be used for alerting an engineer if something went haywire. On top of that, Zabbix gave us a basic web interface for viewing the data it collected in a graphical format. Back when Hyperfeed fit snugly on a single server, this sort of monitoring served us well: Zabbix was a one-stop shop for all things monitoring. But as the demands placed on Hyperfeed grew, so did its operational complexity.

Eventually, a single-machine architecture was not enough to handle the increased volume of Hyperfeed’s input, which was precipitated by FlightAware acquiring new data sources, and expanding its existing data sources. In order to accommodate this change in scale, it was necessary to distribute Hyperfeed across multiple machines. When this happened, we continued using (and still do use) custom tools and Zabbix for monitoring. Our custom tools, though, and the data we send to Zabbix necessarily expanded, too. Although distributing Hyperfeed across multiple machines was necessary and improved the performance and reliability of the system, it also brought with it new challenges and difficulties, particularly in terms of monitoring^[3].

Given their origin, our custom tools were not designed with a distributed setting in mind. As such, they lacked the ability to easily aggregate data across multiple machines and were somewhat cumbersome to operate as Hyperfeed’s architecture progressed. In addition, Zabbix started creaking and revealing some rough edges as we placed more demands on it. A core issue was that Zabbix does really well at monitoring particular machines but not as well at system or application level monitoring, especially in a context where services can move around from machine to machine as failures occur or as an intentional part of the design. In a distributed setting, it also imposes non-trivial operational burdens, e.g., when adding new monitoring points or spreading the collection load across machines. Moreover, Zabbix combines all monitoring functions into a single monolith rather than the current best practice of loosely coupling monitoring tasks like collecting data, long-term storage, querying data, alerting with deduplication and aggregation, and visualization/dashboarding (for further details on monitoring system best practices, see the Google SRE workbook’s monitoring chapter).

As Hyperfeed moved to a distributed architecture, a change driven by business demands and an increased volume of flight tracking data, it became clear that our monitoring tools, while mostly satisfactory in alerting us when something failed in the system, neglected to provide us with many of the other benefits that monitoring can offer. In particular, we were weak in whitebox monitoring and visualization^[4].Whitebox monitoring is especially important for analyzing long-term trends, comparing experimental groups, and in guiding capacity planning and other engineering decision making. Some information can only come from the vantage point provided by whitebox monitoring. For visualization we had Zabbix, but its capabilities there are fairly limited and inflexible. Given all of this, we started looking for a new solution to fill in the gaps in our monitoring approach, something designed for dynamic, distributed systems. We chose Prometheus and Grafana. This combo of tools addresses the shortcomings of our current approach, and also brings with it a number of additional virtues.

Upgrading Monitoring with Prometheus and Grafana

Prometheus: Time-Series Metrics

In an idealized monitoring practice, every event or state of a system is recorded so that it is available if needed: no detail is left out. Unfortunately, this only works in a hypothetical, theoretical scenario since it is impossible to store, process, and work with the quantity of data generated continuously by modern systems. Instead of capturing everything, one must grapple with what to monitor: how to turn the endless torrent of events in a system into something comprehensible, manageable, and actionable. To this end, Prometheus provides time-series metrics, which track aggregations of events in a system over some period of time, where an event refers to anything that might occur and can be quantified or measured. Some example events include a function getting called, CPU usage going up, a web request being made, a client connecting to a service or a cache hit happens.^[5] Fundamentally, the monitoring offered by Prometheus periodically takes a snapshot of some of the events in a system, typically measuring how often or how quickly each event occurred. The value of a snapshot at any given point is often of less interest than its relative value: how the frequency of an event changes over time provides the actionable insight and value of metrics-based monitoring. To make this more concrete, in the case of flight tracking some example time-series metrics of interest might be:

Departures per second
Arrivals per day
Cancellations per minute
Diversions per hour
Flight plans created per second
Input messages processed per second
Output messages generated per second
99th percentile of the time taken to process an ADS-B position over the last 6 months

Note that in order to keep the amount of data captured manageable, the contents of the input messages that trigger each of these events get aggregated away. With metrics, the rate of departures, for example, are measured throughout the system, but any single departure cannot be investigated. As succinctly summarized by Datadog, metrics provide “the big, broad-stroke measures that can help you quickly answer the most pressing questions about a system’s internal health and performance.“ Metrics can also capture resource utilization, like CPU, memory, or disk, and metadata like code revisions and configuration values. Time-series metrics, despite necessarily omitting a lot of detail, are still quite versatile, yielding practicable monitoring data for critical activities throughout a system.

Data and Operational Model

One of Prometheus’ main selling points is that it provides a powerful and conceptually straightforward data model for extracting metrics from a system of interest and tracking their rate of change over time. Each time-series metric tracked by Prometheus includes the following information:

Metric name: a string indicating what the metric is for, e.g., hyperfeed_departures_total.
Help documentation: a string describing what the metric is for or where it comes from.
Metric type: four types are provided, but two of them provide the atomic building blocks of the others. The fundamental types are counters, which can only go up or reset, and gauges, which can be set to any value.
Labels: key-value pairs attaching additional dimensions to the metric. Labels provide a way of affixing more detail to the metric than can be conveyed by the name alone. The quintessential example is the URL path for a metric measuring a web server’s response time. With a path label, it is possible to track the response time for a particular URL but also to aggregate across all labels and view response time for any path. Labels provide additional granularity to metrics and provide enormous power at query time.
Metric value: a float64 value. This is the value tracked over time.
Timestamp for when a particular value of the metric was obtained.

With the Prometheus data model at hand, the next major selling point, as hinted at in the previous section, is its relative ease for exposing whitebox metrics directly from the application or system it is monitoring^[6]. Typically, this is accomplished by using a Prometheus client library in the language(s) of the system being monitored. All of the most popular languages have a library already, but FlightAware is a heavy user of Tcl, a lesser-used language according to the TIOBE index. Tcl does not have an open-source Prometheus library available, so I had to write one in order to take advantage of what Prometheus has to offer^[7]. Thankfully, the Prometheus project has excellent documentation with detailed guidance for library creators, so the process of writing the library, while not trivial, was relatively simple and straightforward. With a client library at hand, the next step is to use it throughout the application being monitored, and then get those metrics into Prometheus.

Getting data from a client library into Prometheus requires that a process expose its metrics over HTTP and that Prometheus knows the hostname and port where it can request metrics^[8]. Once the data gets to Prometheus it can be visualized using Grafana, alerted on using a separate component maintained by the Prometheus project, and queried and explored. Instrumenting an application with a Prometheus library is a relatively cheap operation in terms of computation and memory. Since metrics aggregate values over time rather than build up a sequence of values to send in a batch, there is a constant memory cost to a given set of metrics; the computational cost is very minimal, although not free^[9].

Aside from instrumenting code directly with a client library, there are a number of so-called exporters available for exposing metrics from programs not written with Prometheus metrics in mind, e.g., Postgres, Kafka, or Zookeeper. Exporters act as translators, observing an application directly with queries and probes, converting the output of those observations into a format that Prometheus understands, and exposing the resultant metrics over HTTP in response to scrapes from a Prometheus server. With exporters in the mix, many of the most common open-source servers can be easily monitored with Prometheus metrics. If an exporter does not exist for a particular service, it is fairly easy to write a one and Prometheus’ official documentation provides some guidance on doing so. Not only does Prometheus benefit code that can use a client library, but the entire project comes with a very active ecosystem of offerings for getting whitebox metrics out of our systems.

Monitoring Improvement Case Studies

At this point in the discussion, although the power of Prometheus has been outlined—it is conceptually straightforward but powerful, boasts strong visualization support via Grafana, was built for dynamic, distributed systems, generalizes to other wings at FlightAware, has equally strong support for whitebox and blackbox monitoring and excels at systems-level observations and analysis—it still might not be clear what benefits Prometheus provides over the current Hyperfeed monitoring system. To militate against any doubts of this type, we will now look at two specific cases where Prometheus supersedes our current praxis and contributes to our general excitement and enthusiasm about adopting it for monitoring one of the core pieces of FlightAware’s infrastructure.

Surpassing Custom Tools

The first case involves an in-house library, currently out of commission, but used for several years in Hyperfeed for whitebox monitoring. This in-house library provided an event counter: its API took the name of some event of interest and behind-the-scenes it would increment an integer. These integer counters were stored in the Postgres database used by Hyperfeed for shared state; database views aggregated the counters by day. While this tool had a lot to commend it—tracking the count of events over time is a seemingly simple but immensely powerful technique—it also suffered from a host of problems that led to its retirement: it ended up negatively impacting performance, its data model made aggregation and rate-calculation difficult, and we lacked suitable visualization for exploring its data. When we obsoleted this tool, we did so with the intention of resurrecting it in a newer, more improved form; Prometheus does just that by solving all of the aforementioned problems (plus it allows for even more and isn’t limited to the use case the in-house tool was developed for).

For tracking the count of a particular event over time, Prometheus provides a counter data type^[10]. Counters can only increase or reset; the value of a counter at any given time is not interesting or important, but how the value changes over time is really what matters. While this might seem like it would be uncomplicated to implement, there are various subtleties to work out related to missing data or counter resets that Prometheus mercifully takes care of for us. Whereas our in-house tool stored the counters in Postgres and eventually caused unacceptable performance problems, Prometheus counters are only kept in memory for the process incrementing them and then collected and stored by the Prometheus server, which tidily solves that issue. In addition, our in-house tool did not capture dimensionality of data very well but Prometheus, through its use of labels has this feature built into its design. By using Grafana, these counters and their rates of change can be easily and powerfully visualized, thereby solving all of the issues of our in-house tool and bringing it back in a new and vastly improved form.

Visualization Modernization

The second case where Prometheus and Grafana enable a major improvement in systems monitoring for Hyperfeed involves measuring and visualizing the rate of Hyperfeed’s input. In order to provide a global picture of flight tracking, FlightAware combines data from several dozen different sources. A key part of monitoring Hyperfeed, then, is monitoring its input, measuring its total rate and decomposing the rate into subsets of data sources. Before Prometheus and Grafana this was done through Zabbix. While Zabbix could collect data about Hyperfeed’s data without issue, it struggled immensely with visualization given the sheer number of data feeds; consequently, it did not readily support data exploration. Prometheus and Grafana, however, handle this task exceedingly well and easily.

For one thing, in Zabbix, the metrics collected were of the form input.data feed name. This makes querying either the aggregate input rate or a subset of input rates quite clumsy. Prometheus’ data model, on the other hand, concisely encodes this with labels as input{feed=“data feed name”}. Aggregating all the input feeds is easily expressed in Prometheus' query language as sum(rate(input[5m])) by (feed); querying a subset of feeds is just as simply expressed through a regular expression on the label values. Grafana is tightly integrated with Prometheus and makes it simple to generate graphs for any combination of input data sources. These news tools, then, completely and effortlessly give us the monitoring capabilities lacking in our current approach.

These two specific examples of the benefits of Prometheus and Grafana both in general and relative to our current monitoring stack encourage us in adopting these tools. On top of the aforesaid monitoring upgrades, Prometheus and Grafana also provide a number of additional pluses. For one, we can start using Prometheus incrementally; we do not have to completely get rid of our current monitoring tools before we can move it into production. This is an immense benefit since we already have a lot of alerting in place in Zabbix that we need to carefully transition away from in a measured, piecemeal fashion. At the same time, the development of a Prometheus Tcl library means that it now becomes available to other crews and wings who can use our example as guidance. Grafana has already been in use by other crews, so the buy-in across the company of this potent visualization tool helps make our dashboards more accessible and understandable across FlightAware. As an added benefit, the ease of visualization and the first-class support for Prometheus in Grafana encourages data exploration in ways simply not feasible with our current tooling.

Lastly, since Prometheus was built for distributed, dynamic systems and aims to be operationally simple and conceptually intuitive for the majority of use cases^[11], it makes it possible for our relatively small team to spend more time on application development and less time tinkering with and tending to our monitoring infrastructure. While still at the adoption phase for Prometheus, we have already started reaping its benefits: the future looks promising and we will continue employing it for gaining the insights and observability we need for maintaining Hyperfeed’s reliability and operational health as it continues to evolve and progress.

No Panacea: Limitations of Prometheus

Thomas Sowell once asserted that “there are no solutions, only trade-offs.”^[12] Sowell wrote those words in a context far removed from systems monitoring, but his aphorism applies remarkably well to software and systems engineering. In the case of Prometheus, although the enthusiasm for its use is high and well deserved, it is important not to get carried away with optimism and fail to recognize that, like all other technologies, it has its drawbacks and cannot be the solution to every monitoring problem. Central to the limitations of Prometheus is its choice to use metrics for monitoring. Metrics, although incredibly powerful and useful for systems monitoring, necessarily leave out a lot of details about the events in a system. For debugging or post-hoc analyses these details are often exactly what is needed, which is why logging is an important supplement to metrics. Driving the point even further, all metrics-based solutions have to deal with the problem of cardinality. Since metrics aggregate events to cope with the deluge of events and states in a system, they cannot proliferate indefinitely. Labels on Prometheus metrics, for a specific instance of this issue, end up creating a separate time-series for every distinct combination of label values. This means, then, that one of the most compelling parts of the Prometheus data model cannot be used without caution: the total number of values for a label must be kept finite and within reasonable bounds to keep the system responsive and functional. With this restriction in mind, Prometheus, while incredibly useful for a variety of monitoring and analysis tasks, cannot cope with a lot of questions that might be asked of a system. This is part of the reason why the Predictive Technology crew, which needs to monitor machine learning predictions for quality, and at a very granular level, could not use Prometheus.

Overview of Predictive Technology

The primary objective of the Predictive Technology crew is producing flight ETA predictions for both landing times (EON) and gate arrival times (EIN). For any given airport, we train two models, one for EON and one for EIN, which make predictions for flights inbound to that airport. These models are then evaluated in real-time by the “streamer,” which builds input samples, feeds those samples to the models, and sends the resulting predictions to Hyperfeed. Each streamer typically supports up to 20 airports at a time, so to gain coverage we run multiple copies of the streamer in parallel.

Because of the heavy use of machine learning in this software, monitoring became even more critical than normal. Not only do you need to manage typical software conditions, like whether the system is running, hung, or stopped, you also need to track the accuracy of the machine learning models themselves. Models can have short bursts of inaccuracy, e.g. if they fail to anticipate the impact of a huge storm; they can also drift slowly over time as real-life behaviors change. Being aware of both of these types of changes is critical to having confidence in your predictions.

This time last year, the Predict crew was relatively new at FlightAware and its production services were small and simple. There was only one customer that used predictive times, and supporting them required running a single model. At this scale, it didn’t make sense to instrument a whole monitoring solution; instead, the streamers monitored their own accuracy and reported metrics to log files. A human could read them easily and verify that the one critical model was performing as expected.

This was abruptly rendered inadequate when we gained customers that required us to produce predictions for 600 airports. High availability was a critical component of the service, and to aid in this we chose to run models redundantly. In total, this scaled up our production infrastructure from a few hosts to 24 hosts, totaling 120 streamers. It vastly increased the number of critical models to keep an eye on. Software complexity was added when we needed to introduce prediction rate-limiting in Hyperfeed to ensure the customers weren’t overwhelmed by messages updating an ETA by only a few seconds. With all these changes, it was obvious that looking through streamer log files wasn’t going to cut it anymore; we needed a more comprehensive approach.

Goals for the New Monitoring Solution

There were several features we wanted to ensure our new system included that would solve our pain points:

Track error over time on a per-airport basis
Track error changes over the course of each flight: e.g. what was the prediction error two hours before the flight landed? One hour before the flight landed?
Auto-discovery of metrics: for example, when an airport is passed in as a key, a series is added if that airport hasn’t been seen before
Query and display graphs reflecting prediction error
Alert on high (or higher-than-normal) error

Our first thought was to use Zabbix for this. Most systems at FlightAware report in to Zabbix in one way or another; we were already using it to track the latency of the streamers. It also has the built-in graphing and alerting capabilities we were looking for.

However, we quickly realized that Zabbix was not the tool for the job. While it has some auto-discovery features, we found them difficult to use, especially in our case of needing >1200 error metrics. More critically, our Systems crew had serious doubts that Zabbix would be able to handle the load of all the metrics we wanted to track. We make predictions for around 75,000 flights per day; if we only stored two error values per flight (much fewer that we wanted), it would require making 100 inserts per minute.

Monitoring with Grafana

These issues with Zabbix led us to look into Grafana as an alternative monitoring solution. Right off the bat we were impressed by its visualization and alerting capabilities. We liked that it supported a number of different database backends, and its extensibility through variables seemed like the perfect way of handling all of our different metrics.

Like the Aviation Platform crew, we first explored using Prometheus as the backend. It was easy to get up and running, and with its built-in visualization tools, it was trivial to explore the data from day one. However, we quickly started running into limitations; the first, and ultimately deciding factor, was that Prometheus is a pure time-series database. It has no support for relational queries (SQL JOINs). With all of the visualization opportunities that Grafana provides, we no longer just wanted to track the error for each flight—instead, we wanted to record each prediction that was made, and the eventual arrival time of the flight, and join them together later to get the error over time. Our other major concern was the number of labels, and the number of values for each label, that we wanted to store. We wanted to store ON and IN predictions for 600 airports, at least 10 times throughout each flight—this works out to 12,000 different label combinations, each of which would have to be stored as a time series, which Prometheus is not currently designed to handle.

We ended up picking TimescaleDB as the database backend instead. TimescaleDB is a time-series extension to PostgreSQL: it chunks time-series tables behind your back, which gives queries a pretty nice performance boost compared to pure Postgres (see this Medium article for more details). In all other ways, it is “just” Postgres—this was ideal, not only because it supports relational queries, but also because FlightAware was already using Postgres for the majority of its databases. We already had a lot of institutional knowledge and familiarity with how to set it up, configure it, tune it, and so on, which made it easy to adopt.

With our choice made, we were able to set up an optimal schema for our problem. Relational queries allowed us to store arrival times in one table and predicted ETAs both from our machine learning models and third-party estimates in another. With Grafana, we alert on high ETA error on a per-airport, per-airline, and per-time out basis; we can compare our predicted ETAs directly to third-party estimates. It gives us confidence in our predictions and provides direction for improvement.

Footnotes

For further details about the inner workings of Hyperfeed, consult Zach Conn’s excellent paper presented at the 23rd Tcl Conference. Major architectural changes have occurred since that paper was written, but most of its content is still applicable and current. ↩︎
For a more general introduction to monitoring distributed systems, a good starting place is the SRE book’s chapter on monitoring. A revealing quote about the inherent challenges of monitoring a complex system: “Even with substantial existing infrastructure for instrumentation, collection, display, and alerting in place, a Google SRE team with 10–12 members typically has one or sometimes two members whose primary assignment is to build and maintain monitoring systems for their service.” ↩︎
Leslie Lamport wryly described a distributed system as “one in which the failure of a computer you did not even know existed can render your own computer unusable.” While humorous, the truth at the heart of his quip—that distributed systems introduce new, frequently inscrutable errors—is undeniable to anyone who has worked on distributed systems. ↩︎
Whitebox monitoring comes directly from the system being monitored. In the case of Prometheus it comes straight from the code itself. This is opposed to blackbox monitoring, which was our focus before Prometheus. Blackbox monitoring comes from an outside observer of a system. It is what a client would see when interacting with the system or what can be observed from the system’s output. ↩︎
In contrast to metrics, other types of monitoring include profiling, which collects a lot of high fidelity information for a short period of time, logging, which can mean a lot of things, but typically contains more dimensions and detail than metrics (and more processing and storage), and traces, which save a lot of detail for a subset of requests and help especially in microservice designs. Every monitoring type has to make a foundational decision about much detail to store and process. ↩︎
Prometheus can just as easily support blackbox monitoring, but its potency derives from the ability to collect metrics directly in an application’s code. ↩︎
This Tcl library will be released to FlightAware’s public Github in the near future. Once that occurs, this footnote will be updated accordingly. ↩︎
In the monitoring world, particularly when it comes to time-series metrics, there is a semi-religious debate about the relative merits of push vs. pull systems, where the distinction comes from how the metrics get to the monitoring system. Since a Prometheus server requests metrics over HTTP it is a pull system. Ultimately the difference is not all that important, but, for an interesting discussion, see this blog post by a core Prometheus developer about the topic. ↩︎
Brian Brazil, the founder of Robust Perception, a consultancy for Prometheus, in the book Prometheus: Up & Running recommends no more than 10k metrics per process, so there are limits, but that is, thankfully, an impractically high number for many cases. ↩︎
While our in-house tool only provided a counter, Prometheus also provides a gauge, which is a value that can go up or down and cumulative histograms, so it replaces our old tool and adds even more. ↩︎
A Prometheus server is meant to run as a single, statically linked binary per data center (or whatever failure domain makes sense) without reliance on distributed storage or consensus algorithms. One server can comfortably handle thousands of machines' metrics in a typical setup. There are also means of scaling out a deployment further if necessary. ↩︎
Sowell, Thomas. A Conflict of Visions. Basic Books, 2007. ↩︎

About the authors

Systems Monitoring with Prometheus and Grafana was originally published in Angle of Attack.

The Future Leaders of FlightAware. Part 1: Structuring the Manager’s Path

Tue, 21 Jul 2020 13:12:58 -0500

This is the first in a series of posts about growing the future leaders of FlightAware. In this first installment, we’ll cover why this is important to us as a company and some high-level strategies we employ. In future posts, we’ll cover the details of some of those strategies.

Investing in Our Own

FlightAware is growing. We’re constantly looking to add great people to our company. Finding them is one problem, but once you do find them, to whom are they going to report? Any substantial growth requires an expansion of the overall structure, with new teams and people to run those teams.

This is a great problem to have. You get an opportunity to do some noodling on how you should structure your organization and how it should work to produce results. See more about that in the blog post, Crews, Wings and Alliances. Part 1: The Principles of How We Work. After that, though, where do you find the managers to fill out the different positions that are created by the expanding organization?

At FlightAware, our goal is to promote from within as often as we can. The reasons? They already know a tremendous amount about the organization, so you don’t have to spend time to ramp them up. They understand the company culture and the values we want to promote because they’ve been living in it. They have also already built a network within the organization which allows them to know who to go to for what and have existing relationships that can be leaned on as they work to get things done. Finally, they already have a reputation for delivering results. They are already respected across the part of the organization they deal with; the scope of which could be their current, small team, or, if you’ve been doing your job and getting them broader exposure, a significant part of the organization.

However, just because you promote someone into a position doesn’t mean they have all the skills they need to be effective. But what does it really mean to be effective? At FlightAware, to be effective as a manager, you must create a safe and productive environment that makes your directs want to continue to achieve. We want everyone at FlightAware to be able to share ideas with their coworkers and to not hesitate to do so. When someone makes a mistake, we want them to recognize it, own up to it, and work to figure out a solution, not worry about getting blamed for causing a problem. And we want everyone to be productive and find their job fulfilling. These methodologies allow us to retain the great talent we have, and our management team is charged with making sure that we’re fostering the environment that will make that happen.

Imagine the benefits of this line of thinking, not only for the company, but for the employee. When I was promoted into my first management job, I definitely was not ready. I never received any formal training or had even been recommended to read anything on the subject. Many of the mistakes I made early on could have been avoided with the right support system. Like the company I got my first management position at, many organizations don’t do a great job of getting people ready for the next step in their careers. We don’t want FlightAware to be one of those organizations, so how do we make sure someone is set up for success? We think that anyone who expresses interest in pursuing management should get their feet wet well before it’s time to manage people day to day. We want to be able to have people on deck that can take on a leadership role when an opportunity arises and SUCCEED.

The Path

So how do we set potential managers up for success? We have created an opportunity for potential future leaders to gain some of the knowledge they need through a formal training course. We think there are a lot of qualities that someone needs to be a good leader, and we want to give people the opportunities they need to become one before they are called upon to do so in a structured setting.

Let’s talk about the leadership opportunities that everyone at FlightAware has as they progress in the engineering organization. We follow this process so that our employees can gain valuable skills and, ultimately, make an informed decision about whether they want to take on a management role. We want to start them off small and build up their leadership skills prior to taking on the responsibility of being in charge of an entire team’s growth and retention.

Mentorship

First, we give them the opportunity to mentor new folks to FlightAware. Having a good onboarding process is important for a short and successful ramp-up period (and is a blog post in itself). Providing a mentor for that new hire is a crucial part of that. Being a mentor helps develop the ability to build relationships and offer support to someone that needs help, while still letting the mentee learn things on their own. It also helps the mentor with time management skills that they’ll need later as they take on more leadership responsibility.

Formal Leadership

Next, we encourage them to act as Coordinator for one of the big learning initiatives that are constantly going on at FlightAware, which we call Alliances: small groups of people focused on learning about a specific topic that meet regularly to discuss their findings and share related work. Acting as coordinator gives them a low-stakes way to plan work and coordinate the activities of others. They can create a reading schedule and assign the work for different sections. They also will likely need to work to get buy-in and make adjustments to the schedule as people’s availability changes. Both are crucial skills for leaders.

After that, we like to have them act as tech lead on projects that take more than one person, either within their Core Crew or on a Crew that encompasses members from different Wings. This starts to give them an idea of what it means to be responsible for a group’s output and really helps them flex their leadership muscles.

Directed Training

Once they have some practical leadership experience under their belts and decide they want to pursue it as a next step in their career, we want to give them a good primer on the theory of what it means to manage people full-time. That’s where the formal training course comes in. Being responsible for a group’s technical output is one thing; being responsible for their career growth and retention is something different. They need to know how to conduct 1:1s, a crucial tool for successful relationship building, how important feedback is and how to give it, and a host of other methods that will help them to understand the behaviors they need to go through. And that’s the secret. You can learn to be a successful manager. It doesn’t have to come naturally. You just have to know what behaviors to engage in and do them reliably. That’s what we work to teach in the formal training program.

Delving into Directed Training

In the next installment, we’ll discuss the brass tacks of our directed training program: the format for the course, the curriculum we use, and the individual pieces' pros and cons, so that you might choose whether they’re something you’d like to incorporate into your leadership training initiative.

The Future Leaders of FlightAware. Part 1: Structuring the Manager’s Path was originally published in Angle of Attack.

Driving Reliability at FlightAware

Thu, 02 Jul 2020 11:00:00 -0500

Sean Kelly is FlightAware’s Senior Director of IT Operations and Reliability

Introduction

But Why?

Given all of this and its increased prominence in the industry, SRE was an obvious choice for us.

Houston, We Have an Incident

Let’s break Firehose now as a hypothetical example and see how our pre-SRE Systems Engineers would respond.

Firehose Is on Fire

After confirming the situation, Archer runs the program for notifying customers. Notifications are generated and sent, so Archer continues trying to figure out what is broken.

How We Have Changed

SEV-1: Critical issue that warrants public notification and liaison with executive teams. An example would be that all our Firehose instances are down across all data centers.
SEV-2: Critical system issue actively impacting many customers’ ability to use the product. An example here could be that no new customer connections can be established to Firehose, but existing ones are still working. This example also has a high likelihood of escalating to a SEV-1 as the impact grows due to customer connection churn.
SEV-3: Stability or minor customer-impacting issues that require immediate attention from service owners. This could be the total failure of Firehose at a single datacenter. Customers can still connect to Firehose at other locations and, in many cases, will already have an active redundant connection at another site.
SEV-4: Minor issues requiring action, but not affecting customer ability to use the product. This could be a single failed Firehose instance. The customers can still connect to other ones.
SEV-5: Issues or bugs not affecting customer ability to use the product. This could be the loss of a single disk in a Firehose server. It doesn’t impact the service, but we are at an increased risk of failure.

Any SEV-1s and SEV-2s automatically require a trip through our postmortem process, but more on that later.

Incident Commander: Coordinates the incident response. Ensures things move forward, arranges for any needed resources to be obtained, and otherwise keeps things unblocked.
Communications: Provides updates to customers and internal stakeholders.
Scribe: Documents the incident, ensuring we have a timeline and records any changes.
Worker: Researches and responds to the incident. These are the ones digging through logs, running tests, and trying to figure out why Firehose has raised an incident.

In our scenario above, Archer was the incident commander, communications, scribe, and worker. When Lana got on the scene, she also joined in a worker role.

So, with just these changes, let’s take a look at our incident again.

Firehose Is on Fire: Again From the Top

Incidents Are Better

Postmortems

How can we reasonably prevent this type of problem from happening again?

Any immediate or urgent actions?
Any action items that should get rolled into an upcoming release?
Any longer-term action items that require planning, research, or a prioritized project?

What went well in the incident response process?
What didn't go well in the response process?
What should we do differently next time?
Did we learn anything?

FlightAware and I have two requirements for postmortems:

They are blameless. Nobody is at fault for the incident. Humans make mistakes, so you need to accept that it will happen and design systems that are reasonably tolerant to our ways. Nobody is put on the spot or deemed to be the cause of an incident. The time I accidentally shutdown the wrong PostgreSQL server and caused an outage wasn’t an opportunity to admonish me but rather figure out how to make that harder to do. And yes, I did this.
Postmortems are not punitive. This dovetails with the point above, but is also a harder one to land. When somebody is on the hook to write up a document outlining what happened, it has the potential to feel like a homework assignment. The right atmosphere and communications have to be put in place to instill the value of the postmortem. The opportunity to write one is actually a powerful voice on making recommendations for improvements. It is not, and never should be, a punishment.

Fortunately, the blameless approach has always been a core value of FlightAware engineering. It is a value I communicate as much as possible.

Our postmortem process is still evolving, but is currently comprised of the following phases:

Initial writeup: The incident commander gathers all of the facts and data about the incident. This includes the timeline, cause of the incident, what changes were made to resolve it, start and end time, who was involved, etc. We aim to have this completed and peer reviewed within two business days. If there are action items identified that need addressed immediately, they will also be generated and prioritized.
Deep dive: Now that the facts are understood and agreed upon, the SRE team does more of an introspective pass. A collaborative discussion is carried out to identify any process weaknesses, patterns, and greater or underlying issues that may need to be addressed. Some medium- and long-term action items may also be captured.
Review: Every two weeks, we have a scheduled incident response meeting attended by the Vice President of Engineering, all Engineering group leads, and our SRE lead. Everyone who participated in an incident response since the last meeting is invited for a discussion and review of the incident. This goes beyond Site Reliability Engineering and has participation from developers as well. By this time, much of the incident is ironed out, understood, and has associated action items. This is a last pass for questions, concerns, feedback, and a general way to keep tabs on how we’re doing with incident response.

Wrapping Up

Driving Reliability at FlightAware was originally published in Angle of Attack.

Crews, Wings, and Alliances. Part 1: The Principles of How We Work

Wed, 17 Jun 2020 16:36:00 -0500

James Sulak is FlightAware’s VP of Engineering.

In early 2019, as our engineering team grew, it was becoming clear that our old ways of working were beginning to stretch and fray. Our engineering team was about 50 people, divided into 11 teams, and we were hiring at a net rate of about 1.5 engineers a month. We knew we had problems to solve to ensure that we could continue to produce high-quality work at a rapid pace.

This post is the first in a series telling the story of how we evolved our organization to enable growth, foster individuals’ personal career development, and of course, build great things.

Our challenges

We faced several challenges:

‍Team size. Several of our development teams were reaching ten people in size. We think smaller teams work better (this has been codified elsewhere as Amazon’s “two-pizza rule”), so we needed to create new teams. But…‍
Our teams were organized by functional competency. We had a team of web developers, a team of backend developers, a team of mobile developers, etc. This method worked at first because, at our size, this effectively organized us by product as well. But unless we wanted to have a development team of 12+ people, we now had to face down the choice of organizing by functional competency, product, or something else.‍
We wanted to do more cross-functional work. We were struggling when attempting to build new products cutting across competencies—it required a lot of formal coordination between teams to make it happen (think meetings and passing Jira tickets between teams), which slowed things down and led to dropped balls.

These are interesting problems! We’d already been inching our way towards a new way of working, but it was time to look more intentionally at what we were trying to accomplish—and along the way, learn more about who we were and who we wanted to be.

We came up with a solution that captures what we value most in how we work together and evolves with us as we continue to grow.

What we created were the building blocks of Crews, Wings, and Alliances.

In this first post, I will talk about how we determined what exactly we wanted and why.

Our design process

Our first step in approaching the problem was to read.

An experienced software developer knows that the most maintainable code is code you don’t need to write. So we wanted to reuse the work of others as much as possible. Smart people have worked this problem before, and it would be irresponsible not to learn from them.

We focused our research on a set of books on the topic, and in particular, looked very closely at Spotify’s approach. I’ve included our reading list below under Agile Organizational Design Reading Program.

Our design principles

Approaching any complex, real-world problem is an exercise in choosing tradeoffs. Plans can (should!) change as the world changes.

We first focused on creating our design goals, or principles, so we had a framework to guide our decisions now and in the future. With these, we can respond to growth, changing markets, and lessons learned—while keeping sight of our true north.

Our principles:

1. Build delivery teams

“What are some of the advantages of organizing much of a company in a mission-oriented form? There is only one. It is that the individual units can stay in touch with the needs of their business or product areas and initiate changes rapidly when those needs change. That is it. All other considerations favor the functional-type of organization. But the business of any business is to respond to the demands and needs of its environment, and the need to be responsive is so important that it always leads to much of any organization being grouped in mission-oriented units.”

Andy Grove, High Output Management

Principle 1: Build Delivery Teams

Orient our daily work outwards toward the value we deliver to our customers, not inwards toward specific activities or functions.
Individuals work on autonomous, self-sufficient teams that include all functions to design, develop, and deliver software from idea to launch. Avoid cross-dependencies and handoffs as much as possible.

These twin ideas—outward orientation and autonomy—are foundational and complementary. They both point towards forming teams around what they produce. Teams should face outwards to connect them directly to the end product and customers. And they should be as autonomous as possible so they can serve those customers without waiting on approvals or cross-dependencies (This is also just a more fun way to work!).

A small team of engineers can pull off amazing feats. Communication is simple and the mission is clear and things just click and code flows like water.

But success leads to growth, and growth erodes the effectiveness of this easy informal communication. This isn’t because of a conspiracy of pointy-haired bosses, but because communication in large groups is exponentially more complicated. It requires meetings, emails, and formal tickets. Alignment is harder. Things slow down. An elephant has a slower metabolism than a mouse.

So, we decided to reproduce the advantages of a small company as much as possible through forming small cross-functional groups of people working side-by-side on well-defined efforts.

These groups are known as a delivery teams, which is not a concept we invented. For an in-depth discussion, see Scaling Teams, Chapter 7: Scaling the Organization: Delivery Teams.

2. Optimize communication

“The purpose of organization is to reduce the amount of communication and coordination necessary.”

Fredrick Brooks, The Mythical Man Month

Principle 2: Optimize Communication

Avoid knowledge silos. Create structures and routines to ensure alignment and sharing.
Optimize communication by getting the right people together, face-to-face.
Encourage direct, unscripted collaboration between individuals, outside of regular meetings, and without regard to team structure or who reports to whom. Empower everyone to talk to whom they need to to get their job done
Reward candid, respectful conversations about difficult issues. Attack problems, not people.

One thing I’ve come to believe that truly differentiates our approach from that of many engineering teams is that we value communication skills. We expect people on our team to express themselves well in writing and in person, and to have difficult conversations in a way that empowers.

Communication gets harder as a team grows. The number of possible 1-1 communication channels increases proportionally to the square of the number of people. People know each other less, so it becomes easier to miscommunicate; at the very same time, effective communication becomes more critical. You need new layers—middle managers—that introduce even more possible communication channels and complexity.

Embracing the first principle, delivery teams, helps minimize the need for formal communication (meetings, tickets, emails, etc.) by enabling people doing the work to talk about a thing informally, in person, when needed, whether they are a developer or a designer or a tester. Everyone on the team is empowered to speak to whoever they need to—including the CEO—to get things done.

This principle is a reminder that communication is a massive challenge in a growing team, and that anything that we can do to optimize communication is a huge win.

3. Respond to change

“Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage.”

The Principles behind the Agile Manifesto

Principle 3: Respond to Change

Respond, with a minimum of friction, to changes in our market, strategy, and priorities.
Our organization should be fluid and adaptable—e.g., delivery teams are spun up and down as needed—while keeping an individual’s day-to-day experience consistent and understandable.

One of FlightAware’s competitive advantages—especially in the aviation industry—is our ability to respond quickly to new opportunities. We don’t want to lose that as we grow. So we need the ability to confidently change direction and adapt.

All that said… while our organization should be fluid and adaptable—delivery teams spun up and down as needed—we aim to keep the day-to-day experience of individuals on the team consistent and understandable. Put plainly: we don’t want people to need to change managers to work on new initiatives.

4. Create a continuous learning culture

“Experience is the best teacher, but you have to do the homework.”

David H. Maister

Principle 4: Create a Continuous Learning Culture

Create an environment that encourages and rewards learning as a team and as self-directed individuals.
Build in opportunities for individuals to work on a variety of different projects and efforts throughout their careers.
Engage in continuous improvement. Continually evaluate how the organization can be more effective. Conduct blameless retrospectives and incident postmortems.
Encourage teams and individuals to experiment with new ways of working.

The work we do has two end goals:

Deliver products to our customers.
Deliver knowledge to ourselves.

These are co-equal! Learning, done consistently over time, is an investment that pays off with compound interest. It benefits the individual—it pushes forward skills and knowledge and prepares people to take on greater roles and responsibilities. It benefits the team—over time, we can take on more complex and higher-impact work.

This is not just about book learning; it is about ensuring we have a framework for people to work on a wide variety of projects and gain a breadth of experience to propel them in their careers.

This principle is the principle I love most. My journey into software development was a journey of self-education. We are incredibly fortunate that software is a field that doesn’t rely on formal certifications or degrees, and you can learn everything you need through a habit of self-study and an engaged curiosity.

We want to avoid the lazy temptation to switch on the intellectual autopilot, doing only what is necessary for the day to day, following the immediate course in front of us. We believe in flying the aircraft. Learning is a first-class principle.

5. Deliver business value constantly (continuous delivery)

“If it hurts, do it more often, and bring the pain forward.”

Jez Humble, Continuous Delivery

Principle 5: Deliver Business Value Constantly

Engage in practices and build systems to deliver high-quality software into the hands of users safely, quickly, and sustainably.
Favor small, heavily automated releases over large, manual ones.

We have embraced the goal of continuous delivery (CD), which is the ability to deploy software in a routine, automated, and low-drama way, on demand.

By definition, the more frequently we deploy our software to production, the more value we deliver to our customers. Software that is not in production—not being used—is literally useless. It provides negative value.

It turns out (and research supports this) that deploying software more frequently is both easier (you do it more often, so you get more practice), and safer (the delta between versions is smaller, so it’s easier to figure out what went wrong when something does break). And it’s just a more fun and satisfying way to work.

Achieving the goal of CD requires adopting a host of technical practices, including continuous integration, build pipelines, configuration management, and continuous testing. Those practices are essential, but they are not sufficient. The way we work must support those practices. And so continuous delivery is a first-class principle of our team.

Conclusion

Deciding on design principles is one thing. Building something out of them is another.

In the next part of this series, I will discuss how we made these principles concrete and created the fundamental building blocks of our team: crews, wings, and alliances.

Appendix: The Agile Organizational Design Reading Program

Agile organizational design is a complex and wide-ranging topic, and it’s far from a solved problem. Some of the resources we found helpful when creating this:

The principles behind the Agile Manifesto
High Output Management by Andy Grove. It’s hard to believe that he wrote this book in 1983! This book is a classic without a wasted word.
- Chapter 7. The Breakfast Factory Goes National
- Chapter 8. Hybrid Organizations
- Chapter 9. Dual Reporting
Scaling Teams: Strategies for Building Successful Teams and Organizations by Alexander Grosse and David Loftesness. This is probably the closest thing to a canonical read on the topic.
- Chapter 6. Scaling the Organization: Design Principles
- Chapter 7. Scaling the Organization: Delivery Teams
- Chapter 8. Scaling the Organization: Reporting Structure
Agile IT Organization Design by Alexander Grosse and David Loftesness. This is probably the closest thing to a canonical read on the topic.
- Chapter 3. Key Themes
- Chapter 4. Superstructure
- Chapter 5. Team Design
- Chapter 6. Accountability

Resources specific to Spotify’s journey:

Scaling Agile at Spotify by Henrik Kniberg and Anders Ivarsson
Spotify Engineering Culture part 1 Agile Enterprise Transition with Scrum and Kanban 1 and Spotify Engineering Culture part 2 by Andreas Tjernsli
Thoughts on emulating Spotify’s matrix organization in other companies by Kevin Goldsmith

Extra credit reading:

Crews, Wings, and Alliances. Part 1: The Principles of How We Work was originally published in Angle of Attack.

Smart Tile Grids: Optimizing FlightAware's Maps for the Display of Non-Uniform Geographical Data

Fri, 29 May 2020 13:10:00 -0500

‍Philip Clifton is a Senior Software Engineer 2 and Team Lead, responsible for guiding the overall implementation of Web technologies at FlightAware.

For many users of FlightAware, one of the most visible and visual experiences on the website are the maps, depicting flight data in various forms—individual flights, traffic to/from airports, and airline fleets. Perhaps the most visible application, however, is the FlightAware live map—a product that attempts to depict all en route aircraft around the world. Making this map function presents significant challenges—there can be upwards of 10,000 aircraft en route at any given time, and that’s a lot of data to fetch, process, transport to the client browser, and render. Each of these steps needs to work promptly and efficiently—after all, even the most compelling map depiction will lose some of its impact if users must wait 10+ seconds for the depiction to actually load!

Web maps and tiling

Before we get into the meat of these challenges, we’ll briefly discuss some mapping concepts. Web-based map products are ubiquitous these days thanks to tools like Google Maps, and with their popularity has come plenty of standardization in how maps work. Particularly pertinent to this discussion is how maps use tiles to provide a solid user experience.

When we talk about “tiles,” what we’re really talking about is breaking up data or imagery for a map into small, bite-sized chunks. Let’s use FlightAware’s classic blue base layer as an example. While we certainly have the ability to render all the content as a single large image, in practice we break this up into individual smaller images. By doing this, when a user is looking at a small subset of the globe, we can load only the tiles needed for what’s visible in the map viewport. This, in turn, saves unnecessary work and data transport—that is, loading imagery that the user cannot view.

For most cases where we use tiles on FlightAware maps, we use a three-dimensional coordinate system, which identifies tiles based on their geographical location (X/Y coordinates, corresponding to longitude and latitude) and their zoom level (a Z “coordinate”). The third zoom dimension is important, as it allows us to tailor what’s depicted on a tile to the current zoom level. When a user zooms in closely on an area, we want to show lots of detail—minor roads, airport runways, etc.—but if that same user zooms out to look at the entire US, then depicting every minor road and runway at that level would seriously clutter the map!

There is, however, a pertinent consequence of this three-dimensional system: the number of possible XYZ combinations grows at an astonishing rate. Each time we increase the zoom level by a whole number, we double the scale of the map. That is, if a distance of one mile took up ten pixels on screen, at the next zoom level it would take up twenty. The consequence of this is that given any viewport size, increasing the zoom by one means we decrease the geographical area depicted in that viewport by a factor of four.

Meanwhile, every tile image is always a constant pixel size, so by extension, a single tile also covers only ¼ the geographical area of the next-lower zoom level. As a result, each time we increase the zoom level by one whole number, we require four times as many tiles to represent the entire world. Extrapolating this over a typical range of zoom levels, say, 1-20, there are potentially billions of possible tiles.

More info on this XYZ tiling strategy is available at the OpenStreetMap Wiki.

Types of tile data

Now that we’ve talked about tiling strategies, let’s move on to discussing what sorts of data we can provide in these tiles. Much like image files, tiles take two general forms: raster and vector. The delineation between the two can be thought of in terms of the point at which raw geographical data is transformed into something that’s (hopefully) visually pleasing to the end user.

Raster tiles are rendered server-side into common image formats: JPG, PNG, etc. The tiles are shipped down to the browser like any other image, where the in-browser mapping software places it in its proper place alongside other tiles.

Vector tiles, however, are passed to the browser as simple raw data in a format such as GeoJSON. The mapping software is then responsible for creating a visualization of this data, placing the features described by the data in the proper place on the map, and using styling rules to control the appearance of the features.

Examples of both of these types of tiles can be seen when examining the FlightAware live map. The classic blue base layer tiles are raster images, while the aircraft are depicted as vector features. Essentially, the client receives a very long list of points (the location of each aircraft) along with other metadata that’s used to decide how to depict the aircraft. The aircraft type (e.g. Boeing 737) determines what icon is used, the aircraft’s current heading determines where it’s pointed, and other flight data such as altitude and groundspeed are used to display an informational block adjacent to an aircraft icon on a map.

Putting it into practice: vector data for the live map (the “old” way)

When the live map was first conceptualized, the approach taken to fetching data was straightforward: client-side maps were configured with a vector layer (that is, a layer populated with vector features), which utilized the XYZ tiling scheme outlined previously. Retrieval of the data was handled by an AJAX endpoint, which accepted the XYZ coordinates as parameters. Logic in this endpoint was responsible for transforming these coordinates into a latitude/longitude bounding box, representing the geographical bounds of the tile, which was in turn used to retrieve all current aircraft positions within that box.

A major advantage of this approach is that it’s straightforward to implement. OpenLayers, the client-side map library we use, provides very robust built-in classes for making such requests; as such, client-side code was simple (essentially reduced to the level of configuration), and server-side code was similarly straightforward. Put simply, this approach was easy to put into practice—and it worked.

However, there are some definite drawbacks to this approach. As we’ve already outlined, the nature of XYZ tile coordinate systems is that there is an entirely unique set of tiles for each zoom level, and this zoom-level specificity can be useful for providing different levels of data detail at different zoom levels. In this case, we weren’t leveraging that specificity potential at all—every tile request simply amounted to “return all the aircraft in this rectangular region.”

As a result, we were quite literally doing redundant work. Imagine if a user allowed all aircraft to load at, say, zoom level 3, and then zoomed the map in. Since this means the new map viewport is a smaller geographical area than before, and since we already loaded all known aircraft in that larger geographical area, the client already has all the data required to display after zooming—but the tiling implementation dictated that all aircraft on the map be reloaded from a blank slate!

Still, it worked, and provided a great showcase product for FlightAware for many years—until increases in position data began to turn shortcomings into actual problems.

Performance issues surface

For the final piece of the performance puzzle, let’s briefly discuss how FlightAware stores and retrieves positions and tracks for flights. While we use PostgreSQL as our primary relational database engine, the sheer volume of positional data we receive makes storing this data in Postgres untenable. Instead, we use a separate storage system for en route flights, allowing for much better update and retrieval performance. The first iteration of this system was birdseye—an in-memory position database built on top of Speedtables. Birdseye served us well for many years until 2018 when it was replaced with popeye, which used SQLite as its underpinnings rather than Speedtables.

Both of these share a similar performance concern, though this concern was much more prevalent with birdseye. Specifically, for the bounding-box search use case, the amount of time needed to retrieve a set of positions is roughly proportional to the number of matching positions found in the result set. For example, a bounding box containing 1,000 aircraft positions would take approximately ten times as long to fetch as a box containing 100 positions.

One might be tempted to think that this problem would only apply to boxes (aka tiles) of vastly different sizes, but that’s not the case. After all, en route aircraft aren’t evenly distributed around the globe by any stretch of the imagination. To illustrate this, let’s look at an exemplar live map, overlaid with the applicable XYZ tile grid:

Now, imagine loading the North and South Atlantic Ocean tiles (2,1,-2 and 2,1,-3 respectively) in a map client. The South Atlantic tile loads in less than a second, while North American tiles could take several seconds. The result, to the end user, looks somewhere between not great and outright broken. It’s certainly a bad look for a flagship visualization of our flight data.

A traditional way to reduce the impact of this discrepancy would be via strategic caching, but here our choice of tile grid becomes problematic again: as we discussed, there are potentially billions of distinct tiles, which means the chances that any particular tile is cached at any particular time is slim. There’s far too much variance in possible tiles for caching to be effective.

Moving forward

We’ve now seen the state of the live map as of a few years ago: slow, patchy loads on the client, redundant calls to data endpoints putting unnecessary extra load on our services, and mostly-ineffective caching. Clearly, this was a situation that called for a new solution, and the direction in which we decided to go involved ditching XYZ in favor of a bespoke tile grid system.

The issues with the previous system led to a short set of goals:

Keep the number of aircraft in each tile relatively uniform, thus allowing all tiles to have a relatively uniform load time.
Limit the number of possible tiles to allow for much-improved cache sharing, while also keeping the number of aircraft per tile low enough to keep a cache-miss load acceptably short.

The solution that we decided to explore was to generate a tile grid based on analysis of flight position data. The idea was to gather all known in-flight positions at a point in time, and then divide the entire data set into tiles, such that every tile contained a nearly identical and manageable number of aircraft. Doing this analysis was going to be to expensive to do live, which meant it had to be an offline job. Another factor to consider was that the geographical distribution of aircraft would vary significantly by time of day; while one half of the world is wide awake, the other is asleep. Thus, a tile grid optimized for, say, the middle of the day in the US would be very poorly optimized twelve hours later, when the density of aircraft would favor the opposite side of the globe. As a result, this tile grid would need to be periodically regenerated throughout the day.

Making it work

After thinking through a couple of ideas on how to perform the necessary analysis, we settled on a fairly rudimentary, iterative approach, essentially a trial-and-error process. The basic idea was to take a large rectangular area, which contained a series of points, and split it up until the desired point density was reached. The non-uniform distribution of points meant that these smaller rectangles would necessarily vary widely in size.

The first iteration of this idea took the word “split” rather literally. The iterative process went as follows:

Given an arbitrary rectangle, try splitting it at its midpoint. The decision of whether to split a rectangle vertically or horizontally was based simply on what dimension was wider: height or width.
After this initial split, count the number of points on each side of the split point.
Move the split point in the direction of the larger point population, in an amount proportional to the count imbalance.
Repeat this process until an “equal” split is achieved
In practice, the definition of “equal” is a bit fuzzy, since perfection may not actually be achievable—for example, the original rectangle might contain an odd number of points.
Recursively repeat this splitting process until the desired number of points per tile is reached.

Visually, this process for a single split action looks like this:

While the basic methodology of performing a rectangle split works as outlined above, there’s a major shortcoming with this process—specifically, since all split operations involve splitting one rectangle into two, we’re limited to final tile counts that are powers of two: one rectangle becomes two, which become four, then eight, sixteen, thirty-two, and so on.

Recalling that one of our goals was to carefully optimize both the total tile count and the number of aircraft per tile, this result was problematic. Using this approach, we only have very coarse control over the final result. Suppose that we have a set of 4,000 points and we want to target 400 points per tile—using this approach, the recursion looks like:

2 tiles = 2000 aircraft per tile; too dense
4 tiles = 1000 aircraft per tile; still too dense
8 tiles = 500 aircraft per tile; getting close but still too dense
16 tiles = 250 aircraft per tile; now we’ve significantly overshot our density target

Clearly, if we wanted this analysis to achieve our goals, we needed to refine it a bit.

Making it work better

The most obvious approach to this was finding a way to be much more flexible in performing our tile splits. Previously, the only split we knew how to do was an even split: take a population of points and split it into two equal smaller populations.

The key to making this change was a simple yet powerful mental reframing: Instead of thinking of our previous approach as “split this population in half,” we could rephrase the operation as “split this population into two populations, where the ratio between the two new populations is 1:1.” By doing this, and by massaging the split logic to match, we opened the door to being able to split populations to arbitrary ratios. This, in turn, allows for splitting a rectangle into an equally arbitrary number of smaller rectangles; for example, to split a single rectangle into three smaller ones, we’d split the original rectangle and target a 1:2 ratio; we’d then split the larger one again at a 1:1 ratio, resulting in three rectangles with “equal” populations.

Still, we were only halfway to a working algorithm, because we had to figure out how to use this new arbitrary-ratio ability. Previously, the end state of our recursive splitting was easy to define: each time we split our set of rectangles, we’d check the tile density and see if it was low enough; if not, split all the rectangles again (doubling the tile count) and check again. Once the tile density fell below our threshold, we could stop splitting.

That sort of stepwise recursion doesn’t work for arbitrary splits. Returning to our example of splitting a rectangle into three smaller ones, we have to know before splitting how many rectangles we should end up with. This leads us to the second fundamental reframing necessary to make arbitrary splits work: instead of using tile density as our starting metric and splitting recursively until that goal is reached, we must decide before starting how many tiles we need to end up with, and recursively split until we reach that number. Figuring out this tile count is, of course, straightforward: simply divide the total number of aircraft by the desired tile density, rounding up to the next integer.

After determining our desired tile count, we call a function (split_rect_to_count) whose sole purpose is to split a rectangle into an arbitrary number of smaller rectangles (example: “split the entire world into 33 rectangles”); from there, the logic proceeds as follows:

Determine the ratio for the first split operation by dividing the requested count by two, then using the floor and ceiling of the resultant number to determine the split ratio. (eg 33/2 = 16.5 => 16/17 ratio)
Call a second function (split_rect) whose sole job is to split a population into two rectangles using the requested ratio. This function returns a list of two rectangles and their point populations, in the same order as the requested ratio. (e.g., 16/17 split returns the smaller rectangle first, followed by the larger)
We now know that the first rectangle needs to be split into 16 smaller rectangles, and the second into 17. Armed with this information, we can now recursively call split_rect_to_count for each of these two rectangles.
From the top level, this recursion results in two lists of rectangles, one 16 long and the other 17 long, which we concatenate to yield the final list of rectangles.

Visually, this recursion (using a smaller target count of five) looks somewhat like this (though it’s of course not quite this stepwise in practice:

That’s the entirety of the procedure: with this refined approach, we can come as close as humanly possible to nailing our desired tile density.

Putting it into practice

Now that we’ve talked a lot of theory let’s take a look at what we actually get from this tile grid generation approach in practice.

To begin with, let’s look at an example of an actual production tile grid. This particular grid is targeting a density of 400 aircraft per tile:

The first thing to notice here is that we now have exactly 26 tiles to cover the entire world. Recall that the XYZ grid choice results in billions of discrete tiles—we’ve reduced that number by nine orders of magnitude. This is huge for cache performance!

The second, more abstract (and, to me at least, beautiful) thing to notice is that we can use the geographical size of a tile as an indicator of aircraft density. After all, each tile will contain the same number of aircraft, so aircraft in smaller tiles must necessarily be more densely packed. And we see that this leads to reasonable conclusions with this image—we have high densities over the continental US, Europe, and to a lesser extent Asia, and absurdly low densities in the southern hemisphere.

This leads to a second example: an exaggerated tile grid. For this grid, the target density was 100 aircraft per tile, which of course means we have four times as many tiles. It also means that we get a finer-grained look at aircraft density distribution across the globe:

We can also use these exaggerated tile grids to get a look at how the distribution of airborne aircraft changes over the course of a day. To create this animation, a tile grid was generated once per hour for 24 hours, and the resulting set of grids was animated on a map, along with a representation of the day/night boundary. This allows us to see clearly how aircraft activity follows the sun around the world. At night, a region goes quiet, then becomes active again as morning gives way to daytime.

GIF

‍

Finally, we can see one more sobering illustration. All of the preceding tile grid examples were generated in Fall of 2019 as part of a conference presentation. The following is a more recent actual production tile grid from May 2020:

While the previous grid required 26 tiles to hit the 400-aircraft density target, we can now achieve the same with only 13 tiles—the result of the stark decrease in air travel resulting from the COVID-19 global pandemic.

What’s next?

Maps are obviously here to stay, since they allow for intuitive and engaging displays of FlightAware’s data. This means it’s on us to continually improve these experiences. It may very well be that tiling strategies for data like this are not the future—for example, the live map is the only map product we offer that uses tiling. All other maps use a loose pub/sub model, by which the client map asks for targeted data that’s relevant to it—for example, all flights to/from KIAH, or a single flight. Future improvements might involve going to a similar pub/sub model for the live map, or for other, more feature-rich products that might even supplant the live map.

To make a long story short—the sky’s the limit!

Smart Tile Grids: Optimizing FlightAware's Maps for the Display of Non-Uniform Geographical Data was originally published in Angle of Attack.

Flying through the Clouds: FlightAware’s Journey into Machine Learning at Scale

Thu, 23 Apr 2020 15:41:00 -0500

As a Senior Site Reliability Engineer (SRE), Diorge Tavares has been working with FlightAware’s Predictive Technologies team, using machine learning to bring flight arrival predictions to production in FlightAware Foresight.

FlightAware has been tracking flights since 2005. As you can imagine, we have a ton of flight tracking data; it’s in the petabyte range compressed! With a little machine learning, we could do some very interesting things with this data and help airlines optimize operations by providing better estimated times of arrivals (ETAs) for flights. This means that travelers can get to their destinations quicker—for example, an airline might have capacity to run two more flights in a day because they weren’t held up due to a poorly-estimated flight arrival; that’s a win for everyone.‍

The Predictive Technologies Team

By mid-2019, the Predict Team was running full force and brought our first client onboard: Frankfurt Airport. This marked a major milestone for something that had its roots in a hackathon idea. Today, major airlines are using our services at hundreds of airports. We started with about 200 models and have scaled our extract, transform, and load (ETL) process and live streaming machine learning and batch workflows to more than 1100 training models.

Cross-team SRE

The goal was to have the Predict Team deliver a product that was reliable, easy to deploy, and easy to support. To make this happen while also accelerating our pace of development, we decided to embed a site reliability engineer (SRE) on the team alongside developers. I volunteered.

I was looking forward to it and thought it would be fun, but I’ll admit it took a bit to wrap my head around this new machine learning world. I stared in confusion at everyone throwing unfamiliar jargon around with ease. I was still a part of the core SRE team, but my focus was on the Predict Team. I knew I was getting the hang of this new world when my core SRE team colleagues would reflect that same stare of confusion when I gave my morning standup updates.

By joining the Predict Team, I was able to help in several ways. For example, it takes time to funnel new work through the workload queue. Skipping this queue and giving the Predict Team instant access to getting operations tasks done allowed us to speed up the entire development process.

Since I would be involved from the beginning of an infrastructure buildout, I could provide reliability and redundancy expertise to make it a stable and reliable setup from the beginning. Our in-house live prediction streaming environment originally spanned two datacenters with six servers at each datacenter. That was later doubled to 12 at each datacenter, and ultimately processed more than 1.6 million evaluations per live streaming cluster per second.

One of the initial steps we took was to setup Jenkins to automate CI/CD pipelines for building and deploying images into the environment. This allowed for rapid and consistent deployments.

From a process improvement perspective, attending meetings for both the Predict Team and main SRE team allowed me to share processes and information between the two to improve both teams’ processes.

Machine Learning

The machine learning aspect started with a batch workflow stack with Hadoop and Spark systems for ETL and Docker to run inferences. Inferences are drawn when we give a model a new data set and allow it to infer flight arrival times or predictions. For example, we train a model with a year of data from January 2018 to January 2019 and then run an inference for February 2019 to obtain predictions for February and allow the model to work on new data. We needed a good bit of scale to perform these tasks en masse.

We started with about 200 airports with one “arrival time on runway” (or ON) model per airport. As we continued our work, we expanded the number of supported airports, and created a second class of models to estimate “arrival time at gate” (or IN). We ultimately supported 554 airports (1,108 models with two models per airport).

The second part came from our live streaming predictions. From our streaming API product Firehose, customers could access real-time predictions for flights in the air. This used the same models that are part of our batch workflow to follow live flight tracking data and create predictions. This service has a high-uptime SLA and cannot afford to take service impact from upgrades. We set up redundancy within and across datacenters and we could bring down a streamer node in each datacenter and still provide our streaming live prediction service.

To bring Foresight ETAs to production we had rigorous requirements. This service would be relied upon by airlines to determine flight schedules, delays (when needed), and could make a big impact on travelers and the airlines’ bottom line. This needed to be at launch a mature, high availability service. We made the decision to run these on-premises, but we needed to be able to perform maintenance and upgrades without service impacts. As a result, we set up redundancy within each of our two datacenters and across the datacenters. We can bring down a streamer node in each datacenter and still provide our streaming live prediction service.

Just tracking the metrics this created was a challenge. At any point in time, we need to view error rates on the predictions and then retain these so we can refer back to them as needed. Our main monitoring platform, Zabbix, couldn’t handle the massive amount of metrics we would send it. The solution was TimescaleDB (Postgres with a time series database extension) with a Grafana front end to generate the graphs for this. It worked well and could handle all of the needed metrics while at the same time not ballooning in size, which is something Zabbix would have ended up doing.

Did Someone Say Cloud?

It was clear that to create training sets and models in a reasonable time to address model drift, bug fixes, and feature requests by customers, we needed to massively scale. If we made a code change, we could not wait weeks to train our models and see how it worked. Also, it would be hard to meet client’s tight deadlines if it took us weeks to retrain our models and meet sales needs for updated prediction graphs for sales meetings. We had a need for speed.

We started with an in-house Spark cluster of about 12 machines. Just for a baseline, someone would spend a week to generate updated datasets to then turn around and run inferences for our batch process. Generating datasets for an entire year (the time frame we use to train our models) would take weeks, if not longer. I haven’t even touched on the topic training the models, which would take a 32+ core machine and keep at 100% CPU usage on all cores for over 24 hours, depending on the airport size for which we were creating a model.

Instead of using more on-premises hardware, we decided to evaluate some cloud providers (IBM, Azure, Google, and Amazon Web Services) to determine which would be the best fit for our needs factoring cost, support, scaling needs and workflow. I started this evaluation with the idea that I would get the best real-world results by running real-world workloads on them. We learned some surprising things.

The Cloud Evaluation

Our cloud stack included Apache Spark and Kubernetes to train our models with a custom docker image. It was our goal to be vendor agnostic; to accomplish this we opted to write our workflows so they could run in the cloud and in-house. Argoproj (a Kubernetes workflow engine that supports DAG and step-based workflows) was a big help in covering our model training, which had multiple stages we could batch with Argo and run both in-house and in the cloud. Even our managed Spark could still run it in our in-house Spark cluster.

Options 1 and 2: IBM and Azure

We decided against IBM and Azure. IBM’s workflow for Spark didn’t scale the way we wanted it to; it required obtaining a password for accessing each individual cluster. Since we used an individual cluster for each day of a training set for the second phase, if we were creating a dataset for a year that would require pulling in 365 (days in a year) passwords and switching between them if we needed to access the clusters to debug. This was handled better at Google or AWS with a service account that had permissions to the needed resources.

Azure required us to sign a million-dollar contract for our quota needs. Although they eventually provided an option to get around this by using a third party vendor, all the hoops they made us jump through pushed us in the direction of Google and AWS. We questioned what difficulties we might have if we needed more resources in the future.

Option 3: Google

Google gave us our first quota approval after a good bit of smaller requests and account team pushing. They got us up to 100,000 vCPU core quota that we could use. They asked for a lot of capacity planning meetings, but we were actually able to use our quota.

As we started using it, we ran into a few problems with Kubernetes on Google. First, the cluster would go into a non-responsive state. We determined this was because of upgrades that Google performs and for our usage we spin up Kubernetes clusters, run our batch training, and then spin it down. Although we configured for no upgrades to be done while our jobs were running, they were still happening across multiple zones. Google offered a few solutions, but they seemed to be temporary Band-Aids versus fixing the root problem.

Second, Google’s beginner-friendly approach wasn’t a good fit for us. We could get up and running faster, but it ended up being more difficult because our custom model training requirements didn’t always fit in the box of their AutoMagic setup. It would have been nice to be able to see under the hood and configure things as we wanted.

Option 4: Amazon Web Services (AWS)

AWS provided the most quota we had thus far, in the realm of 3600 instances (they were using instance counts at that time for quota or service limits) which equals about 345,600 vCPU cores. But when we tried to use them we ran into “out of capacity errors” in east-2 region. After a few calls we were recommended to switch to east-1 which had more capacity. This solved the problem.

AWS had great configurability and their debug log verbosity was unbeatable – it meant developers could debug problems faster and easier. Also, we found our models trained faster with AWS than Google Cloud Platform (GCP) resulting in actual cost savings, which was something we did not originally expect from AWS. We had the perception at first AWS would be more expensive, not less. This discovery was surprising.

Adding to all this, AWS allowed us to use spot instances (up to 90% discounted VM instances that can be taken back by AWS if needed, depending on demand for them) for an entire Spark EMR cluster. This was something that GCP, at the time, was limited in offering. The faster model training cost savings, in addition to using spot instances for the entire spark EMR and for model training, put us well below our projected cloud spending budget.

It was clear who won the great FlightAware cloud evaluation war of 2019: AWS.

ETL Improvements

Armed with the resources of the Cloud, we decided to tune and optimize our ETL code. In hindsight, we probably should have done that first in order to generate an accurate view of what resources we needed with optimized code. Our in-house ETL that used to take weeks took hours on the optimized code. It was shocking how much of an improvement we received from tuning our code.

Three primary changes made a difference:

Batching our ETL days together. This saved on multiple expensive reads to our s3 storage bucket at AWS. Our first iteration of this resulted in two million reads/writers to the s3 bucket, which AWS did not like. I remember seeing in the logs “Slow Down! You’re going too fast!” when our EMR jobs were failing after this improvement. We scaled it back so we weren’t hitting the s3 bucket over 30,000/sec rate they allow before throttling or denying requests.
Switching from strings to appropriate types based on data. There were parts of the Scala code that would store data in a string and then convert it to an integer multiple times. So we switched from strings to types that were a better fit for the data; sometimes “Double” or “ Long” types. Java Strings have about 40 bytes of overhead over the raw string data.
Caching. We went through our code and identified when to call .cache() on a dataframe so that we avoided doing a lot of expensive repeat work.

These ETL code improvements allowed us to take what would run in hours down to minutes (six minutes per dataset day, to be exact) in the cloud with enough resources. We settled on using a balance between cost effective and still efficient, which allowed us to complete a day or a year dataset within an hour.

Predict by the Numbers

These are some fun facts about our current production service for flight predictions. These numbers are after optimizations and what we ended up using of our cloud quotas.


Total Size on disk for all training sets for one generation of models	5.9 TB
Total number of models being served in production	1,108 (2,416 with redundancy)
How long it takes to train all models in the cloud	6 hours
Number of machines in our in-house spark cluster	55
Number of machines providing live prediction streaming	24
How long it takes to build a new training set for all models in-house vs in the cloud	In-house: 10 hours Cloud: 1 hour
Peak number of cores used in the cloud for Spark	17,472
Peak number of cores used in the cloud for training models	~ 8,000
Average number of tree evaluations per second across one live prediction streaming cluster during the day	> 1.6 million
Number of examples in the largest single training set	90,527,720
Number of examples across all training sets:	8,297,144,525 (took us 24 hours of runtime to compute)

The Future

We’ve come a long way and there is more to come; be sure that whatever we do, we will be soaring in the clouds with our machine learning at scale.

Flying through the Clouds: FlightAware’s Journey into Machine Learning at Scale was originally published in Angle of Attack.

Angle of Attack

Blast From the Past: Driving Reliability at FlightAware

Introduction

But Why?

Houston, We Have an Incident

Firehose Is on Fire

How We Have Changed

Firehose Is on Fire: Again From the Top

Incidents Are Better

Postmortems

Wrapping Up

Flight Blocking

What is flight blocking?

Testing Flight Blocking

Sharing Blocked Flights

Conditional and Partial Blocking

Time-Sensitive Blocking

Flight Idents vs Aircraft Registration

Conclusion

Protecting Applications with OpenID Connect

AuthNxt’s Infrastructure

The Problem – GlobalBeacon Authentication

Implementing OpenID Connect

Challenges

Impact and Next Steps

Closing Thoughts

Monorepo Phase 2: Tooling, Workflow, and Lessons from the Web Wing at FlightAware

The Web Wing Ecosystem

Tooling and Structure

Workflow and Deployment

Challenges and Lessons

Developer Experience

Coordination Across Crews

Impact

What’s Next

Closing Thoughts

An Enlightened Engineer's Perspective on Design

Discovering Design Beyond Aesthetics

Seeing Through Empathy

Building to Think

Assessing (Cyber) Risk in a Sustainable and Agile Way

Fitting In

“Avengers, Assemble!”

Hitting our stride…

List(threats)

SDLC << 1

Future work

Stories from the Cache Crimes Division

Introduction

Background

Case 1: The fluctuating distance

Case 2: What the Pragma?

Conclusion

My FlightAware Internship

Jason Chung

My Project

The Problem

Why Existing Solutions Fell Short

Proposed Solution

Data Collection and Preprocessing

Feature Extraction (with NLP)

Model Training

Model Evaluation and Tuning

Challenges

Deployment and Feedback Loop

Reflection

What’s Next?

Takeaways and Acknowledgments

FlightAware's ADS-B Flight Tracking Network

What is a FlightFeeder?

Are you interested in becoming a host? How does it all work?

Phase 1

Phase 2

Congratulations!

How your FlightFeeder makes a difference

Falsehoods Programmers Believe About Aviation

Flights

Airports

Airlines

Navigation