Origin and Development of Tohil

Karl Lehenbauer is a founder of FlightAware and served as Chief Technology Officer from 2005 to 2021. In his role as CTO Emeritus and Advisor, he provides guidance to Collins Aerospace on shaping the future of digital aviation and building a connected ecosystem for customers.

At FlightAware, we always lean toward being open instead of closed. It helps to keep us open-minded, open to new ideas, new opportunities. When it comes to programming languages, that means we are willing to experiment with, evaluate, and, when appropriate, adopt new languages into our repertoire.

When we spun up our Predictive Technologies team, it was natural to use Python because it is heavily used in machine learning. Python impressed the team for its general-purpose usability, and Zach Conn, then the team leader (now CTO), came to me to say that Python was really good, and suggested I look at it.

I'd been kind of automatically down on Python because of its indentation-based code scoping. (I had seen in Snobol how a space character could change the meaning of a program, and I found it troublesome.) But I set that concern aside and started learning and experimenting with it. With both growing excitement and growing embarrassment, I began to think that it was important, and we probably ought to embrace it.

How we got here

In 2005 when we were first starting FlightAware we knew some stuff that were "must haves" for us: SQL. Scripting. Unix. C. Doing stuff over the web instead of native PC and Mac apps. And the Internet, obviously.

We already had a software stack because of what we had already been working on, and it was a quick and easy decision to use Tcl, Unix, PostgreSQL, and Apache, because we understood it, we were good at it, we had a lot of code we could leverage, and we were actually pretty happy with how it had been going using that stuff.

We had the startup mentality, big time. Move fast and break things. While we were still making code of overall high quality, what we were writing was completely greenfield. Almost anywhere you looked you could see something that needed to be written and bang something out that while perhaps far from perfect, was way better than nothing.

Within months, FlightAware was a sensation. The quality of the website was in appearance just a notch under what I considered the gold standard at the time, apple.com, and in capability was beyond about everything at the time, and just far far beyond what people were used to in aerospace.

Tcl has served us well. It’s pretty strong. And there’s a lot of open source Tcl out there to build on. It’s got a stable user base, but not a large one; we find ourselves writing interfaces to libraries, creating packages, etc., that already have solid open source implementations for other languages. We want to use web frameworks and the like but then have to consider making our own.

As you might guess, over the course of nearly 17 years we wrote and brought to production a lot of Tcl code, most of which is in daily use by our 13+ million registered users and our thousands of commercial customers, partners, and vendors.

Should we try to rewrite the codebase in Python or in some other language? Absolutely not! It would be a certain disaster. Perhaps suicidal. (A quick entertaining read that probably says it better is “Things You Should Never Do” by Joel Spolsky).

Dipping a toe in the water

As an experiment in evaluating and learning Python I wrote analogues of our Tcl watchdog and alarm/normal functions as a ground-up rewrite where they would be modern and correct Python. It was a small library, only a few hundred lines, that I had originally written and still remembered fairly well.

Even given those advantages, it was surprisingly difficult, and I introduced a half dozen nonobvious regressions!

No, rewriting was off the table, for a lot of good reasons. Rewriting is perhaps a once-in-a-career thing.

A little more coming into focus

OK, the stage is set. We want to make Python a first-class language, something we can use for anything, which means it needs to be able to do everything we can do from Tcl. We are aghast at the thought of rewriting. Ergo we are going to look really hard at how we might call Tcl from Python.

I still write software

At the beginning of FlightAware, the three of us wrote software full-time or pretty much full-time. I also took care of the database servers for years, and pulled a lot of shifts as the ops on-call. Eventually as the company grew and the complexity of the roles grew, we spent less and less time writing software ourselves.

But I still liked to, and would. I’ve been writing software for 49 years and I would say I've become pretty decent at it.

I believed, and believe, that as someone directing development, it’s useful to experience what the developers are experiencing. Is this environment amenable to rapidly making high quality software? What are the pain points? How can we make it better?

As CTO, I still kept my hand in. I still wrote software, although again with the growth of the company it evolved more to be an activity I would indulge in on evenings and weekends.

A consensus is reached

We had a consensus among our technical leadership that we wanted to get to Python, and we wanted to be able to use our existing Tcl code base from it, and without a rewrite. And my boss, our CEO, was ever supportive, and when he did express a technical opinion, it was always a contribution, because he's super smart and knows a ton.

The spec

So here's the spec: We wanted a way to call Tcl from Python, and to have those Tcl functions, from Python, look and behave as much as possible as if they had been written natively in Python, such that, ideally, the developer wouldn’t even be aware they were calling a Tcl function.

Trying to get started

And then… it stalled.

After all, we were super busy taking care of customers, fulfilling on our roadmap, increasing scalability, adding redundancy, fixing bugs, etc. When you have more to do than you can do, it’s natural that some things fall to the wayside, and the things that don’t have advocates pulling for them tend to get pushed to the side.

A dawning realization

Chewing on this, it began to dawn on me that I was uniquely qualified to do some of the work. After all, I had made many contributions to the development of Tcl itself, so I was quite familiar with its C language internals, and I also had some experience with the Python C interface because I had studied it when Jim Nasby and I overhauled PostgreSQL’s PL/Tcl Tcl server-side language to make use of more modern Tcl interfaces that greatly improved its performance.

It fit with my nights-and-weekends ethos that I would at least look into it myself.

First see if it's already been done

The first thing anyone in the know has done for the last 30 years when considering writing a piece of software is a search for open source that either already does what you’re looking for or can be a piece of what you’d otherwise have to write yourself, and today that pretty much involves google searches and GitHub.

Python already has Tkinter to get an X-Windows GUI interface using Tcl and Tcl's Tk toolkit, so that at least showed a possible way to do things and was an option. And we did do a little work with it and found it to be a little clunky... useful, but not quite what we were looking for.

libtclpy

Google also revealed a single-developer, dormant GitHub project, "libtclpy", billed as “A Tcl extension to effortlessly to call bidirectionally between Tcl and Python.”

That seemed pretty good.

It was written by Aidan Hobson-Sayers, who is CTO of a UK software company and leader of the Rust language infrastructure team, which was a good sign, and under a permissive BSD-style license that allowed for unrestricted reuse provided its copyright notice and disclaimers were reproduced in any derivative versions.

It had the ability to call Python from Tcl via a “py” command, and to call Tcl from Python via a "tclpy.eval" function. It got the ball rolling, was something useful to study, and possibly start from.

Even though it didn’t meet the aforementioned desired call transparency, one of Tcl’s extraordinary strengths is its introspection. It was easy to imagine using Tcl’s introspection to find all the Tcl functions ("procs", in Tcl nomenclature), their optional and required parameters and default values, and make Python wrappers for them‚ something like that, not something to be dug into immediately, just comfort in the knowledge that that part should be completely doable.

libtclpy was only a few hundred lines of code, but it had this great function, pyObjToTcl, that could convert Python objects to Tcl objects, including not just strings, floats and integers, but also dicts, lists, arrays, booleans, byte arrays, etc.

And it had the mechanics of creating a single C shared library that had the requisite entry points such that it could be loaded as a C language extension both by Tcl and Python.

If a Tcl interpreter loaded the libtclpy package, a Python interpreter would be created and initialized, and vice versa.

This was all great stuff and all stuff that we needed.

So that jumped us ahead a lot.

And nothing energizes the committed like an early success.

Building the foundation

On GitHub I forked libtclpy and started working on it. The build was hard coded for a particular version of Linux. We also need it to work on FreeBSD and the Mac, so get it building instead using the GNU autoconf build system, something I already knew how to do from writing Tcl extensions.

Start performing little cleanups and normalizations. Instead of loading the libtclpy shared library from the Tcl side with Tcl's relatively obscure shared library "load" command, make it load with a standard Tcl “package require”.

There was a way to call Python functions with explicit arguments, but add a Python-like eval and exec from Tcl. There are a bunch of things you can do with Tcl from C besides eval so add Python-side methods to get and set variables, evaluate Tcl expressions, substitute variables and evaluate embedded Tcl commands in strings, etc., i.e. we are at this point speculatively making interfaces to features of Tcl in the expectation that some of them will be useful.

And we’re trying to use it to call our stuff and looking for where the friction is.

Tcl returning only strings to Python isn't ideal

One thing that became apparent, while libpytcl was very good at importing Python objects into Tcl, it punted everything returned from Tcl into being a string. If Tcl returned a list of integers from 1 to 3, it'd sure be handy to get it as a Python list, like [1, 2, 3] not a string like ”1 2 3”. This was going to need to be addressed, else the Python code that invoked Tcl functions would have to be coded to handle the data returned in a special way. In a simple example, if a function returned an integer but it came to Python as a string, the developer would have to wrap the result in int() before using it as an integer. This would violate our transparency ethos.

This deserves a name

Also the name libtclpy kind of rubbed, like could it be more anodyne? (Although it did make pretty clear what it was.) Aidan seemed to have moved on. We were beginning to far exceed what libtclpy started as. And we like to give things fun names, like hummingbird, grackle, hyperfeed, horde, slick, etc.

After some casting about we chose Tohil, a Mayan deity often represented as a feathered serpent, which seemed particularly apt since Tcl’s emblem is a feather and Python’s is a snake.

So we pushed our modified libtclpy to GitHub as Tohil, and continued iteratively experimenting with and extending it.

Addressing the string return issue

An initial attempt to address Tcl calls only returning strings was to add a "to=" optional parameter to the tohil.call, tohil.eval, tohil.expr functions, etc., where the developer could specify the data type they wanted returned. to="int" would return an int, to="float", a float, etc., and so on, including higher-level Python objects such as tuples, lists, sets, and dicts.

I also added a "shadow dictionary", aka tohil.ShadowDict that shadows a Tcl associative array to create an analog to Python dicts, but one where reads and writes came from and went to the Tcl array.

First release

Soon we released 1.0.0 that contained all of our advances to date.

We had a ready and hungry user base that grabbed it and started experimenting with it, which was really valuable to the effort.

A flurry of releases followed, containing bug fixes and new features. People liked it, but Tohil's stringiness of call returns continued to rub.

Getting deeper into Python

A couple of our developers were legit Python experts, and Chris Roberts suggested that instead of saying to="int", to="float", etc, that it work where you could say to=int, to=float, and so forth, i.e. lose the quotes. I was like "How would that work? What would the call actually receive?" and he breezily replied oh you would get a Python object containing the data type itself and I was like OK I can't even think about that right now.

But it stuck in my head and after a little bit I looked into it, and before long I was like wow this is easy, you can check the argument to see if it's a Python data type object, raise an exception if it's not, and then examine the object to see which data type it is. Within a day or two one of the new capabilities in the dev branch was that you could say to=dict, etc.

A Python object that contains a Tcl object?

As I continued to improve Tohil, responding to feedback, implementing Chris and others' suggestions, I inevitably began to understand Python better, particularly its internals. I started to muse about whether it would be possible to create a C language Python object that wrapped the Tcl object, how it might be able to provide normal Python semantics to access it, whether that would be useful, and what it might make possible.

One appealing thing was the object wouldn't have to be translated and copied when sent to Python, and for something large being returned, that could really add up. Both languages use reference counting to manage object lifetimes, and that seems pretty manageable, like if you returned a Tcl object as a Python object you could increment the Tcl object's reference count, decrement it when your Python stuff was done with it. It would play well with the rest of Tcl, and the Tcl object would be freed, as always, when the last user of it decremented its reference count to zero.

So I began to investigate how to create a new Python object using C. Already I could see the value of Python having such a large user community, as the Python documentation itself is excellent, and there were multiple good blog postings, explanations and introductions to creating your own data type. And of course Python being open source meant I could look at the source code to the implementations of Python's own data types.

tclobj

And I would call it, a tclobj. It wasn't a lock that they would be useful, but my sense, and Chris and Andrew Brooks’ sense, was that it was interesting and seemed promising. The level of effort required seemed manageable, so I set out to extend Tohil to support tclobjs.

I got it working. We found it to be immediately useful. While it didn't make Tcl returns transparent, it made them more handy, closer. Tclobjs could be created empty, or from a Tcl call return, access to Tcl variables, etc., and perhaps most importantly, created at will from the contents of many of the most important Python types like int, float, str, list, dict, tuple, etc. This became release 3.

If a Tclobj contained a valid Tcl list, it could be accessed with Python semantics, like [lindex $t 3] in Tcl was t[3] in Python, and lset t 3 "bar" in Tcl was t[3] = "bar" in Python. It even supported slice notation.

The tclobj object had special methods for obtaining the value of an object in Python, (assuming your object is named 't') t.as_bool(), t.as_bytearray(), t.as_dict(), t.as_float(), t.as_int(), t.as_list(), t.as_set(), t.as_str(), t.as_tclobj(), and t.as_tuple(). Also it had special methods to access Tcl dictionaries.

The feedback was really positive, and we continued to experiment with Tohil and try to use it for stuff. 3.1.0 brought a slew of new bug fixes and improvements. Peter da Silva added full bidirectional UTF-8 support.

A new tcldict object brought Python dict semantics to Tcl dicts, and pointed the way to making Tohil more pythonic, that is, making it work in a more standard Python way.

Eager users but not too many

We were at this time kind of at a happy medium where we had users using and experimenting with Tohil and providing feedback, but we didn't have a lot of Tohil code to worry about, so we could pretty freely break compatibility as we found new and better ways of doing things.

Tests

As development proceeded, I started writing a lot of tests. While this was valuable for all the usual reasons, the tests were particularly helpful because of the nature of the code, a lot of pointer action and reference count managing, sharing objects between languages and the like, that could cause a crash to show up later than where the bug actually was, often a tricky situation to debug, looking at the wrong code and stuff like that. Committing small changes when possible, running the tests frequently, and having a lot of them, exercised Tohil a considerable amount, and helped catch these sort of problems, and fingering a small set of changes, rather than later finding some innocuous change had introduced a crash and the introduction of the bug was far away in time and commits from when the bug started showing up.

Extending the Python tclobj type

I started implementing ever more of the methods that a Python type could implement, wherever it made sense. I started looking at Python's "number protocol," a set of methods that a data type can implement if its developers want it to be able to be used as a number, that is, used in calculations, used to receive the result of a calculation, etc. At this point we're a little off the map insofar as Google's not turning up how-to postings or anything. The Python source code was very helpful, although the built-in data types often use functions that are internal to the Python interpreter or even use older, obsolete ways of defining things, etc.

Chris pointed out that tclobjs could be a lot more pythonic, and as I continued to work on the tclobj and tcldict data types, I began to understand Python internals better. t.as_int() went away, replaced simply by int(t). Yes, Python types implemented in C can provide a C function to perform the conversion to int. For example, Tohil would use the Tcl C function Tcl_GetIntFromObj to get the integer, or error. Likewise float(t), set(t), tuple(t), bool(t), etc., replaced their as-underscore equivalents.

Returning tclobjs by default

Tohil was starting to get good. Explicitly working with tclobjs wasn't utterly aligned with achieving the transparency we were calling for, but it sure made Python facile with Tcl objects, and we could tell it "felt right."

Then one day Chris suggested changing Tohil to return tclobjs by default. At first I was concerned it would cause a lot of problems, but I agreed to try it. By this time the test suite had hundreds of tests, and with trepidation I ran "make test", expecting tons of tests to fail, or worse, and to my surprise only two tests failed, and the reason why was immediately obvious: The test code assumed a string had been returned and then tried to invoke a method that string objects had that tclobjs didn't. It was trivial to fix and then all the tests passed.

The next thing was to try it with the code our developers had written. Again except for the same string method issue in a couple places, everything just worked.

Wow. Just wow.

Not only did returning tclobjs by default work great, tclobjs are in many ways more flexible and "do the right thing" when used from Python than native Python ints, floats, and strings! For example, in Python you get a TypeError exception if you try to add a string and a number, but with a tclobj, it works just fine.

5 + '5'
t = tohil.tclobj('5')
t + 5

It now seemed that we were able to call our Tcl functions and get their complex returns like lists, dictionaries, etc., without any funny business, so we could turn our attention to creating the Python interfaces to the Tcl functions.

Getting the transparent call stuff going

As I knew at the beginning, it was no biggie to get Tcl to recursively find all the defined namespaces and find all the native Tcl functions ("procs") and all the C-based functions, which Tcl calls "commands". For procs we could also determine their arguments, and default values for any arguments that had them.

I crafted a "trampoline" function, one that could take the arguments from a Python call and correctly determine if all the arguments were there that were needed, raise an exception if not, and otherwise get all the arguments and default values (when needed) assembled properly for Tcl and make the call and return the results.

I used metaprogramming; that is, for each of the Tcl functions I generated a string comprising a Python function that called the trampoline and ran them through Python to make them available to Python programs.

And it worked. Worked pretty well. You could start to imagine the end being in sight.

Then Chris points out to me, "Y'know, it isn't really necessary to use metaprogramming to do that, and if you can do it without it, you should, in the interest of reducing complexity." Interesting. Do tell. So he teaches me that we can directly create an executable Python object that invokes the trampoline function when called, with no source code having to be made our evaluated, and that worked, and he was right: It is a lot less complicated that way.

Perhaps surprisingly, out of all of Tohil this was probably the easiest part. I think because it was pretty clear what it needed to do, and both Tcl and Python had what we needed to pull it off.

Multiple interpreters

Tcl has supported having multiple interpreters since the very beginning, and Python too has more recently received the capability to have multiple interpreters. We added support for many Tcl interpreters to each have their own Python interpreter, and vice versa. This was tricky to get right. Chris solved a problem I created before we added support for this where I cached a couple handy pointers that should not have been cached under multiple interpreters, requiring him to spend a day in gdb. (Sorry about that, Chris.)

Tohil and Apache Rivet

We use Apache Rivet to script Tcl in our HTML, etc., and used multiple Tcl interpreters, one for each developer for each httpd process, on our dev servers at that time, so multiple interpreter support was vital to making Tohil available to the web devs.

The web devs noted that embedded Python didn't line up well with the rest of the code because of Python's indenting rules requiring code to start from the left margin. Peter contributed the intricate needlepoint to allow the embedded Python code to be indented inline with the rest of the code that invoked it, while using minimal CPU cycles.

Hypothesis testing framework

Also thank goodness I wrote a ton of tests and also adopted the hypothesis framework.

Hypothesis was great because it turned up a lot of problems. It's fine to confirm that 25 / 5 = 5, but when hypothesis generated a lot of variants it turned up that while 25 / 0 causes Python to raise a divide-by-zero exception, if one of the sources of the numbers was a tclobj then instead the program died from a floating point exception.

I didn't think to try zero, but Hypothesis did. I didn't think to try a string of "{", etc.

If you're writing a Python module and want to catch bugs in your tests that you didn’t foresee, check out Hypothesis.

Documentation

If the first month was one of exhilaration and creation, the second and third months were a slog. Bug fixing and writing documentation.

I set a goal that the Tohil documentation should be as good as the Python documentation. And that is a high bar, and kind of overwhelming at first. I was a little disappointed at the gap but then I realized that the Python docs had been written by a lot of people over the course of many years.

But then like so many other things you just keep iterating and iterating and eventually if you actually are improving it then you get something good.

I would say the Tohil docs are not quite as good as the Python docs, but they are close, and in the upper echelon of documentation for open source projects on GitHub.

Success

It works and we're using it. Heavily. Another good sign that you're on the right track, the release cadence has fallen sharply, because the developers aren't getting cut on sharp edges and it's not breaking in production.

The things they wanted to do, they are doing. For instance all the popdown menus on the FlightAware website are being done using a Python framework.

Surprises

Of course if you're going to call Tcl from Python, you're going to eventually need to call Python from Tcl. It was obvious. So that was in there from the beginning.

What I didn't expect was the devs using Python from Tcl to make something easier from Tcl. But there it was. Rather than coding up a Tcl library for a REST API to the FastAPI interface, they used a Python module that already existed.

Conclusions

This was a very important project for us, and it's the sort of project that has a higher likelihood to fail or to become a tarpit than most.

We succeeded.

We took the time to experiment and iterate, to figure out what worked and make what we needed.

Also having Python experts helped immensely, because they, Chris especially, made great suggestions that resulted in Tohil being much more pythonic than where I, with less experience, would have left it.

It was vital to have people trying to use it and providing feedback.

And it was great to be able to outsource some of the tricky parts to others, those with a very particular set of skills.

I drew on my decades of experience, taste, and openness, and accepted help from all corners. Openness to allow myself to be contributed to. Openness to speculatively try stuff, to prototype and test and see if something's possible and to swing for the bleachers.

In the Mythical Man Month, Frederick Brooks said "Conceptual integrity in turn dictates that the design must proceed from one mind, or from a very small number of agreeing resonant minds." Tohil has high conceptual integrity.

He also said, "What one programmer can do in one month, two programmers can do in two months."

Looking this over, one of the most ready criticisms would be that the spec was too thin. But how good of a spec could one or ten people have written without trying to build anything? We didn't understand the need for tclobjs until we had something working and tried to use it. I am reminded of TCP/IP versus OSI, or Unix versus 360/OS.

You can see all the commits, for better or worse, on the GitHub project.

If you have a bunch of Tcl code and you want to call it from Python, or vice versa, you probably want Tohil.

If you have a problem with a set of known inputs and a set of known outputs, you probably don't need a lot of design.

But if you're trying to create something new, and you only have vague outlines of what it should look like, you need the right people and you need to allot the time for experimentation and play. This isn't always easy to do within the structures most of us use for managing the building of software. But when you pull it off, it's magical. Enjoy.