Smalltalk and capabilities: a vision of a road not taken for desktop computing (and beyond?)

by Jonathan Frederickson — Sun 27 October 2024

This is my submission for the Malleable Systems Collective's "Fearless extensibility" challenge problem.

I've had a brainworm for the last few years that's been bugging me. Way back in this blog post, I linked a video that demoed an early Smalltalk system. The most mind-blowing thing about that demo for me was that, on those early systems, you could make connections between and thus compose different graphical widgets.

I recommend watching the video to really see it in action, but it looked like this:

Screenshot of an early Smalltalk system showing off its abilty to make connections between widgets. There is a widget showing an animation of a bouncing ball, and an editor widget below it editing a specific frame of that animation.

At this point in the demo, Alan Kay is using a picture editor widget to edit a specific frame of a bouncing ball animation. These aren't applications in the sense that we'd think of them today; they're separate pieces of software that are nonetheless working together.

This really underscored to me that the conventional UI wisdom I was familiar with (that command lines are composable and GUIs are not) wasn't the whole story. It would be more accurate to say that today's GUIs are not composable, but GUIs in general certainly can be - it's just a different point in the design space!

Of course, this Smalltalk system was designed in the 70s for a very different computing landscape than the one we have today. Notably, at the time, you wouldn't have been running much software that wasn't written by either you or the computer vendor. The commercial software industry wasn't really a thing yet! Contrast that with the computers of today, where there's a very good chance that most of the software you're running isn't from Microsoft, or Apple, or whoever made your computer. And similarly, computers back then weren't used for nearly as many sensitive tasks as they are today; you wouldn't have been doing online banking from your Alto. In today's world, you might not want just any software you download to be able to modify anything else on your system in such a deep way.

There's a philosophy aiming to address this that I've been interested in for the last few years called the object-capability security model. If you're unfamiliar with the concept, I recommend reading What Are Capabilities by Chip Morningstar for a solid introduction. Something that comes up repeatedly in capability discussions is that ideally, capability transfer should happen via unambiguous user intent. The above blog post provides an example:

Imagine that when Word starts running it has no access at all to anything that’s access controlled – no files, peripheral devices, networks, nothing. When you double click the file icon or pick from the open file dialog, instead of giving Word a pathname string, the operating system itself opens the file and gives Word a handle to it (that is, it gives Word basically the same thing it would have given Word in response to the Open File API call when doing things the old way). Now Word has access to your document, but that’s all. It can’t send your file to Macedonia, because it doesn’t have access to the network – you didn’t give it that, you just gave it the document. It can’t delete or encrypt any of your other files, because it wasn’t given access to any of them either. It can mess up the one file you told it to edit, but it’s just the one file, and if it did that you’d stop using Word and not suffer any further damage. And notice that the user experience – your experience – is exactly the same as it was before. You didn’t have to answer any “mother may I?” security questions or put up with any of the other annoying stuff that people normally associate with security. In this world, that handle to the open file is an example of what we call a “capability”.

It strikes me that the inter-object Alto demo could be done in a capability-secure way! During the demo, we see a painter object and an animation object, and Alan Kay "draws a line" between the two objects. He inspects each object's properties to see what's available on each, and in the box in between the two objects, he establishes a link between the painter's picture and the animation's currentFrame. This feels compatible with a capability system!

One way that I can think of that this might work: the interface outside of each object is part of a more trusted system UI, which has references to each object it manages. By default, each visible object in this system need not have references to any other object, but the system UI (having references to both) can pass references between them. When you draw a line between objects and pass the currentFrame to the painter's picture (which happens outside the context of any particular object), the system UI would send the painter a reference to a part of the animation object.

By building a system like this, we might gain a few things:

But, a few open questions that I have:

There's a lot I don't know about how Smalltalk is implemented, and I'm sure there are a lot of details that would need to be worked out to implement something like this. But I think it's a path that could work, and it's something I'd like to see. :)


Comment: