Adding persistence to gobs-of-machines

by Jonathan Frederickson — Fri 14 February 2025

A couple updates on the gobs-of-machines work from a while back!

The smaller update of the two: I've renamed what I was calling "provisioners" to "providers". This is a bit of an attempt to align my terminology with that of other tools like Terraform. Ultimately when deploying real instances I'll need something to install/configure the hob-server component on the new machine, and it might make sense for that to be called a "provisioner". So I'm calling the part that interacts with cloud providers a "provider". So much of this terminology is overloaded in devops tooling, but so it goes. :)

But the bigger change: the program now supports persistence! In practice, this means that you can shut down and restart the program (primarily the boss), and when it comes back online it will remember the other instances that it's created.

This took a bit of troubleshooting (the errors you get when you forget something currently aren't as clear as they could be) but ultimately I was pleasantly surprised at how few changes I had to do to add persistence support!

The Goblins docs on persistence explain this in more detail than I can here, but the changes mostly boiled down to:

Walking through these:

Defining persistence environments

Each module that defines new objects needs to define a "persistence environment". For hob.scm, that looks like this:

(define hob-env
  (make-persistence-env
   `((((gobs-of-machines hob) ^hob-server-presence) ,^hob-server-presence)
     (((gobs-of-machines hob) ^hob-client-presence) ,^hob-client-presence))
   #:extends common-env))

This gives Goblins some information it needs to snapshot and rehydrate objects: the module that the object is found in, the name of that object's constructor within that module, and the "object rehydrator" - which seems to be a direct reference to the object's constructor.

We extend another persistence env, common-env, because the hob-client uses a ghash to store bindings. So we need to tell Goblins how to persist ghashes too, and (goblins actor-lib common) provides a persistence env for the objects defined there that we can inherit from.

Using define-actor and pulling more into object constructors

Goblins provides a macro called define-actor that handles a lot of the persistence machinery for us. Using it in practice for e.g. the hob-client-presence meant changing from this:

(define (^hob-client-presence bcom)
  (define bindings (spawn ^ghash))

...to this:

(define-actor (^hob-client-presence bcom #:optional (bindings (spawn ^ghash)))

Instead of using an inner definition, I'm defining the bindings hash table as an optional parameter to the object constructor, with its default value set to a newly spawned ghash. To quote the Goblins manual:

In most cases, and usually when using ‘define-actor’, we need to think the behavior and state being determined by the constructor, not relying on internal state.

When the program starts, Goblins will recreate each object by spawning a new object with the same parameters, so moving state like this into the constructor allows it to recreate the previous state of the object.

Using persistent vats

This was the hardest part to work out.

To make use of Goblins's persistence features, you need to spawn your objects inside special vats that are persisted. The storage mechanism is pluggable, but for the time being the only one that actually saves to disk is the syrup-store, so for now I'm using that. Spawning a syrup-store looks like this:

(define dummy-vat-store (make-syrup-store "/tmp/gobs-dummy-state"))

If I only had one vat to deal with, this (along with the persistence environment stuff mentioned in an earlier section) would be all that I would need to spawn a persistent vat. However, I had previously designed the application such that the dummy-machine and hob-client objects lived in a separate vat, to roughly simulate the fact that in reality these would be running on the newly provisioned machine.

Goblins persists one vat at a time, so what do you do when you have objects in two different vats holding references to each other? Well, there is a mechanism for that too - you can make use of something called a persistence-registry:

(define persistence-vat (spawn-vat))
(define persistence-registry
  (with-vat persistence-vat
            (spawn ^persistence-registry)))

And then each time you spawn a new persistent vat, you pass a reference to the persistence registry into the spawn-persistent-vat call:

(define-values (dummy-vat dummy-provider)
  (spawn-persistent-vat
   dummy-env
   (lambda ()
     (spawn ^dummy-provider))
   dummy-vat-store
   #:persistence-registry persistence-registry))

One noteworthy thing about spawn-persistent-vat is that when using it, you spawn the first object in the vat at the same time as the vat itself. This first object acts as the "root" of the persistence graph, and for other objects in this vat to be persisted, this first object needs to hold a reference to them. (This will be important later.)

Also, If you're paying close attention, you may notice that this isn't where the dummy-provider was previously spawned, and there's a reason for that.

Previously, I was creating a new vat for each new machine, and this became a problem, because vats aren't in the list of data types that can appear in a persistence graph! (Even persistent vats, it seems - though I wonder if that could be supported at some point or if there are hurdles to implementing that.)

So I had to refactor a bit. Now, the boss takes an extra providers parameter, which is a ghash whose keys are each the provider name and the values are references to a provider object:

(define-actor (^boss bcom #:key [providers (spawn ^ghash)] [machines (spawn ^ghash)])

Essentially I'm sidestepping the above limitation by spawning the dummy-provider in a new persistent vat first, and then passing it into the boss as an argument (with the dummy-provider modified to spawn each new dummy-machine in the same vat rather than spawning a new one each time).

And then, having done that, we can spawn the persistent vat our boss will live in:

(define syrup-store (make-syrup-store "/tmp/gobs-state"))
(define-values (my-vat my-boss)
  (spawn-persistent-vat
   boss-env
   (lambda ()
     (let* ((ht (spawn ^ghash)))
       ($ ht 'set 'dummy dummy-provider)
       (spawn ^boss #:providers ht)))
   syrup-store
   #:persistence-registry persistence-registry))

Now, the thing that had me scratching my head for a while had to do with a little quirk of that vat independence I mentioned earlier, as well as with the requirement that the root retain a reference to objects in that vat that will be persisted. Previously, when creating a new machine, the hob-server for the new machine would ultimately get a reference to the new hob-client. But the provider wasn't keeping a reference to that newly created hob-client anywhere, and so when rehydrating the object graph it didn't come back.

Fixing this just involved making the dummy-machine object persist its hob-client and making the dummy-provider persist each of the dummy-machines, but it took a little bit of puzzling to figure out.

Persistence in action

Phew! Having made those changes, persistence now works:

scheme@(gobs-of-machines main)> (define-values (vat boss) (start-daemon))
scheme@(gobs-of-machines main)> ,enter-vat vat
Entering vat '2'.  Type ',q' to exit.  Type ',help goblins' for help.
goblins/2@(gobs-of-machines main) [1]> ($ boss 'create-machine "test")
dummy-provider: In a real implementation, this would talk to a cloud provider
goblins/2@(gobs-of-machines main) [1]> dummy-machine: Creating new machine
dummy: Registered new client machine

goblins/2@(gobs-of-machines main) [1]> ($ boss 'list-machines)
$6 = ("test")

<...then after killing and restarting the repl...>

scheme@(gobs-of-machines main)> (define-values (vat boss) (start-daemon))
scheme@(gobs-of-machines main)> ,enter-vat vat
Entering vat '2'.  Type ',q' to exit.  Type ',help goblins' for help.
goblins/2@(gobs-of-machines main) [1]> ($ boss 'list-machines)
$6 = ("test")

Success!

Obviously, there are still a couple tweaks that need to be made before this is really useful as a standalone application. There are some hardcoded /tmp filenames lying around for the two persistent vats; the location for those needs to be configurable. It still needs a command-line entrypoint and a config file, so you can start and configure it from outside a REPL. And it still needs at least one real provider implementation! But it's a start. :)


Comment: