A Language for Stage Directions

Posted On: 2023-12-04

By Mark

Recently, I've been working though the process of updating my script so that the stage directions I've written for myself (ie. a cat scampers across the screen) are translated into instructions that the computer can understand (ie. "move character X to Y position") . It's been a rewarding experience seeing the project come alive - but it's also been a genuinely interesting process balancing the technical needs of the system with the variety of human expressiveness. In light of that, I thought I would attempt to capture the latter in blog form - so that you, too, can see what it takes to bring a script to life.

Speaking Machine

In order to write stage directions that the computer can understand, it's valuable to have a shared language that is both clear to the computer and easy for the writer to use. A common tool to accomplish this is a domain-specific language (DSL) - which is basically a programming language designed to only be useful for one specific domain (or even one specific project.) For my project, I'm using two different domain-specific languages: Ink is itself a DSL for the task of writing interactive scripts, but I've also created an (extremely simplistic) DSL that applies to the output of Ink, allowing me to encode additional meaning in the script.

To illustrate by way of example, when I write, I preface every line of dialogue with the name of the speaker:

Shaw: I often quote myself. It adds spice to my conversation.
This approach makes writing dialogue quick and easy, so the DSL that I've created is able to use this pattern to encode important information - such as which character should be shown as the one "speaking" when the line is shown. In effect, every line is split up into two parts:
[information the system should know about the line]: [textual content of the line]
This lets me encode additional information in that space, such as expressions that should accompany the line:
Munroe [sagacious]: This quote is very memorable.

I also use some simple conventions to denote stage directions in my writing, so I've designed my domain-specific language to reflect that. Whenever I intend to add stage directions, I mark that spot with a quick note of what's happening, (ie.[[Horatio exits Stage Right]]) and then once I'm ready to try it in the code, I go back and add the explicit instructions for the computer, (ie.>>> ActorMove(Horatio, StageRight))

Building a Vocabulary

While a shared language makes it possible for a script-writer to communicate with the machine, it's just as important to think about the concepts and framing with which those ideas will be expressed. Computers* tend to be very precise and explicit: they need to know exactly what to do, down to the tiniest detail (ie. the exact speed and angle of an arm movement). In writing, however, it's often preferred to focus on the gesture and meaning (ie. "Juliet waves in greeting") rather than the literal mechanics of the action. Thus, it benefits a writer to only have access to higher-level concepts and ideas in the DSL: the more I abstract movement and gesture, the more maintainable the stage directions become.

Autonomous Agent Movement has been a boon in this regard: I can specify "move to X point", and the agent will figure out the necessary steps to get there. Building on that further, I've found it's beneficial to attach labels to various objects, agents, and locations in a scene - since the script can be more clear by telling an agent "move to the kitchen table" rather than "move to (10,5)". In theory, the movement system supports seeking out generalized targets as well (ie. "move to the nearest table") - though how exactly I'll expose that to the script is not something I've pinned down yet.

Going beyond movement, there are a wide variety of gestures and performative actions that I can make available to the script - and I'm very much in the early stages figuring out which ones I will need and how to organize them. As an example: the script can tell an agent where to face (left, right, or towards a specific target), and, by mixing that together with some timing, a script can express gestures that aren't directly supported by the language:

[[Horatio glances around]]
>>> ActorFace(Horatio, Left)
>>> TimerWait(0.5s)
>>> ActorFace(Horatio, Right)
>>> TimerWait(0.5s)
>>> ActorFace(Horatio, Juliet)
Horatio: Is this the right play?
While this technically works, spending five lines to express a single stage direction feels like the kind of fiddly detail-work that my DSL is supposed to prevent. As such, I expect that I will wrap this up into a single "ActorLookAround" command - and that this won't be the last time that a new gesture emerges from mixing existing capabilities together.


Hopefully this exploration of the design and decisions on how to convey stage directions to a computer has been an interesting one. When writing for a machine, the language is an essential part of the process, but the design of that language must consider the needs of both the machine and the writer who has to work with it. While I've only got a few weeks worth of experience working with this system I've built, I am happy that this approach seems to invite me to ask the right questions as it grows: how can I center this growing corpus of stage directions on what the writing itself is trying to express?