Validation Through Use

Posted On: 2020-06-22

By Mark

Recently, I switched my approach for validating the effectiveness of my writing tools, and have been quite pleased at the results. Interestingly, the change was specifically to move away from validating, and instead begin using them in earnest. As a result, I have found (and resolved) more tools bugs in the past week than I had in the 3 weeks prior to that. In this post, I'll aim to explain what changed, and why I think it's been so successful.

Specialized Tools

Writing interactive fiction is notoriously hard. On top of all the challenges associated with writing static fiction, interactive fiction also needs to codify, track, and respond to changes in the script - changes initiated at the reader/player's discretion. As such, authors of interactive fiction often rely on specialized tools that are designed specifically for managing interaction in a body of written work.

I've written previously about some of the limitations of existing tools , as well as how I am extending those tools to work around those limitations. As a result, I find myself developing, testing, and using custom tools that help me to author the kinds of interactions that I want in my project. These tools enable me to both use Yarn's branching language together with a salience-based system of my own design (henceforth referred to as the "Salience Engine".)

Validating Tools

After creating and testing my writing tools, I began to use them, deliberately trying to author content that would stretch the tools and expose any limitations or weaknesses. As an early version of home-grown tools, there were plenty of usability issues and missing nice-to-have features, so I dutifully documented those and kept pressing forward. As I worked through integrating the output of these tools into the game itself, I encountered several issues - all of which I (eventually) resolved.

Once the system was working and known issues were documented, I was a bit at a loss for where to go next. Improving usability seemed less urgent than (finally) making some progress on the actual project. In spite of that, I found that attempts to create one-off or side conversations using the tools were largely falling flat - failing to effectively communicate character or reward players for choosing to wander off the narrative trail. What's more, many of the narrative elements necessary to make sense of the conversations (such as characters or settings) were missing or placeholders, since the project is still early in content development.

Building With Intent

After discussing the problem with my (ever patient) wife, she recommended making something small and playable, to demonstrate not just the writing, but also the gameplay interactions available. After hashing out some details on what to expect, I came away with a new purpose: to create an (internal only*) demo, focused on all the new stuff since my previous playable demo.

Within a few days of starting work on the demo, I found several massive issues with my existing code. The most glaring of these is that the grammar used by the salience engine was not capable of everything that I had originally envisioned for it. Even though it had appeared bug-free while testing usability, I quickly found its limits once I ran a full scene's worth of interactions through it. Likewise, I found (and resolved) several pre-existing gameplay issues, each one unearthed while testing some unrelated, demo-specific functionality.

This is Good

In other circumstances, finding so many bugs in such a short amount of time might be disheartening. Instead, I've found it invigorating. Progress on the demo-specific content has been extremely quick, blasting through the first two scenes in a couple of days. The only real slowdown has been these bugs - and since they were all pre-existing, every one that I fix is one step closer to having the systems my project depends upon behaving exactly the way I want them to.

Why is this Working?

So far as I can tell, there are two main reasons why this approach is proving to be so useful:

My ad-hoc testing was falling prey to habit
Repeatedly using the same sandbox to test each individual change, I started falling into habits that helped optimize getting to the testing site and testing only that change. While helpful for focused testing, such an approach needs to be supplemented with other kinds of testing, as unforeseen side-effects can easily be missed when one is too focused on what changed. By changing up the circumstances and goals of the tests, I broke many of the optimization habits I had gotten into, helping me to explore and discover many things that had (without my awareness) changed over the past year.
I had forgotten what was not finished
When I was developing the Salience Engine, I found that the parsing complexity far exceeded any parser I had previously built (by hand). As such, I turned to learning and using ANTLR, a tool for generating parsers automatically based upon grammar. Due to my inexperience with it, ANTLR required a significant amount of time to learn, and integrating this (parsing-empowered) Salience Engine into Yarn was a significant undertaking. Somewhere along the way, I limited the scope of my work, omitting mathematical operations and comparisons - thereby making it possible to get through the Yarn integration even before completing the grammar. Unfortunately, whether intentionally or not*, I left the grammar in this incomplete state, and it took real use to reveal what was missing, and just how limited the current implementation was.
* I don't recall the circumstances of this, unfortunately. It's possible that I cut math operations and planned to just work around them, in an effort to get to other, more pressing issues. Alternatively, it might have been a short-term choice, just to get to the point where I could test in Yarn, but considering that testing was a multi-week endeavor, I very well may have forgotten to circle back and finish the grammar.

Both these reasons have larger, project-wide implications. Having gaps in testing means when a change has side effects that cause regressions in the code, I may not detect that issue for an extended period of time. This is problematic, as I have neither the time for routine regression tests, nor adequate tooling support to automate such tests. Yet, it is now clear that one of these two must give: either I make time for it, or I force the issue with the tools.

The latter, forgetting what was not finished, is a project risk that I will need to come to terms with. Going forward, things will inevitably get cut, and it is quite likely that some of those cuts will be forgotten. One could work around that by documenting every cut, but such an approach is likely impractical*. Instead, I stand a much better chance of being able to cope with it by staying loose and keeping room to react. As is demonstrated by my approach with the grammar: if I have the scheduling room to address an issue at the point it comes up, even forgotten/cut features can recover gracefully.

Conclusion

Having switched to using my tools for their intended purpose (rather than simply for testing), I am finding and resolving quite a few bugs. This is very helpful, as it both makes the systems more robust and also advances a project management goal (making a demo of my current progress.) What's more, it has revealed an important project testing gap that, if resolved, can detect bugs sooner and more consistently than the current approach. Finally, by taking the time to resolve the grammar issues that this has uncovered, the Salience Engine will far better match how I expect it to behave, which, in turn, will make future development smoother.