On Representing Data

Posted On: 2016-12-04

By Mark

Today I'm going to write about why it's important to think not only about data and how users will experience it (for example, a player playing through your carefully designed level) but also how the data will be represented to the people responsible for maintaining it (developers / level designers / etc). I previously wrote about the importance of keeping a project's metadata in a human readable format, but I think something similar applies to data itself. When maintaining data, there are three (maybe four) main needs to keep in mind:

  1. Understanding the data - How easy is it for the maintainer to see the data and understand it? What about complexities and relationships to other data? What about dynamic data?
  2. Making modifications to the data - How easy is it to make a simple change to the data? What about a large number of similar simple changes?
  3. Using the data - How easy is it to get the data into the application? Can the developer of the application easily turn the data into the desired effects? Are there limitations to the data that make it difficult or impossible to represent a desired effect?
  4. Iterating on the data - How easy is it to try new data without committing to it? Once happy with the changes, how easy is it to make those changes permanent?

As I work on my current prototype, it is a bit of a challenge to find the right balance for these factors. In general, many standard formats and approaches (SQL, XML, JSON) have excellent support for using the data, but are often very general or flexible in ways that make understanding the data more tricky. When a standard format doesn't fit a particular problem, I generally prefer custom formats, as they can be designed to maximize understanding and simplify modifications, but this comes at the cost of making the use of the data more difficult. Unfortunately, custom formats are also often less flexible, which can become an issue if the design changes to require something that cannot be accomplished with the current format.

One particular place that is causing me challenges is level generation. I want to use procedural generation to piece together small bits of hand-crafted level (this is a common low cost way of providing a wide variety without risking the player being in an impossible situation), but I am finding that fitting randomness into the hand-crafted pieces is threatening understanding. Specifically, I have created a text file that contains what is essentially an ASCII drawing of the level. This works for very small pieces, but when I want to create anything of moderate size or complexity, I am finding those pieces are very easily recognizable, due to their static layout.

I considered creating symbols to represent ranges (for example [1,6] to represent a platform between 1 and 6 cubes long) but I realized that would make the document far less understandable, as I would need to understand how the layout would get pushed around as more cubes are added to the platform). I also considered adding a JSON file as metadata, to allow specifying things like dynamic/random content to be injected after the static content, but that would require understanding two files at once, rather than one, which is undesirable. In a larger project, it would be probably be worthwhile to use XML or JSON to store the data, and create a custom viewer that renders the dynamic content in a more comprehensible manner, but this prototype is far too small to justify something like that.

In order to stick to the ASCII approach, I am considering instead a symbol to represent either a cube or no cube. This is still an imperfect solution (there is no symbol that can represent two polar opposites and enable the viewer to consider both possibilities at once) but I am hopeful that it will fit my need. If I find this still is inadequate, I will have to consider more alternatives. (I am now thinking of using pixels with transparency instead of ASCII, though I am less familiar with using pixels as a data structure, and it would require completely rewriting the parsing code- probably better to try it in a future project instead. Also, I am not yet aware of any tools to diff images, though it seems reasonable that there might be one.) No matter the outcome, however, I will have learned by doing, which itself is valuable.