Diagnosing Early Performance Problems

Posted On: 2019-12-23

By Mark

This past week I had an opportunity to identify and resolve a modest performance issue. The issue was representative of a particularly common source of performance degradation, so I thought I should write a bit about it for today's post. If you are unfamiliar with how developers address performance problems, I hope this post explains it adequately for you. If you are a developer yourself, then hopefully this window into my own thoughts and processes for addressing this fairly common task will prove interesting.

Diagnosis

The problem first appeared as a slight hiccup in the game's frame rate. Most of the time the game played smoothly, but every few seconds the game would briefly lower its frame rate before immediately recovering. The net effect was a distracting pause in the game's otherwise smooth camera scrolling and animations - one which occurred frequently enough that it could subvert enjoyment of the game.

The issue occurred pretty consistently, even without following any particular steps, so I tried running the profiler to capture the source of the slowdown. Fortunately, I was able to replicate the issue, and a large spike appeared on the CPU usage graph - one timed exactly with the frame rate change. Looking at the spike in detail, it was clear that this was the source: in order to maintain a smooth 60 frames per second, the CPU (and GPU) must complete all their work in 16ms or less. This particular spike, however, was taking closer to 18ms, causing the system to "drop" the frame. Upon close inspection, it appeared that almost half of that (~8.5ms) was spent running the auto-save mechanism - which, itself, was blocked by a single massive (~6ms) delay: garbage collection.

An Aside About Garbage Collection

For those that aren't familiar, garbage collection (GC) is an automated process that takes care of various book-keeping tasks related to computer memory. In particular, it is responsible for identifying and freeing up memory that is no longer in use. The exact details of how the GC work is up to the implementation, but many modern GC implementations (including the Microsoft-provided one used in most C# applications) are quite sophisticated. Unfortunately, the GC implementation used by Unity is, frankly, rather outdated (apparently over a decade old) and is missing many of the features I have come to rely on. In general, this chiefly means that old habits that served me well as a web developer are actively working against me, with best practices suddenly becoming worse practices*.

Isolation

Returning to the story at hand: once I had diagnosed the issue, I spent a bit more time verifying that no other factors were at play. After replicating and testing the performance issue multiple times, I confirmed that it always coincided with GC running during an auto-save. What's more, I was able to dig a bit more into the auto-save itself, and identified that the database I am using (LiteDB) makes use of many small allocations as it writes to the disk. In a normal C# application that wouldn't be an issue, but, unfortunately, in Unity this meant that that garbage collection would only happen during auto-saves (rather than, say, scheduling collection when the CPU wasn't being heavily used.)

Resolution

I briefly entertained the notion of running the save in a different process (that way it could it would have its own GC) but I decided to table that as I have inadequate experience with coordinating data between processes and therefore attempting to do so would cost significant time for (presently) unclear gains. Instead, I looked for ways to improve the performance of the existing implementation - specifically, ways to reduce the number of save transactions (each caused allocations) as well as the number of objects being saved.

Fortunately, there were easy, clear gains to be made in both categories. The number of transactions was simple to reduce: at the time it was using one transaction per object (since I naively saved each individual object separately.) With only a couple tweaks I was able to bulk-save, so that all objects that shared a type would also share a transaction.

For reducing the number of objects being saved, I dug a bit deeper into my code, and was greeted with a lovely code comment:

//TODO: Optimize this

Indeed, I had cut corners on the save operation, so it was saving everything all the time! Clearly this would perform poorly, but I also knew that optimizing it would require effort and introduce risk (including the risk of actually making performance worse.) As such, I intentionally deferred implementing any optimizations until I had an easy-to-replicate performance problem, which would allow me to focus on correct implementation (as opposed to optimizing it during the first implementation, when I would likely be rushing to get it to "just work.")

For this optimization pass, I chose to use the classic dirty flag pattern, so that only records that changed would be saved. Implementing the pattern wasn't particularly difficult, but it required close attention (missing even a single property can result in nightmarish save bugs.) Fortunately, I was able to make it fit nicely with the (interface-based) approach I use to delegate the save functionality to separate components. Now, whenever new saveable properties are added, they must also identify whether they will set the dirty flag (basically, this makes it so I won't forget the dirty flag, even if I don't save anything new for months.)

Results

While each change individually improved performance, when implemented together the performance difference was massive. Not only was the frame rate perfectly smooth, when I profiled the auto-save, I discovered that it now allocates less memory than many other systems (the lead offender now is an experimental, third-party camera script, which recalculates the viewport whenever the camera moves.) In keeping with performance optimization best practices: I am stopping optimization at this time. While there are certainly areas that could be better, any improvements would be imperceptible, so it is better to wait until I have issues I can replicate before trying to optimize any further.

Conclusion

Hopefully this description of diagnosing and resolving simple performance problems has been interesting. Personally, I find reading about others' approaches to such problems are varied and interesting, so I hope that this post has been the same for you. As always, if you have any thoughts or feedback please let me know.