Postmortem analysis of GSWAT v1 and the future of GSWAT

GSWAT is dead.  Long live GSWAT!

Well, I suppose this is about the right time to finally say I’m calling v1 of GSWAT a failure, but, surprisingly, that’s ok.  Once I completed the postmortem analysis, I realized that I came out with a great deal more experience than when I started, greatly shaping the vision of the future GSWAT.  The following is part of the GSWAT v2 planning documentation and was written as a retrospective on GSWAT v1, analyzing what went well and what did not, so that GSWAT v2 may be far more of a success than its predecessor.

Because this postmortem is quite long and involved, I will be splitting it up into two parts, (this) one focused on more project and team management aspects of GSWAT v1 and the other focused more on the technical side. Given that this is my own personal reflection, I will be focusing a bit more on my own learning and reactions than the actual postmortem does.

Before I begin, references to the ‘core’ team refer to the group of four developers who constructed the framework and, by the endpoint of the v1 project, had written well over 90% of the code.  In some places, I will have edited the postmortem slightly from its original content in order to make things clearer to those who are less familiar with the project.

While GSWAT v1 served as an excellent proving ground for many of the ideas present in the GSWAT v2 plan, it ultimately fell short of the goals we had outlined for it.  In this document, we take a brief look back at the components of both the team and product found in GSWAT v1 and determine what can be done better for GSWAT v2.

Team Composition

The goal for GSWAT v1 was always to be an open source project that everyone could contribute to, although it would be guided and ultimately run by a core group of contributors from Pure Battlefield.  However, it quickly became apparent that the majority of contributors to the GSWAT project were underprepared to take on a task the size of GSWAT, especially since most of the volunteer developers did not have sufficient C# experience, contributing to a significant increase in time-to-completion of features.  We onboarded a number of volunteer developers from the project, but none of them ever really stuck around and completed any significant features: the only developers who contributed significant portions of code were the primary core team that started the project.  While there was general interest and excitedness around the project, given the current state of the project, it was very difficult to retain developers due to the high level of technical experience required to design new features with little to no scaffolding.  Moving forward, until large parts of the core features are completed, the project should remain closed to new contributors until at least the minimum viable release point is reached.

Additionally, we faced a serious lack of frontend developers on a project whose primary draw was its feature-rich and robust frontend.  Enzo [our frontend coder]’s expertise in Javascript carried that part of the team until the summer when DICE needed his time for the BF4 launch, and then the team discovered that they lacked the necessary skills to continue development of that part of the product.  It is clear that we must maintain multiple developers in each feature area who are capable of navigating the existing codebase and adding new functionality, especially in an area as broad and wide as the frontend.  Given that the initial set of developers on GSWAT v2 will be rather small, it is critical that all developers be comfortable in any layer of the code such that we do not have critical tribal knowledge and ability housed only in one person. Additionally a good way to combat this is to maintain high quality and in-depth documentation on everything written.

Honestly, out of everything else, I think this absolutely killed us in terms of getting the project to a reasonable place.  It took almost a year to get sending chat messages from GSWAT into the game server in the project, primarily due to contributor churn and lack of experienced developers (outside of the core team).  That’s not to say that the people who volunteered to help code components of GSWAT v1 were bad developers: quite the opposite, actually.  They saw an opportunity to contribute to something that would make both their lives as Battlefield 4 plugin developers easier as well as something that was incredibly ambitious (as detailed in the next section of the postmortem).  Most of them made a very good effort to contribute, but nearly all were blocked by not having an easy way to understand how to contribute.

The biggest oversight on my part as the project lead was not recognizing exactly where their skillsets were and providing the guidance on how to effectively contribute to the project given its nascent state.  That oversight lead me to assign critical-path work to untested new contributors, which lead to stagnation in the project as these developers would hit roadblocks and almost always give up.  Unfortunately, at the time, GSWAT didn’t have any non-critical-path work available because the project was so bare-bones at the time.  As stated in the postmortem itself, the best solution to this particular problem is not to invite in any new core contributors until the project reaches a moderately advanced state, as relying upon untested people to contribute critical functionality is a recipe for stagnation.  When stagnation comes, so does boredom, and when boredom comes, your project team dissolves.

Additionally, our lack of frontend developers significantly hampered our ability to ship new features.  Given that very few of the other core contributors had worked on the frontend, we had almost no experience in the area when Enzo had to drop out to work on Battlefield 4, and all of the rest of the core contributors had a very difficult time getting into the codebase, and for that reason, didn’t.  We should never have been in that situation in the first place — a bus factor of 1 in an area that is literally half the project was unacceptable, but we didn’t realize that until it was too late.  For this reason, when working on GSWAT v2, the core team will be working across all layers of the code to ensure that we don’t wind up in that situation again.  As an example, we had all of the backend code written to send chat messages into the server for quite a long time, but we just never added it into the frontend until someone who was slightly familiar with Enzo’s frontend work came along and did it.  The gap between backend API completion and actually wiring up the functionality was probably about 3 months, which is a rather obscene amount of time for a feature to not ship when it is 90% completed.

The (rather obvious) takeaways: Encourage contribution (pull requests), but don’t assume new and unproven contributors will stick around, especially when the project is new.  Even experienced developers can be excited initially but then realize they lack the time, energy, or motivation to continue working on the project. Bus factor > 1 required for all areas of the code.

Project Scope & Management

GSWAT v1 was always an ambitious project, and everyone involved always acknowledged its ambitiousness.  The primary purpose of the project was always to become a functional replacement for Procon: the team very quickly came to the realization that this would be far more difficult than originally anticipated, especially for a volunteer team.  Our focus, spearheaded by Will, quickly evolved into trying to get new portions of functionality added without first finalizing (to a reasonable extent) the previous features we added, so we were always stuck in a state of never finishing anything.  While we would make forward progress on moving toward the goal of having a functional admin tool, the experience was subpar and sometimes unstable.  Given GSWAT’s nature as a nascent volunteer project, some lack of polish is to be expected, but our lack of focus ultimately damaged the team’s ability to complete new feature areas. 

Negatives aside, GSWAT v1 did prove to be a rather well-liked real-time chat viewing application for administrators to easily have a window into the server’s chat on any device, as well as be able to pull historical logs without needing to go to the Procon layer host or FTP, indicating that at least our priority was correct on implementing the chat system first and launching new features as they became available, rather than waiting for the entire basic admin toolkit to be finished. 

Further compounding the problem was a lack of detailed functional specifications.  This contributed to the focus problems but also heavily contributed to bad design decisions that would ultimately make continuing work on GSWAT v1 difficult.  While we trusted contributors to make good decisions, without a clearly defined north star experience, we wandered off in many directions that made creating a cohesive product nearly impossible.  Going forward, for each feature area we plan to implement, we should have a clearly defined functional specification as well as technical designs that align with the ultimate vision of the project.  

Once again, my inexperience as a project manager shows through.  I think I can say had we decided to focus on getting more fit and finish done on some of the existing features, we would’ve had more places for new contributors to succeed, but I was primarily focused on getting new and exciting things shipped and out the door and adding value for the customer (Pure Battlefield).  Part of that desire to keep shipping valuable features was to keep the development team interested, as we had started stagnating by the time we started rapidly switching focus.  We took the Cult of Done mentality a bit too far when planning out work, and it bit us, particularly when we started needing to scale to multiple servers – more on that in the next post.

I will support our initial decision not to write any functional specs – we were just attempting to see exactly what we needed to do to get communication with the game server working and prove out some components of Windows Azure.  We did that, but the hacky design we had ultimately wound up being used rather than refactored, so we just accepted the current way of doing things as acceptable.  All the developers did what they thought the right thing to do was, and we wound up with a patchwork experience which wound up being very difficult to rectify after the fact.  Once we were done with the proving stage, we should’ve immediately transitioned into writing detailed functional specifications so the contributors wouldn’t get stuck waiting on answers from me about what to do when implementing some feature.

The takeaway: Just because you’re agile doesn’t mean you need to change focus all the time just to get things out the door.  Also, functional specs are good, and in many cases necessary to define the experience.

More reflection to come on the house of cards that became our technical infrastructure in part 2!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s