April 15, 2009

Designing a Black Box (Part 2)

This is the second post in a series about data mining games to discover why players are quitting.
Read Part 1 here

Time to make a big list

Ok, we know what a black box is for and why we need one, but which information should it actually mine and process?

This brings us to some actual work. Most people, including many game developers, don't actually know what game designers do. When people say they want to be game designers, they're usually thinking of what creative directors or executive producers do. Actual game design is filled with unglamorous tasks such as generating thorough lists of things: possible problems, necessary assets, edgecases, tuning hooks, etc. (In fact, making lists is such a fundamental skill for designers that "generate a big list of things you'd need to do x" is a common question on screening tests for design applicants.)

Hopefully, you're already collecting as much data as possible about your players and how they are playing the game. Using WoW as an example, here's the data I'd want my black box to have access to:

Character Data

Basic information about character demographics, advancement, and investment.

  • Number of current active characters (over level 20, played more than 10 hours in the last month)
  • Number of total characters
  • Class breakdown of active characters
  • Total /played time per character, active character, and account
  • Percentage of time played by active character and by class per day/week/month
  • Level, server type, spec, stats, cash, honor, reputation, arena points, etc for each character
  • Comparison of all stats to an average player at their level, stats compared to average player at their level, as well as average player of their class at their level
  • Guilded status, as well as guild ranking/stats/size if applicable
  • Time since guilded, if applicable
  • Time since unguilded, if applicable, and whether quit or kicked.
  • Time since each active character has been logged onto
  • Time since any character on the account has been logged onto
  • Average play session length per character and per account
  • Play session lengths per character and per account in the last month
  • Time spent crafting per day/week/month
  • Time spent in PvE instances per day/week/month
  • Time spent in PvP instances per day/week/month
  • Time spent in combat in the world per day/week/month
  • Time spent online but not in combat, crafting, or in instances per day/week/month
  • Character map coordinates at login and logout
  • Combat and chat log data for 10 minutes before each recent logout.
  • All player deaths, the time between them and their cause (which player/mob)
  • Kill-death ratio per hour/level/day/month/lifetime, by player and by mob
  • Cause and position of recent deaths where the player chose to take the death penalty
  • Time at logout since last level
  • Time at logout since last large cash expenditure, and what it was
  • Time at logout since last pvp fight
  • Time at logout since last pvp death
  • Time at logout since last quest accepted
  • Time at logout since last quest completed
  • Most recently failed quests
  • Time from acceptance to completion, for most recently accepted quest
  • Time from acceptance to completion, for most recently completed quest
  • Current number and names of accepted but uncompleted quests
  • Percentage of quests completed for current zone
  • Amount of rested xp at login and logout
  • Ratio of failed attempts versus successful attempts to clear each instance/raid boss
  • Time spent in PvE instances since the last successful boss takedown
  • Number of deaths in PvE instances since the last successful boss takedown
  • Time at logout since last loot drop of rarity uncommon/rare/epic
  • Time spent in PvE instances since the last uncommon/rare/epic item successfully looted.
  • Ratio of items rolled on versus rolls won
  • Percentage of current tier items attained, if applicable

Battleground Data

PvP data for Battlegrounds. We could also include a section about about world PvP, but since it's not emphasized in WoW we won't need any data aside from those covered in the first section.
  • Total time spent in BGs for character lifetime
  • Total time spent in BGs per character level
  • Total honor/PvP gear tokens/enemy kills over time
  • Total rounds played in each Battleground map per day/week/month
  • Number/percentage of abandoned Battlegrounds per BG map
  • Average ranking on each scoreboard (kills/dps/healing/captures/etc) per day/week/month
  • Average difference between each score and the average teammate score/average enemy score, for all players
  • Average difference between each score and the average teammate score/average enemy score, for players of the same class
  • Win/loss ratio per day/week/month
  • Average duration of BGs per day/week/month
  • Average duration of BGs won per day/week/month
  • Average duration of BGs lost per day/week/month
  • Average duration of BGs abandoned per day/week/month
  • Percentage of available PvP gear that player has attained for their bracket

Arena Data

How well is the player doing in ranked PvP, if they participate?
  • Arena points over time
  • Arena Rankings over time for 2s/3s/5s
  • Arena Rankings over time for 2s/3s/5s
  • Average duration of matches won/lost/both per ladder
  • Average life span of character in matches won/lost/both per ladder
  • Average difference in ranking between their teams and opponents per ladder
  • Percentage class compositions of enemy teams for matches won/lost/both per ladder
  • Average matches played per week
  • Matches played per week, over time.
  • Percentage of current tier arena items attained

Misc Data

Data pertaining to the general state of the game or other games that may be drawing players away.
  • Server performance the minute/hour/day of logout
  • Client's framerate the minute/hour/day of logout
  • Client/Server crashes the minute/hour/day of logout
  • GM tickets submitted the day/week/month of logout, and how long they took to be resolved
  • Most recent patch notes at the time of last logout
  • Competiting companies' new game releases or major patches the week/month of last logout
Ok, that covers the tedious stuff. Bear with me. Once you've done this small amount of work up front, you can spent the bulk of your time actually analyzing the data, with a little bit of maintenance here and there as you add new features.
The data that I've listed here is the kind of data I suspect would reveal why a player is quitting the game. Of course if you've left some important data off the list you won't have a very effective black box, so when in doubt, mine more data than you think you need to.

Next, we'll take all the data we've gathered up and start sifting through it for clues.
Continue to Part 3


Ysharros said...

I have been slain by Wall of Lists! :-D

Fascinating stuff! Now I'm going to play my "creative fluff" card and let the waves of detail roll off me. That's what I'd have minions for! (/James Cameron pose.)

Seriously though, that's some bad data Harry. The pre-logout chatlog is a particularly interesting one (to me).

Here's a question though -- how high would the cost of collecting/processing this data be? That would presumably affect a company's willingness to do it when compared to what it would potentially save in terms of subs. Or put another way: once you find out someone is on the verge of quitting, can you remedy that fast enough to prevent it? Or do you use it for "rolling" (or continuous) care in order to make it less likely for people to quit?

Nels Anderson said...

@Ysharros ... how high would the cost of collecting/processing this data be? Collecting this data is cheap compared to practically any other aspect of production, both in terms of time and money. Most of this information is already available, someone just needs to dump into a really simple database.

The challenge isn't collecting the data, it's analyzing it. Imagine how much of this data a single server WoW server would produce in, say, a week. Pulling something meaningful out of that is the hard part.

Ysharros said...

@Nels -- aye, that's what I meant by processing, when I should have said "analysing." My accuracy-switch failed me for a bit -- but I'm creative! I don't do detail! (see above ;) )

Analysing that kind of data flood... I don't even want to think about it, heh.

Mike Darga said...

Yeah, what Nels said. The challenge is parsing and drawing conclusions from the data.

@Nels: nice to see you here. I've been reading your blog lately too.

@Ysh: Data is almost the equivalent to a natural resource for game developers. It's out there moving all the time, like a big waterfall, and if you're smart you can install some turbines to capture it and turn it into power. That's what I'm trying to do right now.

As far as the difficulty of capturing it, it's much easier to spend some time writing a good tool to do it than to try and wade through the data manually every time you need to figure something out. Hopefully once this series is done it will seem much easier to you to look at these data trends adn derive meaning from them.

Tesh said...

Hmm... I like the turbine analogy. I've long thrived on data, and it's nice to see that quantified a bit.

I'd extend the concept to harnessing data from research and study outside of any particular game design, but you already know that I'm a huge proponent of "Renaissance" holistic design. :)

This is definitely a great series. Thanks for writing this up!

Mike Darga said...

I'm right there with you. I love being a polymath and pulling lessons into design from other places.

A good friend of mine recently pointed out that most of my posts here have been very high level, so I've decided to try and follow my own advice and get my hands dirty. Hopefully I can strike upon a good balance of nitty gritty and high level concepts.

Chris F said...

Great posts Mike, thanks for sharing.

I love the potential, but am sad that most normal eyes will never see that data. Of course I don't expect it, but I just love parsing things and seeing the process from collection to conclusion.

Looking forward to part 3!

Mike Darga said...

Thanks Chris!

Yeah, most companies are pretty private about their intenal data. As we've mentioned before, Valve and CCP are generally pretty public about that sort of thing.

I also just saw the other day that Daniel James has posted some quite detailed data for Puzzle Pirates and Whirled:


Tesh said...

Ooooh, Three Rings hard data. Very nice find. I thought Cleaver got tired of posting to his blog. :)

Thanks for the heads up!

Mike Darga said...

It's a lot of very detailed stuff, too. It was enough to make this data nerd a bit crosseyed at points heh.