Converting Percentages back into Integers

Converting Percentages back into Integers

Every so often an opportunity arises, an opportunity to do something interesting that you wouldn't normally do. The most recent of which for me revolved around a new Endorsement system Blizzard (a game publisher) introduced into one of their games, Overwatch. It's a fairly straight forward system, at the end of each game players can endorse up to three people for one of three categories, Sportsmanship, Shot calling, or for being a Good Teammate. These endorsements count towards a level system that slowly progresses from level 1 through 5 and is displayed in various spots on the UI next to the corresponding player.

Blizzard didn't make it so you could see the exact number of endorsements you had, only the level (and each level requires an undisclosed amount), all you could see was a proportional ring around the level. They do have a way to view player profiles outside the game via https://playoverwatch.com/en-gb/career/pc/eu/\<username)> and my first thought was what approach did they take to display the endorsement level on there? The easiest way would be to just stick the raw numbers into the page and have some simple client-side code turn it into a nice proportional icon - and this is what they did at first.

endorsementIcon-1
An example of how endorsements are displayed on the playoverwatch.com profiles.

A couple of weeks later and I think Blizzard caught onto the fact that people were doing this, and they decided to change the code so that instead of showing the individual counts it'd show the percentage each were of the total.

playoverwatchEndorsementHTML

I had created a script that used regular expressions to grab the values from their site and display them in a nice neat table on mine - this change threw a rather large spanner in the works. Well... it was a fun little project that taught me a couple of things, but now that it's percentages, this is the end of it, right? Not exactly, the fact that the percentages (represented in decimal format) are so precise got me thinking, what if you could work it backwards? It was around this point that I thought to myself "This is going to be one of those nights that doesn't involve much sleep, isn't it?".

It's an interesting little puzzle

A lot of maths problems can be worked backwards, the difficulty in this case comes from the fact that there isn't any fancy "Oh, you just times it by the square root of pancake" type trick (that I know of). The percentages are calculated by Blizzard by simply doing:
1/TotalEndorsements*IndividualEndorsements for each of the three endorsement types (which gives you the decimal representation of a percentage, you can just multiply it by 100 to get the "normal" percentage form). Each resulting value on its own is impossible to reverse because you only have the end result - a single value and no idea of what values were used to create it.

There are a couple of nice things about these endorsements. The first is the fact that they are always integers; there's no half of an endorsement or anything like that. Next (and also very important), the quantity isn't that high, maybe people could get a thousand or so if they're really dedicated (as they decay). Finally, we know that these values are the proportions of a whole - they're all related to each other and thus we have a way to know if we've got the right answer. The first two factors mean that there isn't actually all that many possible starting values and this means that we could iterate through possibilities until we find a match - and we know when we've found one because it will fit all three values.

I started writing some code to calculate these possibilities and I needed a name for the file, so I decided to call it "curiosity.php" - a name that I've grown rather fond of. Anyway, the code looked like this:

// Loop through 1 to 500 total endorsements
for ($totalEndorsements = 1; $totalEndorsements < 500; $totalEndorsements++) {
  // Calculate what percentage an individual endorsement would be of the total
  $decimalPercentOf1Endorse = (1/$totalEndorsements);
  // Loop through the individual endorsement count
  for ($indEndorsement = 1; $indEndorsement < 500; $indEndorsement++) {
    // Check if the individual endorsement count is less than the total
    if($indEndorsement < $totalEndorsements) {
      // Echo the result and the values that created it
      echo ($totalEndorsements . " " . $indEndorsement . " " . $decimalPercentOf1Endorse*$indEndorsement . PHP_EOL);
    }
  }
}

Basically, this will loop through each integer from 1 to 500 (in this example) and calculate what 1 divided by it is and then multiply that by every integer that is less than itself.

Here's what I mean:

1/TotalEndorsements*everyIntegerThatIsLessThanTotal
1/1*1 = Skipped as the individual isn't less than the total.
1/2*1 = 0.5
1/2*2 = Another example of one that is skipped
1/3*1 = 0.33333...
1/3*2 = 0.66666...
1/4*1 = 0.25
1/4*2 = 0.5
1/4*3 = 0.75
1/5*1 = 0.2
and so on...

As the numbers get larger it becomes less likely that they will divide "neatly" (more on this later) and thus the results are much more frequently unique.

1/42*1 = 0.023809523809524
1/42*2 = 0.047619047619048
1/42*3 = 0.071428571428571
1/42*4 = 0.095238095238095
1/42*5 = 0.11904761904762
1/42*6 = 0.14285714285714

There's a big reason I want it to check whether the individual endorsement count is less that the total endorsement count, and that is because it's impossible for an individual count to be more than the total. This means that we'd be doing a ton of calculations that are totally pointless. Here's a graph to help you visualise what I mean:

endorsementGraph-1

I really like this as a graph because it reflects reality perfectly - if we were to skip the "if less than total" check then we'd just about double the dataset. It's not something you may consider at first and it's not necessarily a problem (although it's a waste of processing time) - but it's a good example of a place where you should do as little as possible.

Building a database of possibilities

It seems extremely wasteful to run through the same calculations every time someone wants to check an endorsement count, so it seems like a rather nice opportunity to build ourselves a simple database. I modified the little script above to insert into a database rather than echo each result. It progressed at around 500 inserts per second - it'd be a perfect opportunity for me to learn how to batch insert into a database but instead I decided to use this time to grab a cuppa.

endorsementDB

And just like that we have ourselves a database of possibilities, with a precision of 14 decimal places (PHP's default limit; at least on the server I was using - which is not something I ever expected to be a limitation). We lose a digit to rounding and, in the end, I decided to use 12 digits anyway though that was more of a "I'll use this because I know it'll work with both our values and the ones we get from Blizzard" decision than anything.

So, what's next? Well, I changed the regex (regular expressions) to match the new data-value format on the playoverwatch.com profiles and then I needed a way to search the database for this value. Here's the code I ended up with:

if(isset($shotcallerPerc[1])) {
  $sql = $db->prepare("SELECT * FROM `curiosity` WHERE percDec LIKE :percDec ORDER BY `indivEnd`");
  $sql->execute(array(':percDec' => '%'.substr($shotcallerPerc[1],0,12).'%'));
  $shotcallerDB = $sql->fetchAll();
  $shotcallerArr = array_column($shotcallerDB, 'totalEnd');
}

First off, it checks to see if the regex found a match and then it queries the database for all matches... Wait? All matches? Not just "a" match? Yep, that's the main limitation of converting percentages back to their original values, multiple input values can yield the same result. Take a total of 10 and an individual of 5 for instance, it will result in "0.5" (or 50%) but so will say, 20 and 10, you see what I'm getting at? There's a lot of duplicates - we need a way to work out which one is correct.

So, we've got three decimal percentages which add up to 100% (or 99.999...% due to the fact they aren't rounded), each decimal has the potential to be the result of multiple values but what is the chance that all three have multiple starting values that match? Unfortunately, not impossible as we might like... It's actually quite common but it's unlikely so that the majority of situations will yield the correct response.

So, we need a way to compare the potentially multiple rows from the database for a match. I solved this by first grabbing all the totalEnd(orsement) values returned by each of the three DB lookups using this line in the above code:
$shotcallerArr = array_column($shotcallerDB, 'totalEnd');
...This gives me three arrays of values and now I need a way to compare them all and find a value that exists in all three.

if(isset($teammateArr) && isset($shotcallerArr)) {
  $iTS = array_intersect($teammateArr, $shotcallerArr);
  $c = current($iTS);
}
if(isset($teammateArr) && isset($sportsmanshipArr)) {
  $iTSM = array_intersect($teammateArr, $sportsmanshipArr);
  $c1 = current($iTSM);
}
if(isset($shotcallerArr) && isset($sportsmanshipArr)) {
  $iSS = array_intersect($shotcallerArr, $sportsmanshipArr);
  $c2 = current($iSS);
}

This code does exactly that, as you can't compare three arrays in PHP automatically I compare each array against the other two and return the first match for each (which will be the lowest because the SQL query ordered them: "ORDER BY indivEnd")

if(($c = $c1) && ($c1 = $c2)) {
  $matchedEndTotal = $c;
}

I then have a small piece of logic to check that all three match, again, as you can't compare three values directly I broke it down into "if (value1 and value2 match) and (value2 and value3 match) then they must all be the same". This leaves us with a total value that can be used to produce all three decimal percentages, and all we have to do now it grab the individual endorsement count from the database output and present it in whatever format we like.

$scDBKey = array_search($matchedEndTotal, array_column($shotcallerDB, 'totalEnd'));
$tmDBKey = array_search($matchedEndTotal, array_column($teammateDB, 'totalEnd'));
$smDBKey = array_search($matchedEndTotal, array_column($sportsmanshipDB, 'totalEnd'));

This sets the variables smDBKey, tmDBKey and smDBKey to the Array key (database row) that contains the total endorsement count for each of the three endorsement types. This lets us retrieve and present the original, individual endorsement counts by using the following variables:

$shotcallerDB[$scDBKey]['indivEnd']
$teammateDB[$tmDBKey]['indivEnd']
$sportsmanshipDB[$smDBKey]['indivEnd']

Summary

This blog is rather different to anything I've posted before, I can't imagine many people having a use for what I've written but perhaps, someone will find it interesting like I did. The method I describe does have quite a few limitations, it's only really possible/practical because the number of "possibilities" is relatively low as we're sticking to integers, if you were to allow a single decimal place then the possibilities would multiply by 10 - as you might imagine, it gets rather impractical rather quickly.

There's also a reasonably high chance of this method returning a result that is much lower than it should be as you are relying on the combination of the three percentages being unique when in reality, they're far from it. I've thought of a few ways you could improve it, one would be to compare the proposed result to previous ones for the same profile. You could use the previous results as a minimum though that wouldn't be perfect as the endorsement count can decay or be removed entirely.

Another idea I had and what I believe would be the best thing you could do, is creating a "certainty" system, you could use a combination of whether there were multiple matches for each of the three decimals, you could compare it to previous results for that profile as I said just, and you could also use the endorsement level as a guide as they have certain boundaries (which can be roughly figured out using a system like this - though I'm fairly certain there isn't a universal definitive boundary, more of an average per game system).

I've been reading through various posts on Reddit about Blizzard changing the way the UI works on the online profiles and a lot of people don't believe it's possible to obtain the original values from these percentages and a few believe that it's possible but incredibly inaccurate. I managed to find one organisation that has a working solution and they took the approach of trying all the possibilities on each lookup thus avoiding the need for a database. However, their system is noticeably slower and presumably a lot more resource intensive.

Anyway, that's quite enough from me on this incredibly specific topic, I found it to be a great opportunity to learn some more about coding and it was really fun to do something a lot of people viewed as being impossible. Though, I wouldn't be surprised if it gets broken again in a couple of days because Blizzard decided to only show a couple of decimal places.

Short link: on-te.ch/pti

Owen Nelson

Owen Nelson

https://owennelson.co.uk

IT Systems Administrator from Northamptonshire, UK. Always on the lookout for ways to make things faster and more secure - and I enjoy getting through a fair bit of Tea along the way.

View Comments