{"id":1098,"date":"2012-02-05T20:23:17","date_gmt":"2012-02-06T01:23:17","guid":{"rendered":"http:\/\/unitstep.net\/?p=1098"},"modified":"2012-02-05T20:23:17","modified_gmt":"2012-02-06T01:23:17","slug":"shuffle-sort-and-other-fallacies-of-randomization","status":"publish","type":"post","link":"https:\/\/unitstep.net\/blog\/2012\/02\/05\/shuffle-sort-and-other-fallacies-of-randomization\/","title":{"rendered":"Shuffle sort and other fallacies of randomization"},"content":{"rendered":"

Quick, how do you write code that shuffles a collection of objects? In the real world, it’s fairly easy to see how a deck of cards is shuffled – but how would you do that in code? Obviously, you would have to use some sort of random number generator, but beyond that it’s not straightforward. Furthermore, how do you ensure that the shuffling is fair<\/em>; that is, all permutations appear with equal probability?<\/p>\n

The astute among you will know that one way is by implementing the Fisher-Yates shuffle algorithm<\/a>. But, let’s investigate what happens when other, seemingly adequate solutions, are used instead.<\/p>\n

<\/p>\n

Shuffler’s dilemma<\/h2>\n

One example that highlights the implications of using seemingly<\/em> random and unbiased “shuffling” involves the Windows EU Browser Ballot<\/a>.<\/p>\n

The purpose of the Browser Ballot was so that Microsoft could comply with an EU directive<\/a> that ordered them to provide users with a clear choice of which browser to use with Windows. The ballot was supposed to have been carefully designed; the first section contained the five most common browsers (Chrome, Firefox, Internet Explorer, Opera and Safari) with the second section containing seven lesser-used browsers.<\/p>\n

Since it is generally considered an advantage to be placed first in a ballot (because users who don’t care one way or another might just pick the first option), the EU directed Microsoft to randomize the order in which the browsers appeared on the ballot in each of the sections. Thus, when users visited the ballot page, they were supposed to get a random ordering of the top five browsers followed by a random ordered of the lesser seven.<\/p>\n

Looking into the cards<\/h2>\n

The team in charge of the ballot design came up with a solution to use a sorting function with a randomized comparator. In this way, they hoped to apply a randomized sort to the list of browsers, producing a random ordering that met the rules of the EU. While this did produce a random ordering, (and thus technically was in compliance with the EU directive), it did not produce unbiased results, as the ordering of some browsers came up more often than others. This lead to allegations that Microsoft was trying yet again to stifle its competition to unscrupulous practices.<\/p>\n

However, this probably wasn’t the case, and was likely an instance of a seemingly adequate solution being used without proper testing\/analysis.<\/p>\n

To understand the problem, we first have to understand sorting algorithms, specifically those that use comparisons as their basis. (i.e. comparison sorts<\/em>) An example of a comparison sort is Insertion sort<\/a><\/em>. Through each step of the sorting process, the sorted portion of the list increases by one as we take the next element from the unsorted portion and position it properly within the sorted portion. This is done by comparing<\/em> the element to be inserted with each element in the sorted portion until we find the correct position.<\/p>\n

A comparator<\/em> is simply a function that compares two objects, and tells us which one is greater than the other, or whether they are equal. In Java, the general structure of a Comparator<\/a> would be:<\/p>\n

new Comparator<T>() {\r\n  @Override\r\n  public int compare(final T o1, final T o2) {\r\n    \/\/ If o1 is \"less than\" o2, return a negative integer.\r\n    \/\/ If o1 is \"equal to\" o2, return 0.\r\n    \/\/ If o1 is \"greater than\" o2, return a positive integer.\r\n  }\r\n};<\/code><\/pre>\n

In this way, the exact semantics of comparisons can be tailored to meet the needs of the application. In the case of the Browser Ballot, the team implemented something akin to the following: (Though the language was JavaScript, not Java, as the Browser Ballot was a website)<\/p>\n

new Comparator<T>() {\r\n  @Override\r\n  public int compare(final T o1, final T o2) {\r\n    return Math.random() < 0.5 ? 1 : -1;\r\n  }\r\n};<\/code><\/pre>\n

In this comparator, the objects being compared are not even considered; instead the either -1<\/code> or 1<\/code> is returned with equal probability. Using this sort of “comparison” with a sorting algorithm might seem<\/em> to provide random, unbiased results – but in reality, such “common sense” did not make so much sense after further analysis.<\/p>\n

Looking deeper<\/h2>\n

To look at the problem a bit more in-depth, I decided to write an example in Java that would test the results of a “random sort” on a set of four objects, using the following sorting algorithms: A modified Merge sort (provided by Java’s Collections.sort()<\/a><\/code>), Quicksort, Insertion sort and Selection sort. These results would then be compared to using the proper method, a Fisher-Yates shuffle.<\/p>\n

Some more details:<\/p>\n