{"id":1428,"date":"2014-03-04T21:02:22","date_gmt":"2014-03-05T02:02:22","guid":{"rendered":"http:\/\/unitstep.net\/?p=1428"},"modified":"2014-03-21T21:06:34","modified_gmt":"2014-03-22T02:06:34","slug":"analysis-of-the-2013-chicago-marathon-results","status":"publish","type":"post","link":"https:\/\/unitstep.net\/blog\/2014\/03\/04\/analysis-of-the-2013-chicago-marathon-results\/","title":{"rendered":"Analysis of the 2013 Chicago Marathon results"},"content":{"rendered":"
With close to 39,000 results, the 2013 Chicago Marathon Results<\/a> combine two of my favourite topics, statistics and running. I decided to take this opportunity to learn more about pandas<\/a> by using it to analyze the result set to provide some insight into how people run marathons. (I myself ran this race)<\/p>\n The result of my work is in a GitHub repo<\/a> and published as an IPython Notebook<\/a>. I’ve extracted some of the more interesting parts.<\/p>\n <\/p>\n Histograms are fun because they show distributions:<\/p>\n \n<\/a>\n<\/p>\n This shows the largest five-minute bin is 3:55-4:00. It’s larger by quite a bit and one possible explanation is that the most common goal time a sub-4 hour finish. This significantly skews the distribution from what you would expect if everyone ran to the limit of their ability.<\/p>\n Next, we have essentially the cumulative distribution function of the above: What percentage of people ran faster than a given time. The 50% point is defined as the median, which is 04:27:26. <\/p>\n \n<\/a>\n<\/p>\n A CDF like this is a good way to compare your time with others. For example, a sub-3:30 would get you roughly into the top-10%.<\/p>\n Lastly, a comparison of the difference between mean male and female finish times across the different age groups:<\/p>\n \n<\/a>\n<\/p>\n As you can see, it’s remarkable consistent, with most between 25-30 mins.<\/p>\n I’ve cherry picked which graphs to include. Check out the full analysis\/IPython Notebook<\/a> for more. (The analysis itself is a cherry picking of sorts)<\/p>\nMundane numbers<\/h2>\n
\n
\nTop 1%: 2:50:20
\nTop 5%: 3:14:11
\nTop 10%: 3:28:35
\nTop 20%: 3:46:58\n<\/li>\n
\nwomen: 9.6%
\nmen: 8.4%\n<\/ul>\n<\/li>\n<\/ul>\nInteresting Graphs<\/h2>\n
Conclusion<\/h2>\n