{"id":1565,"date":"2015-03-08T22:02:05","date_gmt":"2015-03-09T03:02:05","guid":{"rendered":"http:\/\/unitstep.net\/?p=1565"},"modified":"2015-03-09T08:57:17","modified_gmt":"2015-03-09T13:57:17","slug":"python-script-to-search-cbc-radio-2-broadcastplay-logs","status":"publish","type":"post","link":"https:\/\/unitstep.net\/blog\/2015\/03\/08\/python-script-to-search-cbc-radio-2-broadcastplay-logs\/","title":{"rendered":"Python script to search CBC Radio 2 broadcast\/play log history"},"content":{"rendered":"<p>I&#8217;m a fan of <a href=\"http:\/\/music.cbc.ca\/radio2\/\">CBC Radio 2<\/a>. Okay, that&#8217;s not exactly true, but I do have my $10 radio alarm clock tuned to 94.1 FM to wake me on weekdays. I often find myself in a stupor or only semi-awake when the tunes start blasting away before dawn, and as such, I often have trouble remembering what was exactly on the radio that morning. However, once during the day I remembered that a certain <em>Ben Folds Five<\/em> song had received airtime on CBC Radio 2 during my morning wake-up, but could not recall the exact day. It bothered me.<\/p>\n<p>Thankfully, they did have <a href=\"http:\/\/music.cbc.ca\/radio2\/playlogs\">broadcast\/play logs<\/a> of all tracks they had aired, along with the date\/times, providing for a succinct history. Unfortunately, it didn&#8217;t seem possible to search them, and I didn&#8217;t feel like searching through each day&#8217;s play log for the particular title. What to do?<\/p>\n<p>Scripting to the rescue!<\/p>\n<p><!--more--><\/p>\n<h2>TL;DR<\/h2>\n<p>If you just want the script, it&#8217;s <a href=\"https:\/\/gist.github.com\/pchng\/88e3f4e724b7c6b8763c\">available here as a gist<\/a>.<\/p>\n<h2>The broadcast log web page<\/h2>\n<p>The first thing to note on the broadcast log page is that the top-level page appears to use a <abbr title=\"Single Page Application\">SPA<\/a> design, with the hash storing the state. However, it&#8217;s actually simpler than that. No Ajax\/XHR is actually used to move between different dates on the broadcast log; instead, there&#8217;s just an <code>iframe<\/code> whose <code>src<\/code> <acronym class=\"uttInitialism\" title=\"Uniform Resource Locator\">URL<\/acronym> is updated when a new date is selected.<\/p>\n<p>This is a bit of a weird design, since the net effect is that the entire page is refreshed, since the <code>iframe<\/code> occupies the entire page. The result is a SPA-like <acronym class=\"uttInitialism\" title=\"Uniform Resource Locator\">URL<\/acronym> with a hash, but without an SPA-like experience. (I&#8217;m sure there&#8217;s an obscure and reasonable explanation for this.)<\/p>\n<p>This means that the <acronym class=\"uttInitialism\" title=\"Uniform Resource Locator\">URL<\/acronym> you see in your browser when you visit the <a href=\"http:\/\/music.cbc.ca\/radio2\/playlogs\">broadcast log page<\/a> is not the <acronym class=\"uttInitialism\" title=\"Uniform Resource Locator\">URL<\/acronym> where the broadcast logs are loaded from. Instead, looking at the <code>iframe<\/code> source, you&#8217;ll see they come from URLs with the following format: <code>http:\/\/music.cbc.ca\/broadcastlogs\/broadcastlogs.aspx?broadcastdate=YYYY-MM-DD<\/code><\/p>\n<h2>Scraping<\/h2>\n<p>Now that we have the <acronym class=\"uttInitialism\" title=\"Uniform Resource Locator\">URL<\/acronym> template to use when requesting a specific day of broadcast\/play logs, the next step is to learn how to scrape and extract the relevant data. Even though scraping is far less preferable to a structured API, thankfully the <acronym class=\"uttInitialism\" title=\"HyperText Markup Language\">HTML<\/acronym> is relatively well-structured:<\/p>\n<pre><code>&lt;div class=\"logShowEntry\"&gt;\r\n  &lt;div class=\"logEntryTime fB s11\"&gt;\r\n      5:07 AM&lt;\/div&gt;\r\n  &lt;div class=\"logTrack\"&gt;\r\n    &lt;h3 class=\"fCm s21\"&gt;\r\n      DRAW A CROWD&lt;\/h3&gt;\r\n    &lt;dl class=\"s12\"&gt;\r\n      &lt;dt&gt;artist&lt;\/dt&gt;&lt;dd&gt;Ben Folds Five&lt;\/dd&gt;&lt;dt&gt;composer&lt;\/dt&gt;&lt;dd&gt;Folds- Ben&lt;\/dd&gt;\r\n      &lt;dt&gt;album&lt;\/dt&gt;&lt;dd class=\"fB\"&gt;Draw A Crowd (Clean Edit)(Single)&lt;\/dd&gt;\r\n      &lt;dt&gt;label&lt;\/dt&gt;&lt;dd&gt;Legacy&lt;\/dd&gt;\r\n      &lt;dt&gt;duration&lt;\/dt&gt;&lt;dd&gt;03:56&lt;\/dd&gt;\r\n    &lt;\/dl&gt;\r\n  &lt;\/div&gt;\r\n&lt;\/div&gt;<\/code><\/pre>\n<p>You can see that all the information is available with in each <code>div.logShowEntry<\/code>:<\/p>\n<ul>\n<li>The time the song was played at is in a <code>div.logEntryTime<\/code><\/li>\n<li>The track\/song name is in a <code>.logTrack h3<\/code><\/li>\n<li>The remainder of the attributes are in a <code>dl<\/code> or a description\/definition list<\/li>\n<\/ul>\n<p>The description\/definition list makes it easy grab this data as a map (key-value pairs) and the other ones can be added in. Sometimes, especially for classical music aired during the &#8220;Choral Concert&#8221; segment, there are many other attributes in the <code>dl<\/code> list. This script ignores them and limits its output to only the following fields: <code>date,time,label,artist,composer,album,title,duration<\/code>.<\/p>\n<p>Here&#8217;s a truncated example of the output, which is valid CSV generated by the Python <a href=\"https:\/\/docs.python.org\/2\/library\/csv.html\">csv<\/a> library.<\/p>\n<pre><code>$ .\/cbc_radio_broadcast_logs.py --start=2015-01-01\r\n# Results from 2015-01-01 to 2015-03-09.\r\ndate,time,label,artist,composer,album,title,duration\r\n2015-01-01,12:00 AM,Soft Revolution,Stars,Campbell- Torquil,From The Night (Radio Edit),FROM THE NIGHT,04:06\r\n2015-01-01,12:04 AM,Rubyworks,Hozier,Hozier,Hozier,FROM EDEN,03:42\r\n2015-01-01,12:07 AM,Mungo Park,Bobby Bazini,,Better In Time,MELLOW MOOD,02:51\r\n2015-01-01,12:10 AM,Last Gang,The New Pornographers,Newman- A C,The New Pornographers: Brill Bruisers,CHAMPIONS OF RED WINE,03:40\r\n2015-01-01,12:14 AM,Bloodshot,Ryan Adams,Adams- Ryan,Ryan Adams,FEELS LIKE FIRE,04:25\r\n2015-01-01,12:18 AM,Dine Alone,Ivan &amp; Alyosha,\"Wilson- Tim,Wilson- Pete,Kim- Tim,Carbary- Ryan\",All The Times We Had,BE YOUR MAN,03:56\r\n2015-01-01,12:22 AM,True North,Lynn Miles,,Love Sweet Love,NEVER COMING BACK,02:57\r\n2015-01-01,12:25 AM,Universal,Sarah Mclachlan,\"Marchand- Pierre,Mclachlan- Sarah\",Monsters (Remix)(Single),MONSTERS,03:11\r\n2015-01-01,12:28 AM,Sub Pop,Blitzen Trapper,Earley- Eric,Furr,FURR,04:07\r\n2015-01-01,12:32 AM,Dangerbird,Butch Walker,Walker- Butch,Bed On Fire (Single),BED ON FIRE,03:57\r\n2015-01-01,12:36 AM,Six Shooter,Amelia Curran,Curran- Amelia,They Promised You Mercy,NEVER SAY GOODBYE,03:54\r\n2015-01-01,12:40 AM,Universal,U2,U2,U2: War,\"\"\"40\"\"\",02:37\r\n2015-01-01,12:42 AM,Glassnote,Mumford &amp; Sons,\"Lovett- Ben,Marshall- Winston,Dwane- Ted,Mumford- Marcus\",Babel,BABEL,03:22\r\n...<\/code><\/pre>\n<p>(Note: <code>--search-artist<\/code> allows you to limit to only a specific artist, but I recommend leaving it out\/empty to return all entries in the broadcast log. You can then save an offline copy to search later, which I recommend rather than continually hammering the site, in order to be a good citizen.)<\/p>\n<p>You can grab the <a href=\"https:\/\/gist.github.com\/pchng\/88e3f4e724b7c6b8763c\">script from a gist here<\/a>. One final usage note: Broadcast log data appears to be spotty before 2012.<\/p>","protected":false},"excerpt":{"rendered":"<p>I&#8217;m a fan of CBC Radio 2. Okay, that&#8217;s not exactly true, but I do have my $10 radio alarm clock tuned to 94.1 FM to wake me on weekdays. I often find myself in a stupor or only semi-awake when the tunes start blasting away before dawn, and as such, I often have trouble [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[403,156],"tags":[],"_links":{"self":[{"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/posts\/1565"}],"collection":[{"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/comments?post=1565"}],"version-history":[{"count":20,"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/posts\/1565\/revisions"}],"predecessor-version":[{"id":1616,"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/posts\/1565\/revisions\/1616"}],"wp:attachment":[{"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/media?parent=1565"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/categories?post=1565"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/unitstep.net\/wp-json\/wp\/v2\/tags?post=1565"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}