{"id":1565,"date":"2015-03-08T22:02:05","date_gmt":"2015-03-09T03:02:05","guid":{"rendered":"http:\/\/unitstep.net\/?p=1565"},"modified":"2015-03-09T08:57:17","modified_gmt":"2015-03-09T13:57:17","slug":"python-script-to-search-cbc-radio-2-broadcastplay-logs","status":"publish","type":"post","link":"https:\/\/unitstep.net\/blog\/2015\/03\/08\/python-script-to-search-cbc-radio-2-broadcastplay-logs\/","title":{"rendered":"Python script to search CBC Radio 2 broadcast\/play log history"},"content":{"rendered":"
I’m a fan of CBC Radio 2<\/a>. Okay, that’s not exactly true, but I do have my $10 radio alarm clock tuned to 94.1 FM to wake me on weekdays. I often find myself in a stupor or only semi-awake when the tunes start blasting away before dawn, and as such, I often have trouble remembering what was exactly on the radio that morning. However, once during the day I remembered that a certain Ben Folds Five<\/em> song had received airtime on CBC Radio 2 during my morning wake-up, but could not recall the exact day. It bothered me.<\/p>\n Thankfully, they did have broadcast\/play logs<\/a> of all tracks they had aired, along with the date\/times, providing for a succinct history. Unfortunately, it didn’t seem possible to search them, and I didn’t feel like searching through each day’s play log for the particular title. What to do?<\/p>\n Scripting to the rescue!<\/p>\n <\/p>\n If you just want the script, it’s available here as a gist<\/a>.<\/p>\n The first thing to note on the broadcast log page is that the top-level page appears to use a SPA<\/a> design, with the hash storing the state. However, it’s actually simpler than that. No Ajax\/XHR is actually used to move between different dates on the broadcast log; instead, there’s just an This is a bit of a weird design, since the net effect is that the entire page is refreshed, since the This means that the URL<\/acronym> you see in your browser when you visit the broadcast log page<\/a> is not the URL<\/acronym> where the broadcast logs are loaded from. Instead, looking at the Now that we have the URL<\/acronym> template to use when requesting a specific day of broadcast\/play logs, the next step is to learn how to scrape and extract the relevant data. Even though scraping is far less preferable to a structured API, thankfully the HTML<\/acronym> is relatively well-structured:<\/p>\n You can see that all the information is available with in each The description\/definition list makes it easy grab this data as a map (key-value pairs) and the other ones can be added in. Sometimes, especially for classical music aired during the “Choral Concert” segment, there are many other attributes in the TL;DR<\/h2>\n
The broadcast log web page<\/h2>\n
iframe<\/code> whose
src<\/code> URL<\/acronym> is updated when a new date is selected.<\/p>\n
iframe<\/code> occupies the entire page. The result is a SPA-like URL<\/acronym> with a hash, but without an SPA-like experience. (I’m sure there’s an obscure and reasonable explanation for this.)<\/p>\n
iframe<\/code> source, you’ll see they come from URLs with the following format:
http:\/\/music.cbc.ca\/broadcastlogs\/broadcastlogs.aspx?broadcastdate=YYYY-MM-DD<\/code><\/p>\n
Scraping<\/h2>\n
<div class=\"logShowEntry\">\r\n <div class=\"logEntryTime fB s11\">\r\n 5:07 AM<\/div>\r\n <div class=\"logTrack\">\r\n <h3 class=\"fCm s21\">\r\n DRAW A CROWD<\/h3>\r\n <dl class=\"s12\">\r\n <dt>artist<\/dt><dd>Ben Folds Five<\/dd><dt>composer<\/dt><dd>Folds- Ben<\/dd>\r\n <dt>album<\/dt><dd class=\"fB\">Draw A Crowd (Clean Edit)(Single)<\/dd>\r\n <dt>label<\/dt><dd>Legacy<\/dd>\r\n <dt>duration<\/dt><dd>03:56<\/dd>\r\n <\/dl>\r\n <\/div>\r\n<\/div><\/code><\/pre>\n
div.logShowEntry<\/code>:<\/p>\n
\n
div.logEntryTime<\/code><\/li>\n
.logTrack h3<\/code><\/li>\n
dl<\/code> or a description\/definition list<\/li>\n<\/ul>\n
dl<\/code> list. This script ignores them and limits its output to only the following fields:
date,time,label,artist,composer,album,title,duration<\/code>.<\/p>\n