I’m a fan of CBC Radio 2. Okay, that’s not exactly true, but I do have my $10 radio alarm clock tuned to 94.1 FM to wake me on weekdays. I often find myself in a stupor or only semi-awake when the tunes start blasting away before dawn, and as such, I often have trouble remembering what was exactly on the radio that morning. However, once during the day I remembered that a certain Ben Folds Five song had received airtime on CBC Radio 2 during my morning wake-up, but could not recall the exact day. It bothered me.
Thankfully, they did have broadcast/play logs of all tracks they had aired, along with the date/times, providing for a succinct history. Unfortunately, it didn’t seem possible to search them, and I didn’t feel like searching through each day’s play log for the particular title. What to do?
Scripting to the rescue!
If you just want the script, it’s available here as a gist.
The broadcast log web page
The first thing to note on the broadcast log page is that the top-level page appears to use a SPA design, with the hash storing the state. However, it’s actually simpler than that. No Ajax/XHR is actually used to move between different dates on the broadcast log; instead, there’s just an
src URL is updated when a new date is selected.
This is a bit of a weird design, since the net effect is that the entire page is refreshed, since the
iframe occupies the entire page. The result is a SPA-like URL with a hash, but without an SPA-like experience. (I’m sure there’s an obscure and reasonable explanation for this.)
This means that the URL you see in your browser when you visit the broadcast log page is not the URL where the broadcast logs are loaded from. Instead, looking at the
iframe source, you’ll see they come from URLs with the following format:
Now that we have the URL template to use when requesting a specific day of broadcast/play logs, the next step is to learn how to scrape and extract the relevant data. Even though scraping is far less preferable to a structured API, thankfully the HTML is relatively well-structured:
<div class="logShowEntry"> <div class="logEntryTime fB s11"> 5:07 AM</div> <div class="logTrack"> <h3 class="fCm s21"> DRAW A CROWD</h3> <dl class="s12"> <dt>artist</dt><dd>Ben Folds Five</dd><dt>composer</dt><dd>Folds- Ben</dd> <dt>album</dt><dd class="fB">Draw A Crowd (Clean Edit)(Single)</dd> <dt>label</dt><dd>Legacy</dd> <dt>duration</dt><dd>03:56</dd> </dl> </div> </div>
You can see that all the information is available with in each
- The time the song was played at is in a
- The track/song name is in a
- The remainder of the attributes are in a
dlor a description/definition list
The description/definition list makes it easy grab this data as a map (key-value pairs) and the other ones can be added in. Sometimes, especially for classical music aired during the “Choral Concert” segment, there are many other attributes in the
dl list. This script ignores them and limits its output to only the following fields:
Here’s a truncated example of the output, which is valid CSV generated by the Python csv library.
$ ./cbc_radio_broadcast_logs.py --start=2015-01-01 # Results from 2015-01-01 to 2015-03-09. date,time,label,artist,composer,album,title,duration 2015-01-01,12:00 AM,Soft Revolution,Stars,Campbell- Torquil,From The Night (Radio Edit),FROM THE NIGHT,04:06 2015-01-01,12:04 AM,Rubyworks,Hozier,Hozier,Hozier,FROM EDEN,03:42 2015-01-01,12:07 AM,Mungo Park,Bobby Bazini,,Better In Time,MELLOW MOOD,02:51 2015-01-01,12:10 AM,Last Gang,The New Pornographers,Newman- A C,The New Pornographers: Brill Bruisers,CHAMPIONS OF RED WINE,03:40 2015-01-01,12:14 AM,Bloodshot,Ryan Adams,Adams- Ryan,Ryan Adams,FEELS LIKE FIRE,04:25 2015-01-01,12:18 AM,Dine Alone,Ivan & Alyosha,"Wilson- Tim,Wilson- Pete,Kim- Tim,Carbary- Ryan",All The Times We Had,BE YOUR MAN,03:56 2015-01-01,12:22 AM,True North,Lynn Miles,,Love Sweet Love,NEVER COMING BACK,02:57 2015-01-01,12:25 AM,Universal,Sarah Mclachlan,"Marchand- Pierre,Mclachlan- Sarah",Monsters (Remix)(Single),MONSTERS,03:11 2015-01-01,12:28 AM,Sub Pop,Blitzen Trapper,Earley- Eric,Furr,FURR,04:07 2015-01-01,12:32 AM,Dangerbird,Butch Walker,Walker- Butch,Bed On Fire (Single),BED ON FIRE,03:57 2015-01-01,12:36 AM,Six Shooter,Amelia Curran,Curran- Amelia,They Promised You Mercy,NEVER SAY GOODBYE,03:54 2015-01-01,12:40 AM,Universal,U2,U2,U2: War,"""40""",02:37 2015-01-01,12:42 AM,Glassnote,Mumford & Sons,"Lovett- Ben,Marshall- Winston,Dwane- Ted,Mumford- Marcus",Babel,BABEL,03:22 ...
--search-artist allows you to limit to only a specific artist, but I recommend leaving it out/empty to return all entries in the broadcast log. You can then save an offline copy to search later, which I recommend rather than continually hammering the site, in order to be a good citizen.)
You can grab the script from a gist here. One final usage note: Broadcast log data appears to be spotty before 2012.