The language of leadership

Today more than ever we can see the power of political leadership to make or break a country through difficult times.

Australia's prime minister is elected to serve our country and speak for our people. Their words should reflect both the sentiment of the people and the cultural themes of their time, as well as set a bold vision for the Australia we all want to create.

This project performs a deep dive into tens of thousands of transcripts from prime ministers over the last 60 years. The resulting analysis gives a thought-provoking snapshot through time of the issues that have impacted us as a nation, and a rich genesis of some of the critical issues we face today.

An overview

This is an entry for GovHack 2020. I used the PM Transcripts Respository as my key data source. This source captures speeches, media releases, press conferences and other official transcripts for prime ministers.

I performed significant data munging and cleansing to shape the data into a form where it could be easily consumed and understood in graphical form.

The code for this project is at a GitHub repository. This includes the scripts used to manipulate the data, this website, and JSON extracts of the summarised data I created, available to others for further analysis. My competition entry page is on the GovHack site.

Below are some samples of the analysis I created, for your browsing! Please note that due to the size of some of these charts, this site is best viewed on a desktop or laptop device.

These are a few of my favourite words

This chart shows the most commonly used word by prime minister by year. The size of the bar indicates the level of focus on that term during that year.

Evolution, visualised

This chart shows the evolution of the most commonly used word by prime minister by year. Watch the change unfold, or pause the animation to delve deeper.

Get to know your favourite PM

See a word cloud for each prime minister, with the size of the word indicating the frequency of usage.

Malcolm Turnbull

Tony Abbott

Julia Gillard

Kevin Rudd

John Howard

Paul Keating

Robert Hawke

Gough Whitlam

William McMahon

John Gorton

Harold Holt

Robert Menzies

FAIR principles

For this project, the most important aspects of the data I leveraged was that it as Accessible and Interoperable.

I found the data to be highly Accessible, as it was simply stored in a GitHub repository. This made access a breeze, and far easier than it might be on some of the other official sources which require complex and slow ordering processes. It was very important that I could just download the source files directly in my browser, with no wait times.

The data was Interoperable in the sense that it was in a standard XML format, making manipulation easy. However the data format was not well documented, and the format that I could discern was not always reliably adhered to, which caused some headaches.

Initially I searched for some other complimentary data around the prime minister transcripts, but I found this hard to find. The PM Transcripts site was far harder to use than the source I used. So data in this sense was not very Findable.

Detailed notes

  • I took a number of steps to ensure the words displayed were highly relevant. I first stripped out "everyday" words (e.g. a, the, of, etc) as they added no analysis value. The resulting data contained more meaningful words, but still words that didn't add much to the evolving historical view I was trying to build (e.g. the leading words were "people", "australia", "strong", etc). So I then created my own list of key important terms that I looked for and analysed in more detail in the above graphs. This list can be seen in the theme tracing section.
  • I took some liberties with the data to ensure a clean result. For example, I ignored Rudd's second term of 2 months as it made that year's data very confusing. I also ignored transcripts outside of term as these were not issued as a prime minister. I also ignored years for a prime minister if they had less than 100 transcripts in that year, to ensure significance - for example, Paul Keating only had 7 transcripts in 1991 as he was only prime minister for the last 11 days of the year.
  • The value depicted on all graphs is the "word ratio", calculated as the total instances of that term divided by the total instances of all "important terms".
  • Technology-wise, I have created all data analysis scripts and this site using PHP, and stored the data in a MySQL database.