We often tend to think that data as just numbers in a spreadsheet. But that is simply not the case. Data is much more than that. They are pieces of information that framed in the right way can leave a lasting impression on your audience. Whether you are looking to awe your audience, inspire, or motivate them, your first step to working with data begins with asking the right question.
Let’s explore this in detail. For this exercise, we will be exploring text. We can find stories in anything, whether it is the message conveyed by the rhetoric of a politician or the lyrics of a musician, we can find patterns by looking at the most common words or phrases.
For this exercise, I want to take a closer look at Bob Dylan, who in 2016 was awarded the Nobel Prize in Literature for “having created new poetic expressions within the Great American song tradition”. I also know that Bob Dylan has been performing nonstop since 1988, on his famous “Never Ending Tour”.
A closer look at his website reveals the number of times he’s played the songs since he started performing. Let’s look at the most frequents songs. The lyrics in each of the top songs will provide the data that we will use to analyze. For this, we need to add it the lyrics to a word document.
Let’s clean the data by eliminating stopwords, which are words that for helpful to ignore for text analysis. This includes words like “this”, “that”, “and”, “but”.
Analyze the text by underlining the phrases the are common. Common phrases can either be a word, or bigram, which is a two-word phrase, or trigram, which is a three-word phrase.
|ain’t||25||they’ll stone||20||ain’t gonna work||15|
|stone||24||I ain’t||16||I ain’t gonna||15|
|they’ll||22||ain’t gonna||15||they’ll stone ya||12|
|say||20||gonna work||15||does it feel||10|
|don’t||20||no more||15||how does it||9|
|well||19||to be||13||gonna work for||9|
|you’re||19||in the||13||work for maggie’s||9|
|like||18||stone ya||12||ya when you’re||9|
|know||18||when you’re||11||they’ll stone ya||8|
|said||18||you don’t||11||you don’t know||8|
|gonna||17||don’t know what||8|
|feel||16||it ain’t no||7|
Let’s look for a story in the data. Ain’t is the most used word from Bob Dylan. But we see that ain’t appears in the top three bigrams and the top two trigrams. Now we can make our own data story.
A data story can be as simple or as complex as you want to make it. It can be something like Bob Dylan has said “ain’t” a lot more than “farm”. That’s a story. A very basic one but nonetheless a story. We can use a bubble chart to visually represent this, by making visual representation of the words and having the biggest words stand out more, whereas words less frequently mentioned be represented less.