Finding the story in your Data: Part 1

We often tend to think that data as just numbers in a spreadsheet. But that is simply not the case. Data is much more than that. They are pieces of information that framed in the right way can leave a lasting impression on your audience. Whether you are looking to awe your audience, inspire, or motivate them, your first step to working with data begins with asking the right question.

Let’s explore this in detail. For this exercise, we will be exploring text. We can find stories in anything, whether it is the message conveyed by the rhetoric of a politician or the lyrics of a musician, we can find patterns by looking at the most common words or phrases.

Step 1

For this exercise, I want to take a closer look at Bob Dylan, who in 2016 was awarded the Nobel Prize in Literature for “having created new poetic expressions within the Great American song tradition”. I also know that Bob Dylan has been performing nonstop since 1988, on his famous “Never Ending Tour”.

Step 2

A closer look at his website reveals the number of times he’s played the songs since he started performing. Let’s look at the most frequents songs. The lyrics in each of the top songs will provide the data that we will use to analyze. For this, we need to add it the lyrics to a word document.

Bob Dylan

Step 3

Let’s clean the data by eliminating stopwords, which are words that for helpful to ignore for text analysis. This includes words like “this”, “that”, “and”, “but”.

Step 4

Analyze the text by underlining the phrases the are common. Common phrases can either be a word, or bigram, which is a two-word phrase, or trigram, which is a three-word phrase.

Step 5

Analyze results.

Top Words Bigrams Trigrams
Word Frequency bigram Frequency Trigram Frequency
ain’t 25 they’ll stone 20 ain’t gonna work 15
stone 24 I ain’t 16 I ain’t gonna 15
they’ll 22 ain’t gonna 15 they’ll stone ya 12
say 20 gonna work 15 does it feel 10
don’t 20 no more 15 how does it 9
well 19 to be 13 gonna work for 9
you’re 19 in the 13 work for maggie’s 9
like 18 stone ya 12 ya when you’re 9
know 18 when you’re 11 they’ll stone ya 8
said 18 you don’t 11 you don’t know 8
gonna 17 don’t know what 8
feel 16 it ain’t no 7
get 15

Step 6

Let’s look for a story in the data. Ain’t is the most used word from Bob Dylan. But we see that ain’t appears in the top three bigrams and the top two trigrams. Now we can make our own data story.

Data Story

A data story can be as simple or as complex as you want to make it. It can be something like Bob Dylan has said “ain’t” a lot more than “farm”. That’s a story. A very basic one but nonetheless a story. We can use a bubble chart to visually represent this, by making visual representation of the words and having the biggest words stand out more, whereas words less frequently mentioned be represented less.

WASHINGTON D.C. - AUGUST 28: Folk singers Joan Baez and Bob Dylan perform during a civil rights rally on August 28, 1963 in Washington D.C. (Photo by National Archive/Newsmakers)

WASHINGTON D.C. – AUGUST 28: Folk singers Joan Baez and Bob Dylan perform during a civil rights rally on August 28, 1963 in Washington D.C. (Photo by National Archive/Newsmakers)

 

 

Categories: Uncategorized

Leave a Reply