Understanding the Type of Data
As covered in a previous blog post, identifying the type and amount of variables in a dataset is key to choosing the right chart type. Scatter plots are useful for exploring relationships through use of two or three variables. Lets begin by exploring the following dataset:
The dataset above looks at different neighborhoods in Barcelona. In particular, it looks at percentage of votes from a political party and the available disposable household income for that neighborhood. As you can see, the variables in the second and third column are shown in fractional amounts. The data points reveal that data in these columns can take any value.
In this case, the B column and the C column in the dataset will take up the x-axis and the y-axis in our visualization. What we know is that these data points are continuous and not discrete.
Choosing Chart Type
Scatter Plots require minimum of two continuous variable to be effective. These chart types are especially useful for exploring patterns and relationships with your data.
If you notice an upward slope then the association of data points is positive. If, however, you notice a downward slope then the association of points is negative. Scatter plots can be defined by four elements:
- Direction of the data points, if any. As with the image above, is the arrangement positive, is it negative, is there any direction at all?
- Form that data points are taking. Is the association linear? Or are you seeing a curved form? Is the form perfectly linear? How does the association look like?
- Strength of the association. Is it obvious there is a relationship in your data points? Can you easily parse out a strong association? Or is it difficult to discern a relationship between your data points?
- Outliers or points in your dataset that stand out from the rest of the pattern in your graph.
Examples in Quadrigram
Let’s look at some examples of Scatter Plots done in Quadrigram. Here we see the relationship of different neighborhoods in Barcelona using percentage of votes by political party and available household income of defined neighborhood.
The chart above shows percentage of votes for center-right party Convergencia i Unio and disposable household income for every neighborhood in Barcelona. There is clear and strong positive association between higher household income and high percent of vote for that party.
Votes for the Catalan Socialist Party clearly indicates a strong negative association. Here the percentage of votes are highest among households with lowest disposable income.