Story-buzzSumo

“Data-driven storytelling is poised to be the next big trend in content marketing.Harvard Business Review, October 2015.

“Data-enhanced storytelling is rapidly reshaping both content and advertising.” Adweek, January, 2016.

There is a growing interest in data driven stories. Some people refer to it as a new form of data journalism, although the reality is that data driven journalism has been around since the mid-1800’s.  The Guardian newspaper, a pioneer of data driven journalism, has pointed out that their very first stories were data driven in 1860.

So why the current interest in data-driven stories? The answer lies in increasing access to large datasets combined with improved, easy to use data analysis tools. This combination is fuelling a new breed of journalists and writers who are uncovering and telling new stories through data research.

At BuzzSumo we are always looking for the stories hidden inside social data. We particularly look for insights into content marketing and why certain content gets amplified through sharing and linking. We thought we would take a more detailed look at data driven stories and at five core narratives which work well with data based story telling.

What are data driven stories?

Wikipedia defines data driven journalism as “analyzing and filtering large data sets for the purpose of creating a news story.” The process is one of uncovering insights from the analysis of large data sets to reveal stories that may be hidden in data. In this way data driven journalism allows journalists to reveal untold stories or find new angles on stories.

Data driven journalism typically follows a process of finding data, filtering data, analysis and visualisation, and finally telling the story. Visualisations in the form of charts and images is often a core part of the story telling. The Financial Times for example, currently runs a column called ‘Chart that tells a story‘ – the premise being that they find a single chart which tells a story.

Data driven stories are hard work

There is a misconception that using data is a quick and easy way to create stories. It is true that data analysis is very different from the difficult and sometimes dangerous work of the brave journalists that uncover stories from say war zones. It is also true that some publishers create quick and cheap stories by running simple polls but this is not what we mean by data driven journalism.

The Guardian says that data driven stories are 80% perspiration, 10% great idea and 10% output. This resonates with our own experience. The process involves analysing a lot of data without knowing if you will find any significant insights that tell a story. A significant amount of time is spent gathering, filtering and cleaning the data, running different forms of analysis, exploring potential implications and testing theories with further datasets. At BuzzSumo we have analyzed datasets of millions of articles looking for insights and sometimes we can spend days and weeks without discovering anything of value or a story of interest. You have to be prepared to fail in your search for a story. However, it is very rewarding when you discover new insights and find hidden stories in the data.

The best data driven stories are original

In our view one of the great strengths of data based stories is that they can tell original stories. They can reveal trends, correlations or counter-intuitive surprises that make people look afresh at an issue. Original research does not mean you necessarily need original data. The data sets that Peter Brand had access to in the famous MoneyBall story were widely available, it was the research and analysis that was unique. These days there are thousands of datasets widely available and an increasing range of tools to help analyse the data. These include Tableau, R, Google Fusion Tables and IBM Watson.

It is clearly advantageous if you have access to an original or unique data set. At BuzzSumo our core business is crawling and collecting very large datasets but most companies also have access to unique data. For example, most businesses have data that is important to their performance and their industry. This can be sales data, market intelligence or simply an understanding of issues through data from your support desk. Data that you might consider commonplace could contain insights that are helpful to your audience.

The best data driven stories tell people something they don’t already know based on data analysis.

Five data-driven narratives

There are five core narratives that work really well when telling a story with data. These are:

  • Trends. For example, how smartphone ownership is increasing or decreasing.
  • Rank order or league tables. For example, the politicians getting the most social media coverage or which areas have the highest crime rates.
  • Comparisons. For example, how one company is performing relative to another.
  • Surprising or counterintuitive data.  Data that challenges or confirms something that people believe to be true, or data that is simply surprising.
  • Relationships. For example, correlations, potentially through to causation and prediction.

Let’s take a look at each of these narratives and some example stories.

Trends

There are many stories based on trends. Typically these stories focus on how something is rising or falling over time. However, even a flattening trend can be a major story. One of the big news stories of recent weeks has been how Twitter is failing to grow its active users. This story can be told through very clearly through the chart and headline below.

“Twitter Fails To Grow Active Users”

twitter

(Active users in millions)

Once you see a trend the obvious next question is why, why is it increasing or falling, or in Twitter’s case flattening. Thus the trend is not the whole story, it prompts further areas for investigation.

Comparisons

A common data-driven narrative is comparisons. For example, we can take a different angle on Twitter’s failure to grow its active users by comparing how it is performing relative to Facebook. This has been a story angle taken by a number of publications. Below is an example chart used to show how Facebook is outperforming Twitter.

“Active Users: Facebook Continues to Outperform Twitter”

twitter-facebook

(Active users millions)

 

Rank order or league tables

Rank order or league tables are another common narrative suited to data. We have provided some examples below from our own BuzzSumo data.

This first table below shows the sites with the most shares of articles about content marketing in the last 12 months.

content marketing websites

We could write a story on the top content marketing sites by using the data in this table.

This second table shows the authors with the highest average shares of articles on content marketing. Brian Sutter‘s articles on Forbes have helped make him the author with the highest average shares. All credit though to Lindsay Kolowich for a really consistent level of shares for her content marketing articles on Hubspot.

content marketing authors

Relationships

Exploring relationships between data is a complex area, particularly when you want to see if one factor has a particular impact on other factors or can predict another factor. However, with advances in machine learning it is an area where we may see a lot more data driven stories.

A simple approach to exploring relationships is to look at the correlation of two sets of data. It is important to remember that correlation is not the same as causation but it can highlight areas for further research. For example, we did a piece of research with Moz where we looked at the relationship between social shares and links. We took a data set of 1m posts and used the Pearson correlation co-efficient, a measure of the linear correlation between two variables. The results can range from between 1 (a total positive correlation) to 0 (where there is no correlation) to −1 (a total negative correlation).  The overall correlations for our sample were effectively zero, for example the correlation between total shares and referring domain links was just 0.021. This implies that people share and link to content for different reasons. This research took a while, and we had no sense of likely outcomes when we started it, but it’s been quite a revelation.

You can also explore relationships in more depth by building predictive linear regression models. I particularly like the models that predict the quality of wine by using factors such as average summer temperatures and rainfall levels.

There are an increasing range of tools which will allow you to apply advanced techniques such as machine learning. Machine learning uses algorithms that can learn from data and make predictions.  In essence you build a model from example data inputs that enable the algorithms to make data-driven predictions. This is a growing field where we will see a lot more activity. Machine learning is something we are exploring and looking to apply at BuzzSumo.

If you can uncover surprising relationships, you can then start to make steps into predictions based on the data. That can create a whole other set of fascinating posts. The work of Nate Silver in predicting sports and election results at fivethirtyeight.com makes for compelling reading. It’s the result of his deep analysis of a huge array of available data on election results.

Surprising or counter intuitive data

Some of the best stories from data research emerge when the data reveals something that is surprising or even counter-intuitive. I personally liked the research which found that 5 glasses of champagne a day can help prevent Alzheimer’s disease.

For this article I decided to look for surprising data about the US to use as an example. The data I personally found most surprising was on incarceration.

“America Imprisons More People Than Any Other Country”

us-incarceration

The chart above shows US incarceration rates relative to other OECD countries. However, the US also has higher incarceration rates than China or Russia and higher numbers of people incarcerated.

This data intrigued me so I did some more research and found that incarceration rates in the US were very similar to other countries until 1980, when it increased significantly. See the chart below.

US_incarceration_timeline

This chart begs the question why did incarceration rates increase so rapidly after 1980? You will need to do your own research but articles I have read suggest that privatisation of prisons and a different approach to incarceration for drug crimes may be responsible. This type of relationship analysis is at the core of the hugely successful Freakonomics books.

Data-driven story tips

Here are some tips on writing data-driven stories based on our our experience, it would be great to get your tips and feedback.

1. Start with a story idea

If you start with an idea for a story you can then look for data which confirms your ideas or alternatively debunks the ideas. Focus on a story that is interesting to your audience.

For us a great story might be “Why ‘how to’ posts get 50% more shares”. Of course the data may not support this headline but it gives a clear direction of the data we would need and the type of story we want to tell. There is a danger of bias in operating this way and you need to honestly reflect on whether your data supports your conclusions.

2. Check your facts

If you have made a mistake or have inaccurate data you will soon get called out when you publish the post. Data posts often get the most scrutiny on the internet, so check and double check your data, and that it supports your story.

3. Focus on one or two key statistics from your research

You may have a mass of data but highlight the key statistics that people will remember. An example from our own experience was “50% of content gets 8 shares or less”.

4. Use visuals and tables

Data driven stories are inherently suitable for charts and graphics. Try to hone down your story to one key chart or image, which is the one you want people to share and remember. Trends in particular work well as line charts as outlined above.

In addition to charts use tables to highlight data and bring out key data using callouts, so they stand out from the rest of your text. Numbers can get lost all too easily in a block of text.

5. Make it human

If you can it is good to bring the data back to a human level and in a form people can relate to. Maybe it is a story about you, a client or a colleague that relates to the data. As the freakonomics authors said, economics is great at predicting human behaviour based on data sets, it just focuses on boring problems that most people don’t care about. Their breakthrough was focusing on things people care about.

6. Make it insightful and helpful

In terms of your story would someone have made a different decision if they had your data? What can they do different which will improve their performance based on your analysis? If you can do this, you have a powerful story, as it means that your insights can help people make better decisions. You’re predicting their future – nobody can resist reading about that.

  • https://francovalentino.com Franco Valentino

    Steve, insightful article – thanks. My favorite types of articles are the ones with surprising or counter-intuitive data. There is a research paper by Murray S. Davis that shed light on what makes a topic seem more interesting or compelling. Synposis here: http://www.sfu.ca/~palys/interest.htm

    • Steve Rayson

      Thanks, I will take a look.

  • Nick Szabo

    This article made me look up predictable linear regression. Gotta love the internet.

    • Steve Rayson

      Some great free tools and courses. I can recommend Edx courses on data analytics which covers linear regression etc.

  • SunShine

    Excellent article. Years back, I began wondering how we, as humans, would begin to deal with the growing body of statistics, knowledge, data growing at increasingly exponential rates online. I see that others were wondering the same thing and, most importantly, how to connect it and make good use of it. I appreciate the mentions of the data crunching tools provided here. Now to find the time to dig into them.

    • Steve Rayson

      Thanks, I do recommend looking at some of the free courses such as Edx’s course the Analytics Edge, which takes you through using R, the open source data analysis tool.

      • SunShine

        Excellent. I’m doing this. Thanks for the share.

  • Dollar Flipper

    This is a great post. I think another part is to understand your audience.

    I did a whole post about a bubble that occurred with re-selling mugs on eBay (http://flippingadollar.com/dragon-ball-z-mug-ebay-bubble-data-analysis/). Half of my readers thought it was an awesome post, and another half thought it was over the top.

    I guess I didn’t realize that my usual data breakout is just a table with simple addition/subtraction. Nothing with graphs, averages, or standard deviations.

    I think younger generations are more comfortable with data analysis since the tools are so much simpler than what was available during previous generations (no more slide rules required!).

  • http://www.horizonpeakconsulting.com Jessica Mehring

    Another winning article from the BuzzSumo team! As a content writer who often writes data-driven white papers and e-books, this post hit home — and I came away with some great ideas!

    • http://buzzsumo.com James Blackwell

      Thanks Jessica

  • http://www.mmarley.com Matthew Marley

    Awesome post guys!

  • andrew_davis

    Really like this post. Its like creating online documentaries

    • Susan Moeller

      Thanks!

  • https://dataddict.wordpress.com Marcos Ortiz

    Time to invest time to learn more about the powerful combination of R + ggplot2 for Data-Driven Journalism. Great post, Steve.