The Technium

Twitter Predicts the Future


[Translations: Japanese]

The chatter in Twitter can accurately predict the box-office revenues of upcoming movies weeks before they are released. In fact, Tweets can predict the performance of films better than market-based predictions, such as Hollywood Stock Exchange, which have been the best predictors to date.

The Social Computing Lab at HP Labs in Palo Alto, CA found that using only the rate at which movies are mentioned could successfully predict future revenues. But when the sentiment of the tweet was factored in (how favorable it was toward the new movie), the prediction was even more exact. To quantify the sentiments in 3 million tweets the team used the anonymous human workers found by the Amazon Mechanical Turk to rate a sample of tweets, and then trained an algorythmic classifier to derive a rating for the rest.

tweet-rate.jpg

The graph above compares the predicted vs actual box office scores of tweet-rates (blue line) and Hollywood Stock Exchange (green line).

Benardo Huberman, the chief investigator on this work, says that they predicted the outcomes of new movies released in November and December 2009 and January 2010, including Avatar, Invictus, The Blind Side and Twilight.

Of course, predicting movie revenues is only a tidy test case. If you can use Twitter to predict the future of movie tickets, then why not elections, or sales of other products? As the authors write:

This method can be extended to a large panoply of topics, ranging from the future rating of products to agenda setting and election outcomes. At a deeper level, this work shows how social media expresses a collective wisdom which, when properly tapped, can yield an extremely powerful and accurate indicator of future outcomes.

The PDF version of the paper Predicting the Future With Social Media by Sitaram Asur and Bernardo Huberman is short and clear.




Comments
  • Alan

    Three thoughts come to mind:
    1. This effect is probably alot more pronounced when the target audience is younger and more active on-line.
    2. No matter the buzz, some movies are so flawed they tank anyway.
    3. By extension, couldn’t the studios or directors float their ideas and gauge the potential before spending a dime? That might save us from Rocky IX.

  • Stephen Downes

    Works fine until people realize it works, then they start gaming it, and it stops working.

  • Colley1962

    @ Stephen Downes
    I completely agree! My sentiments exactly.

  • John

    @Stephen, “gaming” massive amounts of data is hard to do without incentive. How can tens (perhaps hundreds) of thousands of people be compelled to fake tweets about a movie? Tweet bots?

    Question: does this predictive quality of Twitter push an outcome, like a self-fulfilling prophecy, or is it simply a passive indicator? If the latter, I see little incentive for gaming. And, as the summary implies, applications for this metric seem to go far beyond entertainment, especially as higher population percentages align their lives into virtual space.

  • Keith De La Rue

    This makes sense, provided Surowiecki’s rules of a wise crowd apply: Diversity of opinion (yes), Independence (some opinions may be determined by others, but not everyone follows everyone else),
    Decentralization (yes) and Aggregation (available).

    Will be interesting to see how it goes. I would suggest that it is far less likely to be gamed than good old-fashioned surveys – there are lots of other reasons to tweet!

  • Arthur De Vany

    Bernardo has done it again. This shows that the complex dynamics of word of mouth (Tweets) and a growing positive assessment of the film drive its success, as I show in my book Hollywood Economics.

  • Monique van Dusseldorp

    Imagine how effective Twitter could become as a prediction engine by changing their their present question to a new one. Not “What’s happening?”, but
    “What are you up to?” or “what’s next?”
    Not “What are you doing now?” but: “What now?”

  • James Rafferty

    Kevin,

    Fascinating link. As others have noted, there is the concern about gaming the system, but it’s not so easy to do for mass market items. I’d guess there are lots of applications for this kind of analysis, such as in publishing.

    James

  • Bob

    They used 3 million tweets from hundreds of thousands of users. Good luck gaming that system, which has over 100 million users.

  • christian

    The beauty of statistical analysis is that it is resilient to manipulation as it requires a large corpus of data, which by it’s nature is difficult to manipulate in a statistically significant way

  • Michael Smith

    Great read and agree with the notion, as we are starting to implement this methodology in analysis ourselves at Media Logic (www.mlinc.com).

    Goes to show what we can truly gauge from analyzing the conversations around our brand in reference to qualitative data, but more importantly it shows that developing a strong social marketing strategy and placing social at the center (Conversation Centric marketing) of your marketing and / or business can help you positively effect your brands perception, either that be in advocacy or in product development.

    For a current example take a look at Ford’s campaign “The Ford Story” (http://ow.ly/1AGfO) and for a movie related example look back to 1999 (Yes, I did say 1999 and social marketing in the same paragraph) at The Blair Witch Project, which was a story outlined by it’s creator on the web prior to getting investing and writing the screenplay $23,000 to produce $92million in revenue definitely speaks for itself……..

  • Indy

    Call me a nitpicker, but the mechanism here seems to be – collect from Twitter a large number of the opinions of moviegoers on whether they are excited about an upcoming movie. People usually tweet about things that have their attention…

    Or, to say it another way, the fact that people like to talk on Twitter, in an honest manner about “coming attractions” that have caught their notice. Effectively this allows you to conduct an opinion poll about the movie with a massive sample size. This then turns out to have really good predictive power.

    This doesn’t seem particularly shocking and likewise seems to point to serious limitations in the predictive power. i.e. It only works for items people talk about spontaneously and unguardedly and that the Twitter population is a good sample for the consumers of.

    It’s great news for someone in the movie business, distributors I guess, because they can choose not to screen films that are not going to be popular. I’ve no idea if the Twitter population reflects the voting population well enough to predict an election…

  • Jonathan Byron

    Movie attendance and election outcomes are socially determined … I can see how twitter can be good for predicting those (assuming that demographic biases among twitterites does not lead to a highly skewed sample). For complex phenomena, this method might be limited.

  • ptp

    I’m not sure that the system would be as hard to game as some people say; it would take some resources, but you can find people who are capable of creating buzz on Twitter and get them to talk about the movie, which would lead to additional buzz, etc.

    But the question, then, is have you actually gamed the system, or are you doing what you should probably be doing anyway, which is getting thought leaders to talk about your product?

  • bulldogmi

    If this became an accepted predictive method couldn’t it be easily manipulative therefore countering it’s predictability.

  • Fred

    Gaming the system is so easy today with the huge amount of spam and bots flooding twitter and already gaming the twitter trends.

  • Ellen O’Neal

    Interesting post. At first when I began to comment, I was going to argue that this predictive analysis is most accurate when the movie’s demographic matches Twitter’s demographic. However, I think Twitter’s demographic are movie goers. I was trying to think of a movie that caters towards the older generations – the ones less likely to Tweet, the ones most likely retired and out of the corporate world, therefore, not on their computers 8+ hours a day! Despite the fact that I couldn’t think of an example movie, I think the same thought applies. Less people will tweet about it and less revenue will be brought in. Just because Twitter’s demographics are skewed towards the younger generation, I think they align with movie goer demograhics anyway, so it doesn’t matter what the movie is about, the predictive analysis will still be accurate.

    Coming from the technical side, my company would be most interested in the predictive modeling and algorithms the analysts used. That is our business, so we’re excited to see others are catching on!

    Thanks for the post!
    Ellen O’Neal
    http://www.livelogic.net

  • Marc

    These lines in the graph seem to depict very accurate predictions.
    This reminded me about Eric Schidt’s comments on being able to predict someones next move – given enough data…
    It would be interesting to find out how small the sample-size could become without loosing significance.

  • Ben Atkin

    Anyone got a link to the data? I want to make a better graph – a square graph with the same units on both axes.

    Please email me at ben@benatkin.com in case I forget to check back.

  • Sebastian Franck

    Initially I thought this was an indication against the superiority of prediction markets touted so prominently by Robin Hanson of George Mason University (http://www.overcomingbias.com). But of coures Hanson has an answer – he also mentions the gaming issue – but basically the reply is that HSX would become an even better predictor the second the market had acces to this Twitter information: http://www.overcomingbias.com/2010/03/masking-movie-manipulation.html

  • hmmmmm, yeah but

    Interesting yes, but not shocking…

    Marketeers worth their pay know it is their job to create a positive vibe around the product they are selling. Why? Because that is how you sell a product, by making people wanting to buy it.

    To create that positive vibe, you use any channel available (billboards, commercials, twitter, websites, grapevine, …) and seed it with positive messages. That’s how it is, and that’s how it has always been. That’s a marketing campaign.

    In the olden days (a long, long, long time ago) the only evidence of the success of your campaign were the actual sales figures…

    These days marketeers can gauge the effect of their campaign (does it create the positive vibe we want) faster. This enables them to react and adapt the campaign they are running when they see it is not delivering what it was supposed to deliver (the positive vibe), and yes, in some cases it might even be better to simply not launch the product. That’s it, nothing more to it.

    I wouldn’t worry too much about ‘gaming the system’. Those that know their job also know they need to factor out their own seeding messages. What they want to measure is how the rest of the world reacts to their messages.
    And those that do not know their job, ah well… they do not last long time…

  • hmmmm, yeah but

    One more thought…

    As far as I can see, the only thing that is discovered here is that there is a positive correlation between positive buzz and actual sales figures.

    As I stated in my previous post, competent marketeers already know that.

    They probably also know that correlation does not imply causality… two indicators can move in the same direction, not because one pushes the other, but because they are both pushed by the same third factor…

    So given this, I would not say “Twitter predicts the future”, but “Twitter enables you to react before disaster strikes”.

    Granted though, the “predict future” has a nicer ring to it ;^)

  • Aaron Davies

    have the researchers put their money where their mouths are and used this to place intrade bets?

  • http://twitter.com/DeclanStanley DeclanStanley

    Is this what Isaac Asimov “predicted” in his Foundation series with the invention of psychohistory by Hari Seldon – a mathematical way to predict the future once you have a large amount of data from a large number of people. The larger the number, the more predictable is the future.