The Technium

Screen Fluency

[Translations: Japanese]

We were once People of the Book but now we are becoming People of Screen. But to complete this transformation in full we need a set of tools which will allow us to manipulate, create, and process moving images with the same ease we have for words.

That’s the thesis for a 4,000 word piece I wrote in this Sunday’s New York Times Magazine. I called the piece “Screen Fluency”; the Times entitled it “Becoming Screen Literate.”  A few  excerpts:


The overthrow of the book would have happened long ago but for the great user asymmetry inherent in all media. It is easier to read a book than to write one; easier to listen to a song than to compose one; easier to attend a play than to produce one. But movies in particular suffer from this user asymmetry. The intensely collaborative work needed to coddle chemically treated film and paste together its strips into movies meant that it was vastly easier to watch a movie than to make one. A Hollywood blockbuster can take a million person-hours to produce and only two hours to consume. But now, cheap and universal tools of creation (megapixel phone cameras, Photoshop, iMovie) are quickly reducing the effort needed to create moving images.

In fact, the habits of the mashup are borrowed from textual literacy. You cut and paste words on a page. You quote verbatim from an expert. You paraphrase a lovely expression. You add a layer of detail found elsewhere. You borrow the structure from one work to use as your own. You move frames around as if they were phrases.

If text literacy meant being able to parse and manipulate texts, then the new screen fluency means being able to parse and manipulate moving images with the same ease. But so far, these “reader” tools of visuality have not made their way to the masses. For example, if I wanted to visually compare the recent spate of bank failures with similar events by referring you to the bank run in the classic movie “It’s a Wonderful Life,” there is no easy way to point to that scene with precision. (Which of several sequences did I mean, and which part of them?) I can do what I just did and mention the movie title. But even online I cannot link from this sentence to those “passages” in an online movie. We don’t have the equivalent of a hyperlink for film yet. With true screen fluency, I’d be able to cite specific frames of a film, or specific items in a frame. Perhaps I am a historian interested in oriental dress, and I want to refer to a fez worn by someone in the movie “Casablanca.” I should be able to refer to the fez itself (and not the head it is on) by linking to its image as it “moves” across many frames, just as I can easily link to a printed reference of the fez in text. Or even better, I’d like to annotate the fez in the film with other film clips of fezzes as references.

With our fingers we will drag objects out of films and cast them in our own movies. A click of our phone camera will capture a landscape, then display its history, which we can use to annotate the image. Text, sound, motion will continue to merge into a single intermedia as they flow through the always-on network. With the assistance of screen fluency tools we might even be able to summon up realistic fantasies spontaneously. Standing before a screen, we could create the visual image of a turquoise rose, glistening with dew, poised in a trim ruby vase, as fast as we could write these words. If we were truly screen literate, maybe even faster. And that is just the opening scene.

  • Danny Bloom

    as you know, i coined the neologism “screening” to stand for reading on screen, to differentiate this from reading on paper surfaces, and you also told me recently that you’d be happy to see screening used as a verb this way. Well, i wrote an opedn on my ideas about screening and it has been rejected by the Boston Globe, the New York Times and the Wash Post, not to mention Technology Review and TechCrunch. Seems nobody wants to hear about new ideas these days. Maybe if YOU wrote about screening, as a new verb, for what we do online these days, from reading text online to watching videos online to looking at photos online, they might sit up and listen? Vindu Goel at the New York Times tech page told me “we will never write about your screening ideas here, Dan.” Erick Schonfield at TechCrunch told me to get lost. Ashlee Vance at NYTimes showed some initial interest while visiting Taiwan and Japan and then he stopped emailing me. Jason Pontin told me he would never assign any of his writers to write about my ideas of screening. Period. See? YOU were the only one who listened to me, and said GOOD IDEA. You and Alex Beam. See his column of June 19 2009 in Boston Globe titled “I screen you screen we all screen”. You won’t see it anywhere else, but nobody wants to hear it. Only you and Alex so far……sigh.

  • Tom Buckner

    This reminds me of a scene from science fiction; Greg Bear’s awesomely good book “Eon” or perhaps one of its sequels. In it, future humans will often “pict” while talking, projecting an image near their heads which in some way comments on the speaker’s intent. The character in the scene I recall is saying to his colleague “We have a dilemma,” or something close to this; the image he picts is somewhat humorous, showing a Geshel (i.e. an extropian into body modification; I never figured out where Bear got “Geshel” from). The Geshel is trying to choose between two different body designs, as if they were neckties or hats.

    Doing this in real time, as KK and Greg Bear describe, would be like improvising music, wouldn’t it? To be any good at it, you need to know the vocabulary forward and back. You need to know what notes to play and how to play them. More than that, you need to know it so thoroughly that you can do it while you’re making coffee.

    I don’t seriously doubt this is possible, but I’d not be surprised if it came at the cost of some other skill falling into desuetude. In the regular Esquire column “What I’ve Learned,” Kris Kristofferson said that every time you gain something, you pay for it with something of exactly equal value. I thought it a preposterous notion, but increasingly I get a creeping feeling that he was right.

  • Vasu Srinivasan

    Hi Kevin,

    I am a fan of yours. I wrote about the need for a screen language in my blog a year back.

    -Vasu Srinivasan

  • Robbo

    You’re kicking ass with these more recent Technium posts, Mr. Kelly!

    The ability to “quote” video is tied up not just with technical fluency but also with access to the content that informs and feeds this emerging visual language of ours. Copyright, public domain, corporate control of culture, all influence our ability to really soar with these expressions even once we begin to grasp the necessary tools at hand.

    It’s hard not to feel that this is like the dawn of the age of print, when the ability to cast ideas exploded with the birth of the broadsides and chap sheets of yore. I’m also amused at the idea of a language that is “devolving” into a pre-literate form when store signs where pictures in order to reach the larger portion of the population that could not yet read text. Like silent films from the start of the past century the use of visuals can transcend barriers of the spoken and written word – and is a form of communication that is decidedly more emotive and influential. Those who lament the loss of the linguistic arts would do wel to know there is great poetry to be found in the moving picture — and that brings me to remembering Vico’s pronouncement of the 3 ages of language:

    1) The Poetic
    2) The Mnemonic
    3) The Vulgar

    – all moving in a constant cycle of rebirth and regeneration. Where are we now in that cycle? What next phase lies just ahead? Will I be writing letters, poems and stories with the visual debris of 20th century movie/TV culture? And – can I do it all on my mobile phone?

    Only sightly apropos of the “easy to grasp” technology that wil allow us to readily pluck the images we choose to speak with, if you haven’t already seen a (one of many) new “Minority Report” style of user interface called “g-speak” you might want to check it out either via Gizmodo:

    or my own paltry entry on the same thing:


  • Af2008

    As in the brain, the ways to describe simulations can work like and organism. The medium generally is essential for the simulation just like a mirror describes the world with ligth, not with smell.

  • Chris Castiglione

    One of the problems I see with the comparison “the ease of making video now approaches the ease of writing” is the difference between the set of English vocabulary available (somewhat finite) and and all the available video/images (which is constantly growing). We are able to write quickly because we only have to rely on a finite set of words stored in our heads. Even if a semantic web was perfected there would still be great ambiguity between which images represented the idea you wanted to express. This is often the problem with folksonomy and tagging (is a picture labeled “snowy” a picture of snow / or the name of someone’s dog). (Great article BTW!)

    • @ Chris. Yes, I think you nail the challenge very well. The ambiguity of images is also its poetry, but it is less precise. I imagine some marriage of words + image.

  • Olof Dahlberg

    Very good piece! This technology would have to be made a lot easier to use, though. Gloves and a million dollar computer shouldn’t be required!
    I found myself immediately imagining a really useful multimedia equivalent of the T9 word list, severely evolved, for this kind of “messaging.” It would be awesome; delivering, for any concept I start to define or describe, vocally or in text, a choice of examples and reference items from text, audio, video and image libraries globally and locally available, to allow me to illustrate and detail what _I_ am trying to “say.” With this tool a truly multimedial document could be produced as quickly as texting.

  • Stephanie Gerson

    much fun to read! thank you for not posing this as a migration from the book to the screen, and for considering how image and word will interact. the “marriage of words + images” you mention in your response to Chris interests me. clearly this is related to your thinking that the next generation of the web will enable linking to data, e.g. not to a film but to a scene in it.

    how much smaller can the legos get? will bits quite literally point to atoms, or somethings smaller?

  • Arthur Smith

    There is a system that seems designed for the sort of hyperlinking of “screen” objects you are talking about – “Digital Object Identifiers”, – so far though I believe it has been used primarily to provide permanent identifiers for items published in scientific journals, a quite different context. But then, the web started in a science lab (CERN) …

  • Tristan Harris

    Hi Kevin,

    Great post. I’ve been dreaming about this for the longest time too. I think we’re still a ways away from being able to do this automatically, though there are some good first starts. I just came back from Le Web Paris and has some amazing automatic face-recognition technology that can work in realtime with video. This is a first step.

    To get to the level of precision you mentioned though, you need a new kind of tool. You need to be able to link to objects. A specific page. A specific graph on a specific page. A specific moment in a Podcast. A specific section of a Wikipedia article. A specific moment in a video. You’ve talked a lot about this in your TED talks, but (and some shameless promotion here) Apture is the only great, free tool I know that let’s you do all of the above.. and present multiple exact objects at the same time:

    Check it out a video demo of the video linking feature: