Archive for the 'Post-ITP' Category

« Previous Entries

Some thoughts on where computing is headed

This is largely a response to this video informed by this other video:

I think that despite all his calls for “out of the box” thinking, Scott Jenson’s thinking is as bounded as that he decries. I agree, apps suck, and yes, I love the idea of browser as operating system, but I also think the idea of phones themselves as interfaces sucks. They are the apps of the physical world. We won’t need a Google for ranking the sensor-enabled objects around us because they exist in three-dimensional space just as we do. The whole point of physical computing is to eliminate screens as go-betweens.

To use his (kind of lame) example, if I want to interact with my stereo, I shouldn’t have to go to my phone. He just got done telling me how much it sucks that there needs to be an app for that and then he tells me I can tap through a list of objects around me on my phone to interact with them. How about I look at my stereo? Or I talk to it? Or I point at it? Or I think about it?

What we’re seeing is the dying of a computing metaphor. We have always had to go to computers and speak to them in their language. At first, hundreds of us flocked to massive computers and spoke to them in punchcard, an entirely human-unintelligible language. Then a revolution: one man, one computer. The graphical user interface, handmaiden to this revolution, allowed us to speak to the computer in a way we could comprehend, though it still required us to learn how to manipulate its appendages to accomplish the tasks we wanted performed. Now we’re in a world where each person has multiple, increasingly tactile computers. And as processor speeds grow and prices drop, it seems likely that the computer to people ratio will continue to increase.

The desktop metaphor, with its graphically nested menus and multiple windows, won’t survive. It didn’t translate well onto the pocket-sized screens of smartphones, and Siri is the first of peal of its death knell. Siri eliminates the physical analog of a desktop/button pad altogether and replaces it with a schema-less model where I can use a computer without learning anything about how it works.

Couple that with the increasing physical awareness and falling cost of networked devices equipped with cameras and sensors, and what you end up with is not a small computer we can carry with us to interact with the world around us but a giant computer which we inhabit, and which treats us and what we do as input.

What’s tricky about this is imagining the output. With each jump in computing, the new modes did not replace the old modes. They overlapped a bit, but mostly they expanded the possibilities of computing and the number of computable operations. No one programming on the command line imagined that a computer would one day be great for editing films. The command line is still very much in use today as it is still the best method of doing many things, but the GUI has greatly expanded the computable universe. Likewise, while it’s relatively easy to imagine the region where a physical user interface (PUI?) intersects the GUI (advancing slides in a keynote presentation without a remote, for instance), it’s much harder to imagine those tasks we’ve never even thought of as within the reach of computability.

Computing Paradigms Bubble Chart

And that’s what I’m really interested in, the film editing scenarios. Context and object awareness won’t require phones to rank nearby objects as we’ll be able to interact with them with minimal or no perceptible interfaces. We’ve slowly watched consumerization turn sophisticated operating systems into shiny idiot-proof button pads. There’s no reason to believe the trend won’t continue spreading into the backend, turning programming itself into a consumer behavior. At Google we’re obsessed with machine learning, but it seems to me the future may be its converse—human teaching. If people can tell their computers exactly what they want without having to learn C or Java, then they can start to ignore their computers entirely.

That’s the ultimate goal: invisible computing. After all, how often do you think about how you’re light switch works when you go turn on the lights in a dark room?

ADVANCE! — a first-person recruiter

Before moving out to the West Coast, I worked for several months with the irrepressibly delightful Jessica Hammer on refining the mechanics and creating the visual style for ADVANCE!, a Diner Dash-ish game at the center of her doctoral dissertation. Jessica has an extremely sophisticated understanding of games and game dynamics, so naturally the game she envisioned was much more than a resource distribution clickfest.

Advance! screenshot

ADVANCE! is a sneaky little game. It’s both a craftily created study of systemic biases in corporate settings masquerading as a game and also as a kind of time-release pill to confront players with evidence of their own biases which they might otherwise be able to plausibly deny. If someone in a study setting asks you to sort through a group of resumes based on the candidates’ appropriateness for a particular job, you will probably make every effort to appear equanimous, even if in a real-life situation you would rarely chose a female candidate over a male candidate. But people love games, they love figuring out the rules and winning, so if you create a game which rewards behaviors that in a non-game setting might be considered uncouth, you can in theory short-circuit political correctness and self-censorship—provided you make the gameplay compelling enough.

And that’s where I came in. I created the visual feel for the game—an info-graphicky, isometric layout with a consistent information panel to the right modeled on a simplified CRM platform—and helped to distill the actual gameplay so that each element and each interaction reinforced the theme while also adrenalizing the game’s fun-ness and making it simple to make complicated judgements based on multiple data points quickly. The final result is a game that looks great and boasts some really nifty mechanics.

Players run a job agency responsible for staffing a faceless but multi-ethnic corporation in a boring high-rise office building. As applicants enter the job queue, players can look for appropriate job openings within the company. Each job requires certain minimum qualifications. If the player doesn’t fill a job quickly, the company will hire an NPC internally.

Advance! screenshot

Each job also comes with its own politics, symbolized on the game board by hearts and skulls beneath the colleagues that come with a particular job. The higher the ratio of hearts to skulls, the more likely a candidate will thrive in a position and become eligible for promotion.

Advance! screenshot

Promotions (and demotions) happen automatically when a job opens on a floor above a certain character’s current floor and his/her qualifications have increased sufficiently. Alternately, the player may choose to train characters to manually enhance their skills, though this can prove very expensive.

phpEfJh5j

The abstract notion of a score is replaced with a running tally of the job agency’s bank account balance. Running the business costs money, so this balance creeps downward throughout the game. Successfully placing job candidates results in cash bonuses, and the player receives a percentage of their salary as long as they’re employed. Training candidates improves their job prospects as well as the bounty a player receives for placing them, but as a character becomes more experienced, the cost of training grows exponentially.

As the game proceeds, the costs of running the business escalate, so it becomes increasingly important to place candidates quickly into jobs in which they’re happy. As characters are promoted, more floors are added to the building, so finding job openings requires moving floor to floor, which uses valuable time. The ultimate goal is that players become so focused on staying afloat that they don’t notice the subtle biases that are randomly attributed to the client company at the beginning of each game. In one game, instance, the client company may promote men more than women and show a distinct preference for Asian candidates. In one version of the game, identifying the bias correctly during gameplay results in a giant cash settlement; in the other version, there is no mechanism for addressing the bias.

Much my work developing the interface was iterative simplification—removing unnecessary or irrelevant complications while maintaining the game’s overall information-dense statistical feel, and ensuring players’ ability easily and intuitively make multi-dimensional decisions such as comparing a candidate’s qualifications with job requirements while previewing that candidate’s relationship with his/her potential colleagues. I played the skinless prototype—a grid of geometric shapes—which I found totally compelling; I can’t wait to play the finished product!

Make it Ridiculous (Till it’s Awesome)

I’m finally getting around to writing up my presentation at the “Telephony is Sexy” edition of the SF Telephony Meetup on December 16th. I wish I’d spoken to Clay Shirky before I went out to California, as I have him to thank for the apt title of this post, a phrase he overheard used recently to describe the modus operandi of ITP, which over the last two years I’ve largely co-opted as my own.

My fellow presenters spoke about hacking up a windshield display using a pico projector and a mobile phone, replacing the expensive and proprietary handheld devices used by large enterprises to track inventory and maintenance requests with already ubiquitous mobile handsets—basically, “why isn’t there an app for that?”—, and using web sockets with telephony platforms like Tropo to create persistent connections (for games and the like). There was a lot of code and engineering speak, peppered with good-natured technical objections from the audience.

I spoke about the importance of playing, of doing things that seem totally useless but fun in the interest of stumbling upon new ideas that might not be so useless. I showed my perennial favorite, Generative Social Networking, the ever-popular Botanicalls, the ill-fated Popularity Dialer, and the soon-to-be-huge Megaphone alongside my Eliza project and my more recent forays into accented speech synthesis:

Call Iñigo and his ESL friends by dialing: +1 617-466-6212


If I’d had a bit more time, I would also have shown Sebastian Buys’s amazingly Rube Goldberg-y World of Warcraft phone-in project, which is tragically not documented anywhere. Because Blizzard doesn’t include any hooks for third-party developers in its code, Sebastian captured screenshots of the game, used OCR to turn the image of the chat box at the bottom of the screen into machine-readable text, and then fed it into an Asterisk script so that a remote user can call in and have his guild’s chatter read to him by a robot over the phone even when he’s not close to his computer. Amazing.

I might also have shown my knock-knock jokes:

Knock knock answering service: +1 617-682-9322


I was a little disappointed I didn’t get any objections from the audience, but I did get a couple of very nice emails!

Text-to-English-as-a-Second-Language

I don't speak English - But I promise not to laugh at your Spanish.

I’ve been experimenting recently with the hosted Asterisk at Tropo.com, and I have to say, it’s the best API I’ve ever played with, especially after spending months wrangling an Asterisk server. They’ve abstracted away all the eccentricities of Asterisk and created wrappers for Ruby, JavaScript, PHP, and a couple of other languages.

And speaking of other languages, they’ve also included and easy-wrapped a bunch of cool text-to-speech and voice recognition modules for a number of languages. When I saw “Jorge” the Castillian, I had an idea: can a computer voice have an accent? I read a piece in the Times or on some feed that I can’t track down recently that argued that English-language learners have an easier time learning from teachers who share their accent. It makes sense.

I remember an American friend of mine’s mother in Madrid who could not understand why Spaniards kept on thinking she was saying seis (six) when she was saying tres (three). The reason, I explained, was that she was pronouncing tres (which is pronounced like “press” in English) as “trays” which is exactly how seis sounds.

I tell this story as a way of explaining how I arrived at my ESL answering machine. You can interact with it by calling:

+1 617-466-6212


Getting this to work required some reverse phonetic hacking. Here are a couple of examples, see if you can guess the language:

“Jelo. Mai nem is Inigo Montoya. Llu kild mai fáder, pripeer tu dai.”

“Chateau Haut-Brion 1959, magnifisainte waillene, Aille love Frinch waillene, layke aille love ze Frinch leinguaje. aille ave simpelte everi leinguaje. Frinch ise maille favorite. Fantastic leinguaje, especially tu coeurse wits. Nom de Dieu de putain bordel de merde de saloperies de connards d’enculis de ta meire. Yu si? itte ise layke waille pine yoeur asse wits silk. Aille love itte!”

I’ll be posting a bunch more little phone experiments soon, so check back, you hear!

Eliza’s Astriconversations

P1020038

Astricon in DC a couple of weeks ago was my first trade show as an exhibitor, and I had a fabulous time. John Todd, Digium’s Asterisk Open Source Community Director invited me to attend and show off Eliza, my video chatterbot. The conference took place at the gargantuan Gaylord National Resort and Convention Center in the altogether bizarre and otherworldly National Harbor development on the banks of the Potomac.

My table was in the little open-source corner of the hall, tucked between some very fancy commercial exhibitors and the constantly rotating cornucopia of caffeinated beverages and high-calorie snacks. Eliza was set up between Astlinux, a custom Linux distribution centered around Asterisk, and the rowdy Atlanta Asterisk Users Group. I was also within spitting distance of the OpenBTS project (roll your own GSM cell tower), of which I’m a big fan, and Areski Belaid, a developer with a finger in numerous telephony pies, including Star2Billing, which essentially allows anyone to become a long-distance phone company. Really interesting stuff.

P1020040

The most surprising thing about the whole experience, other than the incredible amounts of cookies and sweets, was the communityness of the Asterisk community. Everyone seemed to know everyone, most people over a certain age were way into ham radio, there was nary a GUI in sight, and everyone seemed genuinely interested in everyone else’s projects, including mine.

I spoke for nearly an hour to Tim Panton from PhoneFromHere, a company that integrates voice and chat services into existing websites so businesses can interact directly with their customers over the web. He suggested I cut Flash out of Eliza by using HTTP Live Streaming, which also made me realize that I might also be able to ditch the socket server and use HTML5 web sockets!

Mark Spencer, the boffin responsible for Asterisk, stopped by and seemed genuinely pleased to see that a couple of years on, ITPers are still playing with his baby, making it contort in unexpected ways.

The folks at LumenVox (speech recognition) and GM Voices (speech synthesis and lightning-turnaround voice recording) generously offered to help robustify Eliza for her next iteration.

Also enthusiastic were Jason Goecke and Ben Klang, who are the principal movers behind the Ruby Adhearsion framework which reskins Asterisk in a slick modern web way and also involved with Tropo, by far the best cloud-hosted Asterisk service I’ve seen—write scripts in a variety of languages, host them yourself or on their servers and debug them through a web interface, take advantage of the built-in speech recognition system, seamlessly integrate with AGI, and best of all it’s all free for development, pay only when you’re looking to cash in! They turned me onto this interactive phone/video piece, which got me thinking.

ELIZA 2.0

For her next iteration, Eliza’s going to be on the web, hopefully in gloriously standards-compliant HTML5. Instead of canned conversations, she’ll rely on silence detection and Markov chains to generate much more dynamic conversations. The GM Voices people told me that they often record vocabularies—phrases in a variety of intonations so that you can do text to speech with real voices rather than those slightly Scandinavian sounding canned computer voices. I’ll be posting my progress soon.

Everything I know about interaction design I learned by making a scratch-n-sniff television

My favorite thing about my Scratch-n-Sniff TV is the conversations it spawns. I showed it recently at Maker Faire NY, and as at previous showings at ITP and at Greylock Arts, reactions were divided. About 70% of people were totally incredulous until they tried it, and then were delighted and had to find out how it worked. Of the remaining 30%, half looked at it suspiciously and rebuffed invitations to try it and the other half tried to predict how it worked before using it and then complained that the smells weren’t “accurate.” All of these reactions reveal an underlying attitude towards technology and its possibilities: the first, marvel—the what will they think of next effect; the second, suspicion—this has got to be a trick; the third, which shares elements of the second, a need to establish that we control technology—not the other way around.

heroShot

Smell is subjective, it’s ephemeral, and it’s not binary. What smells like citrus to one person smells like air freshener to another; smells can’t be turned on and off, they waft, so getting people to believe that their actions resulted in equal and opposite smell reactions required some clever sleight of nose. First of all, I gave people clear visual cues. When you scratch a picture of chocolate, you’re much more likely to interpret the resulting smell as chocolate. I also made the screen respond to being scratched by fading, just as scratch-n-sniff stickers do after vigorous scratching. This tie-in to a direct physical analogue was key, as people were much more likely to smell the screen where they’d scratched it and the one-to-one correspondence between action and reaction primed people to smell. A couple of times I ran out of scents, and several people still swore they’d smelled scents that simply weren’t there!

HOW IT WORKS

puff

  1. I found that the transistor-based model of the Glade Flameless Candle automatic air freshener would fire once approximately every two seconds if powered for 500 milliseconds (as opposed to the earlier version that relies on a resonant circuit that requires ten seconds before firing), so I hooked up its battery terminals to an Arduino, and voila! Controllable atomization of non-oil based scents!

arduinoinplace

  1. Trying to create an effective scent disperser from scratch is madness. One of the benefits of piggybacking on all of Glade’s hard work is that it’s easy to fill the provided smell canisters with other scents. I got most of mine from the nice folks at Demeter.

scents

  1. I aligned the scent dispensers under a touchscreen sending touch coordinates to the Arduino via Processing sketch. Thanks to the hydrostatic properties of the fine particle mist, when emitted, it flows up the screen and across it, sticking to it until the scent evaporates a few seconds later.

screenanddispensers

This Game Sucks: Learning English with Mosquitos

Don't Bug Me

Click on the image to play in a new window.

Last summer I worked in Tokyo at a division of TBS, where I was asked to develop a prototype for an English listening comprehension game for Japanese kids. I spent a month conceiving the game, laying it out, developing the code, and art directing the incomparable Nina Widartawan-Spenceley, who created the characters and animated the grotesque death sequences.

Once again, Flash Player in Firefox is a bit screwy. Mozilla, what’s going on?

Censor Me, and You Too!

You need Flash Player 10 to run this bad boy because I’m using its sound generating capabilities. Oh, and if you don’t have a webcam and a mic, you’re out of luck, sorry. Also, for some reason this isn’t working with the latest version of Firefox.



I finally ported the Sensorship sketch I wrote in Processing to Flash so that you too can enjoy the marvels of real-time censorship. It’s not quite as slick as its Processing predecessor, but it works on the web, and there’s no arguing with that.

There are a number of ports of OpenCV to ActionScript based on Ohtsuka Masakazu’s original port (“Marilena“) and also a native library called Deface. I ended up using one of the former, Mario Klingemann’s slightly modded Marilena, not because I have a particular preference but because I’m lazy and he very generously included an example that used the webcam.

After making the necessary adjustments to the face detecting code to draw the black bar and flip it and the image to make them mirror reflections (it always weirds me out to wave with my left hand only to see myself waving back with my right), I used the new SampleDataEvent that Adobe has helpfully documented along with insights gleaned from Jeff Swartz’s helpful tutorial on generating sounds dynamically to generate what can only be described as a horrifically annoying beep any time the microphone’s input passes a certain threshold.

« Previous Entries