Text-to-English-as-a-Second-Language

I don't speak English - But I promise not to laugh at your Spanish.

I’ve been experimenting recently with the hosted Asterisk at Tropo.com, and I have to say, it’s the best API I’ve ever played with, especially after spending months wrangling an Asterisk server. They’ve abstracted away all the eccentricities of Asterisk and created wrappers for Ruby, JavaScript, PHP, and a couple of other languages.

And speaking of other languages, they’ve also included and easy-wrapped a bunch of cool text-to-speech and voice recognition modules for a number of languages. When I saw “Jorge” the Castillian, I had an idea: can a computer voice have an accent? I read a piece in the Times or on some feed that I can’t track down recently that argued that English-language learners have an easier time learning from teachers who share their accent. It makes sense.

I remember an American friend of mine’s mother in Madrid who could not understand why Spaniards kept on thinking she was saying seis (six) when she was saying tres (three). The reason, I explained, was that she was pronouncing tres (which is pronounced like “press” in English) as “trays” which is exactly how seis sounds.

I tell this story as a way of explaining how I arrived at my ESL answering machine. You can interact with it by calling:

+1 617-466-6212


Getting this to work required some reverse phonetic hacking. Here are a couple of examples, see if you can guess the language:

“Jelo. Mai nem is Inigo Montoya. Llu kild mai fáder, pripeer tu dai.”

“Chateau Haut-Brion 1959, magnifisainte waillene, Aille love Frinch waillene, layke aille love ze Frinch leinguaje. aille ave simpelte everi leinguaje. Frinch ise maille favorite. Fantastic leinguaje, especially tu coeurse wits. Nom de Dieu de putain bordel de merde de saloperies de connards d’enculis de ta meire. Yu si? itte ise layke waille pine yoeur asse wits silk. Aille love itte!”

I’ll be posting a bunch more little phone experiments soon, so check back, you hear!

Protected: Google Gas

This content is password protected. To view it please enter your password below:

Eliza’s Astriconversations

P1020038

Astricon in DC a couple of weeks ago was my first trade show as an exhibitor, and I had a fabulous time. John Todd, Digium’s Asterisk Open Source Community Director invited me to attend and show off Eliza, my video chatterbot. The conference took place at the gargantuan Gaylord National Resort and Convention Center in the altogether bizarre and otherworldly National Harbor development on the banks of the Potomac.

My table was in the little open-source corner of the hall, tucked between some very fancy commercial exhibitors and the constantly rotating cornucopia of caffeinated beverages and high-calorie snacks. Eliza was set up between Astlinux, a custom Linux distribution centered around Asterisk, and the rowdy Atlanta Asterisk Users Group. I was also within spitting distance of the OpenBTS project (roll your own GSM cell tower), of which I’m a big fan, and Areski Belaid, a developer with a finger in numerous telephony pies, including Star2Billing, which essentially allows anyone to become a long-distance phone company. Really interesting stuff.

P1020040

The most surprising thing about the whole experience, other than the incredible amounts of cookies and sweets, was the communityness of the Asterisk community. Everyone seemed to know everyone, most people over a certain age were way into ham radio, there was nary a GUI in sight, and everyone seemed genuinely interested in everyone else’s projects, including mine.

I spoke for nearly an hour to Tim Panton from PhoneFromHere, a company that integrates voice and chat services into existing websites so businesses can interact directly with their customers over the web. He suggested I cut Flash out of Eliza by using HTTP Live Streaming, which also made me realize that I might also be able to ditch the socket server and use HTML5 web sockets!

Mark Spencer, the boffin responsible for Asterisk, stopped by and seemed genuinely pleased to see that a couple of years on, ITPers are still playing with his baby, making it contort in unexpected ways.

The folks at LumenVox (speech recognition) and GM Voices (speech synthesis and lightning-turnaround voice recording) generously offered to help robustify Eliza for her next iteration.

Also enthusiastic were Jason Goecke and Ben Klang, who are the principal movers behind the Ruby Adhearsion framework which reskins Asterisk in a slick modern web way and also involved with Tropo, by far the best cloud-hosted Asterisk service I’ve seen—write scripts in a variety of languages, host them yourself or on their servers and debug them through a web interface, take advantage of the built-in speech recognition system, seamlessly integrate with AGI, and best of all it’s all free for development, pay only when you’re looking to cash in! They turned me onto this interactive phone/video piece, which got me thinking.

ELIZA 2.0

For her next iteration, Eliza’s going to be on the web, hopefully in gloriously standards-compliant HTML5. Instead of canned conversations, she’ll rely on silence detection and Markov chains to generate much more dynamic conversations. The GM Voices people told me that they often record vocabularies—phrases in a variety of intonations so that you can do text to speech with real voices rather than those slightly Scandinavian sounding canned computer voices. I’ll be posting my progress soon.

Jumblr: Linkawhat?

jumblr screenshotConnections are most interesting when they are broken, when they’re forbidden, when they are unintended. Secret liaisons make good stories, short-circuits end in fires, ruptured pipelines induce panic. This is the great appeal of the mashup—possibly even the power of cinema (wasn’t it Eisenstein who wrote about cutting and the mental jumps the mind makes?)—the juxtaposition of disparate elements that our minds nonetheless connect. That is pretty much how my favorite parts of my brain work, making strange and unexpected connections with unpredictable outcomes. It’s also the way humor works, uncovering the unexpected connection to spark a laugh.

Part php and part spit, Jumblr is a website that makes unexpected connections by jumbling links, either within a site or between two sites. It works by scraping a site, parsing its links and storing them in an array using XPath, shuffling the array, and then returning them via preg_replace(). Right now it breaks when sites use relative links because they end up pointing to my domain. I’ve been to lazy to revisit constructing conditional regular expressions, but once I do, I’ll be able to fix that problem. I’m also going to develop a Javascript link interceptor so that the randomization persists with each link click.

Everything I know about interaction design I learned by making a scratch-n-sniff television

My favorite thing about my Scratch-n-Sniff TV is the conversations it spawns. I showed it recently at Maker Faire NY, and as at previous showings at ITP and at Greylock Arts, reactions were divided. About 70% of people were totally incredulous until they tried it, and then were delighted and had to find out how it worked. Of the remaining 30%, half looked at it suspiciously and rebuffed invitations to try it and the other half tried to predict how it worked before using it and then complained that the smells weren’t “accurate.” All of these reactions reveal an underlying attitude towards technology and its possibilities: the first, marvel—the what will they think of next effect; the second, suspicion—this has got to be a trick; the third, which shares elements of the second, a need to establish that we control technology—not the other way around.

heroShot

Smell is subjective, it’s ephemeral, and it’s not binary. What smells like citrus to one person smells like air freshener to another; smells can’t be turned on and off, they waft, so getting people to believe that their actions resulted in equal and opposite smell reactions required some clever sleight of nose. First of all, I gave people clear visual cues. When you scratch a picture of chocolate, you’re much more likely to interpret the resulting smell as chocolate. I also made the screen respond to being scratched by fading, just as scratch-n-sniff stickers do after vigorous scratching. This tie-in to a direct physical analogue was key, as people were much more likely to smell the screen where they’d scratched it and the one-to-one correspondence between action and reaction primed people to smell. A couple of times I ran out of scents, and several people still swore they’d smelled scents that simply weren’t there!

HOW IT WORKS

puff

  1. I found that the transistor-based model of the Glade Flameless Candle automatic air freshener would fire once approximately every two seconds if powered for 500 milliseconds (as opposed to the earlier version that relies on a resonant circuit that requires ten seconds before firing), so I hooked up its battery terminals to an Arduino, and voila! Controllable atomization of non-oil based scents!

arduinoinplace

  1. Trying to create an effective scent disperser from scratch is madness. One of the benefits of piggybacking on all of Glade’s hard work is that it’s easy to fill the provided smell canisters with other scents. I got most of mine from the nice folks at Demeter.

scents

  1. I aligned the scent dispensers under a touchscreen sending touch coordinates to the Arduino via Processing sketch. Thanks to the hydrostatic properties of the fine particle mist, when emitted, it flows up the screen and across it, sticking to it until the scent evaporates a few seconds later.

screenanddispensers

If nothing is new under the sun, then why bother with today’s news?

palinpatra

Some thoughts on the future of journalistic “content”—for a related project, check out my paywalls.

What is the future (if any) of professional, unfree, editor-refereed journalism? Setting aside for a second the usual economic arguments (why pay for something you can get for free, why wait for something you can get immediately, why all the news that’s fit to print when all the news fits and print is an afterthought), how might journalism turn all the granular data and digital resources now at its disposal into something worth paying for?

What brought all this about was a recent trip to Asia where I had at the disposal of my itchy remote finger 24-hour news from all over the world. I know it’s hardly news, but seeing sensationalist CNN next to the more staid but still alarmingly populist BBC World next to the multiple bland iterations of China’s CCTV really brought home the extent to which “news” stories are a manufactured product that respond in near real time to audience demand (for entertainment in the West, in the East, for “reassurance”). The result in the West is hours of programming devoted to the information equivalent of dandruff: book burners and rednecks and science deniers and slutty heiresses and other subjects you’d think should be relegated to dark corners of the web but instead are broadcast globally and legitimized to an extent that would have appalled even the editors of the National Enqurirer just twenty years ago. In addition to being meaningless and legitimizing questionable subjects, China’s crop yield statistics and endless political meetings and amazing traffic accidents are also mind-numbingly dull.

When left to free market forces, the news tends towards tabloidization. And this is not just a television news phenomenon. Note the Huffington Post‘s descent into celebrity gossip and Murdochian headline hyperbole. But here’s the really interesting thing about online media—while TV and print rely on largely fuzzy audience data and asynchronous adjustments and tweaks, online media can in real time and with absolute resolution determine what stories are getting clicked, linked, emailed, and tweeted and by whom, and rearrange themselves accordingly, minute to minute. News becomes a democracy where every mouse gets a vote. It’s hard to imagine that editors aren’t letting these statistics at least in part influence the types of stories they’re running. Even the august editors of the New York Times must be paying attention to the most emailed articles—that’s ad revenue.

That’s a grim prospect. Many serious newspapers agree and have either erected or are planning paywalls. But I’m still not convinced that asking people to pay for online news will end up netting any real gains for them. The subscription money they collect might equal the advertising money they scare away, but becoming unsearchable and unindexable will drive away good writers. The other possibility, the one the New York Times is considering, is allowing paywalled sites to appear in search results. That seems like a recipe for disaster, as it will either encourage people to rely on Google News or some new aggregating web service. So what can the future possibly hold? Maybe that question contains the answer.

TOWARDS A PRIORI JOURNALISM

recorded future screenshotRecorded Future is a startup that searches for trends and patterns in current events that may be imperceptible to readers but are obvious to computers. They’ve figured out a way to algorithmically parse the nature of news stories and rank their import, essentially developing a language that abstracts specific events into generic types. Their subscribers and investors, who include Google and many government agencies and Wall Street organizations of differing levels of nefariousness, can then look for patterns and trends that may indicate an oncoming event. It’s scenario planning with a statistical backbone. The military has been obsessed with the idea of reducing battles to a series of determining factors and then computationally predicting their outcomes for years.

With computers reaching unprecedented speeds and processor power,muckrack.com screenshot it’s now conceivable to model incredibly complex systems. Add to that the ability to teach computers to interpret texts and classify their contents, and you approach a not too distant future in which people know the probable news weeks in advance! This isn’t a crazy idea, it’s already happening on a very small scale. Take for instance Muckrack.com, a site that aggregates journalists’ tweets, in a sense getting the news as it is made but before it goes to press. If you want to know what a particular columnist is going to be writing about, then pay attention to the sorts of leads he’s soliciting.

Imagine for a second what the world would be like if the paid-for, professional news were not a running account of what had happened, but a forecast of what was probably going to happen.

In such a future, a news organization’s most valuable asset is its archives. Years ago, the New York Times tried and failed to charge for access to its archives, probably because that was too literal an approach, akin to charging for a collection of reporters’ notes rather than a finished newspaper. It’s the interpretation that adds value. You could have a news source whose focus was on historically significant news, determined a priori. Based on comparisons to older news, computer-aided journalists would be able to identify the beginnings of revolutions years before they occurred, distinguish hit movies from duds as soon as they were greenlighted, and weigh in on the importance of leaders on the eve of their election. Today’s stories could be chosen on the basis of their future historical importance given their similarities to past stories. There are already a bunch of services that find connections in news stories. Though they’re still in an incipient phase, combined with a computable semiotics of events, it’s easy to imagine how they might lend themselves to untangling the web of historical cause and effect to put it at the service of the future.

You could similarly imagine a news source that focused on unprecedented news, on stories that fit no known patterns, really pushing their newness. The converse, a source devoted to describing just how old each piece of news really is by digging up an exact analogue from the historical record, might keep cynics reading even after the paper they swore they’d never abandon is replaced by screens or projection or some other digital means of delivery.

In either case, the news again becomes worth money, as the information it provides is “actionable,” and the role of the professional, paid journalist is preserved, though transformed. Pattern matching is the province of computers, but I suspect the human mind will always retain its primacy in the fields of analogy and metaphor. Finding the future in the past is a poetic task and having a class of highly visible, professional introspectors of a poetic bent might not be a bad idea—regardless of the possible future significance of any of the other ideas I’ve expressed.

This Game Sucks: Learning English with Mosquitos

Don't Bug Me

Click on the image to play in a new window.

Last summer I worked in Tokyo at a division of TBS, where I was asked to develop a prototype for an English listening comprehension game for Japanese kids. I spent a month conceiving the game, laying it out, developing the code, and art directing the incomparable Nina Widartawan-Spenceley, who created the characters and animated the grotesque death sequences.

Once again, Flash Player in Firefox is a bit screwy. Mozilla, what’s going on?

Censor Me, and You Too!

You need Flash Player 10 to run this bad boy because I’m using its sound generating capabilities. Oh, and if you don’t have a webcam and a mic, you’re out of luck, sorry. Also, for some reason this isn’t working with the latest version of Firefox.



I finally ported the Sensorship sketch I wrote in Processing to Flash so that you too can enjoy the marvels of real-time censorship. It’s not quite as slick as its Processing predecessor, but it works on the web, and there’s no arguing with that.

There are a number of ports of OpenCV to ActionScript based on Ohtsuka Masakazu’s original port (“Marilena“) and also a native library called Deface. I ended up using one of the former, Mario Klingemann’s slightly modded Marilena, not because I have a particular preference but because I’m lazy and he very generously included an example that used the webcam.

After making the necessary adjustments to the face detecting code to draw the black bar and flip it and the image to make them mirror reflections (it always weirds me out to wave with my left hand only to see myself waving back with my right), I used the new SampleDataEvent that Adobe has helpfully documented along with insights gleaned from Jeff Swartz’s helpful tutorial on generating sounds dynamically to generate what can only be described as a horrifically annoying beep any time the microphone’s input passes a certain threshold.

« Previous Entries Next Entries »