Monday, June 3, 2013

HAL It Ain't

When Microsoft announced the Xbox One, they mentioned that the Kinect (the microphone and camera part of the console) will be able to power up the Xbox when a user says "Xbox". To do so the microphone obviously has to be on all the time. Cue privacy concerns. I'm not going to talk about the validity or magnitude of the concerns, because I think it's a personal decision for the consumer, and that most of the arguments being bandied about are facile. What I wanted to comment on is the shortcomings technology like the Kinect highlight in the English language.

If a person was to hire, in a moment of utter decadence, a servant whose only job was to stand in the corner of the living room and turn on a device when they heard "Xbox" then it's fair to say that the servant is listening for a word. In order for the servant to perform the assigned task they have to process all the sound in the room, understand which parts are words, and explicitly filter out every word that isn't "Xbox", because that's how people work. By side effect, the servant would hear (and by connotation comprehend) everything said by every person in the room. Privacy would be an obvious and genuine concern.

Let's assume, tinfoil hats off, that the Kinect will be doing only what Microsoft says: listening until it hears the word it knows, "Xbox", and then turning itself on. This is where the language falls down. We don't have verbs that convey what's happening in this situation. The Kinect doesn't "listen", "hear", have a definition of "word", or in any meaningful sense "know". These are all words that imply agency, intent, and some degree of intelligence.

A computer, as technology stands today, has none of these properties. Although the level of processing going on to interpret the commands is a lot more complex, there is no difference between the Kinect interpreting (another verb loaded with agency) a sound-based signal from a person and a burst of infrared in a pattern it recognises (yet another) from a remote control. Using these active verbs isn't sloppy use of the language, it's trying to use the language to convey behaviours that it simply does not have the ability to convey in a concise manner.

Inanimate objects able to respond in complex ways to a wide variety of sensory input are a modern invention, so all we have available in English to describe such devices are the words originally ascribed to intelligent entities. It is similar to the way scientists of all types end describing complex systems as "trying", "wanting" or "choosing" to do things in a way that implies sentient forces which are not part of the model. The words describe the process correctly but for people listening it's hard not to include some of the linguistic baggage.

To a degree, every internet enabled device a person uses presents some sort of privacy concern, and it's probably not outrageous to worry that the Kinect might do more than it says on the box. Even in the worst case, though, it won't be "listening" to anything.

