Siri and voice first

In a recent episode of his Vector podcast, Rene Ritchie had “voice first” advocate Brian Roemmele. Rene is probably my current favorite Apple blogger and podcaster and Vector is excellent.

As I listened to this episode I found myself nodding along for much of it. Roemmele is very passionate about voice first computing and certainly seems to know what he’s talking about. In regards to Siri, his primary argument seems to be that Apple made a mistake in holding Siri back after purchasing the company. At the time Siri was an app and had many capabilities that it no longer has. Rather than take Siri further and developing it into a full-fledged platform Apple reigned it in and took a more conservative approach. In the past couple of years it has been adding back in, via Siri Kit, what it calls domains.

Apps adopt SiriKit by building an extension that communicates with Siri, even when your app isn’t running. The extension registers with specific domains and intents that it can handle. For example, a messaging app would likely register to support the Messages domain, and the intent to send a message. Siri handles all of the user interaction, including the voice and natural language recognition, and works with your extension to get information and handle user requests.

So, they scaled it back and are rebuilding it. I’m not a developer but my understanding of why they’ve done this is, in part, to allow for a more varied and natural use of language. But as with all things internet and human, people often don’t want to be bothered with the details. They want what they want and they want it yesterday. In contrast to Apple’s handling of Siri we have Amazon which has it’s pedal to the floor.

Roemmele goes on to discuss the rapid emergence of Amazon’s Echo ecosystem and the growth of Alexa. Within the context of this podcast and what I’ve seen of his writing, much of his interest and background seems centered on commerce and payment as they relate to voice. That said, I’m just not that interested in what he calls “voice commerce”. I order from Amazon maybe 6 times a year. Now and in the foreseeable future I get most of what I need from local merchants. That said, even when I do order online I do so visually. I would never order via voice because I have to look at details. Perhaps I would use voice to reorder certain items that need to be replaced such as toilet paper or toothpaste but that’s the extent of it.

What I’m interested in is how voice can be a part of the computing experience. There are those of us that use our computers for work. For the foreseeable future I see myself interacting with my iPad visually because I can’t update a website with my voice. I can’t design a brochure with my voice. I can’t update a spreadsheet with my voice. I can’t even write with my voice because my brain has been trained to write as I read on the screen what it is I’m writing.

But this isn’t the computing Roemmele is discussing. His focus is “voice first devices”, those that don’t even have screens, devices such as the Echo and the upcoming HomePod1. And the tasks he’s suggesting will be done by voice first computing are different. And this is where it get’s a bit murky.

Right now my use of Siri is via the iPhone, iPad, AppleWatch and AirPods. In the near future I’ll have Siri in the HomePod. How do I make the most of voice first computing? What are these tasks that Siri will be able to do for me and why is Roemmele so excited about voice first computing. The obvious stuff would be the sorts of things assistants such as Siri have been touted as being great for: asking about for the weather, adding things to reminders, setting alarms, getting the scores for our favorite sports ball teams and so on. I and many others have written about these sorts of things that Siri has been doing for several years now. But what about the less obvious capabilities?

At one point in the podcast the two discuss using voice for such things as sending text. I often use dictation when I’m walking to dictate a text into my phone when using Messages and I see the benefit of that. But dictation, whether it is dictating to Siri or directly to the Messages app or any other app, at least for me, requires an almost different kind of thinking. It may be that I am alone in this. But it is easier for me to write with my fingers on the keyboard then it is to “write” with my mouth through dictation. It might also be that this is just a matter of retraining my brain. I can see myself dictating basic notes and ideas. But I don’t see myself “writing” via dictation.

At another point Roemmele suggests that apps and devices will eventually disappear as they are replaced by voice. At this point I really have to draw a line. I think this is someone passionate about voice first going off the rails. I think he’s let his excitement cloud his thinking. Holding devices, looking, touching, swiping, typing and reading, these are not going away. He seems to want it both ways though at various points he acknowledges that voice first doesn’t replace apps so much as it is a shift in which voice becomes more important. That I can agree with. I think we’re already there.

Two last points. First, about the tech pundits. Too often people let their own agenda and preference color their predictions and analysis. The lines blur between their hopes and preferences and what is. No one knows the future but too often act as they do. It’s kinda silly.

Second, what seems to be happening with voice computing is simply that a new interface has suddenly become useful and it absolutely seems like magic. For those of us who are science fiction fans it’s a sweet taste of the future in the here and now. But, realistically, its usefulness is currently limited to very fairly trivial daily tasks mentioned above. Useful, convenient and delightful? Yes, absolutely. Two years ago I had to go through all the trouble of putting my finger on a switch, push a button or pull a little chain, now I can simply issue a verbal command. No more trudging through the effort of tapping the weather app icon on my screen, not for me. Think of all the calories I’ll save. I kid, I kid.

But really, as nice an addition as voice is, the vast majority of my time computing will continue to be with a screen. I don’t doubt that voice interactions will become more useful as the underlying foundation improves and I look forward to the improvements. As I’ve written many times, I love Siri and use it every day. I’m just suggesting that in the real world, adoption of the voice interface will be slower and less far reaching than many would like.


  1. Actually, technically, the HomePod technically has a screen but it’s not a screen in the sense that an iPhone has a screen. ↩︎