Conversational AI: What To Expect In The Next Five Years

Published in

Chatbots Magazine

6 min readMay 14, 2017

Soundbite: By 2022, chatbots will take coffee orders, help with tech support, and recommend restaurants (albeit without small talk and good humor).

“Bots go bust” — so went the first of the five AI startup predictions in 2017 by Bradford Cross, countering some recent excitement around conversational AI (see for example O’Reilly’s “Why 2016 is shaping up to be the Year of the Bot”). The main argument was that social intelligence, rather than artificial intelligence is lacking, rendering bots utilitarian and boring.

As I tinker with dialog systems at the Allen Institute for Artificial Intelligence, primarily by prototyping Alexa skills, I often wonder what AI is still lacking to build good conversational systems, punting the social challenge to another day. This post is my take on where AI has a good chance to improve and consequently, what we can expect from the next wave of conversational systems.

Task-Oriented Dialog Agents (TODA), Circa 2020

First, the fun part: predictions. I believe we will see a new class of task-oriented dialog agents (TODA) gradually rolling out in the next 2–5 years, helping us get things done.

When you run into a problem with your tech gadget in 2019, a tech support agent will help you with troubleshooting the problem. The agent won’t say “your call is very important to us, please stay on the line and your call will be answered in the order it was received.” The agent is of course a bot that can help you, immediately and effectively.
When you pull up to a Starbucks’ drive-through in 2020, you will be speaking to an order-taking bot.
When you need help with restaurant recommendations and reservations at the Wynn Las Vegas in 2021, help will arrive in the form of a concierge bot.

These are hardly ideas of Hollywood’s science fiction. Even when the Starbucks bot can sound like Scarlett Johansson’s Samantha, the public will be unimpressed — we would prefer a real human interaction. Yet the public won’t have a choice; efficient task-oriented dialog agents will be the automatic vending machines and airport check-in kiosks of the near future.

The AI Behind TODA: Yes, It’s Deep Learning!

When we open our news feed and find out about yet another AI breakthrough—IBM Watson, driverless cars, AlphaGo — the notion of TODA may feel decidedly anti-climatic. The reality is that the current AI is not quite 100% turnkey-ready for TODA. This will soon change due to two key factors: 1) businesses want it, and 2) businesses have abundant data, the fuel that the current state-of-the-art machine learning techniques need to make AI work.

It won’t be an easy march though once we get to the nitty-gritty details. For example, I heard through the grapevine that when Starbucks looked at the voice data they collected from customer orders, they found that there are a few millions unique ways to order. (For those in the field, I’m talking about unique user utterances.) This is to be expected given the wild combinations of latte vs mocha, dairy vs soy, grande vs trenta, extra-hot vs iced, room vs no-room, for here vs to-go, snack variety, spoken accent diversity, etc. The AI practitioner will soon curse all these dimensions before taking a deep learning breath and getting to work. I feel though that given practically unlimited data, deep learning is now good enough to overcome this problem, and it is only a matter of couple of years until we see these TODA solutions deployed. One technique to watch is Generative Adversarial Nets (GAN). Roughly speaking, GAN engages itself in an iterative game of counterfeiting real stuffs, getting caught by the police neural network, improving counterfeiting skill, and rinse-and-repeating until it can pass as your Starbucks’ order-taking person, given enough data and iterations.

The wheels are churning inside Amazon, Facebook, Google, Microsoft, and other tech companies to capitalize on this opportunity. Starbucks baristas can soon really focus on perfecting the art of making drinks.

Conversational AI: What To Expect Beyond TODA?

In a word: not much. In the near future your hotel concierge won’t help you with your iPhone tech troubles. Your Starbucks barista-bot may take your order, but won’t chit-chat with you about today’s weather or politics.

The upcoming TODA agents are good at one thing, and one thing only. As Facebook found out with the ambitious Project M, building general personal assistants that can help users in multiple tasks (cross-domain agents) is hard. Think awfully hard. Beyond the obvious increase in scope, knowledge, and vocabulary, there is no built-in data generator that feeds the hungry learning machine (sans an unlikely concerted effort to aggregate the data silos from multiple businesses). The jury is out whether the army of human agents that Project M employs can scale, even with Facebook’s kind of resources. In addition, cross-domain agents will probably need major advances in areas such as domain adaptation, transfer learning, dialog planning and management, reinforcement/apprenticeship learning, automatic dialog evaluation, etc.

I argued that it is super hard to scale a one-trick TODA into a general assistant that helps the user getting things done across multiple tasks. An intelligence assistant is arguably expected to hold an informal chit-chat with the user. It is this area where we are staring into perhaps the biggest challenge of AI. Observe how Samantha introduces herself to Joaquin Phoenix’s Ted in the clip below:

When the irresistible force of neural network hits the immovable object of human commonsense, we are instead treated to this type of reality check:

Human: What is your job?
Machine: I’m a lawyer.
Human: What do you do?
Machine: I’m a doctor.
…
Human: how old are you?
Machine: I’m 16.
Human: 16?
Machine: I don’t know what you are talking about.
Human: You don’t know what you are saying.
Machine: I don’t know what you are talking about.

How far are we from building systems with commonsense? One often-heard answer is: not in the near future, while the realistic answer is: we don’t know. Last year, I spent some time trying to build a system that can do better than an information retrieval baseline in taking fourth-grade science exam (which still has a ways to go to gain a passing score of 65%). I failed hard. Here’s an example to get a sense of the difficulty of these questions.

Bluebirds prefer to live near open, grassy places. Where would you most likely find a bluebird?
(A) a dam
(B) a beach
(C) a ball field
(D) a parking lot

If AI struggles with fourth-grade science question answering, should AI be expected to hold an adult-level, open-ended chit-chat about politics, entertainment, and weather? It is thus encouraging to see that Microsoft’s Satya Nadella did not give up on Tay after its debacle, and Amazon’s Jeff Bezos is sponsoring an Alexa social chatbot competition. I love this below quote from Jeff:

“I believe the dreamers come first, and the builders come second. A lot of the dreamers are science fiction authors, they’re artists…They invent these ideas, and they get catalogued as impossible. And we find out later, well, maybe it’s not impossible. Things that seem impossible if we work them the right way for long enough, sometimes for multiple generations, they become possible.”
- Jeff Bezos
Founder & CEO, Amazon

Wrapping Up

By 2022, task-oriented dialog agents/chatbots will take your coffee order, help with tech support problems, and recommend restaurants on your travel. They will be effective, if boring. What do I see beyond 2022? I have no idea. Amara’s law says that we tend to overestimate technology in the short term while underestimating it in the long run. I hope I am right about the short term but wrong about AI in 2022 and beyond! Who would object against a Starbucks barista-bot that can chat about weather and crack a good joke?

Acknowledgement

Thanks to Paul den Hertog for pointing out the quote that is Amara’s law which I originally attributed to Bill Gates.