One week after a report in The Guardian revealed that people in Apple's Siri "rating" program heard private and illegal activity, Apple has suspended the program to conduct a review. It also works with a software update to allow users to unsubscribe (or perhaps sign up).
Apple issued a simple statement: "We are committed to providing a great Siri experience while protecting users' privacy. While conducting a thorough review, we are suspending Siri rating globally. In addition, as part of a future software update, users will have the opportunity to choose to participate in grading. "
That's right to do, but it makes me wonder what the road ahead should be. Because while most people don't realize it, machine learning (ML) and AI are built on a foundation of human "grading," and there is no good alternative in sight. And with Siri often criticized for being a year or two behind its rivals, it won't be easy for Apple to take charge while protecting our privacy.
Everyone does it
What is this Siri grading program all about? Basically, every time you say "Hey Siri …" the command you provide is processed on your device, but also is semi-anonymized and sent up to the cloud. Some small percentages are used to train the neural network that allows Siri (and Apple's dictation function) to understand exactly what you're saying. Someone, somewhere in the world, listens to some of the "Hey Siri" commands and notes whether Siri understood the person correctly or not.
Then the machine learning network is adjusted and re-adjusted, and re-adjusted, through millions of permutations. The changes are automatically tested against these “graded” samples until a new ML algorithm yields more accurate results. Then that neural network becomes the new baseline and the process repeats.
There is just no way to train ML algorithms – for speech recognition or image recognition or to determine if your security camera saw a person or a car – without a human training.
There is just no way to train ML algorithms – for speech recognition or photo recognition or to determine if your security camera saw a person or a car – without a person training it this way. If there was a computer algorithm that could always accurately determine whether AI was right or wrong, it would be the AI algorithm!
Apple, Google, Amazon, Microsoft and anyone else who produces AI assistants that use machine learning algorithms to recognize speech or detect objects in images or video or almost anything else do so. They listen to your assistant questions, they look at your photos, they look at your security cameras.
You can certainly train ML algorithms by using a bunch of commercially purchased and licensed photos, videos, and voice samples. And so do many companies, but it will only get you so far. In order to truly make the AI reliable, it needs the same quality photos, videos and recordings taken on the company's devices. It needs messy, accented voice from six feet away on the phone's microphone with wind noise and a mower in the background.
Human training for AI is not a rare occurrence, it is common practice. Tesla's self-driving capacity is built with people who practice a neural network by looking at camera data from customers' cars and marking signs, lanes, other cars, bicycles, pedestrians and so on. You just can't train a high-quality machine learning algorithm without people having gone through the data.
Anonymous, but not quite
Because it is simply not possible to train a high-quality AI algorithm intended to be used by millions of people without human review, at least most companies try to do so semi-anonymously. Before a human hears a recording, it is stripped of all data that can be used to identify a precise user. At least that's what the companies tell us they do.
But a certain amount of data beyond the actual voice recording or photo / video is usually required, so it cannot be completely anonymous.
For example, if I say, “Hey Siri, what time does the UPS store on Greenback Lane close? "And Siri thinking I said" What does the UPS store on Glenn Brook Lane do? “I'm going to have a bad result. There is no Glenn Brook Lane near me, and absolutely no UPS store there. But there is no way for an automated system to know that transcription was wrong, because that is certainly something a person could say.
So a person has to go through these things, and they need to know just about where I was when I made the request. These human "ratings" won't know that Glenn Brook Lane is wrong without enough location data to know that there is no Glenn Brook Lane nearby, right?
Similarly, a person undergoing video footage from Ring to distinguish moving cars versus people may need to know if they are looking at images from an outdoor camera (which sees many cars) or an indoor camera (which should only see cars through windows).
Full disclosure is key
It is difficult to know exactly how consumers would react to the way their data could be used to train AI algorithms if they knew exactly how it works and exactly what was done to protect their privacy . I have a feeling that most people would be okay with that (if people were concerned about personal information and privacy, Facebook wouldn't be used by 1.2 billion people.)
But they don't know, and none of them involved companies seem interested in explaining it. Brief statements to the technical press are not the same as informing hundreds of millions of users. Hiding permitted statements 4000 words deep into your tight service agreement doesn't count. This lack of disclosure is a key flaw.
One of the biggest problems is the fact that virtual assistants often record things they shouldn't. Basically, Siri, Alexa and Google Assistant always play. They listen for a few seconds at a time in a continuous loop on the device buffer, and send no information anywhere until they hear the wake-up phrase: Hey Siri, Alexa or OK Google / Hey Google. Only after that do they activate the network connection and send your data to the cloud.
As we all know, sometimes these alarm phrases do not work and sometimes they are triggered even if no one said them. These false triggers are what end up being the human "classifications" belonging to excerpts of private conversations, drug deals, sexual activity and so on.
Again, there is no easy solution. These assistants are not going to be perfect when they hear the waking phrases unless people actually tell them when they did it wrong.
Do the work yourself
It does not necessarily mean that we have to pass data together to others. We could train and rate ourselves. Apple can change the iPhone so that every time Siri is called, we are presented with simple "right" or "wrong" buttons. If the user marks a mistake, they may be able to offer more info – the correct phrase, or the way the answer they got was not what was expected.
Smart speakers could be given keywords that allow us to do the same with our voice, perhaps using a wired phone to make corrections.
Then the adjusted algorithm – but none of our personal data – could be sent back to the parent company to be combined with everyone else's and incorporated into the next software release. Some companies already use this method for certain types of ML algorithms, such as smart, predictive text in keyboards (where we correct errors at all).
Most users will never bother to rate and correct their virtual assistant, of course. The whole point of them is to avoid this tedium, and who wants to review all the misdiagnosed motion triggers on their smart security camera or mis-labeled image in an AI-powered photo album? There is work . It's the opposite of what AI is for.
But with a large enough audience, and Apple can certainly argue that with over a billion devices in use, even a small fraction of the active users who train their devices would be a huge test to draw from. It may even be enough to make Siri an exceptional AI assistant, which it certainly is not at present.
Would a company like Apple be willing to go the extra mile? To stain the smooth design and "it just works" look with an easily accessible interface that, from its very existence, means something doesn't work often enough? Probably not. Apple is likely to quickly complete the grading program review and reinstall it with a rocker switch in privacy settings to opt out. It's the easy thing to do, but it's missing the opportunity to turn at least a small portion of hundreds of millions of Siri users into active Siri enhancements.