Voice control is one of digital technology’s fastest-growing trends. Thanks to advances in such computing subfields as speech recognition, machine learning and natural language processing, devices are now voice-controlled to play music, send messages, make payments from bank accounts and supply personalised recommendations and advice.
Virtual assistants such as Apple’s Siri and Amazon’s Alexa have their shortcomings. Critics point out that they are a long way from being able to hold genuine conversations and that their usefulness is currently overhyped. On the other hand, and within certain limitations, voice control technology has already proved its validity in a variety of scenarios where a manual interface would be inconvenient or dangerous.
Driving provides the obvious example: voice recognition software such as that used by Ford for its SYNC infotainment system allows drivers to control its functions without taking their hands off the wheel or eyes off the road. Elsewhere, automatic spoken word to text conversion is liberating certain professionals, such as physicians, from the encumbrance of having to scribble notes by hand as they work.
It is possible to see how areas of the factory floor might benefit from similar thinking. When it comes to the setting up, use or maintenance of machinery, hands-free interaction could make the work of operators both safer and, with less downtime, more productive.
Though in its very early stages, voice control technology for industry is now starting to get off the ground. iTSpeex’s ATHENA is one of the first voice-activated operating systems designed specifically for use with CNC machines such as lathes, mills and grinders. By means of a headset, microphone and notebook PC, the user can both instruct machines to carry out specific operations and instantly access information from machine manuals and factory documentation.
The kind of algorithms underpinning this capability have come a long way from Siri’s first forays into fetching internet search results. And other aspects of the system have been specially developed to meet the challenges of the industrial environment, such as the shop noise-cancelling features of the headset.
On one level, the naturally limited range of words and command types involved in a user’s interaction with a machine like a drill or a cutter makes appropriate voice control programming relatively easy. This is not to say, of course, that basic training in the right terminology is not necessary for users. Nor that other problems universal to speech recognition software, such as strong regional accents, do not also need resolving here.
It is particularly important in an industrial context, however, that spoken commands be no less clear than those traditionally delivered by buttons and keypads. Hence the need for an activation word used at the start of utterances – to make sure the machine knows the words are directed particularly to it – and, where necessary, system requests for clarification or confirmation. Hence, too, the need for proper authorisation protocols.
The story about Amazon’s Alexa accidentally ordering some cat food when it overheard one of Amazon’s own adverts is funny enough – but not the kind of true story one wants to hear a factory-setting-variation of. In a sense, the challenge of voice control technology is to make interaction with devices easy but not too easy: easy enough to make machine activation swift and natural, but not so easy as to risk dangerous or costly consequences.
If new technology is to improve efficiency without compromising workplace safety, it must also do so without undermining system security. The threat of cyber-crime rises as businesses become more digitised, and, as it stands, voice tech is a widely accepted major risk component of digital systems.
ATHENA emphasises that it functions completely locally, without an internet connection. And a voice-activated maintenance assistant recently in development by Siemens (by means of which technicians in wind turbines might verbally access information while continuing complex work with both hands) envisages the restriction of all associated data to the company’s own cloud-based operating system.
And yet it is difficult to imagine voice-controlled machines being kept away from shared internet space in the long term. Chip technology, after all, is finding ever more power-efficient ways of running speech-recognition software, broadening significantly the range of devices suitable for voice operation – including the kind of IofT devices already known for their vulnerability to hackers.
There is every chance that challenges like these will eventually be met. As has been proven in the domestic and leisure spheres, for many the spoken digital interface is natural and preferred. Search engines like Google are already being adapted to respond not just to typed keywords but to ordinary, spoken phrasing.
Points of friction, it seems, between human beings and the devices they live and work with, are everywhere being smoothed away. It is a sort of closing of the culture gap. Whether it be industrial robots that look and behave increasingly like human arms, or the constant redesign of laptops and other devices the better to reflect actual handling, all is in pursuit of a kind of complete ergonomic fit.
That natural speech should at some point supersede the artifice of typed code is part of the same trend – though in this case the imagined industrial endpoint remains some considerable way off.