Using speech recognition software is akin to witches who stare into their cauldrons uttering magical incantations. It’s a bizarre method of making stuff happen on a screen, its uses range from pointless time-wasting to providing less able bodied people life changing results.
I get the impression that speech recognition software is aimed at a percentage of the population who may have limited motor skills or reduced capacity to interact with a PC in a traditional sense. I remember feeling very excited when I read an article over a decade ago about how we would all soon be talking to our computers instead of using such outdated and unfashionable things like the common mouse and keyboard. At the time I was into science fiction movies so the prospect of having a chat with a computer like HAL from 2001 or KIT from Knight Rider seemed incredible.
Forgive me if I’m wrong but I seem to recall seeing a microphone logo in Windows ME that related to speech recognition software. I’m sure I clicked it a few times but nothing memorable happened. Since then I’ve not paid much attention to what was hailed as the dawn of a new age in computer interaction because it was obvious computer power simply wasn’t able to process and act upon the complexities of the spoken word.
My enthusiasm for fiction writing is always sky-high but my focus ebbs and flows, there are good days and bad, though I try to post regular articles on my blog it isn’t the same as focussing my brain on a story. A few weeks ago I hit “the zone” and a decent chunk of story screamed out of my brain and onto the screen. I typed away for hours without a break and at the end of it my finger tips were numb! No word of a lie.
After a much needed barbecued burger I
sprinted dashed walked…ambled back to the keyboard with a full tummy and continued writing into the night. Around 4am I had to stop, my eyeballs were unhappy and my finger tips were none too pleased at the sudden onslaught. You know those gel pads people rest their hands on when using a keyboard or mouse? Why has no one thought to make the actual keys on a keyboard out of the same stuff?
I wrote more.
This sudden burst of story weaving prompted me to wonder if I could just relax in my chair and just talk to my PC. At times I find there is a strange kind of brain to keyboard dyslexia, like a delay or mix-up where I shovel the idea out of my skull, down my arms and into my finger tips on the keyboard. You know when you click too many things and the PC sits there all confused about what to do next? It’s that kind of thing.
Then it struck me. Technology must have moved on enough for speech recognition to be moderately efficient by now. A quick search later and I was hooking up my headset and microphone and working my way through the Windows Speech Recognition Tutorial.
I was pleasantly surprised. There was even a non-threatening, well dressed lady to encourage me with the procedure. See? She looks calm and vaguely happy in a “I’m paid to this so I don’t care if I look stupid” kind of way.
She doesn’t do much, just wanders across the screen and sits down at a PC, looking all smug and confident that she can get it to work whereas I probably wouldn’t have a hope.
Then the big step came. I read the blurb on the screen and it invited me to click or say Next. I was doubtful it would work. I can do top quality 3D design, play games, edit video and do all sorts of funky stuff on my PC. It makes a lot of noise so it must be powerful.
But judging by those voice activated systems they have at customer call centres…well, I’ve learned to just press # to speak to an operator after the system rejects my voice commands for the 28th time!
I said “Next!” in a clear tone.
The page moved to the next one! My PC knew what I was saying. It WAS my pal after all! That was it, there was no stopping me! I worked through all the tutorials and slowly learned how to speak to my PC. There were lots of commands to use and I had to remember to speak clearly, there’s even a handy little tool bar that sits on your screen and tells you off when you cough or curse at it!
Isn’t that cool? It’s listening to me! Shame it ain’t gonna hear anything as I don’t have my microphone plugged in as I’m writing this, but it is sat there patiently waiting to hear the sound of my voice. Which is nice. After the tutorials were done I went through the speech training stuff where I read junk on the screen and my PC supposedly learned my vocal patterns.
I did feel a bit self-conscious reading the stuff out as I needed to concentrate on speaking clearly. I was very aware of my diction. Oo-er. I felt like I was using my special voice like when I try to explain how the TV set works to someone over 70. It took quite a while to trawl through it all, reciting the text and trying to remember all the commands. Thankfully there’s a command called “What can I say?” and when you say it a handy box pops up with all the commands in it!
Say it loud!
After all that hard work I took a break for a cheese burger. And a diet coke, with ice.
Then it was time to put my training into practice! My PC had proved to be a loyal chum, listening merrily to everything I had said thus far without too much sarcasm or rolling of the eyes. I opened up a fresh Word doc, primed the speech recognition tool bar for my verbal onslaught and adjusted my vocal chords…
Me me me me me….wah wah wah wah….tee tee tee….lob lob lob….sigh.
So I started writing using my mouth. The first line came out all wrong and I felt it had all been for nothing. I wanted to reach out and use the keyboard to delete it all but I called up the command prompt box and said in a clear voice “Undo That” whilst thinking “stupid machine.” It did in fact undo what I had said. I was happy it knew how to undo at least.
I remembered that it continues to learn what you say and how you say it so it wasn’t going to be productive to keep deleting text without using my voice to tell it why “this is my voice” doesn’t translate to “thesis mic voice”, and all without shouting at it!
It does work….kinda.
After a few hours talking to my PC (makes me sound like a nut case doesn’t it?) I finally got into the flow of things. I could speak out about one sentence at a time, then pause for it to appear on the screen, then either correct it or continue. Punctuation is handled by saying “Open quote” or “Comma” and also stuff like “New line” or “tab” for example. It felt a bit stop/start having to actively say those words out loud as when I’m writing all those tiny bits just flow in the background without paying any attention to them.
That aside it works, sort of…a bit. There’s a strange kind of disembodied feeling though, like I’m not really connecting with the words as they appear on the screen, almost as if without the physical act of pushing keys on the keyboard I haven’t earned the words. That’s weird.
- Talking instead of typing a story is similar to using a pen and paper, I’ve taken more time to think about what I want to say instead of letting it all pour out like gravy, this is more refined like drinking wine through a straw.
- I have enjoyed lounging in my chair with my feet on the desk and enjoying watching the words pop onto the screen. The art of the lazy writer!
- Speech recognition isn’t just limited to making text appear on the screen. If I say “Start. Programs. Firefox.” You’ll never guess what happens! I’ve been able to control pretty much everything on my PC just using my voice. Perfect for when I just don’t feel like moving my limbs about.
- Due to the reduced speed in which words are translated and displayed on the screen, compared to using a keyboard, the process of writing became more measured and deliberate, refined I guess.
- When writing dialogue I often stop and read out loud what a certain character is saying to see if it fits with their personality. Speech recognition can really help with this, since it’s all verbal communication I’ve had a great time really getting into character like a comedian running through impersonations on stage. It’s great fun!
- Understandably there’s a lot of faffing about – adjusting the microphone, fine tuning, testing, training, tutorials etc and then when I did start it took much longer than I expected to get into the flow. I estimate it took about 2-3 hours to run through the tutorials and even then it’s still not that great. So far I’ve verbally written around 5,000 words which took about 8-10 hours in all. Using a keyboard I could do that in a couple of hours in a straight run. So the time scale needed to be semi-proficient is quite lengthy.
- I wasn’t prepared to have to make so many corrections. I was constantly saying: “Change that” to change misspelt words or “Caps word” to make the previous word start with a capital letter. These are things that do become easier as time goes on but they felt like a chore when I started.
- If I cough or sigh or make any other unrecognisable sound the software tries to interpret this and sticks random best guesses down. For example a slight cough is translated as “If” and a sigh usually results with either “The” or “Where.” And yes, there was an awful lot of sighing going on!
So, what have I learned?
It’s been a worthwhile process even if the results weren’t as good as I had hoped. I now have the option to switch from hand to mouth and relax into my chair and write instead of being hunched over like a cave man. I’ll likely use speech recognition more when it comes to dialogue rather than narrative as I feel more connected with the characters.
Even if I had the money to buy a more professional and responsive speech recognition program, I don’t know if I would. I only used the one that came with Windows Vista so maybe other paid software would do a better job. I’m not willing to put this theory to the test just yet.
Also I’ve learned that it must feel like a battle for those with disabilities for example, yet at the same time I would assume it can open up so much for them, and that the initial work involved should eventually pay off. It still feels a long way off what movies show us, chatting away with your PC like you would another human doesn’t seem possible unless you have access to much more sophisticated software.
I would like to think that in my lifetime there will be a variety of A.I (Artificial Intelligence) hosts out there than can ask me how my day was or comment on the weather, you know, general chit-chat. I think when that happens a lot of lonely people won’t feel so isolated. Now and then I yearn to hear the monotonous voice of HAL 9000 saying: “I’m sorry Dave, I’m afraid I can’t do that” instead of the usual Windows error beep.
However, there is an interesting note made in the tutorial about how the software isn’t perfect and things are always being improved and updated. So it’s nice to see that Microsoft aren’t declaring a perfect application but one that is evolving. In the end I enjoyed testing it all out and taking the time to learn how to speak to my PC, even it doesn’t speak back.
Maybe one day it will.
And when that day comes I will give a wry smile and say: “Are you talkin’ to me?”