Here are the criteria that a verbal interface must meet. First it must have the ability to do specific orders or conduct tasks. Second, carry out casual conversation like a real person would. Finally, it must be done on a fairly weak Windows 7 PC. Those three challenges may seem overwhelming, but will be overcome with a little creativity.
What software do you need for the verbal interface?
- Nuance Dragon Naturally Speaking 11
- RoboForm 7
- WinAutomation
- TechSmith Camtasia or TechSmith Snagit
- ABBYY FineReader 10 Professional Edition
- NaturalReader
- Decaptcher
- A. L. I. C. E. The Artificial Linguistic Internet Computer Entity
- Cleverbot
RoboForm 7 is used in automating online interactions. One makes a general profile containing a name, email, password, DOB etc. When prompted online for input, like on login forms, RF7 fills it in. It is quite the versatile little program, because it can work on Internet Explorer, Opera, Firefox and Chrome. Also, it stores your logins and passwords in a password protected database and can input them on request.
WinAutomation is the second core component. It is a strong windows automation software and macro recorder. This means that you can automate and schedule any repetitive job or process on your computer. It serves as a processing tool for DNS which will be described later.
TechSmith Camtasia or TechSmith Snagit are both screen capture software. They serve exactly that purpose. Camtasia allows for recording videos while Snagit allows for printscreens. They are used as a secondary system which feeds input to other automation software and procedures.
ABBYY FineReader 10 is an extremely smart OCR (optical character recognition) instrument which is used for textual content recognition and creating editable and searchable electronic files from scanned documents, PDFs and digital photographs.
NaturalReader is uses to patch up a great flaw in Dragon Naturally Reading. It is used to read text from a computer out loud. It does so in a very natural sounding voice. NR can read from pictures too. In more complex operations, NR reads text that is being feed from other programs.
Decaptcher is a controversial service. It allows for solving those CAPTCHA letters automatically. For $2 one gets 1000 solved. This is used mostly in conjuction with RoboForm.
A. L. I. C. E. The Artificial Linguistic Internet Computer Entity is used for the talking and personality part. It is a web service with which one can communicate. You type something in an then the robot responds etc. you interact with one another. WinAutomation is used to communicate the data to NaturalReader. ALICE is very polite and outgoing; very pleasant and cooperative conversationalist. It is used as the formal computer personality representative. You can download it too and set up locally.
Cleverbot is used for adding flare to the simulated AI. When this mode is activated, the AI becomes a very different conversationalists. The best description would be like a drunk smart-ass. All that rudeness is then read and passed along to speech or text.
Now that we are familiar with the programs and their basic functions it is time to define their interactions and finally make the whole damn thing work.
How to make the verbal interface do things?
Dragon Naturally Speaking is used to accept verbal input. It converts speech into text or executing commands and even some basic command sequences. DNS would be configured to work in 4 different modes: executing commands locally and online and working in two talk modes; polite and smart-ass. DNS is supposed to convert verbal input into preconfigured commands.
Because DNS can't execute complex tasks, it knows how to send commands to other software. We will be linking it to another program called WinAutomation. The WinAutomation software can execute even the most complex tasks and even macro them in loops and give feedback. It is used to cover the verbal interface's ability to do any preconfigured local and online task. As such WA can be considered as the main server of the verbal interface. That server can interact with all other programs we use.
Camtasia or Snagit are used to capture images of "the difficult" things into text, like text which can't be copied or is on a picture. Afterwards ABBY FineReader can interpret it into copyable text or NaturalReader can read it aloud. Which reminds me, DNS doesn't have a good text to speech converter. NaturalReader must be used to patch that.
RoboForm and Decaptcher are used to smooth out internet interactions. Once set up, they can fill practically any form on any website. DNS and WA are used to read the relevant content or input spoken commands. This also applies to the chat mode. In chat mode, your verbal input is transferred to one of the chat personalities; ALICE or Cleverbot. The robots response is then transformed into human-like speech.
Here is an overall general diagram of this verbal interface's layout:
(click to enlarge)
The things mentioned above don't seem all that hard to imagine doing. Hope you found this useful and interesting.

No comments:
Post a Comment