Setting Up Voice Control for ICAVE

We have developed a plugin for Unity and getReal3D that will allow you to control anything in your ICAVE project with the sound of your voice in just a few minutes, it is a drag and drop package for Unity that works with the getReal3D RpcManager to automatically call functions across both nodes.

Downloads and UnityPackage Source

For the latest unitypackage release please go to https://github.com/FIU-ICAVE/VoiceRecognitionForCave/releases

For the unitypackage source please go to https://github.com/FIU-ICAVE/VoiceRecognitionForCave

Setting up Google Cloud Speech

The plugin uses the Google Cloud Speech API so you will have to set up your own Google Cloud Platform account, and create a project. After you create your project you need to activate access to the Google Cloud Speech API from the API Library and generate an API Key from the credentials section in the Google Cloud Console.

More details on setting up Google Cloud is available here

Setting up the Unity Plugin

Download the latest .unitypackage available from the project GitHub here. Make sure you have imported the latest getReal3D plugin first [Image], then import everything in the package to your current Unity project [Image].

You should now have a folder inside your Assets folder named GoogleCloudSpeech, inside Assets/GoogleCloudSpeech there is a prefab called SpeechRecognition. Click on SpeechRecognition and drag it into the Unity world. [Image]

Select the SpeechRecognition object from the hierarchy to view it inside the inspector. You should now see various configuration settings inside the prefab, what each setting does is explained in the project wiki on GitHub and in the tool-tips inside the inspector, but for now we’ll only touch the “Google Speech Api Key” and set it to be our Google Cloud Api Key [Image].

Congratulations! The Google Cloud Speech Api is now setup and ready to begin taking commands.

Custom Commands

After setting up the Unity Plugin you need to set up your own command handler to receive commands based on keywords you define. An example command handler is provided at Assets/Commands/ExampleCommand.cs and below

The three necessary functions to make a custom command handler are defined in the ICommand interface and they are  Start(), GetKeywords() and HandleCommand(string command, string keyword)

GetKeywords tells the command dispatcher which keywords your handler is capable of processing, and HandleCommand is called when the command dispatcher reads the transcribed speech and finds a keyword in the speech, the command dispatcher will send both the keyword that triggered the event along with the entire transcript of what the user said.

Multiple handlers can handle the same keyword, and the order in which the commands are dispatched to the handlers is non-deterministic.

Making a Custom Command Handler

For this demo we’ll make a command handler that makes a TextMesh in front of the camera and sets the text to equal the entire command.

When making a new command there are a few important things to do, such as making sure your new command is inside the /Assets/GoogleCloudSpeech/Commands/ folder, naming command handler class appropriately in the style of WhatItDoesCommand.cs so for out example we will be making the new command handler TextMeshCommand.cs

After creating the new .cs file inside the /Assets/GoogleCloudSpeech/Commands/ folder open up Visual Studio to edit the new command handler.

Set the namespace of the class to  Assets.GoogleCloudSpeech.Commandsyour command handler will not be found if it is not in this name space!!!

Then replace the base class MonoBehavior  with the  ICommand interface so that our new class implements the ICommand interface, your command handler will not be found if it does not implement ICommand!!!

The first few lines of the new command handler should look like this

Now we need to implement both members of the ICommand interface. So create the three methods inside the new command handler as shown below

Handling Custom Commands

After the command dispatcher has found a keyword that your command handler can handle, it will call HandleCommand() with both the entire command that was transcribed and the keyword that triggered the alert, actually doing things with this information is up to you! For this demo our keyword is going to simply be “Show Command” and “Clear Text” so lets create two more methods

And since the keyword the dispatcher used to call HandleCommand is passed we can use the keyword to quickly identify which method we want to use to handle it. For the ShowCommand method I decided to substring the command so that the keyword doesn’t show up in the TextMesh.

Since we’re going to be manipulating a game object, we’ll need to set those up, since this guide isn’t about Unity but about working with the GoogleCloudSpeech plugin for ICAVE, I’ll just show the code I’m using in the Start() method.

This will create an empty game object, attach it to the getRealPlayerController so that when the camera moves, the empty object moves with it, then it creates a TextMesh and sets it up in a position that the camera will be able to see it, and finally sets the font color and place holder text.

Now that we can access _textMesh anywhere in our command handler, we can go ahead and  edit our ShowCommand and ClearCommand methods to actually edit the text.

And that’s it! Now when we launch the Unity application press the button we set for voice recognition and say “show command {whatever we want to show}” or “clear command” and our brand new command handler will edit the text accordingly.

Here’s the full TextMeshCommand.cs that we made for the demo