Speech Recognition in the Browser

Special guest and Microsoft Engineer Jakub Jedryszek recently shared a presentation on speech recognition in the browser at our campus in Seattle. In case you missed it, read on to find out how you can integrate speech recognition into your app in pure JavaScript. For some background, read his first article on voice commands in the browser.

Last Thursday I had the pleasure of giving a talk about speech recognition in the browser at Code Fellows in Seattle.

Many people were surprised at how easy it is to add speech recognition to your website with pure JavaScript. So I thought I will share a few code snippets here. It only works in Chrome so far.

Recognizing speech

This is how you can translate speech to text:

var sr = new webkitSpeechRecognition();
sr.onresult = function (evt) {
    console.log(evt.results[0][0].transcript);
}
sr.start();

You can also get the confidence level of the result:

var sr = new webkitSpeechRecognition();
sr.onresult = function (evt) {
    console.log(evt.results[0][0].transcript, evt.results[0][0].confidence);
}
sr.start();

You can get interim results:

sr.interimResults = true;	// false by default
sr.onresult = function(evt) {
     for (var i = 0; i < evt.results.length; ++i) {
    	    console.log(evt.results[i][0].transcript);
    };
};    

Or different alternatives of recognized speech:

sr.maxAlternatives = 10;	// default = 1
sr.onresult = function(evt) {
	    for (var i = 0; i < evt.results[0].length; ++i) {
	    console.log(evt.results[0][i].transcript);
    }
}

You can set a language, e.g., to Polish:

sr.lang = 'pl-PL'

All above, you can stop the recognition when you stop speaking. In order to not stop the recognition, you need to set continuous flag to true. Additionally, this will treat every fragment of your speech as an interim result, so you need to update the onresult callback, too:

sr.continuous = true;	// false by default
sr.onresult = function(evt) {
    console.log(evt.results[evt.results.length-1][0].transcript);
};

Speech Recognition object has other callbacks (other than onresult) that you can take advantage of:

sr.onstart = function() { console.log("onstart"); }
sr.onend = function() { console.log("onend"); }
sr.onspeechstart = function() { console.info("speech start"); }
sr.onspeechend = function() { console.info("speech end"); }

Emitting speech

var msg = new SpeechSynthesisUtterance('Hi, I\'m Jakub!');
speechSynthesis.speak(msg);

You can also change the speaker voice:

var voices = window.speechSynthesis.getVoices();
msg.voice = voices[10]; // Note: some voices don't support altering params

There are also other options you can set:

msg.volume = 1; // 0 to 1
msg.pitch = 2; //0 to 2
msg.text = 'Hello World';
msg.lang = 'en-US';

msg.onend = function(e) {
    console.log('Finished in ' + event.elapsedTime + ' seconds.');
};

Summary

Speech is coming to the browser. The question is not if but when will most websites add voice support. Check out voiceCmdr, a library that I blogged about earlier this year, which helps add voice commands to your website in a very easy way. You can also check out this website that can be navigated with voice commands. You can find entire logic for voice commands support here (lines 38-103).


Read more articles by Jakub on <a href=“http://jj09.net/” target"_blank">his blog.

Next PostPrevious Post

About the Author