Build a voice to text search
On this page
Your app needs three things to build a voice search experience:
- A voice input using a speech-to-text service
- An output to display results
- The suggested Algolia settings for optimizing your voice search.
Sample app
More examples
Input - the speech-to-text layer
Since Algolia only handles text searching, you must convert your user’s speech to text. If you’re building on top of a voice assistant like Amazon Alexa, you get built-in speech-to-text support. This is also the case if you’re building iOS or Android native apps or explicitly targeting the Chrome browser. For all other web apps, you’ll need an external service. Some options are Google Cloud Speech to Text, Azure Cognitive Services, or AssemblyAI.
You need to send the user’s speech to the speech-to-text service, receive the text, and then send that text to Algolia as a search query. For example, the sample app uses Google’s Speech-to-text API to interrogate the user’s voice input:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
function startRecognitionStream(io) {
this.recognizeStream = this.gcpClient
.streamingRecognize(this.request)
.on("error", console.error)
.on("data", data => {
process.stdout.write(
data.results[0] && data.results[0].alternatives[0]
? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
: `\n\nReached transcription time limit, press Ctrl+C\n`
);
io.emit("dataFromGCP", data.results[0].alternatives[0].transcript);
//Stopping the speech recognition if the user stopped talking
if (data.results[0] && data.results[0].isFinal) {
io.emit("endSpeechRecognition", {});
}
});
}
Output
You can convert the text you receive from the speech-to-text layer into speech or display them as text (as in the sample app).
If you do need text-to-speech support, your choices are:
- Built-in. Voice assistants have text-to-speech support. All browsers that implement the SpeechSynthesis API offer support.
- Third-party. You can use solutions such as Azure Cognitive Services, AWS Polly, or Google Cloud Text-to-speech.
For example, the following code uses the SpeechSynthesis API to announce the titles of search results:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// The search input
const searchInput = document.querySelector('#search-input');
// The search results
const searchResults = document.querySelector('#search-results');
// Create an Algolia InstantSearch instance
const search = instantsearch({
appId: 'YourApplicationID',
apiKey: 'YourSearchOnlyAPIKey',
// Replace with the name of the index you want to search
indexName: 'YourIndexName',
// Bind the search input to the InstantSearch instance
searchParameters: {
query: searchInput.value
}
});
// Initialize the search
search.start();
// Listen for search results
search.on('render', () => {
// Get the title of search results
const titles = searchResults
.querySelectorAll('.Hits-item')
.map(item => item.querySelector('.title').textContent);
// Speak the titles using the speechSynthesis API
titles.forEach(title => {
const msg = new SpeechSynthesisUtterance(title);
window.speechSynthesis.speak(msg);
});
});
Customize this code to fit your specific needs. For example, you could add a button to control when the titles are spoken, or change the output to include information from other attributes.
Algolia settings
- Set
removeStopWords
totrue
or the appropriate language code (for example,en
). This will remove words like “a,” “an,” or “the” that don’t add value to the query. - Set
ignorePlurals
totrue
or the appropriate language code. This makes words like “car” and “cars” equivalent. - Send the entire query string as
optionalWords
. When searching conversationally, searchers might use words that aren’t in your index. Making all words optional means that records don’t need to match every word, but records matching more words will rank higher. - Use
analyticsTags
if you want to identify a search as being voice-driven
As an alternative to setting removeStopWords
and ignorePlurals
individually, you can use the naturalLanguages
parameter to set both these behaviors in one call.
Add dynamic filters with Rules
To help users refine their search to find more relevant results, consider adding rules to apply filters based on what they say.