Build a voice to text search

Sample app

Live Demo Source Code

More examples

GitHub Autocomplete with voice search (iOS)

Voice overlay (iOS)

Voice overlay (Android)

Input - the speech-to-text layer

Since Algolia only handles text searching, you must convert your user’s speech to text. If you’re building on top of a voice assistant like Amazon Alexa, you get built-in speech-to-text support. This is also the case if you’re building iOS or Android native apps or explicitly targeting the Chrome browser. For all other web apps, you’ll need an external service. Some options are Google Cloud Speech to Text, Azure Cognitive Services, or AssemblyAI.

You need to send the user’s speech to the speech-to-text service, receive the text, and then send that text to Algolia as a search query. For example, the sample app uses Google’s Speech-to-text API to interrogate the user’s voice input:

Copy
function startRecognitionStream(io) {
  this.recognizeStream = this.gcpClient
    .streamingRecognize(this.request)
    .on("error", console.error)
    .on("data", data => {
      process.stdout.write(
        data.results[0] && data.results[0].alternatives[0]
          ? `Transcription: ${data.results[0].alternatives[0].transcript}\n`
          : `\n\nReached transcription time limit, press Ctrl+C\n`
      );

      io.emit("dataFromGCP", data.results[0].alternatives[0].transcript);

      //Stopping the speech recognition if the user stopped talking
      if (data.results[0] && data.results[0].isFinal) {
        io.emit("endSpeechRecognition", {});
      }
    });
  }

Output

You can convert the text you receive from the speech-to-text layer into speech or display them as text (as in the sample app).

If you do need text-to-speech support, your choices are:

Built-in. Voice assistants have text-to-speech support. All browsers that implement the SpeechSynthesis API offer support.
Third-party. You can use solutions such as Azure Cognitive Services, AWS Polly, or Google Cloud Text-to-speech.

For example, the following code uses the SpeechSynthesis API to announce the titles of search results:

Copy
// The search input
const searchInput = document.querySelector('#search-input');

// The search results
const searchResults = document.querySelector('#search-results');

// Create an Algolia InstantSearch instance
const search = instantsearch({
  appId: 'YourApplicationID',
  apiKey: 'YourSearchOnlyAPIKey',
  // Replace with the name of the index you want to search
  indexName: 'YourIndexName',
  // Bind the search input to the InstantSearch instance
  searchParameters: {
    query: searchInput.value
  }
});

// Initialize the search
search.start();

// Listen for search results
search.on('render', () => {
  // Get the title of search results
  const titles = searchResults
    .querySelectorAll('.Hits-item')
    .map(item => item.querySelector('.title').textContent);

  // Speak the titles using the speechSynthesis API
  titles.forEach(title => {
    const msg = new SpeechSynthesisUtterance(title);
    window.speechSynthesis.speak(msg);
  });
});

Customize this code to fit your specific needs. For example, you could add a button to control when the titles are spoken, or change the output to include information from other attributes.

Algolia settings

Set removeStopWords to true or the appropriate language code (for example, en). This will remove words like “a,” “an,” or “the” that don’t add value to the query.
Set ignorePlurals to true or the appropriate language code. This makes words like “car” and “cars” equivalent.
Send the entire query string as optionalWords. When searching conversationally, searchers might use words that aren’t in your index. Making all words optional means that records don’t need to match every word, but records matching more words will rank higher.
Use analyticsTags if you want to identify a search as being voice-driven

As an alternative to setting removeStopWords and ignorePlurals individually, you can use the naturalLanguages parameter to set both these behaviors in one call.

Add dynamic filters with Rules

To help users refine their search to find more relevant results, consider adding rules to apply filters based on what they say.