ESP32 already supported being a fully functional Alexa client, a voice assistant.
ESP32 now also supports Dialogflow, a voice-enabled conversational interface from Google. It enables IoT users to include a natural language user interface in their devices.
The differences of Dialogflow w.r.t. voice assistants are
- a reduced complexity,
- pay as you go pricing,
- custom wake words, instead of having to use ‘Okay Google’ or ‘Alexa’
- and no certification hassles, because hey, you aren’t integrating with Alexa or Google Assistant; you are building one of your own
Unlike voice-assistants, Dialogflow let’s you configure every step of the conversation, and it won’t answer other trivia/questions like voice-assistants typically do. For example, a Dialogflow agent for a Laundry project will provide information only about the configurable parameters of the laundry (like state, temperature, wash cycle etc.)
This is now a part of Espressif’s Voice Assistant SDK and is available on github here: https://github.com/espressif/esp-va-sdk. To get started, see this.
The underlying technologies used by the Dialogflow implementation for VA SDK includes:
- gRPC
- Google Protobufs
- HTTP 2.0
You can see a demo video of Dialogflow on ESP32 LyraT below:
Note that the current Dialogflow SDK does not yet include support for creating custom wake words. Conversations initiated with a tap-to-talk button are supported.