Home Assistant Voice Recognition with Snips

After the initial install of Home Assistant, I’ve been eager to get some basic voice recognition working. One of my early goals was for it to be “offline”; meaning, not use Amazon or Google.

Hardware

Raspberry Pi 3 model B
- Running Raspbian Stretch
A USB microphone
- ReSpeaker USB 4 Mic Array
- Microsoft LifeCam HD-3000

I was originally working with the HD-3000, but wasn’t very happy with the recording quality. I’m still experimenting with the ReSpeaker, but it definitely seems better. In any case, configuration was pretty similar- and likely the same goes for any other USB microphone.

Basic Alsa Audio

First, we need to get audio working; both a microphone and speaker.

Good, concise documentation that explains what’s going on with Raspberry Pi/Debian audio has eluded me thus far. Most of this is extracted from random forum posts, Stack Overflow, and a smattering of trial and error.

You can record from a microphone with arecord. Abridged arecord --help output:

Usage: arecord [OPTION]... [FILE]...

-l, --list-devices      list all soundcards and digital audio devices
-L, --list-pcms         list device names
-D, --device=NAME       select PCM by name
-t, --file-type TYPE    file type (voc, wav, raw or au)
-c, --channels=#        channels
-f, --format=FORMAT     sample format (case insensitive)
-r, --rate=#            sample rate
-d, --duration=#        interrupt after # seconds
-v, --verbose           show PCM structure and setup (accumulative)

List various devices. arecord -l:

**** List of CAPTURE Hardware Devices ****
card 1: Dummy [Dummy], device 0: Dummy PCM [Dummy PCM]
  <SNIP>
card 2: ArrayUAC10 [ReSpeaker 4 Mic Array (UAC1.0)], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Then, arecord -L:

null
    Discard all samples (playback) or generate zero samples (capture)
default
<Bunch of CARD=Dummy>
sysdefault:CARD=ArrayUAC10
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Default Audio Device
<Bunch of CARD=ArrayUAC10 speakers/output>
dmix:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Direct sample mixing device
dsnoop:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Direct sample snooping device
hw:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Direct hardware device without any conversions
plughw:CARD=ArrayUAC10,DEV=0
    ReSpeaker 4 Mic Array (UAC1.0), USB Audio
    Hardware device with all software conversions

To record using the ReSpeaker (card ArrayUAC10):

# `-d 3` records for 3 seconds (otherwise `Ctrl+c` to stop)
# `-D` sets the PCM device
arecord -d 3 -D hw:ArrayUAC10 tmp_file.wav

It may output:

Recording WAVE 'tmp_file.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono
arecord: set_params:1299: Sample format non available
Available formats:
- S16_LE

Like arecord -L says, hw: is “Direct hardware device without any conversions”. We either need to record in a supported format, or use plughw: (“Hardware device with all software conversions”). Either of these work:

arecord -d 3 -D plughw:ArrayUAC10 tmp_file.wav
# `-f S16_LE` signed 16-bit little endian
# `-c 6` six channels
# `-r 16000` 16kHz
arecord -f S16_LE -c 6 -r 16000 -d 3 -D hw:ArrayUAC10 tmp_file.wav

You can get a list of supported parameters with arecord --dump-hw-params -D hw:ArrayUAC10:

HW Params of device "hw:ArrayUAC10":
--------------------
ACCESS:  MMAP_INTERLEAVED RW_INTERLEAVED
FORMAT:  S16_LE
SUBFORMAT:  STD
SAMPLE_BITS: 16
FRAME_BITS: 96
CHANNELS: 6
RATE: 16000
PERIOD_TIME: [1000 2730625]
PERIOD_SIZE: [16 43690]
PERIOD_BYTES: [192 524280]
PERIODS: [2 1024]
BUFFER_TIME: [2000 5461313)
BUFFER_SIZE: [32 87381]
BUFFER_BYTES: [384 1048572]
TICK_TIME: ALL
--------------------

In online resources you’ll see values similar to hw:2,0, which means “card 2, device 0”. Looking at the arecord -l output, it’s the same as hw:ArrayUAC10 since the ReSpeaker only has the one device.

You can play the recorded audio with aplay. Looking at the output from aplay -L, I can:

aplay -D plughw:SoundLink tmp_file.wav

If you’re not hearing anything also check alsamixer to make sure the output is not muted and the volume isn’t 0.

There’s at least two configuration files that can affect behaviour of arecord/aplay:

/etc/asound.conf
~/.asoundrc

For example, after changing the default sound card via Audio Device Settings my ~/.asoundrc contains:

pcm.!default {
	type hw
	card 2
}

ctl.!default {
	type hw
	card 2
}

If I check aplay -l, “card 2” is my Bose Revolve SoundLink USB speaker:

**** List of PLAYBACK Hardware Devices ****
card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA]
  <SNIP>
card 0: ALSA [bcm2835 ALSA], device 1: bcm2835 IEC958/HDMI [bcm2835 IEC958/HDMI]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 0: ALSA [bcm2835 ALSA], device 2: bcm2835 IEC958/HDMI1 [bcm2835 IEC958/HDMI1]
  Subdevices: 1/1
  Subdevice #0: subdevice #0
card 1: Dummy [Dummy], device 0: Dummy PCM [Dummy PCM]
  <SNIP>
card 2: SoundLink [Bose Revolve SoundLink], device 0: USB Audio [USB Audio]
  Subdevices: 1/1
  Subdevice #0: subdevice #0

Voice Recognition with Snips

At first I was thrilled to find Snips:

Completely offline (once you create and download the assistant)
Good integration with Home Assistant
Simple to install and configure

But after initial (successful) experimentation, there’s a few largish problems:

May be issues on Debian Buster
Dubious support for devices other than Raspberry Pi
Worst of all, post-acquisition they’re killing the “console” web-app you need to make assistants

Oops. Hopefully it will return in another form.

Install

Following the manual setup instructions:

sudo apt-get install -y dirmngr
sudo bash -c  'echo "deb https://raspbian.snips.ai/$(lsb_release -cs) stable main" > /etc/apt/sources.list.d/snips.list'
sudo apt-key adv --fetch-keys https://raspbian.snips.ai/531DD1A7B702B14D.pub
sudo apt-get update
sudo apt-get install -y snips-platform-voice

Create an Assistant

An “assistant” defines what voice commands Snips handles. Need to create an assistant via the (soon to be shutdown) Snips Console:

Click Add an App
Click + of apps of interest
Click Add Apps button
Wait for training to complete
Click Deploy Assistant button
Download and install manually

Install the assistant:

pc> scp ~/Downloads/assistant_proj_XYZ.zip pi@pi3.local:~
pc> ssh pi@pi3.local

sudo rm -rf /usr/share/snips/assistant/
sudo unzip ~/assistant_proj_1mE9N2ylKWa.zip -d /usr/share/snips/
sudo systemctl restart 'snips-*'

At this point Snips should be working. If triggered with the wake word (default is hey snips), it should send “intents” over MQTT.

Verification/Troubleshooting

Check all services are green and active (running):

sudo systemctl status 'snips-*'

Initially, the Snips Audio Server was unable to start. Check output in syslog:

tail -f /var/log/syslog

It was unable to open the “default” audio capture device:

Dec  5 07:22:25 pi3 snips-audio-server[28216]: INFO:snips_audio_alsa::capture: Starting ALSA capture on device "default"
Dec  5 07:22:25 pi3 snips-audio-server[28216]: ERROR:snips_audio_server       : an error occured in the audio pipeline: Error("snd_pcm_open", Sys(ENOENT))
Dec  5 07:22:25 pi3 snips-audio-server[28216]:  -> caused by: ALSA function 'snd_pcm_open' failed with error 'ENOENT: No such file or directory'

We could set the “default” device. Or, /etc/snips.toml contains platform configuration where we can specify values from above:

[snips-audio-server]
alsa_capture = "plughw:ArrayUAC10"
alsa_playback = "plughw:SoundLink"

snips-watch shows a lot of information:

sudo apt-get install -y snips-watch
snips-watch -vv

I added the weather app to my Snips assistant. So, if I say, “hey snips, what’s the weather?” snips-watch should output:

[15:00:52] [Hotword] detected on site default, for model hey_snips
[15:00:52] [Asr] was asked to stop listening on site default
[15:00:52] [Hotword] was asked to toggle itself 'off' on site default
[15:00:52] [Dialogue] session with id 'e39a4367-e167-467c-912a-e047f49bea7a' was started on site default
[15:00:52] [Asr] was asked to listen on site default
[15:00:54] [Asr] captured text "what 's the weather" in 2.0s with tokens: what[0.950], 's[0.950], the[1.000], weather[1.000]
[15:00:54] [Asr] was asked to stop listening on site default
[15:00:55] [Nlu] was asked to parse input "what 's the weather"
[15:00:55] [Nlu] detected intent searchWeatherForecast with confidence score 1.000 for input "what 's the weather"
[15:00:55] [Dialogue] New intent detected searchWeatherForecast with confidence 1.000

Instead of snips-watch, you can use any MQTT client:

sudo apt-get install -y mosquitto-clients
# Subscribe to all topics
mosquitto_sub -p 1883 -t "#"

Home Assistant and Snips

Now that we know Snips is working, we can integrate it with Home Assistant.

Snips uses MQTT by default, and Hass has optional MQTT integration. You can either:

Have Hass use Snips’ broker
- The Hass documentation incorrectly says the Snips broker is running on port 9898. Currently the default is 1883, but consult /etc/snips.toml.
Have Snips use Hass’ broker

Since we did everything from scratch, Hass doesn’t have a broker. So, we should point Hass at the instance that got installed with Snips. In configuration.yaml:

# Enable snips (VERY IMPORTANT)
snips:
# Setup MQTT
mqtt:
  broker: 127.0.0.1
  port: 1883

Restart Hass and from the UI pick ☰ > Developer Tools > MQTT > Listen to a Topic and enter hermes/intent/# (all Snips intents) then Start Listening.

Now say “hey snips, what’s the weather” and you should see a message for searchWeatherForecast intent pop up.

To test TTS, in Developer Tools > Services try snips.say service with data text: hello and Call Service. You should be greeted by a robo-voice from the speaker.

Let’s try a basic intent script triggered on the intent. In configuration.yaml:

intent_script:
  searchWeatherForecast:
    speech:
      text: 'Hello intent'
    action:
      - service: system_log.write
        data_template:
          message: 'Hello intent'
          level: warning

Again, restart Hass and then saw “hey snips, what’s the weather”. Now when Hass receives the intent, the TTS engine will say “hello intent” and output the same to Developer > Logs.

The End?

It’s a total bummer the future of Snips is uncertain because it was perfect for voice controlled home automatic. But, that would be why it was acquired.

Rendered Obsolete