Home Assistant Voice Recognition with Rhasspy

With the impending demise of Snips, I’ve been looking for a suitable replacement offline speech recognition solution. After some research, Rhasspy seems like a real winner. Besides supporting a variety of toolkits, it has good documentation, and can be easy to get working.

A Raspberry Pi3B with stock Debian really struggled with this at times. It might be possible to alleviate this by picking different services or adjusting other configuration, but you might be better off just using a more powerful device (like a Pi4 or Jetson Nano) or running it remotely.

Installation

Normally, I like to go through manual installation. But installing Pocketsphinx and OpenFST for Jasper was enough of a headache that I decided to go the container route.

Follow the Rhasspy installation docs. I’m runnning both Hass and Rhasspy on the same Raspberry Pi. From my PC I connect to the pi as pi3.local- adjust this based on the name of your device or use the IP address. If working directly on the device everything is localhost.

If you haven’t already, install docker using the convenience script:

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER

You can run Rasspy with the recommended:

docker run -d -p 12101:12101 \
      --restart unless-stopped \
      -v "$HOME/.config/rhasspy/profiles:/profiles" \
      --device /dev/snd:/dev/snd \
      synesthesiam/rhasspy-server:latest \
      --user-profiles /profiles \
      --profile en

Or, use docker-compose:

Install docker-compose via alternative install options:
```
 sudo pip install docker-compose
```

Use the recommended docker-compose.yml:

 rhasspy:
     image: "synesthesiam/rhasspy-server:latest"
     restart: unless-stopped
     volumes:
         - "$HOME/.config/rhasspy/profiles:/profiles"
     ports:
         - "12101:12101"
     devices:
         - "/dev/snd:/dev/snd"
     command: --user-profiles /profiles --profile en

Run: docker-compose up

If docker-compose up fails with ImportError: No module named ssl_match_hostname see this issue:

# Remove problematic `ssl-match-hostname`
sudo pip uninstall backports.ssl-match-hostname docker-compose
# Install alternative `ssl-match-hostname`
sudo apt-get install -y python-backports.ssl-match-hostname \
    python-backports.shutil-get-terminal-size
# Reinstall docker-compose
sudo pip install docker-compose

Docker Shell

When running things with docker, it takes an extra step to have a shell in the context of the container.

Show running containers with docker ps or docker container ls:

 CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS              PORTS                      NAMES
 4181a2880c84        synesthesiam/rhasspy-server:latest   "/run.sh --user-prof…"   26 hours ago        Up 4 minutes        0.0.0.0:12101->12101/tcp   pi_rhasspy_1

Get a shell to the container:

 docker exec -it pi_rhasspy_1 /bin/bash
 # Now you're in the container
 root@4181a2880c84:/#

Replace pi_rhasspy_1 with the “container id” or “name” of the appropriate container.

Configuration

Once docker outputs rhasspy_1 | Running on https://0.0.0.0:12101 (CTRL + C to quit) Rhasspy should be up and running. Ignore what it says and use http instead of https- point your browser at http://pi3.local:12101.

At this point I was able to configure everything via the Settings tab. Should that not cooperate, everything can also be done via json.

Audio

The first things to get working are audio input and output. Refer back to an earlier post about working with ALSA.

Settings > Microphone (“Audio Recording”)
1. Use arecord directly (ALSA) (default is PyAudio)
2. Select appropriate Input Device
Settings > Sounds (“Audio Playing”)
1. Use aplay directly (ALSA)
2. Select appropriate Output Device

To verfy audio recording/playback works, from a docker shell use arecord and aplay.

If, instead of ALSA for input you want to use PyAudio, it’s handy to see what PyAudio sees:

# Install pyaudio
sudo apt-get install -y python-pyaudio
# Launch python REPL
python

Then, run the following (from SO#1, SO#2):

import pyaudio
p = pyaudio.PyAudio()
for i in range(p.get_device_count()):
    print p.get_device_info_by_index(i)

## OR

import pyaudio
p = pyaudio.PyAudio()
info = p.get_host_api_info_by_index(0)
numdevices = info.get('deviceCount')
for i in range(0, numdevices):
        if (p.get_device_info_by_host_api_device_index(0, i).get('maxInputChannels')) > 0:
            print "Input Device id ", i, " - ", p.get_device_info_by_host_api_device_index(0, i).get('name')

Rhasspy TTS

Testing text-to-speech also seems to be the easist way to validate your audio output is working.

Settings > Text to Speech
- eSpeak didn’t work for me, but both flite and pico-tts did
Speech tab, in Sentence put hello and Speak
Check Log tab for FliteSentenceSpeaker lines to see e.g. command lines it’s using

Intent Recognition

One way to validate audio input is to setup Rhasspy to recognize intents.

Settings > Intent Recognition
- Default OpenFST should work
Sentences tab to configure recognized intents
- Uses a simplified JSGF syntax
Speech tab, use Hold to Record or Tap to Record for mic input

Saying what time is it should output:

{
 "intent":{
     "entities":{},
     "hass_event":{
         "event_data":{},
         "event_type": "rhasspy_GetTime"
     },
     "intent":{
         "confidence": 1,
         "name": "GetTime"
     },
     "raw_text": "what time is it",
 }
}

Wake word

Another way to validate audio input is to setup a phrase to trigger Rhasspy to recognize intents (i.e. hey siri, ok google, etc.)

Settings > Wake Word
PocketSphinx is the only fully open/offline option
- “Wake Keyphrase” is the trigger phrase
Save Settings and wait for Rhasspy to restart
Train (mentioned in the docs)
Check Log for PocketsphinxWakeListener: Hotword detected

If your wake keyphrase contains a new word, the log will complain it’s not in dictionary.txt after you Save Settings:

[WARNING:955754080] PocketsphinxWakeListener: XXX not in dictionary
[DEBUG:3450672] PocketsphinxWakeListener: Loading wake decoder with hmm=/profiles/en/acoustic_model, dict=/profiles/en/dictionary.txt

It seems like either adding a custom word via the Words tab and/or hitting Train should fix this, but I haven’t yet figured out the correct incantation.

Hass Integration

Integrating with Home Assistant is accomplished by leveraging Hass’ REST API and POSTing to /api/events endpoint.

Hass: Create long-lived access token
1. Open Hass user profile: http://pi3.local:8123/profile
2. Long-Lived Access Tokens > Create Token
  - Also read Hass authetication docs
Rhasspy: Configure intent handling with Hass
1. Open Rhasspy: http://pi3.local:12101
2. Settings > Intent Handling
3. Hass URL http://172.17.0.1:8123 (the docker host, 172.17.0.2 is the container itself)
  - If not using docker could instead use localhost
4. Access Token the token from above
5. Save Settings > OK to restart

Check Hass REST API is working:

# Replace `<TOKEN>` with Hass Long-lived access token
curl -X GET -H "Authorization: Bearer <TOKEN>" -H "Content-Type: application/json" http://pi3.local:8123/api/

Should return:

{"message": "API running."}

Note that from within the container you can’t connect to services outside the container using localhost. There’s a few different ways to do this, but that’s why we’re using 172.17.0.1 above:

# Shell into container
docker exec -it pi_rhasspy_1 /bin/bash
# Try Hass REST API to `localhost`
# Replace `<TOKEN>` with Hass Long-lived access token
curl -X GET -H "Authorization: Bearer <TOKEN>" -H "Content-Type: application/json" http://localhost:8123/api/
curl: (7) Failed to connect to localhost port 8123: Connection refused

Let’s test the Rhasspy->Hass connection:

Open Hass: http://pi3.local:8123
Developer Tools > Events > Listen to events
- rhasspy_GetTime and Start Listening.
Like for intent recognition, say “what time is it”

Hass should output:

{
 "event_type": "rhasspy_GetTime",
 "data": {},
 "origin": "REMOTE",
 "time_fired": "2019-12-17T16:02:51.366090+00:00",
 "context": {
     "id": "012345678901234567890123456789",
     "parent_id": null,
     "user_id": "deadbeefdeadbeefdeadbeefdeadbeef"
 }
}

Let’s test Hass automation:

Open Hass: http://pi3.local:8123
Configuration > Automation > +
Create an Event trigger:
- Triggers
  - Trigger type: Event
- Actions
  - Action type: Call service
  - Service: system_log.write
  - Service data: {message: 'Hello event'}
Like for intent recognition, say “what time is it”
In Hass, Developer Tools > Logs should show the message.

Hass TTS

To use Rhasspy’s TTS we can leverage its REST API:

curl -X POST -d "hello world" http://pi3.local:12101/api/text-to-speech

To trigger this from Hass, we can use the RESTful Command integration. In configuration.yaml:

rest_command:
  tts:
    url: http://localhost:12101/api/text-to-speech
    method: POST
    payload: ''

The payload is Jinja2 template that can be set by the caller.

Test the tts REST command:

Open Hass: http://pi3.local:8123
Developer Tools > Services
Specify rest_command.tts service and with data message: "hello"
Call Service to trigger Rhasspy TTS

Let’s add it to our Hass automation:

Configuration > Automation
Edit the previous item (click the pencil- ✎)
Add Action:
- Action type: Call service
- Service: rest_command.tts (it should auto-complete for you)
- Service data: {message: 'hello world'}
Like for intent recognition, say “what time is it”

This should trigger a full loop:

speech -> Rhasspy -> intent -> Hass -> text -> Rhasspy -> speech

Systemd

I’d like Rhasspy to auto-start similar to Hass.

It would seem that mixing docker with systemd is bad mojo, making me contemplate re-installing Hass via docker. Docker says little on starting containers with systemd other than don’t cross the streams with restarts. And so far google has turned up dubious results- mostly from several years ago that don’t work with current versions of docker.

Create /etc/systemd/system/rhasspy@homeassistant.service:

[Unit]
Description=Rhasspy
Wants=home-assistant@homeassistant.service 
Requires=docker.service
After=home-assistant@homeassistant.service docker.service

[Service]
Type=exec
ExecStart=docker run --rm \
	--name rhasspy \
	-p 12101:12101 \
	-v "/home/homeassistant/.config/rhasspy/profiles:/profiles" \
	--device /dev/snd:/dev/snd \
	synesthesiam/rhasspy-server:latest \
	--user-profiles /profiles --profile en 
ExecStop=docker stop rhasspy
# Restart on failure
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target


Wants/Requires/After	Docker must be running, and ideally Hass is (but we can start Rhasspy without it)	man
Type	Stronger requirement than `simple` ensuring the process starts	man
ExecStart	Start the container
ExecStop	Stop the container by name

For ExecStart, note a few differences from the original docker run:


`--rm`	Remove the container on exit. Otherwise we get “name taken” errors on restarts.
`--name`	Give it a predictable name to simplify `ExecStop`, and make it easier to open docker shells
`-v`	Docker defaults to creating files as root. `/srv/` might be better, but I thought this would make the profiles easier to find.
`--restart unless-stopped`	Removed since systemd is managing the lifetime.

Configure it to auto-start and start it:

sudo systemctl --system daemon-reload
sudo systemctl enable rhasspy@homeassistant
sudo systemctl start rhasspy@homeassistant

To debug:

# Check running containers
docker container ls
# Check log output
sudo journalctl -f -u rhasspy@homeassistant
# Open docker shell
docker exec -it rhasspy /bin/bash

Note, if you fail to remove $HOME from docker run it will fail with:

Dec 18 19:00:25 pi3 docker[4764]: /usr/bin/docker: invalid reference format.

Rendered Obsolete