March 31, 2018

Voice Recognition on the Orange Pi Zero (DietPi Armbian)

With a seeed studio ReSpeaker 2-Mics Pi HAT, it becomes possible to move my voice recognition project over to a Raspberry Pi 3. Hotword recognition will be done with snowboy from KITT.AI. The good news is that the ReSpeaker HAT seems to work well. The bad news is that there is now some sort of incompatibility between snowboy and Google Assistant Service.

In this post, I will show a slight change I made to the new version of snowboydecoder.py that I believe will be helpful when use of the command recording capability of snowboy is made, especially with small systems relying on SD cards for storage. I will also add a number of new demonstration programs based on those from KITT.AI to explore the different ways to combine snowboy and Google Assistant and to make it easier to experiment with the many options of snowboydecoder.py.

In the Beginning

Audio Input and Output

Speech to Text Prerequisites

Installing snowboy and Voice Recognition

Modifying snowboydecoder.py and demo4.py

Google Assistant

Observations

Downloads

toc

In the Beginning

As before, I used Etcher as per the instructions at raspberrypi.org to burn the newest Rasbpian image available from the Raspberry Foundation (Raspbian Stretch Lite, 2017-11-29) which can be found here.

Before burning the image, you should uncheck the Auto-unmount on success option in the Etcher Settings. If this is not done, it will be necessary to remove and reinsert the SD card in the desktop SD card reader to perform the next step which consists of an empty file called ssh in the card's boot partition in order to configure the RPi3 without monitor and keyboard. It will be necessary to initially have a working Ethernet connection to the Raspberry Pi in order to configure it.

michel@hp:~$ sudo touch /media/michel/boot/ssh

Using zenmap to scan my local network, I was able to start an ssh session on the Raspberry Pi.

michel@hp:~$ ssh pi@192.168.0.134 pi@192.168.0.134's password: raspberry not echoed to the screen Linux raspberrypi 4.9.59-v7+ #1047 SMP Sun Oct 29 12:19:23 GMT 2017 armv7l ... ... SSH is enabled and the default password for the 'pi' user has not been changed. This is a security risk - please login as the 'pi' user and type 'passwd' to set a new password.

I used raspi-config to change the configuration to suit my situation.

pi@raspberrypi:~ $ sudo raspi-config

Change password for the current user mandatory
Configure network settings
- N1 Hostname on the network to rpi3
- N2 Provide Wi-fi credentials (network name and password)
Boot Options - did nothing
Localisation Options
- I1 Change Locale to add fr_CA
- I2 Change Timezone - to America/Moncton
Interfacing Options

P2 SSH - enabled mandatory
P4 SPI - enabled
P5 I2C - enabled

Overclock - did nothing
Advanced Options

A1 Expland Filesystem
A3 Memory Split - minimum 16 MB for the GPU
A4 Audio - 1 Force 3.5mm ('headphone') jack

I rebooted as asked and then updated and upgraded the system.

pi@raspberrypi:~ $ sudo apt update && sudo apt upgrade Get:1 http://mirrordirector.raspbian.org/raspbian stretch InRelease [15.0 kB] ... 94 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. Need to get 147 MB of archives. After this operation, 290 kB of additional disk space will be used. Do you want to continue? [Y/n] y

Again I rebooted and this time logged in via the wireless network.

michel@hp:~$ ssh pi@192.168.0.135 pi@192.168.0.135's password: Linux rpi3 4.9.80-v7+ #1098 SMP Fri Mar 9 19:11:42 GMT 2018 armv7l

Note how the kernel version is now 4.9.80 (2018-03-09) while the intial version was 4.9.59 (2017-10-19). To complete my initial set up, I installed git.

pi@rpi3:~ $ sudo apt update ... pi@rpi3:~ $ sudo apt install git

I also added the Python 3 virtual environment utilities as explained in Python 3 virtual environments.

While not mandatory and not even recommended, I decided to use static IP addresses for both the Ethernet and the WiFi interfaces. I just get tired of doing network scans to find headless systems whenever the router has to be restarted, which happens just a bit too often in this household. The following command confirmed that "classic" network interface manes are still in use.

pi@rpi3:~ $ ls /sys/class/net eth0 lo wlan0

Following the instructions for the dhcpcd method of setting up static addresses on the Raspberry Pi StackExchange, I first backed up the dchcpc configuration file and then edited it.

pi@rpi3:~ $ sudo cp /etc/dhcpcd.conf /etc/dhcpcd.conf.bak pi@rpi3:~ $ sudo nano /etc/dhcpcd.conf

... # Example static IP configuration: #interface eth0 #static ip_address=192.168.0.10/24 #static ip6_address=fd51:42f8:caae:d92e::ff/64 #static routers=192.168.0.1 #static domain_name_servers=192.168.0.1 8.8.8.8 fd51:42f8:caae:d92e::1 interface eth0 static ip_address=192.168.1.34/24 static routers=192.168.1.1 static domain_name_servers=192.168.1.1 interface wlan0 static ip_address=192.168.1.35/24 static routers=192.168.1.1 static domain_name_servers=192.168.1.1 ...

Finally, I shut down the Raspberry Pi and made a backup copy of its SD card on my desktop computer.

michel@hp:~/Téléchargements/Devices/RPi/Stretch$ sudo umount /dev/sde2 michel@hp:~/Téléchargements/Devices/RPi/Stretch$ sudo umount /dev/sde1 michel@hp:~/Téléchargements/Devices/RPi/Stretch$ sudo dd bs=4M if=/dev/sde of=backup-rasbian-18-03-17.img

toc

Audio Input and Output

My previous post showed how to install the drivers for the ReSpeaker 2-Mics Pi Hat. Here is a summary of what needs to be done.

With the power off, plug in the card onto the Raspberry Pi GPIO header and then connect powered speakers to the 3.5 mm jack on the card. Clone two git repositories. The first contains the LED drivers that will be used later, the second contains the sound capture and playback drivers.

pi@rpi3:~ $ git clone https://github.com/respeaker/mic_hat.git ... pi@rpi3:~ $ git clone https://github.com/respeaker/seeed-voicecard.git ... pi@rpi3:~ $ cd seeed-voicecard pi@rpi3:~/seeed-voicecard $ sudo ./install.sh 2mic ... ------------------------------------------------------ Please reboot your raspberry pi to apply all settings Enjoy! ------------------------------------------------------ pi@rpi3:~/seeed-voicecard $ reboot

toc

Speech to Text Prerequisites

Quite a few packages must be installed in order to use the snowboy and Voice Recognition Python libraries. Fortunately, installing all the prerequisites is pretty straight forward in Raspbian Stretch (Debian 9) compared to installation DietPi Armbian Jessie (Debian 8). For example, the GNU compilers and make utility are already loaded.

pi@rpi3:~ $ apt-cache policy build-essential build-essential: Installed: 12.3 Candidate: 12.3 Version table: *** 12.3 500 500 http://mirrordirector.raspbian.org/raspbian stretch/main armhf Packages 100 /var/lib/dpkg/status ... pi@rpi3:~ $ g++ --version g++ (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516

The Simplified Wrapper and Interface Generator (SWIG) is needed to create some Python wrappers of C/C++ libraries. It turns out that snowboy needs version 3.0.10. A check shows that the latest version available in the Stretch depository is recent enough but it needs to be installed.

dietpi@domopiz:~$ apt-cache policy swig swig: Installed: (none) Candidate: 3.0.10-1.1 Version table: 3.0.10-1.1 500 500 http://mirrordirector.raspbian.org/raspbian stretch/main armhf Packages pi@rpi3:~ $ sudo apt-get install swig ... Need to get 1,510 kB of archives. After this operation, 5,588 kB of additional disk space will be used. Do you want to continue? [Y/n] y ... Setting up swig (3.0.10-1.1) ... pi@rpi3:~ $ ls -l /usr/bin/swig* lrwxrwxrwx 1 root root 7 Nov 28 2016 /usr/bin/swig -> swig3.0 -rwxr-xr-x 1 root root 1428152 Nov 28 2016 /usr/bin/swig3.0

Compiling the snowboy python wrapper requires the ATLAS (Automatically Tuned Linear Algebra Software) package. It automatically generates an optimized Basic Linear Algebra Subroutines (BLAS) library. It also provides a subset of the linear algebra routines from the Linear Algebra Package (LAPACK) library. It remains amazing to me that using a big and complex package, at least in my estimation, to monitory a continuous stream of sound to pick out a key word will occupy one of the four cores of the Raspberry Pi 3 only a small percentage of its time.

pi@rpi3:~ $ sudo apt-get install libatlas-base-dev ... Need to get 10.5 MB of archives. After this operation, 49.6 MB of additional disk space will be used. Do you want to continue? [Y/n] y

Both VoiceRecognition and snowboy rely on the PyAudio Python module which is wrapper for the cross-platform audio I/O library PortAudio. Of course, the latter must be present.

pi@rpi3:~ $ sudo apt-get install portaudio19-dev Reading package lists... Done ... Need to get 779 kB of archives. After this operation, 2,832 kB of additional disk space will be used. Do you want to continue? [Y/n] y

VoiceRecognition also requires FLAC (Free Lossless Audio Codec). It is an open source lossless alternative to MP3.

pi@rpi3:~ $ sudo apt-get install flac Reading package lists... Done ... Do you want to continue? [Y/n] y ... Setting up flac (1.3.2-1) ... ...

toc

Installing snowboy and Voice Recognition

At this point I created a directory and started a Python virtual environment.

pi@rpi3:~$ mkdir hestia pi@rpi3:~$ cd hestia pi@rpi3:~/hestia $ mkvenv venv creating virtual environment /home/pi/hestia/venv updating virtual environment /home/pi/hetias/venv

The command mkvenv creates and updates a Python 3 virtual environment, while ve activates it. This is documented in a previous post: Python 3 virtual environments.

Install the Python PyAudio module.

(venv) pi@rpi3:~/hestia $ pip install PyAudio ... Successfully installed PyAudio-0.2.11

At last, snowboy can be installed. First we get the source from the GitHub repository and then Swig is used to create the missing scripts.

(venv) pi@rpi3:~/hestia $ git clone https://github.com/Kitt-AI/snowboy.git Cloning into 'snowboy'... (venv) pi@rpi3:~/hestia $ cd snowboy/swig/Python3 (venv) pi@rpi3:~/hestia/snowboy/swig/Python3 $ make ...

Aside from a warning, everything seemed to go well. The demonstration scripts in examples/Python3 in the parent directory can now be tested.

(venv) pi@rpi3:~/hestia/snowboy/swig/Python3 $ cd ../../examples/Python3 (venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ python demo.py resources/snowboy.umdl Traceback (most recent call last): File "demo.py", line 1, in import snowboydecoder File "/home/dietpi/snow2/examples/Python3/snowboydecoder.py", line 5, in from . import snowboydetect SystemError: Parent module '' not loaded, cannot perform relative import

That was a bit of an anticlimax. However, it was relatively easy to fix the problem: just remove the relative path to snowboydetect.py in the import command of snowboydecoder.py.

(venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ nano snowboydecoder.py

!/usr/bin/env python import collections import pyaudio #from . import snowboydetect import snowboydetect or just delete "from ." import time import wave import os import logging ... save and exit nano

Now the first three demonstration scripts work:

(venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ python demo.py resources/models/snowboy.umdl Listening... Press Ctrl+C to exit ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ... ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream "hello" "snowman" "snowbot" "snowboy" INFO:snowboy:Keyword 1 detected at time: 2018-03-18 19:41:34 it works! press the keys CtrlC to break out of the loop ^C (venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ python demo2.py resources/models/snowboy.umdl resources/alexa/alexa_02092017.umdl lsListening... Press Ctrl+C to exit ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ... ALSA lib pcm.c:2239:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline "alex" "alexi" "alexa" INFO:snowboy:Keyword 2 detected at time: 2017-11-24 12:43:10 success! say: "snowboy" INFO:snowboy:Keyword 1 detected at time: 2017-11-24 12:43:12 success! press the keys CtrlC to break out of the loop ^C (venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ python demo3.py resources/snowboy.wav resources/models/snowboy.umdl Hotword Detected! as expected (venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ python demo3.py resources/ding.wav resources/models/snowboy.umdl Hotword Not Detected! as expected.

There is a new universal hotword model: jarvis.umdl. Be careful, the file actually contains two models so that it cannot be used instead of snowboy.umdl or alexa.umdl. The ApplyFrontend function has an increased role which is unfortunately not well documented. Look at the code in my modified version of snowboydetector.py for details that I have gleaned.

The fourth demonstration script, which was not there back in November of last year was a surprise. It requires the SpeechRecognition Python module.

(venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ pip install -U SpeechRecognition ... Successfully installed SpeechRecognition-3.8.1 (venv) pi@rpi3:~/hestia/snowboy/examples/Python3 $ python demo4.py resources/models/snowboy.umdl Listening... Press Ctrl+C to exit ... ALSA lib pcm_dmix.c:990:(snd_pcm_dmix_open) The dmix plugin supports only playback stream ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream "snowboy how are you today?" INFO:snowboy:Keyword 1 detected at time: 2018-03-19 00:53:48 recording audio...converting audio to text how are you today "snowboy turn the light on" INFO:snowboy:Keyword 1 detected at time: 2018-03-20 21:56:22 recording audio...converting audio to text turn the light on

Looking at the source code, it is obvious that a new callback function has been added to the detector.start function that makes it easier to perform continuous speech recognition. Here is the documentation about the callback found in the source file.

if [audio_recorder_callback is] specified, this will be called after a keyword has been spoken and after the phrase immediately after the keyword has been recorded. The [callback] function will be passed the name of the file where the phrase was recorded.

detector.stat

toc

Modifying snowboydecoder.py and demo4.py

The phrase following the hotword is recorded and saved to a normal wav file in the current directory. I thought it best to modify snowboydecoder.py to save the recording in a temporary filesystem (tempfs) to avoid wearing down the Raspberry Pi SD card. As an added bonus, the process should be faster. I added the procedure that creates a temporary file name at the end of the __init__ code of the HotwordDetector class.

try: (fd, self.filename) = tempfile.mkstemp(suffix='.wav', dir='/run/user/%d' % os.getuid()) except IOError: (fd, self.filename) = tempfile.mkstemp(suffix='.wav') os.close(fd) os.unlink(self.filename)

Then the saveMessage function has to be modified. Instead of naming the recorded sound file filename it must be named self.filename which was defined in the __init__ code. And of course the line defining the filename
filename = 'output' + str(int(time.time())) + '.wav'
must be removed.

I want the Raspberry Pi 3 / ReSpeaker mics-2 HAT combination to behave a bit more like my Google Home mini. I have set up the latter to respond with a sound and flashing LEDs when it detects its hotword ("Ok Google"). It turned out to be simple to add this feature.

First I copied the apa102.py and pixels.py scripts from the mic_hat directory created when installing and testing the ReSpeaker sound card. I decided to use the listen function of pixels.py as the visual clue that snowboy detected a hotword. I tweaked the _listen definition, adding self._off() at the very end so that the LEDS turn off automatically once they have been flashed.

Instead of modifying demo4.py, I made a copy and saved it under the name sbdemo5.py. The detectedCallback function is modified so that it there is auditory and visual feedback on hotword detection in addition to the visual prompt on the console.

def detectedCallback(): pixels.listen() snowboydecoder.play_audio_file() print('yes ? ', end='', flush=True)

Of course, the pixels.py module must be imported and a Pixels object must be created. I also changed the silent count threshold because the default value of 15 resulted in a recording of too long a period of silence at the end of the command in my view.

pixels = Pixels() detector = snowboydecoder.HotwordDetector(model, sensitivity=0.38) print('Listening... Press Ctrl+C to exit') # main loop detector.start(detected_callback=detectedCallback, audio_recorder_callback=audioRecorderCallback, interrupt_check=interrupt_callback, sleep_time=0.01, silent_count_threshold=4) detector.terminate() pixels.off() time.sleep(1)

I added the following lines at the very top of the file so that it would be possible to shorten the command line when running the script.

#!../venv/bin/python # -*- coding: utf-8 -*- import time from pixels import Pixelsimport snowboydecoder

The script has to be made executable for this to work. Sharp eyed readers will spot that I moved the hotword model files to a different directory.

(venv) pi@rpi3:~/hestia/snowboy $ chmod +x sbdemo5.py (venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo5.py models/snowboy.umdl Listening... Press Ctrl+C to exit ... ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream "snowboy" INFO:snowboy:Keyword 1 detected at time: 2018-03-21 13:17:12 yes... [LEDS flash, "DING" is heard] "turn the lamp on" converting audio to text turn the lamp on

I made further changes to sbdemo5.py. The help screen explains what these are.

(venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo5.py -h usage: ./sbdemo5.py [-l <LANG>] [-m <MODEL>] [-p {splay, aplay}] [-s <SLEEP>] [-c <COUNT>] [-r <TIMEOUT>] [-d {0,1,2,3}] sbdemo5.py optional arguments: -h, --help show this help message and exit -l LANG, --lang LANG Spoken language (default en-US) -m MODEL, --model MODEL Snowboy hotword model file -p {snowp,aplay,mocp,none}, --player {snowp,aplay,mocp,none} Play recorded command with 'snowp': snowboydecoder.play_audio_file, 'aplay': ALSA aplay, 'mocp': for Music On Console player, or 'none': to not play -s SLEEP, --sleep SLEEP sleep_time (default 0.01) -c COUNT, --count COUNT silent_count_threshold (default 15) -r RECORD, --record RECORD recording_timeout (default 100) -d DETECTED, --detected DETECTED Detected signal: 0 - none, >0 - print yes, >1 - add pixels, >2 - add ding

The -s, -c and -r parameters are passed on to the HotwordDetector start method. Here is the information about these parameters in the source code.

float sleep_time: how much time in second every loop waits. silent_count_threshold: indicates how long silence must be heard to mark the end of a phrase that is being recorded. recording_timeout: limits the maximum length of a recording.

As mentioned, the silent count threshold is much too big in my estimation and I wanted to easily experiment with different value. At the same time always entering the relative path to the snowboy model file was tiresome so I set a default value in the code to make things simpler.

I thought it would be useful to hear just what snowboydecoder recorded after the hotword was detected so it is possible to play back the recorded message. By default, snowboydecoder.play_audio_file is used to play the sound file, but that can be changed to the ALSA aplay utility or the Music On Console player mocp if it is installed on the system. It is also possible to disable the playback of the recorded file altogether.

I found the timing between the hotword and the rest of the command to be passed on to the audio_recorder_callback function to be a bit delicate. So the actions done in between are optional:

    -d 0 - nothing is done,
    -d 1 - "yes" is printed,
    -d 2 - "yes" is printed and the LEDs are flashed,
    -d 3 - "yes" is printed, the LEDs are flashed and DING.WAV is played.

-d 3

No matter what DETECTED option is chosen, often, I find that the last part of the hotword (and DING if -d 3) are passed on the audio_recorder_callback function. This is probably not the best for optimal text recognition.

toc

Google Assistant

I prefer to use snowboy for offline hotword recognition instead of relying on the hotword ("Ok Google") recognition capabilities of the Google Assistant Library (as in google_assistant_demo). In that case, the Google Assistant Service (as in googlesamples-assistant-pushtotalk) is the better choice. For the curious, an overview contains a table outlining the differences between the Google Assistant Libray and Google Assistant Service.

To cut to the chase, here is how pushtotalk can be used once the hotword has been detected and the rest of the recorded command has been stored in a file named time_en.wav.

(venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time_en.wav

To test this, first record a file asking for the current time.

(venv) pi@rpi3:~/hestia/snowboy $ arecord -c 1 -r 16000 -f S16_LE time_en.wav Recording WAVE 'time5.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono "What time is it?" CtrlC ^CAborted by signal Interrupt... arecord: pcm_read:2103: read error: Interrupted system call (venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time_en.wav INFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:Transcript of user request: "what". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:End of audio request detected INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. "it's 1 32" INFO:root:Finished playing assistant response.

This works just as well with other supported languages.

(venv) pi@rpi3:~/hestia/snowboy $ arecord -c 1 -r 16000 -f S16_LE time_fr.wav Recording WAVE 'time_fr.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono "Quelle heure est-il ?" CtrlC ^CAborted by signal Interrupt... arecord: pcm_read:2103: read error: Interrupted system call (venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time_fr.wav --lang fr-CA INFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:Transcript of user request: "quelle". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:End of audio request detected INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. "il est 13 heures 36" INFO:root:Finished playing assistant response.

Sound file format

If the input sound file was not recorded with the correct format then there will be a difficult to interpret error message as in the following example.

(venv) pi@rpi3:~/hestia/snowboy $ arecord -f cd time.wav Recording WAVE 'time.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo "What time is it?" CtrlC ^CAborted by signal Interrupt... arecord: pcm_read:2103: read error: Interrupted system call (venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time.wav INFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:End of audio request detected INFO:root:Finished playing assistant response. ERROR:root:Exception iterating requests! Traceback (most recent call last): File "/home/pi/hestia/venv/lib/python3.5/site-packages/grpc/_channel.py", line 187, in consume_request_iterator request = next(request_iterator) File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/pushtotalk.py", line 124, in iter_assist_requests for c in self.gen_assist_requests(): File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/pushtotalk.py", line 202, in gen_assist_requests for data in self.conversation_stream: File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/audio_helpers.py", line 326, in return iter(lambda: self.read(self._iter_size), b'') File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/audio_helpers.py", line 307, in read return self._source.read(size) File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/audio_helpers.py", line 105, in read if self._wavep File "/usr/lib/python3.5/wave.py", line 242, in readframes data = self._data_chunk.read(nframes * self._framesize) File "/usr/lib/python3.5/chunk.py", line 136, in read data = self.file.read(size) File "/usr/lib/python3.5/chunk.py", line 136, in read data = self.file.read(size) ValueError: read of closed file

I thought that the simplest thing would be to invoke pushtotalk using a Python subprocess. Hopefully, small changes to a few lines of code in sbdemo5.py would allow for testing the snowboy/google_assistant combination. The result is in sbdemo6.py. Here is an example of its use.

(venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo6.py -s 0.03 -c 4 -p 'aplay' -o 'outcome.wav' Snowboy model file: models/snowboy.umdl Spoken language: en-US Play recorded command with aplay sleep_time: 0.03 silent_count_threshold: 4 recording_timeout: 100 hotword detected signal: (3) print "yes" + pixels + play ding Listening... Press Ctrl+C to exit ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ... ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream "snowboy" INFO:snowboy:Keyword 1 detected at time: 2018-03-27 10:36:00 yes... "Where is London Bridge?" passing on "googlesamples-assistant-pushtotalk -i /run/user/1000/tmpxacvjzuv.wav -o outcome.wav" ffINFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:Transcript of user request: "where". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "where is". .. INFO:root:Transcript of user request: "where is London Bridge". INFO:root:Playing assistant response. INFO:root:End of audio request detected INFO:root:Transcript of user request: "where is London Bridge". INFO:root:Playing assistant response. INFO:root:Finished playing assistant response. Playing the output sound file "outcome.wav" with aplay "London Bridge is in Lake Havasu City" Playing WAVE 'outcome.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono Playing recorded message in file "/run/user/1000/tmpxacvjzuv.wav" with aplay Playing WAVE '/run/user/1000/tmpxacvjzuv.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono Listening... Press Ctrl+C to exit

In the above example, when pushtotalk prints INFO:root:Playing assistant response it is actually recording the "spoken" response to the output wav file specified with the -o option in the command line. If this option is not included, pushtotalk plays the response itself as in the following example.

(venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo6.py -s 0.03 -c 4 -p 'aplay' ... Listening... Press Ctrl+C to exit ... "snowboy" INFO:snowboy:Keyword 1 detected at time: 2018-03-27 10:36:00 yes... "Where is London Bridge?" passing on "googlesamples-assistant-pushtotalk -i /run/user/1000/tmpxacvjzuv.wav -o outcome.wav" ffINFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. ... INFO:root:Transcript of user request: "where is London Bridge". INFO:root:Playing assistant response. "London Bridge is in Lake Havasu City" INFO:root:Finished playing assistant response. Playing recorded message in file "/run/user/1000/tmpxacvjzuv.wav" with aplay Playing WAVE '/run/user/1000/tmpxacvjzuv.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono Listening... Press Ctrl+C to exit

If pushtotalk is told to record its output to a sound file and mocp is used to play the sound files then most of the Google Assistant output will not be heard. That is because mocp is a server which plays a file until its end is reached or until it is asked to play another file. This is what happens here when mocp is told to play the recorded input file almost immediately after starting to play the recorded output file.

toc

Google Assistant and snowboy Not Playing Nice

Why use a sub process to invoke a Python script within a Python script? The more direct approach would be to call pushtotalk directly. That is what I did in sbdemo7.py based on an August 18, 2017 article entitled Setup your own Google Home with custom Hotwords by mg166. It is necessary to modify pushtotalk.py, found in the virtual environment tree ...venv/lib/python3.5/site-packages/googlesamples/assistant/grpc, to remove the need to press on the enter key. Rather than change pushtotalk.py, I made a copied of it under the name talkassist.py and changed the following near the end of the file

# keep recording voice requests using the microphone # and playing back assistant response using the speaker. # When the once flag is set, don't wait for a trigger. Otherwise, wait. wait_for_user_trigger = not once while True: if wait_for_user_trigger: click.pause(info='Press Enter to send a new request...') continue_conversation = assistant.assist() # wait for user trigger if there is no follow-up turn in # the conversation. wait_for_user_trigger = not continue_conversation # If we only want one conversation, break. if once and (not continue_conversation): break

# keep recording voice requests using the microphone # and playing back assistant response using the speaker. # This will loop as long as assist() returns true # meaning that a follow on query for the user is # expected. If the once flag is set only one request # is performed no matter what assist() returns while assistant.assist(): if once: break

Note that pushtotalk.main (actually talkassist.main) never returns in detectedCallback. As can be seen in the code snippet, once Google Assistant has finished, the nListening... Press Ctrl+C to exit message should be printed and then snowboy should start listening for a hotword again.

def detectedCallback(): snowboydecoder.play_audio_file() main() # googlesamples.assistant.grpc.talkassist print('\nListening... Press Ctrl+C to exit')

Instead, the program exits without explanation or sometimes with the error:
sounddevice.PortAudioError: Can't write to an input only stream [PaErrorCode -9974]
is reported. The author of GAssistPi, shivasiddharth, confirmed the existence of the problem and that it is up to Google to solve:

"this Portaudio error needs to be fixed from google's side. This occurs if the voice commands are either slightly late or slightly early, i dont have control over that error."

snowboy

GAssistPi

Portaudio

README.md

I tried a couple more ways of using snowboy hotword triggers with Google Assistant, however they did not yield satisfactory results either:

sbdemo7e.py which adds code to try to recover from the PortAudio error.
sbdemo8.py which does recover from the error at times albeit using the slower sub process invocation.

toc

Observations

The new capability of snowboy to record the rest of a command to pass on to another module is marred by the fact that it records the end of the hotword and the DING sound that I am using. Of course, it is always possible to use snowboy as trigger only as before. However as far as I can see, there is no satisfactory way to call on Google Assistant when snowboy.

The author of GAssistPi, shivasiddharth, seems confident that he will be able to bring back snowboy but he gives no deadline. Google Assistant is proving to be a moving target, so that I assume it will not be a simple task to accomplish.

first impression

toc

Downloads

The following files are available for download.

pixels.py	Modified ReSpeaker2 supplied file. Needs `apa102.py`. Store in the same directory as the `sbdemox.py` files.
snowboydecoder.py	Modified to store a recorded message in a temporary file system to avoid wear and tear of SD card
talkassist.py	Modified `pushtotalk.py` to work with an external hotword recognition or other trigger by removing waiting for keypress. Store beside `pushtotalk.py` in `venv/lib/python3.5/site-packages/googlesamples/assistant/grpc`
googlesamples-assistant-talkassist	Store beside `googlesamples-assistant-pushtotalk` in `venv/bin`

Be careful, you will probably not want to overwrite the original versions of the last two files.

The following programs are reworked versions of demo4.py from KAI.IT. The first is basically the original altered to take advantage of the LEDs on the ReSpeaker Mics-2 HAT and providing more options. The last four explore the use of the snowboy hotword detection as a trigger for Google Assistant Service. They differ in the way pushtotalk (or its equivalent talkassist) is invoked after the hotword has been detected.

file	snowboy callback method	speech analyzer	invocation method
sbdemo5.py	audioRecorderCallback	speech_recognition	function call
sbdemo6.py	audioRecorderCallback	googlesamples-assistant-pushtotalk	sub process
sbdemo7.py	detectedCallback	googlesamples-assistant-talkassist	function
sbdemo7e.py	detectedCallback	googlesamples-assistant-talkassist	function
sbdemo8.py	detectedCallback	googlesamples-assistant-talkassist	sub process

The files pixels.py and apa102.py are imported directly or implicitly in all the demonstration programs. If the latter are used as guides for use with different microphone and speaker harware, it is a simple matter to remove all mention of pixels.py and its listen function which have no essential role.