With a seeed studio ReSpeaker 2-Mics Pi HAT, it becomes possible to move my voice recognition project over to a Raspberry Pi 3. Hotword recognition will be done with snowboy from KITT.AI. The good news is that the ReSpeaker HAT seems to work well. The bad news is that there is now some sort of incompatibility between snowboy and Google Assistant Service.
In this post, I will show a slight change I made to the new version of
snowboydecoder.py
that I believe will be helpful when use of the
command recording capability of snowboy is made, especially with small systems
relying on SD cards for storage. I will also add a number of new demonstration
programs based on those from KITT.AI to explore the different ways to
combine snowboy and Google Assistant and to make it easier to experiment with the many options of
snowboydecoder.py
.
Table of Contents
- In the Beginning
- Change password for the current user mandatory
- Configure network settings
- N1 Hostname on the network to rpi3
- N2 Provide Wi-fi credentials (network name and password)
- Boot Options - did nothing
- Localisation Options
- I1 Change Locale to add fr_CA
- I2 Change Timezone - to America/Moncton
- Interfacing Options
- P2 SSH - enabled mandatory
- P4 SPI - enabled
- P5 I2C - enabled
- Overclock - did nothing
- Advanced Options
- A1 Expland Filesystem
- A3 Memory Split - minimum 16 MB for the GPU
- A4 Audio - 1 Force 3.5mm ('headphone') jack
- Audio Input and Output
- Speech to Text Prerequisites
- Installing snowboy and Voice Recognition
- Modifying snowboydecoder.py and demo4.py
The phrase following the hotword is recorded and saved to a normal
wav
file in the current directory. I thought it best to modifysnowboydecoder.py
to save the recording in a temporary filesystem (tempfs) to avoid wearing down the Raspberry Pi SD card. As an added bonus, the process should be faster. I added the procedure that creates a temporary file name at the end of the__init__
code of theHotwordDetector class
.try: (fd, self.filename) = tempfile.mkstemp(suffix='.wav', dir='/run/user/%d' % os.getuid()) except IOError: (fd, self.filename) = tempfile.mkstemp(suffix='.wav') os.close(fd) os.unlink(self.filename)Then the
saveMessage
function has to be modified. Instead of naming the recorded sound filefilename
it must be namedself.filename
which was defined in the__init__
code. And of course the line defining the filename
filename = 'output' + str(int(time.time())) + '.wav'
must be removed.I want the Raspberry Pi 3 / ReSpeaker mics-2 HAT combination to behave a bit more like my Google Home mini. I have set up the latter to respond with a sound and flashing LEDs when it detects its hotword ("Ok Google"). It turned out to be simple to add this feature.
First I copied the
apa102.py
andpixels.py
scripts from themic_hat
directory created when installing and testing the ReSpeaker sound card. I decided to use thelisten
function ofpixels.py
as the visual clue that snowboy detected a hotword. I tweaked the_listen
definition, addingself._off()
at the very end so that the LEDS turn off automatically once they have been flashed.Instead of modifying
demo4.py
, I made a copy and saved it under the namesbdemo5.py
. ThedetectedCallback
function is modified so that it there is auditory and visual feedback on hotword detection in addition to the visual prompt on the console.def detectedCallback(): pixels.listen() snowboydecoder.play_audio_file() print('yes ? ', end='', flush=True)Of course, the
pixels.py
module must be imported and aPixels
object must be created. I also changed the silent count threshold because the default value of 15 resulted in a recording of too long a period of silence at the end of the command in my view.pixels = Pixels() detector = snowboydecoder.HotwordDetector(model, sensitivity=0.38) print('Listening... Press Ctrl+C to exit') # main loop detector.start(detected_callback=detectedCallback, audio_recorder_callback=audioRecorderCallback, interrupt_check=interrupt_callback, sleep_time=0.01, silent_count_threshold=4) detector.terminate() pixels.off() time.sleep(1)I added the following lines at the very top of the file so that it would be possible to shorten the command line when running the script.
#!../venv/bin/python # -*- coding: utf-8 -*- import time from pixels import Pixelsimport snowboydecoderThe script has to be made executable for this to work. Sharp eyed readers will spot that I moved the hotword model files to a different directory.
(venv) pi@rpi3:~/hestia/snowboy $ chmod +x sbdemo5.py (venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo5.py models/snowboy.umdl Listening... Press Ctrl+C to exit ... ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream "snowboy" INFO:snowboy:Keyword 1 detected at time: 2018-03-21 13:17:12 yes... [LEDS flash, "DING" is heard] "turn the lamp on" converting audio to text turn the lamp onI made further changes to
sbdemo5.py
. The help screen explains what these are.(venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo5.py -h usage: ./sbdemo5.py [-l <LANG>] [-m <MODEL>] [-p {splay, aplay}] [-s <SLEEP>] [-c <COUNT>] [-r <TIMEOUT>] [-d {0,1,2,3}] sbdemo5.py optional arguments: -h, --help show this help message and exit -l LANG, --lang LANG Spoken language (default en-US) -m MODEL, --model MODEL Snowboy hotword model file -p {snowp,aplay,mocp,none}, --player {snowp,aplay,mocp,none} Play recorded command with 'snowp': snowboydecoder.play_audio_file, 'aplay': ALSA aplay, 'mocp': for Music On Console player, or 'none': to not play -s SLEEP, --sleep SLEEP sleep_time (default 0.01) -c COUNT, --count COUNT silent_count_threshold (default 15) -r RECORD, --record RECORD recording_timeout (default 100) -d DETECTED, --detected DETECTED Detected signal: 0 - none, >0 - print yes, >1 - add pixels, >2 - add dingThe
-s
,-c
and-r
parameters are passed on to theHotwordDetector start
method. Here is the information about these parameters in the source code.float sleep_time: how much time in second every loop waits. silent_count_threshold: indicates how long silence must be heard to mark the end of a phrase that is being recorded. recording_timeout: limits the maximum length of a recording.As mentioned, the silent count threshold is much too big in my estimation and I wanted to easily experiment with different value. At the same time always entering the relative path to the
snowboy
model file was tiresome so I set a default value in the code to make things simpler.I thought it would be useful to hear just what
snowboydecoder
recorded after the hotword was detected so it is possible to play back the recorded message. By default,snowboydecoder.play_audio_file
is used to play the sound file, but that can be changed to the ALSAaplay
utility or the Music On Console playermocp
if it is installed on the system. It is also possible to disable the playback of the recorded file altogether.I found the timing between the hotword and the rest of the command to be passed on to the
audio_recorder_callback
function to be a bit delicate. So the actions done in between are optional:-d 0 - nothing is done, -d 1 - "yes" is printed, -d 2 - "yes" is printed and the LEDs are flashed, -d 3 - "yes" is printed, the LEDs are flashed and DING.WAV is played.
The default is-d 3
.No matter what DETECTED option is chosen, often, I find that the last part of the hotword (and DING if
-d 3
) are passed on theaudio_recorder_callback
function. This is probably not the best for optimal text recognition. - Google Assistant
I prefer to use
snowboy
for offline hotword recognition instead of relying on the hotword ("Ok Google") recognition capabilities of the Google Assistant Library (as ingoogle_assistant_demo
). In that case, the Google Assistant Service (as ingooglesamples-assistant-pushtotalk
) is the better choice. For the curious, an overview contains a table outlining the differences between the Google Assistant Libray and Google Assistant Service.To cut to the chase, here is how
pushtotalk
can be used once the hotword has been detected and the rest of the recorded command has been stored in a file namedtime_en.wav
.(venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time_en.wavTo test this, first record a file asking for the current time.
(venv) pi@rpi3:~/hestia/snowboy $ arecord -c 1 -r 16000 -f S16_LE time_en.wav Recording WAVE 'time5.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono "What time is it?" CtrlC ^CAborted by signal Interrupt... arecord: pcm_read:2103: read error: Interrupted system call (venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time_en.wav INFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:Transcript of user request: "what". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:End of audio request detected INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "what time is it". INFO:root:Playing assistant response. "it's 1 32" INFO:root:Finished playing assistant response.This works just as well with other supported languages.
(venv) pi@rpi3:~/hestia/snowboy $ arecord -c 1 -r 16000 -f S16_LE time_fr.wav Recording WAVE 'time_fr.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono "Quelle heure est-il ?" CtrlC ^CAborted by signal Interrupt... arecord: pcm_read:2103: read error: Interrupted system call (venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time_fr.wav --lang fr-CA INFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:Transcript of user request: "quelle". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. INFO:root:End of audio request detected INFO:root:Transcript of user request: "quelle heure est-il". INFO:root:Playing assistant response. "il est 13 heures 36" INFO:root:Finished playing assistant response.Sound file formatIf the input sound file was not recorded with the correct format then there will be a difficult to interpret error message as in the following example.
(venv) pi@rpi3:~/hestia/snowboy $ arecord -f cd time.wav Recording WAVE 'time.wav' : Signed 16 bit Little Endian, Rate 44100 Hz, Stereo "What time is it?" CtrlC ^CAborted by signal Interrupt... arecord: pcm_read:2103: read error: Interrupted system call (venv) pi@rpi3:~/hestia/snowboy $ googlesamples-assistant-pushtotalk -i time.wav INFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:End of audio request detected INFO:root:Finished playing assistant response. ERROR:root:Exception iterating requests! Traceback (most recent call last): File "/home/pi/hestia/venv/lib/python3.5/site-packages/grpc/_channel.py", line 187, in consume_request_iterator request = next(request_iterator) File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/pushtotalk.py", line 124, in iter_assist_requests for c in self.gen_assist_requests(): File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/pushtotalk.py", line 202, in gen_assist_requests for data in self.conversation_stream: File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/audio_helpers.py", line 326, inreturn iter(lambda: self.read(self._iter_size), b'') File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/audio_helpers.py", line 307, in read return self._source.read(size) File "/home/pi/hestia/venv/lib/python3.5/site-packages/googlesamples/assistant/grpc/audio_helpers.py", line 105, in read if self._wavep File "/usr/lib/python3.5/wave.py", line 242, in readframes data = self._data_chunk.read(nframes * self._framesize) File "/usr/lib/python3.5/chunk.py", line 136, in read data = self.file.read(size) File "/usr/lib/python3.5/chunk.py", line 136, in read data = self.file.read(size) ValueError: read of closed file I thought that the simplest thing would be to invoke
pushtotalk
using a Pythonsubprocess
. Hopefully, small changes to a few lines of code insbdemo5.py
would allow for testing thesnowboy
/google_assistant
combination. The result is insbdemo6.py
. Here is an example of its use.(venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo6.py -s 0.03 -c 4 -p 'aplay' -o 'outcome.wav' Snowboy model file: models/snowboy.umdl Spoken language: en-US Play recorded command with aplay sleep_time: 0.03 silent_count_threshold: 4 recording_timeout: 100 hotword detected signal: (3) print "yes" + pixels + play ding Listening... Press Ctrl+C to exit ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.front ... ALSA lib pcm_dsnoop.c:556:(snd_pcm_dsnoop_open) The dsnoop plugin supports only capture stream "snowboy" INFO:snowboy:Keyword 1 detected at time: 2018-03-27 10:36:00 yes... "Where is London Bridge?" passing on "googlesamples-assistant-pushtotalk -i /run/user/1000/tmpxacvjzuv.wav -o outcome.wav" ffINFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. INFO:root:Transcript of user request: "where". INFO:root:Playing assistant response. INFO:root:Transcript of user request: "where is". .. INFO:root:Transcript of user request: "where is London Bridge". INFO:root:Playing assistant response. INFO:root:End of audio request detected INFO:root:Transcript of user request: "where is London Bridge". INFO:root:Playing assistant response. INFO:root:Finished playing assistant response. Playing the output sound file "outcome.wav" with aplay "London Bridge is in Lake Havasu City" Playing WAVE 'outcome.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono Playing recorded message in file "/run/user/1000/tmpxacvjzuv.wav" with aplay Playing WAVE '/run/user/1000/tmpxacvjzuv.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono Listening... Press Ctrl+C to exitIn the above example, when
pushtotalk
printsINFO:root:Playing assistant response
it is actually recording the "spoken" response to the outputwav
file specified with the-o
option in the command line. If this option is not included,pushtotalk
plays the response itself as in the following example.(venv) pi@rpi3:~/hestia/snowboy $ ./sbdemo6.py -s 0.03 -c 4 -p 'aplay' ... Listening... Press Ctrl+C to exit ... "snowboy" INFO:snowboy:Keyword 1 detected at time: 2018-03-27 10:36:00 yes... "Where is London Bridge?" passing on "googlesamples-assistant-pushtotalk -i /run/user/1000/tmpxacvjzuv.wav -o outcome.wav" ffINFO:root:Connecting to embeddedassistant.googleapis.com INFO:root:Using device model ga-respeaker2-mics-rcl7xe and device id 91b7128e-2dd1-11e8-a78f-b827ebaa03a8 INFO:root:Recording audio request. ... INFO:root:Transcript of user request: "where is London Bridge". INFO:root:Playing assistant response. "London Bridge is in Lake Havasu City" INFO:root:Finished playing assistant response. Playing recorded message in file "/run/user/1000/tmpxacvjzuv.wav" with aplay Playing WAVE '/run/user/1000/tmpxacvjzuv.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono Listening... Press Ctrl+C to exitIf
pushtotalk
is told to record its output to a sound file andmocp
is used to play the sound files then most of the Google Assistant output will not be heard. That is becausemocp
is a server which plays a file until its end is reached or until it is asked to play another file. This is what happens here whenmocp
is told to play the recorded input file almost immediately after starting to play the recorded output file. - Google Assistant and snowboy Not Playing Nice
Why use a sub process to invoke a Python script within a Python script? The more direct approach would be to call
pushtotalk
directly. That is what I did insbdemo7.py
based on an August 18, 2017 article entitled Setup your own Google Home with custom Hotwords by mg166. It is necessary to modifypushtotalk.py
, found in the virtual environment tree...venv/lib/python3.5/site-packages/googlesamples/assistant/grpc
, to remove the need to press on the enter key. Rather than changepushtotalk.py
, I made a copied of it under the nametalkassist.py
and changed the following near the end of the file# keep recording voice requests using the microphone # and playing back assistant response using the speaker. # When the once flag is set, don't wait for a trigger. Otherwise, wait. wait_for_user_trigger = not once while True: if wait_for_user_trigger: click.pause(info='Press Enter to send a new request...') continue_conversation = assistant.assist() # wait for user trigger if there is no follow-up turn in # the conversation. wait_for_user_trigger = not continue_conversation # If we only want one conversation, break. if once and (not continue_conversation): breakto the much shorter# keep recording voice requests using the microphone # and playing back assistant response using the speaker. # This will loop as long as assist() returns true # meaning that a follow on query for the user is # expected. If the once flag is set only one request # is performed no matter what assist() returns while assistant.assist(): if once: breakNote that
pushtotalk.main
(actuallytalkassist.main
) never returns indetectedCallback
. As can be seen in the code snippet, once Google Assistant has finished, thenListening... Press Ctrl+C to exit
message should be printed and thensnowboy
should start listening for a hotword again.def detectedCallback(): snowboydecoder.play_audio_file() main() # googlesamples.assistant.grpc.talkassist print('\nListening... Press Ctrl+C to exit')Instead, the program exits without explanation or sometimes with the error:
sounddevice.PortAudioError: Can't write to an input only stream [PaErrorCode -9974]
is reported. The author of GAssistPi, shivasiddharth, confirmed the existence of the problem and that it is up to Google to solve:"this Portaudio error needs to be fixed from google's side. This occurs if the voice commands are either slightly late or slightly early, i dont have control over that error."
I believe that somewhere else he said that he pulledsnowboy
fromGAssistPi
because of a problematic interaction withPortaudio
but I cannot find the reference (but look at the Dec. 22 2017 README.md file: "Custom wakewords/snowboy has been removed/disabled due to audio related errors").I tried a couple more ways of using snowboy hotword triggers with Google Assistant, however they did not yield satisfactory results either:
sbdemo7e.py
which adds code to try to recover from the PortAudio error.sbdemo8.py
which does recover from the error at times albeit using the slower sub process invocation.
- Observations
The new capability of snowboy to record the rest of a command to pass on to another module is marred by the fact that it records the end of the hotword and the DING sound that I am using. Of course, it is always possible to use snowboy as trigger only as before. However as far as I can see, there is no satisfactory way to call on Google Assistant when
snowboy
.The author of GAssistPi, shivasiddharth, seems confident that he will be able to bring back
To my mind these somewhat disappointing result confirm my first impression about the suitability of the ReSpeaker 2-Mics Pi HAT for speech input and sound output applications.snowboy
but he gives no deadline. Google Assistant is proving to be a moving target, so that I assume it will not be a simple task to accomplish. - Downloads
The following files are available for download.
pixels.py Modified ReSpeaker2 supplied file. Needs apa102.py
. Store in the same directory as thesbdemox.py
files.snowboydecoder.py Modified to store a recorded message in a temporary file system to avoid wear and tear of SD card talkassist.py Modified pushtotalk.py
to work with an external hotword recognition or other trigger by removing waiting for keypress. Store besidepushtotalk.py
invenv/lib/python3.5/site-packages/googlesamples/assistant/grpc
googlesamples-assistant-talkassist Store beside googlesamples-assistant-pushtotalk
invenv/bin
Be careful, you will probably not want to overwrite the original versions of the last two files.
The following programs are reworked versions of
demo4.py
from KAI.IT. The first is basically the original altered to take advantage of the LEDs on the ReSpeaker Mics-2 HAT and providing more options. The last four explore the use of thesnowboy
hotword detection as a trigger for Google Assistant Service. They differ in the waypushtotalk
(or its equivalenttalkassist
) is invoked after the hotword has been detected.file snowboy callback method speech analyzer invocation method sbdemo5.py audioRecorderCallback speech_recognition function call sbdemo6.py audioRecorderCallback googlesamples-assistant-pushtotalk sub process sbdemo7.py detectedCallback googlesamples-assistant-talkassist function sbdemo7e.py detectedCallback googlesamples-assistant-talkassist function sbdemo8.py detectedCallback googlesamples-assistant-talkassist sub process The files
pixels.py
andapa102.py
are imported directly or implicitly in all the demonstration programs. If the latter are used as guides for use with different microphone and speaker harware, it is a simple matter to remove all mention ofpixels.py
and itslisten
function which have no essential role.
As before, I used Etcher as
per the instructions at raspberrypi.org
to burn
the newest Rasbpian image available from
the Raspberry Foundation (Raspbian Stretch Lite, 2017-11-29)
which can be found here.
Before burning the image, you should uncheck the Auto-unmount on
success
option in the Etcher
Settings
. If this is not done, it will be necessary to remove
and reinsert the SD card in the desktop SD card reader to perform the next
step which consists of an empty file called ssh
in the card's
boot partition in order to configure the RPi3 without monitor and keyboard.
It will be necessary to initially have a working Ethernet connection to the
Raspberry Pi in order to configure it.
Using zenmap
to scan my local network, I was able to start
an ssh
session on the Raspberry Pi.
I used raspi-config
to change the configuration to suit my
situation.
I rebooted as asked and then updated and upgraded the system.
Again I rebooted and this time logged in via the wireless network.
Note how the kernel version is now 4.9.80 (2018-03-09) while the
intial version was 4.9.59 (2017-10-19). To complete my initial set up,
I installed git
.
I also added the Python 3 virtual environment utilities as explained in Python 3 virtual environments.
While not mandatory and not even recommended, I decided to use static IP addresses for both the Ethernet and the WiFi interfaces. I just get tired of doing network scans to find headless systems whenever the router has to be restarted, which happens just a bit too often in this household. The following command confirmed that "classic" network interface manes are still in use.
Following the instructions for the dhcpcd method of
setting up static addresses on the Raspberry Pi StackExchange, I first backed up the
dchcpc
configuration file and then edited it.
Finally, I shut down the Raspberry Pi and made a backup copy of its SD card on my desktop computer.
My previous post showed how to install the drivers for the ReSpeaker 2-Mics Pi Hat. Here is a summary of what needs to be done.
With the power off, plug in the card onto the Raspberry Pi GPIO header
and then connect powered speakers to the 3.5 mm jack on the card. Clone
two git
repositories. The first contains the LED drivers that
will be used later, the second contains the sound capture and playback
drivers.
Quite a few packages must be installed in order to use the snowboy and Voice Recognition Python libraries. Fortunately, installing all the prerequisites is pretty straight forward in Raspbian Stretch (Debian 9) compared to installation DietPi Armbian Jessie (Debian 8). For example, the GNU compilers and make utility are already loaded.
The Simplified Wrapper and Interface Generator (SWIG) is needed to create some Python wrappers of C/C++ libraries. It turns out that snowboy needs version 3.0.10. A check shows that the latest version available in the Stretch depository is recent enough but it needs to be installed.
Compiling the snowboy python wrapper requires the
ATLAS (Automatically Tuned Linear Algebra Software)
package. It automatically generates an optimized Basic Linear Algebra
Subroutines (
Both VoiceRecognition and snowboy rely on the PyAudio Python module which is wrapper for the cross-platform audio I/O library PortAudio. Of course, the latter must be present.
VoiceRecognition also requires FLAC (Free Lossless Audio Codec). It is an open source lossless alternative to MP3.
At this point I created a directory and started a Python virtual environment.
The command mkvenv
creates and updates a Python 3 virtual
environment, while ve
activates it. This is documented in a
previous post: Python 3 virtual environments.
Install the Python PyAudio module.
At last, snowboy can be installed. First we get
the source from the GitHub repository and then
Swig
is used to create the missing scripts.
Aside from a warning, everything seemed to go well. The demonstration
scripts in examples/Python3
in the parent directory can now be
tested.
That was a bit of an anticlimax. However, it was relatively easy to
fix the problem: just remove the relative path to snowboydetect.py
in the import
command of snowboydecoder.py
.
Now the first three demonstration scripts work:
There is a new universal hotword model: jarvis.umdl
. Be
careful, the file actually contains two models so that it cannot be used
instead of snowboy.umdl
or alexa.umdl
. The
ApplyFrontend
function has an increased role which is
unfortunately not well documented. Look at the code in my
modified version of snowboydetector.py
for details that
I have gleaned.
The fourth demonstration script, which was not there back in November of last year was a surprise. It requires the SpeechRecognition Python module.
Looking at the source code, it is obvious that a new callback function has been added to the detector.start function that makes it easier to perform continuous speech recognition. Here is the documentation about the callback found in the source file.
if [audio_recorder_callback is] specified, this will be called after a keyword has been spoken and after the phrase immediately after the keyword has been recorded. The [callback] function will be passed the name of the file where the phrase was recorded.As the example shows, a verbal command can be given in a single sentence beginning with the hotword and followed by a phrase.
detector.stat
records the whole
phrase, strips out the hotword and passes on the rest of the recorded phrase
as a recorded file. SpeechRecognition will not have to record any sound
itself.