The majority of Raspberry Pi speech-to-text examples shared online seem to rely on various cloud solutions (e.g. Google Cloud Speech-to-Text) for actual audio processing. This article will show you how to configure an "offline" speech processing solution on your Raspberry Pi, that does not require 3rd party cloud services.

We are going to use CMUSphinx, a group of continuous-speech, speaker-independent speech recognition systems developed at Carnegie Mellon University.

Step 1. Open your virtual environment (unless you want to install globally). The benefits of using virtualenv are explained in my previous article about OpenCV. Mine is called "cv", and I launch it by issuing the following command:

workon cv

My shell prompt changes accordingly:

Step 2. Update package lists and fetch new versions of packages (may take a few minutes):

sudo apt-get update
sudo apt-get upgrade

Step 3. If you are going to use a microphone (as opposed to processing audio files, or ingesting audio streams from other devices), verify your audio card configuration:

cat /proc/asound/cards

My Raspberry Pi 3B displays the following:

(cv) pi@raspberrypi:~ $ cat /proc/asound/cards
 0 [ALSA           ]: bcm2835_alsa - bcm2835 ALSA
                      bcm2835 ALSA
(cv) pi@raspberrypi:~ $ 

Step 4. Install Bison (GNU Parser Generator):

sudo apt-get install bison

Step 5. Install the ALSA (Advanced Linux Sound Architecture) library:

 sudo apt-get install libasound2-dev

Step 6. Download and extract sphinxbase (latest version is 5prealpha shown here):

tar -xzvf sphinxbase-5prealpha.tar.gz

Step 7. Install SWIG. SWIG is an interface compiler that connects programs written in C and C++ with scripting languages such as Python:

sudo apt-get install swig

Step 8. Go inside the sphinxbase directory you just extracted and compile sphinxbase - and make sure there are no errors:

cd sphinxbase-5prealpha

./configure --enable-fixed
make clean all
sudo make install

Step 9. Download and extract pocketsphinx (again, latest version is 5prealpha shown here), but go up one directory first:

cd ..

tar -xzvf pocketsphinx-5prealpha.tar.gz

Step 10. Go inside the pocketsphinx directory you just extracted and compile pocketsphinx - and make sure there are no errors:

cd pocketsphinx-5prealpha

sudo make install
export LD_LIBRARY_PATH=/usr/local/lib 
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

Step 11. Get some audio files for testing. We are going to use a couple of American English samples from the Open Speech Repository:


Step 12. This is me running Sphinx on one of the sample .wav files I downloaded:

pocketsphinx_continuous -samprate 8000 -hmm en-us-8khz -infile OSR_us_000_0030_8k.wav

As you can tell from my screen recording below, the process isn't very fast - so I will need to optimize:

What if something goes wrong?
If you get an error compiling, or running pocketsphinxcontinuous for the first time - feel free to share your error message the comments below, and I will add a few troubleshooting steps. In my case, I had to help the compiler locate a few libraries. I also had to download a Sphinx acoustic model for 8kHz in order to process my .wav sample.

Good luck!