Raspberry Pi 音声認識でLEDをオン/オフ

はじめに
実行環境
事前準備
1. Speech-to-Text APIの準備
2. Python3実行環境の構築
音声でLED制御

はじめに

本文章では、「Google Cloud Platform」の「Speech-to-Text」を使用して、音声でGPIO制御を行い、LEDをON/OFFする方法について記載いたします。

実行環境

・ボード
　Raspberry Pi 4 Model B
・OS
　Raspberry Pi OS Lite
　Release date: September 22nd 2022
　System: 32-bit
　Kernel version: 5.15
　Debian version: 11 (bullseye)
・Python3
　Python 3.9.2
・RPi.GPIO
　RPi.GPIO-0.7.1
・マイク（USBマイク）
　SANWA SUPPLY MM-MCU06BK（サンプリングレート：48kHz、解像度：16bit）

事前準備

Speech-to-Text APIの準備

「Google Cloud Platform」の「Speech-to-Text API」を使用するには、Googleアカウントを作成し、「Speech-to-Text API」を有効化する必要があります。
「Speech-to-Text API」を利用するまでの方法を本サイトの「Raspberry Pi 「Speech-to-Text」で音声認識」で紹介しています。はじめて使用される方は、そちらを参照して準備してください。

Python3実行環境の構築

Raspberry Pi OSの更新

はじめに、Raspberry Pi OSを最新の状態に更新します。

pi@raspberrypi:~ $ sudo apt update
pi@raspberrypi:~ $ sudo apt upgrade
pi@raspberrypi:~ $

更新した内容を反映するために再起動します。

pi@raspberrypi:~ $ reboot

Raspberry Pi OSのバージョンを確認します。

pi@raspberrypi:~ $ cat /proc/version
Linux version 5.15.76-v7l+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1597 SMP Fri Nov 4 12:14:58 GMT 2022
pi@raspberrypi:~ $ uname -a
Linux raspberrypi 5.15.76-v7l+ #1597 SMP Fri Nov 4 12:14:58 GMT 2022 armv7l GNU/Linux
pi@raspberrypi:~ $

Python3、venvのインストール

Python3およびvenvをインストールします。

pi@raspberrypi:~ $ sudo apt install python3 python3-venv
pi@raspberrypi:~ $

pythonのバージョンを確認します。

pi@raspberrypi:~ $ python --version
Python 3.9.2
pi@raspberrypi:~ $

pipの最新版をインストールします。

pi@raspberrypi:~ $ wget https://bootstrap.pypa.io/get-pip.py
pi@raspberrypi:~ $ sudo python3 get-pip.py
Successfully installed pip-22.3.1 setuptools-65.5.1 wheel-0.38.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
pi@raspberrypi:~ $

pipのバージョンを確認します。

pi@raspberrypi:~ $ pip --version
pip 22.3.1 from /usr/local/lib/python3.9/dist-packages/pip (python 3.9)
pi@raspberrypi:~ $

PyAudioのインストール

本文章では、音声やスピーカーを扱うため「PyAudio」オーディオI/Oライブラリを使用します。
「PyAudio」を使用するために必要な「PortAudio」ライブラリをインストールします。

pi@raspberrypi:~ $ sudo apt install libportaudio2
pi@raspberrypi:~ $

本文章では、python仮想実行環境を作成してその中で実行します。
venvを使用して仮想実行環境を作成し有効化します。本文章では、「env」という名前の仮想実行環境を作成し有効化します。
有効化されると、コマンドプロンプトの先頭に「env」(仮想実行環境名)が記載されます。

pi@raspberrypi:~ $ python3 -m venv env
pi@raspberrypi:~ $ source env/bin/activate
(env) pi@raspberrypi:~ $

仮想実行環境に「PyAudio」をインストールします。

(env) pi@raspberrypi:~ $ pip install pyaudio
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting pyaudio
  Downloading https://www.piwheels.org/simple/pyaudio/PyAudio-0.2.12-cp39-cp39-linux_armv7l.whl (50 kB)
     |????????????????????????????????| 50 kB 88 kB/s
Installing collected packages: pyaudio
Successfully installed pyaudio-0.2.12
(env) pi@raspberrypi:~ $

RPi.GPIOのインストール

本文章では、「RPi.GPIO」を使用してGPIOの制御を行いLEDのON/OFFを行いますので、仮想実行環境に「RPi.GPIO」をインストールします。

(env) pi@raspberrypi:~ $ pip install RPi.GPIO
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting RPi.GPIO
  Downloading https://www.piwheels.org/simple/rpi-gpio/RPi.GPIO-0.7.1-cp39-cp39-linux_armv7l.whl (66 kB)
     |????????????????????????????????| 66 kB 18 kB/s
Installing collected packages: RPi.GPIO
Successfully installed RPi.GPIO-0.7.1
(env) pi@raspberrypi:~ $

google-cloud-speechのインストール

仮想実行環境に、「Speech-to-Text API」を使用するために必要な「google-cloud-speech」ライブラリをインストールします。

(env) pi@raspberrypi:~ $ pip install --upgrade google-cloud-speech
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting google-cloud-speech
  Downloading https://www.piwheels.org/simple/google-cloud-speech/google_cloud_speech-2.16.2-py2.py3-none-any.whl (228 kB)
     |????????????????????????????????| 228 kB 131 kB/s

～（中略）～

Installing collected packages: pyasn1, urllib3, six, rsa, pyasn1-modules, protobuf, idna, charset-normalizer, certifi, cachetools, requests, grpcio, googleapis-common-protos, google-auth, grpcio-status, google-api-core, proto-plus, google-cloud-speech
Successfully installed cachetools-5.2.0 certifi-2022.12.7 charset-normalizer-2.1.1 google-api-core-2.11.0 google-auth-2.15.0 google-cloud-speech-2.16.2 googleapis-common-protos-1.57.0 grpcio-1.51.1 grpcio-status-1.51.1 idna-3.4 proto-plus-1.22.1 protobuf-4.21.11 pyasn1-0.4.8 pyasn1-modules-0.2.8 requests-2.28.1 rsa-4.9 six-1.16.0 urllib3-1.26.13
(env) pi@raspberrypi:~ $

以上で音声認識を使用して、LEDの制御(GPIO制御)を行う準備は完了です。

音声でLED制御

本文章では、以下の表に記載している音声で、2個のLEDの制御を行います。

音声	実行内容
右のLEDをオン	右のLEDを点灯
左のLEDをオン	左のLEDを点灯
右のLEDをオフ	右のLEDを消灯
左のLEDをオフ	左のLEDを消灯
終了	音声認識を終了

本文章では、「Google Cloud Platform」の公式ホームページで公開されている「Python 非ストリーミングおよびストリーミング音声認識サンプル」を修正して使用しています。

(env) pi@raspberrypi:~ $ vi voice_recognition_control.py

【記載内容】
from __future__ import division

import re
import sys
from google.cloud import speech

import pyaudio
from six.moves import queue

import RPi.GPIO as GPIO
import time

LANGUAGE_CODE = "ja_JP"

MICROPHONE_RATE = 48000         # Sampling Rate 48kHz
SPEECH_RATE = 16000             # Speech-to-Text API Sampling Rate

CHUNK = int(MICROPHONE_RATE / 10)       # 100ms

RIGHT_LED = 23
LEFT_LED = 24

class MicStream(object):
    
    def __init__(self, rate,  chunk):
        self._rate = rate
        self._chunk = chunk
        
        # Create thread-safe buffer
        self._buff = queue.Queue()

        self.closed = True

    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            rate=self._rate,
            channels=1,
            format=pyaudio.paInt16,
            input=True,
            output=False,
            input_device_index=None,
            output_device_index=None,
            frames_per_buffer=self._chunk,
            start=True,
            input_host_api_specific_stream_info=None,
            output_host_api_specific_stream_info=None,
            stream_callback=self._fill_buffer
        )

        self.closed = False

        return self

    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        self._buff.put(in_data[::3])
        return None, pyaudio.paContinue

    def generator(self):
        while not self.closed:
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]

            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break

            yield b"".join(data)

    def __exit__(self, type, value, traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()

        self.closed = True
        
        self._buff.put(None)
        self._audio_interface.terminate()

def listen_print(responses):
    for response in responses:
        if not response.results:
            continue

        result = response.results[0]

        if not result.alternatives:
            continue

        transcript = result.alternatives[0].transcript

        if not result.is_final:
            continue

        else:
            print(transcript)

        if re.search(r"\b(右の LED をオン)\b", transcript, re.I):
            GPIO.output(RIGHT_LED, True)

        if re.search(r"\b(右の LED をオフ)\b", transcript, re.I):
            GPIO.output(RIGHT_LED, False)

        if re.search(r"\b(左の LED をオン)\b", transcript, re.I):
            GPIO.output(LEFT_LED, True)

        if re.search(r"\b(左の LED をオフ)\b", transcript, re.I):
            GPIO.output(LEFT_LED, False)

        if re.search(r"\b(終了)\b", transcript, re.I):
            print("Exiting...")
            break

def main():

    # RPi.GPIOの初期設定
    GPIO.setmode(GPIO.BCM)

    # GPIOピンの入出力設定
    GPIO.setup(RIGHT_LED, GPIO.OUT)
    GPIO.setup(LEFT_LED, GPIO.OUT)

    # GPIOピン出力をLOWへ
    GPIO.output(RIGHT_LED, False)
    GPIO.output(LEFT_LED, False)

    client = speech.SpeechClient()

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=SPEECH_RATE,
        language_code=LANGUAGE_CODE
    )

    streaming_config = speech.StreamingRecognitionConfig(
        config=config,
        interim_results=True
    )

    with MicStream(MICROPHONE_RATE, CHUNK) as stream:
        audio_generator = stream.generator()

        requests = (
            speech.StreamingRecognizeRequest(audio_content=content)
            for content in audio_generator
        )

        responses = client.streaming_recognize(streaming_config, requests)

        listen_print(responses)

    # GPIOピン出力をLOWへ 
    GPIO.output(RIGHT_LED, False)
    GPIO.output(LEFT_LED, False)

    # GPIOピンの解放
    GPIO.cleanup() 

    print("End")

if __name__ == "__main__":
    main()

(env) pi@raspberrypi:~ $

以上のように、音声を使用してGPIOを制御することができますので、アイデア次第では、様々なことに応用できると思います。