SpeechPy

Open Source Python Library for Speech Recognitions

Python API that supports speech processing as well as recognition operations. It also supports MFCCs and filter-bank energies alongside the log-energy of filter-banks.

The SpeechPy library has provided a set of useful techniques for speech processing as well as recognition and important post-processing operations using Python commands. Various advanced speech features like MFCCs and filter-bank energies alongside the log-energy of filter-banks are fully supported by the SpeechPy library.

The library also aims to provide all the necessary functionalities for deep learning applications such as speech recognition (AS) or automatic speech recognition (ASR). It has provided several important functions for calculating the main speech features such as calculating MFCC features from an audio signal, computing mel-filter-banks energy, computing log Mel-filter-bank energy features from an audio signal, extracting temporal derivative features, extracting mel frequency cepstral coefficient, and many more.

At A Glance

An overview of SpeechPy features.

Features Overview

Speech Processing
Speech Recognition
Compute MFCCs
Filterbank energies
MP3 support
Post Processing
Use Autoencoders
Extract Audio
Audio to Text

SpeechPy

SpeechPy supports Audio file formats as listed below.

Reader

MP3, WAV, WMA, WEBM

Writer

MP3, WAV, WMA, WEBM

SpeechPy

Platform Independence

SpeechPy only requires Python runtime.

Python 2.6 & Above.

SpeechPy

Getting Started with SpeechPy

The easiest way to install the SpeechPy library is using the Python Package Index (PyPI). Please use the following command for a complete installation.

Install SpeechPy using PyPI

 pip install speechpy

Speech Recognition via Python

Speech Recognition is mainly concerned with the recognition and translation of spoken language into text by computers. The open source Python library SpeechPy enables software developers to create applications supporting speech recognition features. It helps users to save time by speaking instead of typing. Thus helping users to communicate with their devices with less effort and making technological devices more accessible and easier to use.

Compute MFCC from Audio Signal

The Python library SpeechPy has provided complete support for computing MFCC features from an audio signal inside their own applications. The library has provided support for several important MFCC features such as sampling frequency of the signal, length of each frame in seconds, step between successive frames in seconds, apply filters from filter-bank, number of FFT points, lowest band edge of mel filters, highest band edge of mel filters, Number of cepstral coefficients and more.

Extract Audio using Autoencoders

The open source Python library SpeechPy enables computer programmers to extract audio data using Python code. Autoencoder is a very effective learning technique for neural networks that learns efficient data representations. Autoencoder networks learn from each other how to compress data from the input layer into a shorter code, and then uncompress that code into whatever format best matches the original input.