Architecture
Overview
DeepSpeech's architecture consists of:
-
Audio Processing Pipeline
-
Neural Network Engine
-
Language Model
-
Inference Engine
-
API Layer
Components
-
speechd: Speech recognition server
-
audioprocessor: Audio preprocessing service
-
modelserver: Neural network inference
-
KenLM: Language model engine
-
VAD: Voice Activity Detection
Scalability Features
-
Batch processing
-
GPU acceleration
-
Distributed inference
-
Model parallelism