Skip to main content

Architecture

Overview

DeepSpeech's architecture consists of:

  • Audio Processing Pipeline

  • Neural Network Engine

  • Language Model

  • Inference Engine

  • API Layer

Components

  • speechd: Speech recognition server

  • audioprocessor: Audio preprocessing service

  • modelserver: Neural network inference

  • KenLM: Language model engine

  • VAD: Voice Activity Detection

Scalability Features

  • Batch processing

  • GPU acceleration

  • Distributed inference

  • Model parallelism