Overview

The Voice Agent SDK is a comprehensive infrastructure solution that enables businesses to build custom real-time AI voice agents. Similar to the Agora Conversation SDK, our platform provides all the essential components for voice agent development including Speech-to-Text (STT), Text-to-Speech (TTS), and Large Language Model (LLM) integration with easy-to-use configuration options.

Key Features

STT Integration

High-quality speech-to-text capabilities with low latency

TTS Integration

Natural-sounding text-to-speech with multiple voice options

LLM Framework

Integration with leading language models for intelligent conversations

Easy Configuration

Simplified setup and configuration for rapid development

Architecture

The Voice Agent SDK follows a modular architecture that provides:
  1. Core Communication Layer: Handles real-time voice streaming and media processing
  2. AI Processing Layer: Manages STT, TTS, and LLM integration
  3. Configuration Layer: Simplified API for customizing agent behavior
  4. Integration Layer: Connectors for CRMs, databases, and business systems

Getting Started

Installation

npm install @voxeme/voice-agent-sdk

Basic Implementation

import { VoiceAgent } from '@voxeme/voice-agent-sdk';

const agent = new VoiceAgent({
  stt: {
    provider: 'deepgram', // or 'assemblyai', 'google', etc.
    config: {
      // STT configuration
    }
  },
  tts: {
    provider: 'elevenlabs', // or 'google', 'amazon', etc.
    config: {
      voice: 'default',
      // TTS configuration
    }
  },
  llm: {
    provider: 'openai', // or 'anthropic', 'cohere', etc.
    config: {
      model: 'gpt-4',
      // LLM configuration
    }
  }
});

// Start the agent
agent.start();

Core Components

Speech-to-Text (STT)

Support for multiple STT providers:
  • Deepgram (recommended for accuracy)
  • AssemblyAI (good balance of quality and cost)
  • Google Cloud Speech-to-Text
  • Amazon Transcribe
  • Azure Speech Services

Text-to-Speech (TTS)

Multiple TTS options with natural voices:
  • ElevenLabs (highest quality)
  • Google Cloud Text-to-Speech
  • Amazon Polly
  • Azure Text-to-Speech
  • Custom neural voices

Language Model Integration

Flexible LLM integration:
  • OpenAI GPT models
  • Anthropic Claude
  • Cohere Command
  • Custom fine-tuned models
  • On-premises LLMs

Configuration Options

Agent Personality

const agent = new VoiceAgent({
  personality: {
    tone: 'friendly', // or 'professional', 'casual', 'enthusiastic'
    speakingStyle: 'conversational', // or 'formal', 'direct'
    responseLength: 'concise', // or 'detailed'
    language: 'en-US'
  }
});

Conversation Flow

const agent = new VoiceAgent({
  conversation: {
    greeting: 'Hello! How can I assist you today?',
    timeout: 30, // seconds before timeout prompt
    maxTurns: 20, // maximum conversation turns
    interruptionHandling: true
  }
});

Advanced Features

Knowledge Base Integration

const agent = new VoiceAgent({
  knowledgeBase: {
    type: 'vector', // or 'graph', 'document'
    source: 'your-knowledge-base-id',
    retrievalMethod: 'similarity'
  }
});

Tool Calling

const agent = new VoiceAgent({
  tools: [
    {
      name: 'check_order_status',
      description: 'Check the status of a customer order',
      parameters: {
        orderId: { type: 'string', required: true }
      },
      handler: async (params) => {
        // Your implementation
        return orderStatus;
      }
    }
  ]
});

Use Cases

  • Customer Service Bots: Automated customer support with natural conversations
  • Sales Assistants: AI-powered sales agents for lead qualification
  • Appointment Scheduling: Automated booking and rescheduling
  • Information Kiosks: Voice-based information systems
  • Educational Tutors: Interactive learning assistants

Performance Metrics

The SDK provides built-in monitoring:
  • Latency tracking (STT, TTS, LLM)
  • Accuracy metrics
  • Conversation quality scores
  • Resource utilization
  • Error rates and recovery

Security & Compliance

  • End-to-end encryption for voice streams
  • GDPR and CCPA compliance
  • HIPAA-ready for healthcare applications
  • SOC 2 Type II certified infrastructure
  • Custom data retention policies

Pricing

The SDK is available in multiple tiers:
  • Starter: Basic features with limited usage
  • Professional: Full features with higher usage limits
  • Enterprise: Custom configurations and dedicated support
Contact our sales team for detailed pricing information.