522 lines
14 KiB
Plaintext
522 lines
14 KiB
Plaintext
|
|
服务 用途 示例格式
|
|||
|
|
ASR 服务器 语音识别(WebSocket) wss://api.example.com/asr
|
|||
|
|
LLM 服务器 AI 对话(HTTP SSE) https://api.example.com/chat
|
|||
|
|
TTS 服务器 语音合成 https://api.example.com/tts
|
|||
|
|
|
|||
|
|
iOS(Objective-C,iOS 15+)端技术实现文档
|
|||
|
|
低延迟流式语音陪伴聊天(按住说话,类似猫箱首页)
|
|||
|
|
0. 范围与目标
|
|||
|
|
|
|||
|
|
实现首页语音陪伴对话:
|
|||
|
|
|
|||
|
|
按住说话:开始录音并实时流式发送到 ASR
|
|||
|
|
|
|||
|
|
松开结束:ASR 立即 finalize,返回最终文本并显示
|
|||
|
|
|
|||
|
|
AI 回复:边显示文字(打字机效果)边播放服务端 TTS 音频
|
|||
|
|
|
|||
|
|
延迟低优先:不等待完整回答/完整音频,采用“分句触发 + 流式/准流式播放”
|
|||
|
|
|
|||
|
|
打断(Barge-in):AI 正在播报时用户再次按住 → 立即停止播报/取消请求,进入新一轮录音
|
|||
|
|
|
|||
|
|
iOS 最低版本:iOS 15
|
|||
|
|
|
|||
|
|
1. 总体架构(客户端模块)
|
|||
|
|
KBAiMainVC
|
|||
|
|
└─ ConversationOrchestrator (核心状态机 / 串联模块 / 取消与打断)
|
|||
|
|
├─ AudioSessionManager (AVAudioSession 配置与中断处理)
|
|||
|
|
├─ AudioCaptureManager (AVAudioEngine input tap -> 20ms PCM frames)
|
|||
|
|
├─ ASRStreamClient (NSURLSessionWebSocketTask 流式识别)
|
|||
|
|
├─ LLMStreamClient (SSE/WS token stream)
|
|||
|
|
├─ Segmenter (句子切分:够一句就触发 TTS)
|
|||
|
|
├─ TTSServiceClient (请求 TTS,适配多种返回形态)
|
|||
|
|
├─ TTSPlaybackPipeline (可插拔:URL播放器 / AAC解码 / PCM直喂)
|
|||
|
|
├─ AudioStreamPlayer (AVAudioEngine + AVAudioPlayerNode 播 PCM)
|
|||
|
|
└─ SubtitleSync (按播放进度映射文字进度)
|
|||
|
|
|
|||
|
|
2. 音频会话(AVAudioSession)与权限
|
|||
|
|
2.1 麦克风权限
|
|||
|
|
|
|||
|
|
仅在用户第一次按住说话前请求
|
|||
|
|
|
|||
|
|
若用户拒绝:提示到设置开启
|
|||
|
|
|
|||
|
|
2.2 AudioSession 配置(对话模式)
|
|||
|
|
|
|||
|
|
Objective-C(建议参数):
|
|||
|
|
|
|||
|
|
category:AVAudioSessionCategoryPlayAndRecord
|
|||
|
|
|
|||
|
|
mode:AVAudioSessionModeVoiceChat
|
|||
|
|
|
|||
|
|
options:
|
|||
|
|
|
|||
|
|
AVAudioSessionCategoryOptionDefaultToSpeaker
|
|||
|
|
|
|||
|
|
AVAudioSessionCategoryOptionAllowBluetooth
|
|||
|
|
|
|||
|
|
(可选)AVAudioSessionCategoryOptionMixWithOthers:若你希望不打断宿主音频(看产品)
|
|||
|
|
|
|||
|
|
2.3 中断与路由变化处理(必须)
|
|||
|
|
|
|||
|
|
监听:
|
|||
|
|
|
|||
|
|
AVAudioSessionInterruptionNotification
|
|||
|
|
|
|||
|
|
AVAudioSessionRouteChangeNotification
|
|||
|
|
|
|||
|
|
处理原则:
|
|||
|
|
|
|||
|
|
来电/中断开始:停止采集 + 停止播放 + cancel 网络会话
|
|||
|
|
|
|||
|
|
中断结束:回到 Idle,等待用户重新按住
|
|||
|
|
|
|||
|
|
3. 音频采集(按住期间流式上传)
|
|||
|
|
3.1 固定音频参数(锁死,便于端到端稳定)
|
|||
|
|
|
|||
|
|
Sample Rate:16000 Hz
|
|||
|
|
|
|||
|
|
Channels:1
|
|||
|
|
|
|||
|
|
Format:PCM Int16(pcm_s16le)
|
|||
|
|
|
|||
|
|
Frame Duration:20ms
|
|||
|
|
|
|||
|
|
16kHz * 0.02s = 320 samples
|
|||
|
|
|
|||
|
|
每帧 bytes = 320 * 2 = 640 bytes
|
|||
|
|
|
|||
|
|
3.2 AudioCaptureManager(AVAudioEngine 输入 tap)
|
|||
|
|
|
|||
|
|
使用:
|
|||
|
|
|
|||
|
|
AVAudioEngine
|
|||
|
|
|
|||
|
|
inputNode installTapOnBus:bufferSize:format:block:
|
|||
|
|
|
|||
|
|
关键点:
|
|||
|
|
|
|||
|
|
tap 回调线程不可做重活:只做拷贝 + dispatch 到 audioQueue
|
|||
|
|
|
|||
|
|
将 AVAudioPCMBuffer 转成 Int16 PCM NSData
|
|||
|
|
|
|||
|
|
确保稳定输出“20ms帧”,如果 tap 回调 buffer 不刚好是 20ms,需要做 帧拼接/切片(ring buffer)
|
|||
|
|
|
|||
|
|
3.3 接口定义(OC)
|
|||
|
|
@protocol AudioCaptureManagerDelegate <NSObject>
|
|||
|
|
- (void)audioCaptureManagerDidOutputPCMFrame:(NSData *)pcmFrame; // 20ms/640B
|
|||
|
|
- (void)audioCaptureManagerDidUpdateRMS:(float)rms; // 可选:UI波形
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
@interface AudioCaptureManager : NSObject
|
|||
|
|
@property (nonatomic, weak) id<AudioCaptureManagerDelegate> delegate;
|
|||
|
|
- (BOOL)startCapture:(NSError **)error;
|
|||
|
|
- (void)stopCapture;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
4. ASR 流式识别(iOS15:NSURLSessionWebSocketTask)
|
|||
|
|
4.1 建议协议(控制帧 JSON + 音频帧二进制)
|
|||
|
|
|
|||
|
|
Start(文本帧)
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
"type":"start",
|
|||
|
|
"sessionId":"uuid",
|
|||
|
|
"format":"pcm_s16le",
|
|||
|
|
"sampleRate":16000,
|
|||
|
|
"channels":1
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
|
|||
|
|
Audio(二进制帧)
|
|||
|
|
|
|||
|
|
直接发送 640B/帧 PCM
|
|||
|
|
|
|||
|
|
频率:50fps(每秒 50 帧)
|
|||
|
|
|
|||
|
|
Finalize(文本帧)
|
|||
|
|
|
|||
|
|
{ "type":"finalize", "sessionId":"uuid" }
|
|||
|
|
|
|||
|
|
4.2 下行事件
|
|||
|
|
{ "type":"partial", "text":"今天" }
|
|||
|
|
{ "type":"final", "text":"今天天气怎么样" }
|
|||
|
|
{ "type":"error", "code":123, "message":"..." }
|
|||
|
|
|
|||
|
|
4.3 ASRStreamClient 接口(OC)
|
|||
|
|
@protocol ASRStreamClientDelegate <NSObject>
|
|||
|
|
- (void)asrClientDidReceivePartialText:(NSString *)text;
|
|||
|
|
- (void)asrClientDidReceiveFinalText:(NSString *)text;
|
|||
|
|
- (void)asrClientDidFail:(NSError *)error;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
@interface ASRStreamClient : NSObject
|
|||
|
|
@property (nonatomic, weak) id<ASRStreamClientDelegate> delegate;
|
|||
|
|
- (void)startWithSessionId:(NSString *)sessionId;
|
|||
|
|
- (void)sendAudioPCMFrame:(NSData *)pcmFrame; // 20ms frame
|
|||
|
|
- (void)finalize;
|
|||
|
|
- (void)cancel;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
5. LLM 流式生成(token stream)
|
|||
|
|
5.1 目标
|
|||
|
|
|
|||
|
|
低延迟:不要等整段回答
|
|||
|
|
|
|||
|
|
使用 SSE 或 WS 收 token
|
|||
|
|
|
|||
|
|
token 进入 Segmenter,够一句就触发 TTS
|
|||
|
|
|
|||
|
|
5.2 LLMStreamClient 接口(OC)
|
|||
|
|
@protocol LLMStreamClientDelegate <NSObject>
|
|||
|
|
- (void)llmClientDidReceiveToken:(NSString *)token;
|
|||
|
|
- (void)llmClientDidComplete;
|
|||
|
|
- (void)llmClientDidFail:(NSError *)error;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
@interface LLMStreamClient : NSObject
|
|||
|
|
@property (nonatomic, weak) id<LLMStreamClientDelegate> delegate;
|
|||
|
|
- (void)sendUserText:(NSString *)text conversationId:(NSString *)cid;
|
|||
|
|
- (void)cancel;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
6. Segmenter(句子切分:先播第一句)
|
|||
|
|
6.1 切分规则(推荐)
|
|||
|
|
|
|||
|
|
任一满足则切分成 segment:
|
|||
|
|
|
|||
|
|
遇到 。!?\n 之一
|
|||
|
|
|
|||
|
|
或累积字符数 ≥ 30(可配置)
|
|||
|
|
|
|||
|
|
6.2 Segmenter 接口(OC)
|
|||
|
|
@interface Segmenter : NSObject
|
|||
|
|
- (void)appendToken:(NSString *)token;
|
|||
|
|
- (NSArray<NSString *> *)popReadySegments; // 返回立即可TTS的片段数组
|
|||
|
|
- (void)reset;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
7. TTS:返回形态未定 → 客户端做“可插拔播放管线”
|
|||
|
|
|
|||
|
|
由于服务端同事未定输出格式,客户端必须支持以下 四种 TTS 输出模式 的任意一种:
|
|||
|
|
|
|||
|
|
模式 A:返回 m4a/MP3 URL(最容易落地)
|
|||
|
|
|
|||
|
|
服务端返回 URL(或 base64 文件)
|
|||
|
|
|
|||
|
|
客户端用 AVPlayer / AVAudioPlayer 播放
|
|||
|
|
|
|||
|
|
字幕同步用“音频时长映射”(可拿到 duration)
|
|||
|
|
|
|||
|
|
优点:服务端简单
|
|||
|
|
缺点:首帧延迟通常更高(要等整段生成、至少等首包)
|
|||
|
|
|
|||
|
|
模式 B:返回 AAC chunk(流式)
|
|||
|
|
|
|||
|
|
服务端 WS 推 AAC 帧
|
|||
|
|
|
|||
|
|
客户端需要 AAC 解码成 PCM,再喂 AudioStreamPlayer
|
|||
|
|
|
|||
|
|
模式 C:返回 Opus chunk(流式)
|
|||
|
|
|
|||
|
|
需 Opus 解码库(服务端/客户端成本更高)
|
|||
|
|
|
|||
|
|
解码后喂 PCM 播放
|
|||
|
|
|
|||
|
|
模式 D:返回 PCM chunk(最适合低延迟)
|
|||
|
|
|
|||
|
|
服务端直接推 PCM16 chunk(比如 100ms 一块)
|
|||
|
|
|
|||
|
|
客户端直接转 AVAudioPCMBuffer schedule
|
|||
|
|
|
|||
|
|
延迟最低、实现最稳
|
|||
|
|
|
|||
|
|
8. TTSServiceClient(统一网络层接口)
|
|||
|
|
8.1 统一回调事件(抽象)
|
|||
|
|
typedef NS_ENUM(NSInteger, TTSPayloadType) {
|
|||
|
|
TTSPayloadTypeURL, // A
|
|||
|
|
TTSPayloadTypePCMChunk, // D
|
|||
|
|
TTSPayloadTypeAACChunk, // B
|
|||
|
|
TTSPayloadTypeOpusChunk // C
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
@protocol TTSServiceClientDelegate <NSObject>
|
|||
|
|
- (void)ttsClientDidReceiveURL:(NSURL *)url segmentId:(NSString *)segmentId;
|
|||
|
|
- (void)ttsClientDidReceiveAudioChunk:(NSData *)chunk
|
|||
|
|
payloadType:(TTSPayloadType)type
|
|||
|
|
segmentId:(NSString *)segmentId;
|
|||
|
|
- (void)ttsClientDidFinishSegment:(NSString *)segmentId;
|
|||
|
|
- (void)ttsClientDidFail:(NSError *)error;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
@interface TTSServiceClient : NSObject
|
|||
|
|
@property (nonatomic, weak) id<TTSServiceClientDelegate> delegate;
|
|||
|
|
- (void)requestTTSForText:(NSString *)text segmentId:(NSString *)segmentId;
|
|||
|
|
- (void)cancel;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
|
|||
|
|
这样服务端最后选哪种输出,你只需实现对应分支即可,不需要推翻客户端架构。
|
|||
|
|
|
|||
|
|
9. TTSPlaybackPipeline(播放管线:根据 payloadType 路由)
|
|||
|
|
9.1 设计目标
|
|||
|
|
|
|||
|
|
支持 URL 播放与流式 chunk 播放
|
|||
|
|
|
|||
|
|
提供统一的“开始播放/停止/进度”接口供字幕同步与打断使用
|
|||
|
|
|
|||
|
|
9.2 Pipeline 结构(建议)
|
|||
|
|
|
|||
|
|
TTSPlaybackPipeline 只做路由与队列管理
|
|||
|
|
|
|||
|
|
URL → TTSURLPlayer(AVPlayer)
|
|||
|
|
|
|||
|
|
PCM → AudioStreamPlayer(AVAudioEngine)
|
|||
|
|
|
|||
|
|
AAC/Opus → Decoder → PCM → AudioStreamPlayer
|
|||
|
|
|
|||
|
|
9.3 Pipeline 接口(OC)
|
|||
|
|
@protocol TTSPlaybackPipelineDelegate <NSObject>
|
|||
|
|
- (void)pipelineDidStartSegment:(NSString *)segmentId duration:(NSTimeInterval)duration;
|
|||
|
|
- (void)pipelineDidUpdatePlaybackTime:(NSTimeInterval)time segmentId:(NSString *)segmentId;
|
|||
|
|
- (void)pipelineDidFinishSegment:(NSString *)segmentId;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
@interface TTSPlaybackPipeline : NSObject
|
|||
|
|
@property (nonatomic, weak) id<TTSPlaybackPipelineDelegate> delegate;
|
|||
|
|
|
|||
|
|
- (BOOL)start:(NSError **)error; // 启动音频引擎等
|
|||
|
|
- (void)stop; // 立即停止(打断)
|
|||
|
|
|
|||
|
|
- (void)enqueueURL:(NSURL *)url segmentId:(NSString *)segmentId;
|
|||
|
|
- (void)enqueueChunk:(NSData *)chunk payloadType:(TTSPayloadType)type segmentId:(NSString *)segmentId;
|
|||
|
|
|
|||
|
|
// 可选:用于字幕同步
|
|||
|
|
- (NSTimeInterval)currentTimeForSegment:(NSString *)segmentId;
|
|||
|
|
- (NSTimeInterval)durationForSegment:(NSString *)segmentId;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
10. AudioStreamPlayer(PCM 流式播放,低延迟核心)
|
|||
|
|
10.1 使用 AVAudioEngine + AVAudioPlayerNode
|
|||
|
|
|
|||
|
|
将 PCM chunk 转 AVAudioPCMBuffer
|
|||
|
|
|
|||
|
|
scheduleBuffer 播放
|
|||
|
|
|
|||
|
|
维护“当前 segment 的播放时间/总时长”(可估算或累加 chunk 时长)
|
|||
|
|
|
|||
|
|
10.2 接口(OC)
|
|||
|
|
@interface AudioStreamPlayer : NSObject
|
|||
|
|
- (BOOL)start:(NSError **)error;
|
|||
|
|
- (void)stop;
|
|||
|
|
- (void)enqueuePCMChunk:(NSData *)pcmData
|
|||
|
|
sampleRate:(double)sampleRate
|
|||
|
|
channels:(int)channels
|
|||
|
|
segmentId:(NSString *)segmentId;
|
|||
|
|
|
|||
|
|
- (NSTimeInterval)playbackTimeForSegment:(NSString *)segmentId;
|
|||
|
|
- (NSTimeInterval)durationForSegment:(NSString *)segmentId;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
|
|||
|
|
PCM chunk 的粒度建议:50ms~200ms(太小 schedule 太频繁,太大延迟高)。
|
|||
|
|
|
|||
|
|
11. 字幕同步(延迟优先)
|
|||
|
|
11.1 策略
|
|||
|
|
|
|||
|
|
对每个 segment 的文本 text,按播放进度映射显示字符数:
|
|||
|
|
|
|||
|
|
visibleCount = round(text.length * (t / T))
|
|||
|
|
|
|||
|
|
t:segment 当前播放进度(pipeline 提供)
|
|||
|
|
|
|||
|
|
T:segment 总时长(URL 模式直接取;chunk 模式可累加估算)
|
|||
|
|
|
|||
|
|
11.2 SubtitleSync 接口(OC)
|
|||
|
|
@interface SubtitleSync : NSObject
|
|||
|
|
- (NSString *)visibleTextForFullText:(NSString *)fullText
|
|||
|
|
currentTime:(NSTimeInterval)t
|
|||
|
|
duration:(NSTimeInterval)T;
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
12. ConversationOrchestrator(状态机 + 打断 + 队列)
|
|||
|
|
12.1 状态
|
|||
|
|
typedef NS_ENUM(NSInteger, ConversationState) {
|
|||
|
|
ConversationStateIdle,
|
|||
|
|
ConversationStateListening,
|
|||
|
|
ConversationStateRecognizing,
|
|||
|
|
ConversationStateThinking,
|
|||
|
|
ConversationStateSpeaking
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
12.2 关键流程
|
|||
|
|
事件:用户按住(userDidPressRecord)
|
|||
|
|
|
|||
|
|
如果正在 Speaking/Thinking:
|
|||
|
|
|
|||
|
|
[ttsService cancel]
|
|||
|
|
|
|||
|
|
[llmClient cancel]
|
|||
|
|
|
|||
|
|
[asrClient cancel](如仍在识别)
|
|||
|
|
|
|||
|
|
[pipeline stop](立即停播)
|
|||
|
|
|
|||
|
|
清空 segment 队列、字幕队列
|
|||
|
|
|
|||
|
|
配置/激活 AudioSession
|
|||
|
|
|
|||
|
|
新建 sessionId
|
|||
|
|
|
|||
|
|
[asrClient startWithSessionId:]
|
|||
|
|
|
|||
|
|
[audioCapture startCapture:]
|
|||
|
|
|
|||
|
|
state = Listening
|
|||
|
|
|
|||
|
|
事件:用户松开(userDidReleaseRecord)
|
|||
|
|
|
|||
|
|
[audioCapture stopCapture]
|
|||
|
|
|
|||
|
|
[asrClient finalize]
|
|||
|
|
|
|||
|
|
state = Recognizing
|
|||
|
|
|
|||
|
|
回调:ASR final text
|
|||
|
|
|
|||
|
|
UI 显示用户最终文本
|
|||
|
|
|
|||
|
|
state = Thinking
|
|||
|
|
|
|||
|
|
开始 LLM stream:[llmClient sendUserText:conversationId:]
|
|||
|
|
|
|||
|
|
回调:LLM token
|
|||
|
|
|
|||
|
|
segmenter appendToken
|
|||
|
|
|
|||
|
|
segments = [segmenter popReadySegments]
|
|||
|
|
|
|||
|
|
对每个 segment:
|
|||
|
|
|
|||
|
|
生成 segmentId
|
|||
|
|
|
|||
|
|
记录 segmentTextMap[segmentId] = segmentText
|
|||
|
|
|
|||
|
|
[ttsService requestTTSForText:segmentId:]
|
|||
|
|
|
|||
|
|
当收到第一个可播放音频并开始播:
|
|||
|
|
|
|||
|
|
state = Speaking
|
|||
|
|
|
|||
|
|
回调:TTS 音频到达
|
|||
|
|
|
|||
|
|
URL:[pipeline enqueueURL:segmentId:]
|
|||
|
|
|
|||
|
|
chunk:[pipeline enqueueChunk:payloadType:segmentId:]
|
|||
|
|
|
|||
|
|
回调:pipeline 播放时间更新(每 30-60fps 或定时器)
|
|||
|
|
|
|||
|
|
根据当前 segmentId 取到 fullText
|
|||
|
|
|
|||
|
|
visible = [subtitleSync visibleTextForFullText:currentTime:duration:]
|
|||
|
|
|
|||
|
|
UI 更新 AI 可见文本
|
|||
|
|
|
|||
|
|
12.3 打断(Barge-in)
|
|||
|
|
|
|||
|
|
当用户再次按住:
|
|||
|
|
|
|||
|
|
立即 stop 播放
|
|||
|
|
|
|||
|
|
取消所有未完成网络请求
|
|||
|
|
|
|||
|
|
丢弃所有未播放 segments
|
|||
|
|
|
|||
|
|
开始新一轮录音
|
|||
|
|
|
|||
|
|
12.4 Orchestrator 接口(OC)
|
|||
|
|
@interface ConversationOrchestrator : NSObject
|
|||
|
|
@property (nonatomic, assign, readonly) ConversationState state;
|
|||
|
|
|
|||
|
|
- (void)userDidPressRecord;
|
|||
|
|
- (void)userDidReleaseRecord;
|
|||
|
|
|
|||
|
|
@property (nonatomic, copy) void (^onUserFinalText)(NSString *text);
|
|||
|
|
@property (nonatomic, copy) void (^onAssistantVisibleText)(NSString *text);
|
|||
|
|
@property (nonatomic, copy) void (^onError)(NSError *error);
|
|||
|
|
@end
|
|||
|
|
|
|||
|
|
13. 线程/队列模型(强制要求,避免竞态)
|
|||
|
|
|
|||
|
|
建议三条队列 + 一条 orchestrator 串行队列:
|
|||
|
|
|
|||
|
|
dispatch_queue_t audioQueue;(采集帧处理、ring buffer)
|
|||
|
|
|
|||
|
|
dispatch_queue_t networkQueue;(WS 收发解析)
|
|||
|
|
|
|||
|
|
dispatch_queue_t orchestratorQueue;(状态机串行,唯一修改 state/队列的地方)
|
|||
|
|
|
|||
|
|
UI 更新统一回主线程
|
|||
|
|
|
|||
|
|
规则:
|
|||
|
|
|
|||
|
|
任何网络/音频回调 → dispatch_async(orchestratorQueue, ^{ ... })
|
|||
|
|
|
|||
|
|
Orchestrator 内部再决定是否发 UI 回调(主线程)
|
|||
|
|
|
|||
|
|
14. 关键参数(延迟与稳定性)
|
|||
|
|
|
|||
|
|
音频帧:20ms
|
|||
|
|
|
|||
|
|
PCM:16k/mono/int16
|
|||
|
|
|
|||
|
|
ASR 上传:WS 二进制
|
|||
|
|
|
|||
|
|
LLM:token stream
|
|||
|
|
|
|||
|
|
TTS:优先 chunk;若 URL 模式也要尽快开始下载与播放
|
|||
|
|
|
|||
|
|
chunk 播放缓冲:100~200ms(防抖动)
|
|||
|
|
|
|||
|
|
15. 开发落地建议(服务端未定情况下的迭代路径)
|
|||
|
|
Phase 1:先跑通端到端(用“URL 模式”模拟)
|
|||
|
|
|
|||
|
|
TTSServiceClient 先假定服务端返回 m4a URL(或本地 mock URL)
|
|||
|
|
|
|||
|
|
Pipeline 实现 URL 播放(AVPlayer)
|
|||
|
|
|
|||
|
|
打断 + 字幕同步先跑通
|
|||
|
|
|
|||
|
|
Phase 2:服务端定了输出后再替换
|
|||
|
|
|
|||
|
|
若服务端给 PCM chunk:直接走 AudioStreamPlayer(最推荐)
|
|||
|
|
|
|||
|
|
若给 AAC chunk:补 AAC 解码模块(AudioConverter 或第三方)
|
|||
|
|
|
|||
|
|
若给 Opus chunk:集成 Opus 解码库,再喂 PCM
|
|||
|
|
|
|||
|
|
关键:Orchestrator/Segmenter/ASR/字幕同步都不需要改,只替换 TTSPlaybackPipeline 分支。
|
|||
|
|
|
|||
|
|
16. 合规/体验注意
|
|||
|
|
|
|||
|
|
录音必须由用户动作触发(按住)
|
|||
|
|
|
|||
|
|
明确的“正在录音”提示与波形
|
|||
|
|
|
|||
|
|
避免自动偷录
|
|||
|
|
|
|||
|
|
播放时允许随时打断
|
|||
|
|
|
|||
|
|
文档结束
|
|||
|
|
给“写代码的 AI”的额外要求(建议你一并附上)
|
|||
|
|
|
|||
|
|
语言:Objective-C(.h/.m)
|
|||
|
|
|
|||
|
|
iOS 15+,WebSocket 用 NSURLSessionWebSocketTask
|
|||
|
|
|
|||
|
|
音频采集用 AVAudioEngine + ring buffer 切 20ms 帧
|
|||
|
|
|
|||
|
|
播放管线必须支持:URL 播放(AVPlayer)+ PCM chunk 播放(AVAudioEngine)
|
|||
|
|
|
|||
|
|
其余 AAC/Opus 分支可留 TODO / stub,但接口要预留
|