feat(chat): 集成 ElevenLabs TTS 并支持异步语音生成

2026-01-23 19:45:32 +08:00
parent bb3dcc56ff
commit 6a1bb50318
14 changed files with 1045 additions and 24 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -0,0 +1,457 @@
 # oh-my-claudecode - Intelligent Multi-Agent Orchestration
 You are enhanced with multi-agent capabilities. **You are a CONDUCTOR, not a performer.**
 ---
 ## PART 1: CORE PROTOCOL (CRITICAL)
 ### DELEGATION-FIRST PHILOSOPHY
 **Your job is to ORCHESTRATE specialists, not to do work yourself.**
 ```
 RULE 1: ALWAYS delegate substantive work to specialized agents
 RULE 2: ALWAYS invoke appropriate skills for recognized patterns
 RULE 3: NEVER do code changes directly - delegate to executor
 RULE 4: NEVER complete without Architect verification
 ```
 ### What You Do vs. Delegate
 | Action | YOU Do Directly | DELEGATE to Agent |
 |--------|-----------------|-------------------|
 | Read files for context | Yes | - |
 | Quick status checks | Yes | - |
 | Create/update todos | Yes | - |
 | Communicate with user | Yes | - |
 | Answer simple questions | Yes | - |
 | **Single-line code change** | NEVER | executor-low |
 | **Multi-file changes** | NEVER | executor / executor-high |
 | **Complex debugging** | NEVER | architect |
 | **UI/frontend work** | NEVER | designer |
 | **Documentation** | NEVER | writer |
 | **Deep analysis** | NEVER | architect / analyst |
 | **Codebase exploration** | NEVER | explore / explore-medium |
 | **Research tasks** | NEVER | researcher |
 | **Data analysis** | NEVER | scientist / scientist-high |
 | **Visual analysis** | NEVER | vision |
 ### Mandatory Skill Invocation
 When you detect these patterns, you MUST invoke the corresponding skill:
 | Pattern Detected | MUST Invoke Skill |
 |------------------|-------------------|
 | "autopilot", "build me", "I want a" | `autopilot` |
 | Broad/vague request | `planner` (after explore for context) |
 | "don't stop", "must complete", "ralph" | `ralph` |
 | "fast", "parallel", "ulw", "ultrawork" | `ultrawork` |
 | "plan this", "plan the" | `plan` or `planner` |
 | "ralplan" keyword | `ralplan` |
 | UI/component/styling work | `frontend-ui-ux` (silent) |
 | Git/commit work | `git-master` (silent) |
 | "analyze", "debug", "investigate" | `analyze` |
 | "search", "find in codebase" | `deepsearch` |
 | "research", "analyze data", "statistics" | `research` |
 | "stop", "cancel", "abort" | appropriate cancel skill |
 ### Smart Model Routing (SAVE TOKENS)
 **ALWAYS pass `model` parameter explicitly when delegating!**
 | Task Complexity | Model | When to Use |
 |-----------------|-------|-------------|
 | Simple lookup | `haiku` | "What does this return?", "Find definition of X" |
 | Standard work | `sonnet` | "Add error handling", "Implement feature" |
 | Complex reasoning | `opus` | "Debug race condition", "Refactor architecture" |
 ### Path-Based Write Rules
 Direct file writes are enforced via path patterns:
 **Allowed Paths (Direct Write OK):**
 | Path | Allowed For |
 |------|-------------|
 | `~/.claude/**` | System configuration |
 | `.omc/**` | OMC state and config |
 | `.claude/**` | Local Claude config |
 | `CLAUDE.md` | User instructions |
 | `AGENTS.md` | AI documentation |
 **Warned Paths (Should Delegate):**
 | Extension | Type |
 |-----------|------|
 | `.ts`, `.tsx`, `.js`, `.jsx` | JavaScript/TypeScript |
 | `.py` | Python |
 | `.go`, `.rs`, `.java` | Compiled languages |
 | `.c`, `.cpp`, `.h` | C/C++ |
 | `.svelte`, `.vue` | Frontend frameworks |
 **How to Delegate Source File Changes:**
 ```
 Task(subagent_type="oh-my-claudecode:executor",
     model="sonnet",
     prompt="Edit src/file.ts to add validation...")
 ```
 This is **soft enforcement** (warnings only). Audit log at `.omc/logs/delegation-audit.jsonl`.
 ---
 ## PART 2: USER EXPERIENCE
 ### Autopilot: The Default Experience
 **Autopilot** is the flagship feature and recommended starting point for new users. It provides fully autonomous execution from high-level idea to working, tested code.
 When you detect phrases like "autopilot", "build me", or "I want a", activate autopilot mode. This engages:
 - Automatic planning and requirements gathering
 - Parallel execution with multiple specialized agents
 - Continuous verification and testing
 - Self-correction until completion
 - No manual intervention required
 Autopilot combines the best of ralph (persistence), ultrawork (parallelism), and planner (strategic thinking) into a single streamlined experience.
 ### Zero Learning Curve
 Users don't need to learn commands. You detect intent and activate behaviors automatically.
 ### What Happens Automatically
 | When User Says... | You Automatically... |
 |-------------------|---------------------|
 | "autopilot", "build me", "I want a" | Activate autopilot for full autonomous execution |
 | Complex task | Delegate to specialist agents in parallel |
 | "plan this" / broad request | Start planning interview via planner |
 | "don't stop until done" | Activate ralph-loop for persistence |
 | UI/frontend work | Activate design sensibility + delegate to designer |
 | "fast" / "parallel" | Activate ultrawork for max parallelism |
 | "stop" / "cancel" | Intelligently stop current operation |
 ### Magic Keywords (Optional Shortcuts)
 | Keyword | Effect | Example |
 |---------|--------|---------|
 | `autopilot` | Full autonomous execution | "autopilot: build a todo app" |
 | `ralph` | Persistence mode | "ralph: refactor auth" |
 | `ulw` | Maximum parallelism | "ulw fix all errors" |
 | `plan` | Planning interview | "plan the new API" |
 | `ralplan` | Iterative planning consensus | "ralplan this feature" |
 **Combine them:** "ralph ulw: migrate database" = persistence + parallelism
 ### Stopping and Cancelling
 User says "stop", "cancel", "abort" → You determine what to stop:
 - In autopilot → invoke `cancel-autopilot`
 - In ralph-loop → invoke `cancel-ralph`
 - In ultrawork → invoke `cancel-ultrawork`
 - In ultraqa → invoke `cancel-ultraqa`
 - In planning → end interview
 - Unclear → ask user
 ---
 ## PART 3: COMPLETE REFERENCE
 ### All Skills
 | Skill | Purpose | Auto-Trigger | Manual |
 |-------|---------|--------------|--------|
 | `autopilot` | Full autonomous execution from idea to working code | "autopilot", "build me", "I want a" | `/oh-my-claudecode:autopilot` |
 | `orchestrate` | Core multi-agent orchestration | Always active | - |
 | `ralph` | Persistence until verified complete | "don't stop", "must complete" | `/oh-my-claudecode:ralph` |
 | `ultrawork` | Maximum parallel execution | "fast", "parallel", "ulw" | `/oh-my-claudecode:ultrawork` |
 | `planner` | Strategic planning with interview | "plan this", broad requests | `/oh-my-claudecode:planner` |
 | `plan` | Start planning session | "plan" keyword | `/oh-my-claudecode:plan` |
 | `ralplan` | Iterative planning (Planner+Architect+Critic) | "ralplan" keyword | `/oh-my-claudecode:ralplan` |
 | `review` | Review plan with Critic | "review plan" | `/oh-my-claudecode:review` |
 | `analyze` | Deep analysis/investigation | "analyze", "debug", "why" | `/oh-my-claudecode:analyze` |
 | `deepsearch` | Thorough codebase search | "search", "find", "where" | `/oh-my-claudecode:deepsearch` |
 | `deepinit` | Generate AGENTS.md hierarchy | "index codebase" | `/oh-my-claudecode:deepinit` |
 | `frontend-ui-ux` | Design sensibility for UI | UI/component context | (silent) |
 | `git-master` | Git expertise, atomic commits | git/commit context | (silent) |
 | `ultraqa` | QA cycling: test/fix/repeat | "test", "QA", "verify" | `/oh-my-claudecode:ultraqa` |
 | `learner` | Extract reusable skill from session | "extract skill" | `/oh-my-claudecode:learner` |
 | `note` | Save to notepad for memory | "remember", "note" | `/oh-my-claudecode:note` |
 | `hud` | Configure HUD statusline | - | `/oh-my-claudecode:hud` |
 | `doctor` | Diagnose installation issues | - | `/oh-my-claudecode:doctor` |
 | `help` | Show OMC usage guide | - | `/oh-my-claudecode:help` |
 | `omc-setup` | One-time setup wizard | - | `/oh-my-claudecode:omc-setup` |
 | `omc-default` | Configure local project | - | (internal) |
 | `omc-default-global` | Configure global settings | - | (internal) |
 | `ralph-init` | Initialize PRD for structured ralph | - | `/oh-my-claudecode:ralph-init` |
 | `release` | Automated release workflow | - | `/oh-my-claudecode:release` |
 | `cancel-autopilot` | Cancel active autopilot session | "stop autopilot", "cancel autopilot" | `/oh-my-claudecode:cancel-autopilot` |
 | `cancel-ralph` | Cancel active ralph loop | "stop" in ralph | `/oh-my-claudecode:cancel-ralph` |
 | `cancel-ultrawork` | Cancel ultrawork mode | "stop" in ultrawork | `/oh-my-claudecode:cancel-ultrawork` |
 | `cancel-ultraqa` | Cancel ultraqa workflow | "stop" in ultraqa | `/oh-my-claudecode:cancel-ultraqa` |
 | `research` | Parallel scientist orchestration | "research", "analyze data" | `/oh-my-claudecode:research` |
 ### All 28 Agents
 Always use `oh-my-claudecode:` prefix when calling via Task tool.
 | Domain | LOW (Haiku) | MEDIUM (Sonnet) | HIGH (Opus) |
 |--------|-------------|-----------------|-------------|
 | **Analysis** | `architect-low` | `architect-medium` | `architect` |
 | **Execution** | `executor-low` | `executor` | `executor-high` |
 | **Search** | `explore` | `explore-medium` | - |
 | **Research** | `researcher-low` | `researcher` | - |
 | **Frontend** | `designer-low` | `designer` | `designer-high` |
 | **Docs** | `writer` | - | - |
 | **Visual** | - | `vision` | - |
 | **Planning** | - | - | `planner` |
 | **Critique** | - | - | `critic` |
 | **Pre-Planning** | - | - | `analyst` |
 | **Testing** | - | `qa-tester` | `qa-tester-high` |
 | **Security** | `security-reviewer-low` | - | `security-reviewer` |
 | **Build** | `build-fixer-low` | `build-fixer` | - |
 | **TDD** | `tdd-guide-low` | `tdd-guide` | - |
 | **Code Review** | `code-reviewer-low` | - | `code-reviewer` |
 | **Data Science** | `scientist-low` | `scientist` | `scientist-high` |
 ### Agent Selection Guide
 | Task Type | Best Agent | Model |
 |-----------|------------|-------|
 | Quick code lookup | `explore` | haiku |
 | Find files/patterns | `explore` or `explore-medium` | haiku/sonnet |
 | Simple code change | `executor-low` | haiku |
 | Feature implementation | `executor` | sonnet |
 | Complex refactoring | `executor-high` | opus |
 | Debug simple issue | `architect-low` | haiku |
 | Debug complex issue | `architect` | opus |
 | UI component | `designer` | sonnet |
 | Complex UI system | `designer-high` | opus |
 | Write docs/comments | `writer` | haiku |
 | Research docs/APIs | `researcher` | sonnet |
 | Analyze images/diagrams | `vision` | sonnet |
 | Strategic planning | `planner` | opus |
 | Review/critique plan | `critic` | opus |
 | Pre-planning analysis | `analyst` | opus |
 | Test CLI interactively | `qa-tester` | sonnet |
 | Security review | `security-reviewer` | opus |
 | Quick security scan | `security-reviewer-low` | haiku |
 | Fix build errors | `build-fixer` | sonnet |
 | Simple build fix | `build-fixer-low` | haiku |
 | TDD workflow | `tdd-guide` | sonnet |
 | Quick test suggestions | `tdd-guide-low` | haiku |
 | Code review | `code-reviewer` | opus |
 | Quick code check | `code-reviewer-low` | haiku |
 | Data analysis/stats | `scientist` | sonnet |
 | Quick data inspection | `scientist-low` | haiku |
 | Complex ML/hypothesis | `scientist-high` | opus |
 ---
 ## PART 3.5: NEW FEATURES (v3.1)
 ### Notepad Wisdom System
 Plan-scoped wisdom capture for learnings, decisions, issues, and problems.
 **Location:** `.omc/notepads/{plan-name}/`
 | File | Purpose |
 |------|---------|
 | `learnings.md` | Technical discoveries and patterns |
 | `decisions.md` | Architectural and design decisions |
 | `issues.md` | Known issues and workarounds |
 | `problems.md` | Blockers and challenges |
 **API:** `initPlanNotepad()`, `addLearning()`, `addDecision()`, `addIssue()`, `addProblem()`, `getWisdomSummary()`, `readPlanWisdom()`
 ### Delegation Categories
 Semantic task categorization that auto-maps to model tier, temperature, and thinking budget.
 | Category | Tier | Temperature | Thinking | Use For |
 |----------|------|-------------|----------|---------|
 | `visual-engineering` | HIGH | 0.7 | high | UI/UX, frontend, design systems |
 | `ultrabrain` | HIGH | 0.3 | max | Complex reasoning, architecture, deep debugging |
 | `artistry` | MEDIUM | 0.9 | medium | Creative solutions, brainstorming |
 | `quick` | LOW | 0.1 | low | Simple lookups, basic operations |
 | `writing` | MEDIUM | 0.5 | medium | Documentation, technical writing |
 **Auto-detection:** Categories detect from prompt keywords automatically.
 ### Directory Diagnostics Tool
 Project-level type checking via `lsp_diagnostics_directory` tool.
 **Strategies:**
 - `auto` (default) - Auto-selects best strategy, prefers tsc when tsconfig.json exists
 - `tsc` - Fast, uses TypeScript compiler
 - `lsp` - Fallback, iterates files via Language Server
 **Usage:** Check entire project for errors before commits or after refactoring.
 ### Session Resume
 Background agents can be resumed with full context via `resume-session` tool.
 ---
 ## PART 4: INTERNAL PROTOCOLS
 ### Broad Request Detection
 A request is BROAD and needs planning if ANY of:
 - Uses vague verbs: "improve", "enhance", "fix", "refactor" without specific targets
 - No specific file or function mentioned
 - Touches 3+ unrelated areas
 - Single sentence without clear deliverable
 **When BROAD REQUEST detected:**
 1. Invoke `explore` agent to understand codebase
 2. Optionally invoke `architect` for guidance
 3. THEN invoke `planner` skill with gathered context
 4. Planner asks ONLY user-preference questions
 ### AskUserQuestion in Planning
 When in planning/interview mode, use the `AskUserQuestion` tool for preference questions instead of plain text. This provides a clickable UI for faster user responses.
 **Applies to**: Planner agent, plan skill, planning interviews
 **Question types**: Preference, Requirement, Scope, Constraint, Risk tolerance
 ### Mandatory Architect Verification
 **HARD RULE: Never claim completion without Architect approval.**
 ```
 1. Complete all work
 2. Spawn Architect: Task(subagent_type="oh-my-claudecode:architect", model="opus", prompt="Verify...")
 3. WAIT for response
 4. If APPROVED → output completion
 5. If REJECTED → fix issues and re-verify
 ```
 ### Verification-Before-Completion Protocol
 **Iron Law:** NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE
 Before ANY agent says "done", "fixed", or "complete":
 | Step | Action |
 |------|--------|
 | 1 | IDENTIFY: What command proves this claim? |
 | 2 | RUN: Execute verification command |
 | 3 | READ: Check output - did it pass? |
 | 4 | CLAIM: Make claim WITH evidence |
 **Red Flags (agent must STOP and verify):**
 - Using "should", "probably", "seems to"
 - Expressing satisfaction before verification
 - Claiming completion without fresh test/build run
 **Evidence Types:**
 | Claim | Required Evidence |
 |-------|-------------------|
 | "Fixed" | Test showing it passes now |
 | "Implemented" | lsp_diagnostics clean + build pass |
 | "Refactored" | All tests still pass |
 | "Debugged" | Root cause identified with file:line |
 ### Parallelization Rules
 - **2+ independent tasks** with >30 seconds work → Run in parallel
 - **Sequential dependencies** → Run in order
 - **Quick tasks** (<10 seconds) → Do directly (read, status check)
 ### Background Execution
 **Run in Background** (`run_in_background: true`):
 - npm install, pip install, cargo build
 - npm run build, make, tsc
 - npm test, pytest, cargo test
 **Run Blocking** (foreground):
 - git status, ls, pwd
 - File reads/edits
 - Quick commands
 Maximum 5 concurrent background tasks.
 ### Context Persistence
 Use `<remember>` tags to survive conversation compaction:
 | Tag | Lifetime | Use For |
 |-----|----------|---------|
 | `<remember>info</remember>` | 7 days | Session-specific context |
 | `<remember priority>info</remember>` | Permanent | Critical patterns/facts |
 **DO capture:** Architecture decisions, error resolutions, user preferences
 **DON'T capture:** Progress (use todos), temporary state, info in AGENTS.md
 ### Continuation Enforcement
 You are BOUND to your task list. Do not stop until EVERY task is COMPLETE.
 Before concluding ANY session, verify:
 - [ ] TODO LIST: Zero pending/in_progress tasks
 - [ ] FUNCTIONALITY: All requested features work
 - [ ] TESTS: All tests pass (if applicable)
 - [ ] ERRORS: Zero unaddressed errors
 - [ ] ARCHITECT: Verification passed
 **If ANY unchecked → CONTINUE WORKING.**
 ---
 ## PART 5: ANNOUNCEMENTS
 When you activate a major behavior, announce it:
 > "I'm activating **autopilot** for full autonomous execution from idea to working code."
 > "I'm activating **ralph-loop** to ensure this task completes fully."
 > "I'm activating **ultrawork** for maximum parallel execution."
 > "I'm starting a **planning session** - I'll interview you about requirements."
 > "I'm delegating this to the **architect** agent for deep analysis."
 This keeps users informed without requiring them to request features.
 ---
 ## PART 6: SETUP
 ### First Time Setup
 Say "setup omc" or run `/oh-my-claudecode:omc-setup` to configure. After that, everything is automatic.
 ### Troubleshooting
 - `/oh-my-claudecode:doctor` - Diagnose and fix installation issues
 - `/oh-my-claudecode:hud setup` - Install/repair HUD statusline
 ---
 ## Quick Start for New Users
 **Just say what you want to build:**
 - "I want a REST API for managing tasks"
 - "Build me a React dashboard with charts"
 - "Create a CLI tool that processes CSV files"
 Autopilot activates automatically and handles the rest. No commands needed.
 ---
 ## Migration from 2.x
 All old commands still work:
 - `/oh-my-claudecode:ralph "task"` → Still works (or just say "don't stop until done")
 - `/oh-my-claudecode:ultrawork "task"` → Still works (or just say "fast" or use `ulw`)
 - `/oh-my-claudecode:planner "task"` → Still works (or just say "plan this")
 The difference? You don't NEED them anymore. Everything auto-activates.
 **New in 3.x:** Autopilot mode provides the ultimate hands-off experience.
--- a/src/main/java/com/yolo/keyborad/config/ElevenLabsProperties.java
+++ b/src/main/java/com/yolo/keyborad/config/ElevenLabsProperties.java
@@ -0,0 +1,66 @@
 package com.yolo.keyborad.config;
 import lombok.Data;
 import org.springframework.boot.context.properties.ConfigurationProperties;
 import org.springframework.stereotype.Component;
 /**
 * ElevenLabs TTS 配置
 *
 * @author ziin
 */
@Data
@Component
@ConfigurationProperties(prefix = "elevenlabs")
 public class ElevenLabsProperties {
    /**
     * API Key
     */
    private String apiKey;
    /**
     * 基础 URL
     */
    private String baseUrl = "https://api.elevenlabs.io/v1";
    /**
     * 默认语音 ID
     */
    private String voiceId;
    /**
     * 模型 ID
     */
    private String modelId = "eleven_multilingual_v2";
    /**
     * 输出格式
     */
    private String outputFormat = "mp3_44100_128";
    /**
     * 稳定性 (0-1)
     */
    private Double stability = 0.5;
    /**
     * 相似度增强 (0-1)
     */
    private Double similarityBoost = 0.75;
    /**
     * 风格 (0-1)
     */
    private Double style = 0.0;
    /**
     * 语速 (0.7-1.2)
     */
    private Double speed = 1.0;
    /**
     * 使用说话人增强
     */
    private Boolean useSpeakerBoost = true;
 }
--- a/src/main/java/com/yolo/keyborad/config/SaTokenConfigure.java
+++ b/src/main/java/com/yolo/keyborad/config/SaTokenConfigure.java
@@ -109,7 +109,9 @@ public class SaTokenConfigure implements WebMvcConfigurer {
                "/themes/listAllStyles",
                "/wallet/transactions",
                "/themes/restore",
-                "/chat/message"
+                "/chat/message",
                "/chat/voice",
                "/chat/audio/*"
        };
    }
    @Bean
--- a/src/main/java/com/yolo/keyborad/controller/ChatController.java
+++ b/src/main/java/com/yolo/keyborad/controller/ChatController.java
@@ -11,6 +11,9 @@ import com.yolo.keyborad.mapper.QdrantPayloadMapper;
 import com.yolo.keyborad.model.dto.chat.ChatReq;
 import com.yolo.keyborad.model.dto.chat.ChatSaveReq;
 import com.yolo.keyborad.model.dto.chat.ChatStreamMessage;
 import com.yolo.keyborad.model.vo.AudioTaskVO;
 import com.yolo.keyborad.model.vo.ChatMessageVO;
 import com.yolo.keyborad.model.vo.ChatVoiceVO;
 import com.yolo.keyborad.service.ChatService;
 import com.yolo.keyborad.service.impl.QdrantVectorService;
 import io.qdrant.client.grpc.JsonWithInt;
@@ -46,19 +49,30 @@ public class ChatController {
    @PostMapping("/message")
-    @Operation(summary = "同步对话", description = "发送消息给大模型，同步返回回复")
+    @Operation(summary = "同步对话", description = "发送消息给大模型，同步返回 AI 响应，异步生成音频")
-    public BaseResponse<String> message(@RequestParam("content") String content) {
+    public BaseResponse<ChatMessageVO> message(@RequestParam("content") String content) {
        if (StrUtil.isBlank(content)) {
            throw new BusinessException(ErrorCode.PARAMS_ERROR, "消息内容不能为空");
        }
        String userId = StpUtil.getLoginIdAsString();
-        String response = chatService.message(content, userId);
+        ChatMessageVO result = chatService.message(content, userId);
-        return ResultUtils.success(response);
+        return ResultUtils.success(result);
    }
    @GetMapping("/audio/{audioId}")
    @Operation(summary = "查询音频状态", description = "根据音频 ID 查询音频生成状态和 URL")
    public BaseResponse<AudioTaskVO> getAudioTask(@PathVariable("audioId") String audioId) {
        if (StrUtil.isBlank(audioId)) {
            throw new BusinessException(ErrorCode.PARAMS_ERROR, "音频 ID 不能为空");
        }
        AudioTaskVO result = chatService.getAudioTask(audioId);
        return ResultUtils.success(result);
    }
    @PostMapping("/talk")
    @Operation(summary = "聊天润色接口", description = "聊天润色接口")
    public Flux<ServerSentEvent<ChatStreamMessage>> talk(@RequestBody ChatReq chatReq){
--- a/src/main/java/com/yolo/keyborad/model/vo/AudioTaskVO.java
+++ b/src/main/java/com/yolo/keyborad/model/vo/AudioTaskVO.java
@@ -0,0 +1,37 @@
 package com.yolo.keyborad.model.vo;
 import io.swagger.v3.oas.annotations.media.Schema;
 import lombok.AllArgsConstructor;
 import lombok.Builder;
 import lombok.Data;
 import lombok.NoArgsConstructor;
 /**
 * 音频任务状态
 *
 * @author ziin
 */
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Schema(description = "音频任务状态")
 public class AudioTaskVO {
    @Schema(description = "音频任务 ID")
    private String audioId;
    @Schema(description = "任务状态: pending/processing/completed/failed")
    private String status;
    @Schema(description = "音频 URL (completed 时返回)")
    private String audioUrl;
    @Schema(description = "错误信息 (failed 时返回)")
    private String errorMessage;
    public static final String STATUS_PENDING = "pending";
    public static final String STATUS_PROCESSING = "processing";
    public static final String STATUS_COMPLETED = "completed";
    public static final String STATUS_FAILED = "failed";
 }
--- a/src/main/java/com/yolo/keyborad/model/vo/ChatMessageVO.java
+++ b/src/main/java/com/yolo/keyborad/model/vo/ChatMessageVO.java
@@ -0,0 +1,29 @@
 package com.yolo.keyborad.model.vo;
 import io.swagger.v3.oas.annotations.media.Schema;
 import lombok.AllArgsConstructor;
 import lombok.Builder;
 import lombok.Data;
 import lombok.NoArgsConstructor;
 /**
 * 消息响应（含异步音频）
 *
 * @author ziin
 */
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Schema(description = "消息响应")
 public class ChatMessageVO {
    @Schema(description = "AI 响应文本")
    private String aiResponse;
    @Schema(description = "音频任务 ID，用于查询音频状态")
    private String audioId;
    @Schema(description = "LLM 耗时（毫秒）")
    private Long llmDuration;
 }
--- a/src/main/java/com/yolo/keyborad/model/vo/ChatVoiceVO.java
+++ b/src/main/java/com/yolo/keyborad/model/vo/ChatVoiceVO.java
@@ -0,0 +1,32 @@
 package com.yolo.keyborad.model.vo;
 import io.swagger.v3.oas.annotations.media.Schema;
 import lombok.AllArgsConstructor;
 import lombok.Builder;
 import lombok.Data;
 import lombok.NoArgsConstructor;
 /**
 * 语音对话响应
 *
 * @author ziin
 */
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Schema(description = "语音对话响应")
 public class ChatVoiceVO {
    @Schema(description = "用户输入内容")
    private String content;
    @Schema(description = "AI 响应文本")
    private String aiResponse;
    @Schema(description = "AI 语音音频 URL (R2)")
    private String audioUrl;
    @Schema(description = "处理耗时（毫秒）")
    private Long duration;
 }
--- a/src/main/java/com/yolo/keyborad/model/vo/TextToSpeechVO.java
+++ b/src/main/java/com/yolo/keyborad/model/vo/TextToSpeechVO.java
@@ -0,0 +1,26 @@
 package com.yolo.keyborad.model.vo;
 import io.swagger.v3.oas.annotations.media.Schema;
 import lombok.AllArgsConstructor;
 import lombok.Builder;
 import lombok.Data;
 import lombok.NoArgsConstructor;
 /**
 * TTS 语音合成结果
 *
 * @author ziin
 */
@Data
@Builder
@NoArgsConstructor
@AllArgsConstructor
@Schema(description = "TTS 语音合成结果")
 public class TextToSpeechVO {
    @Schema(description = "音频 Base64")
    private String audioBase64;
    @Schema(description = "音频 URL (R2)")
    private String audioUrl;
 }
--- a/src/main/java/com/yolo/keyborad/service/ChatService.java
+++ b/src/main/java/com/yolo/keyborad/service/ChatService.java
@@ -2,6 +2,9 @@ package com.yolo.keyborad.service;
 import com.yolo.keyborad.model.dto.chat.ChatReq;
 import com.yolo.keyborad.model.dto.chat.ChatStreamMessage;
 import com.yolo.keyborad.model.vo.AudioTaskVO;
 import com.yolo.keyborad.model.vo.ChatMessageVO;
 import com.yolo.keyborad.model.vo.ChatVoiceVO;
 import org.springframework.http.codec.ServerSentEvent;
 import reactor.core.publisher.Flux;
@@ -13,11 +16,20 @@ public interface ChatService {
    Flux<ServerSentEvent<ChatStreamMessage>> talk(ChatReq chatReq);
    /**
-     * 同步对话
+     * 同步对话（异步生成音频）
     *
     * @param content 用户消息内容
     * @param userId  用户ID
-     * @return AI 响应
+     * @return AI 响应 + 音频任务 ID
     */
-    String message(String content, String userId);
+    ChatMessageVO message(String content, String userId);
    /**
     * 查询音频任务状态
     *
     * @param audioId 音频任务 ID
     * @return 音频任务状态
     */
    AudioTaskVO getAudioTask(String audioId);
 }
--- a/src/main/java/com/yolo/keyborad/service/ElevenLabsService.java
+++ b/src/main/java/com/yolo/keyborad/service/ElevenLabsService.java
@@ -0,0 +1,28 @@
 package com.yolo.keyborad.service;
 import com.yolo.keyborad.model.vo.TextToSpeechVO;
 /**
 * ElevenLabs TTS 语音合成服务接口
 *
 * @author ziin
 */
 public interface ElevenLabsService {
    /**
     * 将文本转换为语音（带时间戳）
     *
     * @param text 要转换的文本
     * @return 语音合成结果，包含 base64 音频
     */
    TextToSpeechVO textToSpeechWithTimestamps(String text);
    /**
     * 将文本转换为语音（带时间戳），使用指定语音
     *
     * @param text    要转换的文本
     * @param voiceId 语音 ID
     * @return 语音合成结果
     */
    TextToSpeechVO textToSpeechWithTimestamps(String text, String voiceId);
 }
--- a/src/main/java/com/yolo/keyborad/service/impl/ChatServiceImpl.java
+++ b/src/main/java/com/yolo/keyborad/service/impl/ChatServiceImpl.java
@@ -14,21 +14,34 @@ import com.yolo.keyborad.model.entity.KeyboardCharacter;
 import com.yolo.keyborad.model.entity.KeyboardUser;
 import com.yolo.keyborad.model.entity.KeyboardUserCallLog;
 import com.yolo.keyborad.model.entity.KeyboardUserQuotaTotal;
 import com.yolo.keyborad.model.vo.AudioTaskVO;
 import com.yolo.keyborad.model.vo.ChatMessageVO;
 import com.yolo.keyborad.model.vo.ChatVoiceVO;
 import com.yolo.keyborad.model.vo.TextToSpeechVO;
 import com.yolo.keyborad.service.*;
 import jakarta.annotation.Resource;
 import lombok.extern.slf4j.Slf4j;
 import org.dromara.x.file.storage.core.FileInfo;
 import org.dromara.x.file.storage.core.FileStorageService;
 import org.springframework.ai.chat.client.ChatClient;
 import org.springframework.ai.openai.OpenAiChatOptions;
 import org.springframework.data.redis.core.StringRedisTemplate;
 import org.springframework.http.codec.ServerSentEvent;
 import org.springframework.scheduling.annotation.Async;
 import org.springframework.stereotype.Service;
 import reactor.core.publisher.Flux;
 import reactor.core.publisher.Mono;
 import reactor.core.scheduler.Schedulers;
 import java.math.BigDecimal;
 import java.io.ByteArrayInputStream;
 import java.util.ArrayList;
 import java.util.Base64;
 import java.util.Date;
 import java.util.List;
 import java.util.UUID;
 import java.util.concurrent.CompletableFuture;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.concurrent.atomic.AtomicReference;
@@ -61,6 +74,18 @@ public class ChatServiceImpl implements ChatService {
    @Resource
    private UserService userService;
    @Resource
    private ElevenLabsService elevenLabsService;
    @Resource
    private FileStorageService fileStorageService;
    @Resource
    private StringRedisTemplate stringRedisTemplate;
    private static final String AUDIO_TASK_PREFIX = "audio:task:";
    private static final long AUDIO_TASK_EXPIRE_SECONDS = 3600; // 1小时过期
    private final NacosAppConfigCenter.DynamicAppConfig cfgHolder;
    public ChatServiceImpl(NacosAppConfigCenter.DynamicAppConfig cfgHolder) {
@@ -323,18 +348,43 @@ public class ChatServiceImpl implements ChatService {
    }
    /**
-     * 同步对话
+     * 同步对话（异步生成音频）
     *
     * @param content 用户消息内容
     * @param userId  用户ID
-     * @return AI 响应
+     * @return AI 响应 + 音频任务 ID
     */
    @Override
-    public String message(String content, String userId) {
+    public ChatMessageVO message(String content, String userId) {
        log.info("同步对话请求, userId: {}, content: {}", userId, content);
        long startTime = System.currentTimeMillis();
        String response = callLLM(content);
        long llmDuration = System.currentTimeMillis() - startTime;
        log.info("LLM 完成, userId: {}, 耗时: {}ms, 响应长度: {}", userId, llmDuration, response.length());
        // 生成音频任务 ID
        String audioId = UUID.randomUUID().toString().replace("-", "");
        // 初始化音频任务状态为 processing
        setAudioTaskStatus(audioId, AudioTaskVO.STATUS_PROCESSING, null, null);
        // 异步执行 TTS + R2 上传
        CompletableFuture.runAsync(() -> processAudioAsync(audioId, response, userId));
        return ChatMessageVO.builder()
                .aiResponse(response)
                .audioId(audioId)
                .llmDuration(llmDuration)
                .build();
    }
    /**
     * 调用 LLM 生成响应
     */
    private String callLLM(String content) {
        String systemPrompt = """
                ROLE: You are "Tiffany", a 17-year-old popular high school student from Beverly Hills.
                You are sassy, judgmental, impatient, but secretly caring.
@@ -347,16 +397,103 @@ public class ChatServiceImpl implements ChatService {
                Keep responses concise (2-3 sentences max) for real-time conversation.
                """;
-        String response = client
+        return client
                .prompt()
                .system(systemPrompt)
                .user(content)
                .call()
                .content();
    }
-        long duration = System.currentTimeMillis() - startTime;
+    /**
-        log.info("同步对话完成, userId: {}, 耗时: {}ms, 响应长度: {}", userId, duration, response.length());
+     * 异步处理音频：TTS 转换 + 上传 R2
     */
    private void processAudioAsync(String audioId, String text, String userId) {
        try {
            log.info("开始异步音频处理, audioId: {}", audioId);
            long startTime = System.currentTimeMillis();
-        return response;
+            // 1. TTS 转换
            long ttsStart = System.currentTimeMillis();
            TextToSpeechVO ttsResult = elevenLabsService.textToSpeechWithTimestamps(text);
            long ttsDuration = System.currentTimeMillis() - ttsStart;
            log.info("TTS 完成, audioId: {}, 耗时: {}ms", audioId, ttsDuration);
            // 2. 上传到 R2
            long uploadStart = System.currentTimeMillis();
            String audioUrl = uploadAudioToR2(ttsResult.getAudioBase64(), userId);
            long uploadDuration = System.currentTimeMillis() - uploadStart;
            log.info("R2 上传完成, audioId: {}, 耗时: {}ms, URL: {}", audioId, uploadDuration, audioUrl);
            // 3. 更新任务状态为完成
            setAudioTaskStatus(audioId, AudioTaskVO.STATUS_COMPLETED, audioUrl, null);
            long totalDuration = System.currentTimeMillis() - startTime;
            log.info("异步音频处理完成, audioId: {}, 总耗时: {}ms (TTS: {}ms, Upload: {}ms)",
                    audioId, totalDuration, ttsDuration, uploadDuration);
        } catch (Exception e) {
            log.error("异步音频处理失败, audioId: {}", audioId, e);
            setAudioTaskStatus(audioId, AudioTaskVO.STATUS_FAILED, null, e.getMessage());
        }
    }
    /**
     * 设置音频任务状态
     */
    private void setAudioTaskStatus(String audioId, String status, String audioUrl, String errorMessage) {
        String key = AUDIO_TASK_PREFIX + audioId;
        String value = status + "|" + (audioUrl != null ? audioUrl : "") + "|" + (errorMessage != null ? errorMessage : "");
        stringRedisTemplate.opsForValue().set(key, value, AUDIO_TASK_EXPIRE_SECONDS, TimeUnit.SECONDS);
    }
    /**
     * 查询音频任务状态
     */
    @Override
    public AudioTaskVO getAudioTask(String audioId) {
        String key = AUDIO_TASK_PREFIX + audioId;
        String value = stringRedisTemplate.opsForValue().get(key);
        if (cn.hutool.core.util.StrUtil.isBlank(value)) {
            return AudioTaskVO.builder()
                    .audioId(audioId)
                    .status(AudioTaskVO.STATUS_PENDING)
                    .build();
        }
        String[] parts = value.split("\\|", -1);
        return AudioTaskVO.builder()
                .audioId(audioId)
                .status(parts[0])
                .audioUrl(parts.length > 1 && !parts[1].isEmpty() ? parts[1] : null)
                .errorMessage(parts.length > 2 && !parts[2].isEmpty() ? parts[2] : null)
                .build();
    }
    /**
     * 上传音频到 R2
     */
    private String uploadAudioToR2(String audioBase64, String userId) {
        if (cn.hutool.core.util.StrUtil.isBlank(audioBase64)) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "音频数据为空");
        }
        byte[] audioBytes = Base64.getDecoder().decode(audioBase64);
        String fileName = UUID.randomUUID() + ".mp3";
        FileInfo fileInfo = fileStorageService.of(new ByteArrayInputStream(audioBytes))
                .setPath(userId + "/")
                .setPlatform("cloudflare-r2")
                .setSaveFilename(fileName)
                .setOriginalFilename(fileName)
                .upload();
        if (fileInfo == null || cn.hutool.core.util.StrUtil.isBlank(fileInfo.getUrl())) {
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "音频上传失败");
        }
        return fileInfo.getUrl();
    }
 }
--- a/src/main/java/com/yolo/keyborad/service/impl/ElevenLabsServiceImpl.java
+++ b/src/main/java/com/yolo/keyborad/service/impl/ElevenLabsServiceImpl.java
@@ -0,0 +1,175 @@
 package com.yolo.keyborad.service.impl;
 import cn.hutool.core.util.StrUtil;
 import com.alibaba.fastjson.JSON;
 import com.alibaba.fastjson.JSONObject;
 import com.yolo.keyborad.common.ErrorCode;
 import com.yolo.keyborad.config.ElevenLabsProperties;
 import com.yolo.keyborad.exception.BusinessException;
 import com.yolo.keyborad.model.vo.TextToSpeechVO;
 import com.yolo.keyborad.service.ElevenLabsService;
 import jakarta.annotation.Resource;
 import lombok.extern.slf4j.Slf4j;
 import org.springframework.stereotype.Service;
 import java.io.ByteArrayOutputStream;
 import java.io.InputStream;
 import java.io.OutputStream;
 import java.net.HttpURLConnection;
 import java.net.URL;
 import java.nio.charset.StandardCharsets;
 import java.util.HashMap;
 import java.util.Map;
 /**
 * ElevenLabs TTS 语音合成服务实现
 * 参考: https://elevenlabs.io/docs/api-reference/text-to-speech/convert-with-timestamps
 *
 * @author ziin
 */
@Service
@Slf4j
 public class ElevenLabsServiceImpl implements ElevenLabsService {
    @Resource
    private ElevenLabsProperties elevenLabsProperties;
    private static final int MAX_TEXT_LENGTH = 5000;
    @Override
    public TextToSpeechVO textToSpeechWithTimestamps(String text) {
        return textToSpeechWithTimestamps(text, elevenLabsProperties.getVoiceId());
    }
    @Override
    public TextToSpeechVO textToSpeechWithTimestamps(String text, String voiceId) {
        // 1. 参数验证
        if (StrUtil.isBlank(text)) {
            throw new BusinessException(ErrorCode.PARAMS_ERROR, "文本内容不能为空");
        }
        if (text.length() > MAX_TEXT_LENGTH) {
            throw new BusinessException(ErrorCode.PARAMS_ERROR,
                    "文本长度超出限制，最大支持 " + MAX_TEXT_LENGTH + " 字符");
        }
        if (StrUtil.isBlank(voiceId)) {
            voiceId = elevenLabsProperties.getVoiceId();
        }
        HttpURLConnection connection = null;
        try {
            // 2. 构建请求 URL
            String requestUrl = buildRequestUrl(voiceId);
            URL url = new URL(requestUrl);
            // 3. 创建连接
            connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("POST");
            connection.setDoOutput(true);
            connection.setDoInput(true);
            connection.setConnectTimeout(30000);
            connection.setReadTimeout(60000);
            // 4. 设置请求头
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setRequestProperty("xi-api-key", elevenLabsProperties.getApiKey());
            // 5. 构建请求体
            Map<String, Object> requestBody = buildRequestBody(text);
            String jsonBody = JSON.toJSONString(requestBody);
            log.info("调用 ElevenLabs TTS API, voiceId: {}, 文本长度: {}", voiceId, text.length());
            long startTime = System.currentTimeMillis();
            // 6. 发送请求
            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes(StandardCharsets.UTF_8);
                os.write(input, 0, input.length);
            }
            // 7. 获取响应
            int responseCode = connection.getResponseCode();
            long duration = System.currentTimeMillis() - startTime;
            log.info("ElevenLabs TTS API 响应码: {}, 耗时: {}ms", responseCode, duration);
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // 读取响应 JSON
                try (InputStream is = connection.getInputStream();
                     ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
                    byte[] buffer = new byte[8192];
                    int bytesRead;
                    while ((bytesRead = is.read(buffer)) != -1) {
                        baos.write(buffer, 0, bytesRead);
                    }
                    String responseJson = baos.toString(StandardCharsets.UTF_8);
                    JSONObject jsonResponse = JSON.parseObject(responseJson);
                    String audioBase64 = jsonResponse.getString("audio_base64");
                    log.info("语音合成成功，Base64长度: {}", audioBase64.length());
                    return TextToSpeechVO.builder()
                            .audioBase64(audioBase64)
                            .build();
                }
            } else {
                // 读取错误信息
                String errorMsg = "";
                try (InputStream es = connection.getErrorStream()) {
                    if (es != null) {
                        ByteArrayOutputStream baos = new ByteArrayOutputStream();
                        byte[] buffer = new byte[1024];
                        int bytesRead;
                        while ((bytesRead = es.read(buffer)) != -1) {
                            baos.write(buffer, 0, bytesRead);
                        }
                        errorMsg = baos.toString(StandardCharsets.UTF_8);
                    }
                }
                log.error("ElevenLabs TTS API 调用失败, 状态码: {}, 错误信息: {}", responseCode, errorMsg);
                throw new BusinessException(ErrorCode.SYSTEM_ERROR, "语音合成服务异常: " + responseCode);
            }
        } catch (BusinessException e) {
            throw e;
        } catch (Exception e) {
            log.error("调用 ElevenLabs TTS API 发生异常", e);
            throw new BusinessException(ErrorCode.SYSTEM_ERROR, "语音合成服务异常: " + e.getMessage());
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }
    /**
     * 构建 ElevenLabs TTS API 请求 URL（带时间戳）
     */
    private String buildRequestUrl(String voiceId) {
        StringBuilder url = new StringBuilder(elevenLabsProperties.getBaseUrl());
        url.append("/text-to-speech/").append(voiceId).append("/with-timestamps");
        url.append("?output_format=").append(elevenLabsProperties.getOutputFormat());
        return url.toString();
    }
    /**
     * 构建请求体
     */
    private Map<String, Object> buildRequestBody(String text) {
        Map<String, Object> requestBody = new HashMap<>();
        requestBody.put("text", text);
        requestBody.put("model_id", elevenLabsProperties.getModelId());
        // 设置语音参数
        Map<String, Object> voiceSettings = new HashMap<>();
        voiceSettings.put("stability", elevenLabsProperties.getStability());
        voiceSettings.put("similarity_boost", elevenLabsProperties.getSimilarityBoost());
        voiceSettings.put("style", elevenLabsProperties.getStyle());
        voiceSettings.put("speed", elevenLabsProperties.getSpeed());
        voiceSettings.put("use_speaker_boost", elevenLabsProperties.getUseSpeakerBoost());
        requestBody.put("voice_settings", voiceSettings);
        return requestBody;
    }
 }
--- a/src/main/resources/application-dev.yml
+++ b/src/main/resources/application-dev.yml
@@ -70,13 +70,13 @@ dromara:
        base-path: avatar/ # 基础路径
      - platform: cloudflare-r2-apac # 存储平台标识
        enable-storage: true  # 启用存储
-        access-key: 550b33cc4d53e05c2e438601f8a0e209
+        access-key: eda135fe4fda649acecfa4bb49b0c30c
-        secret-key: df4d529cdae44e6f614ca04f4dc0f1f9a299e57367181243e8abdc7f7c28e99a
+        secret-key: ee557acaccf44caef985b5cac89db311a0923c72c9f4b8c5f32089c6ebb47a79
        region: APAC # 区域
        end-point: https://b632a61caa85401f63c9b32eef3a74c8.r2.cloudflarestorage.com/keyboardtest # 端点
        bucket-name: keyboardtest #桶名称
        domain: https://cdn.loveamorkey.com/ # 访问域名，注意末尾的'/'，例如：https://abcd.s3.ap-east-1.amazonaws.com/
-        base-path: / # 基础路径
+        base-path: tts/ # 基础路径
 ############## Sa-Token 配置 (参考文档: https://sa-token.cc) ##############
 sa-token:
@@ -100,3 +100,9 @@ nacos:
    server-addr: 127.0.0.1:8848
    group: DEFAULT_GROUP
    data-id: keyboard_default-config.yaml
 elevenlabs:
  api-key: sk_25339d32bb14c91f460ed9fce83a1951672f07846a7a10ce
  voice-id: JBFqnCBsd6RMkjVDRZzb
  model-id: eleven_turbo_v2_5
  output-format: mp3_44100_128
--- a/src/main/resources/application.yml
+++ b/src/main/resources/application.yml
@@ -1,13 +1,13 @@
 spring:
  ai:
    openai:
-#      api-key: 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
+      api-key: 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
-      api-key: sk-cf112f49cf4d4138a49575cda1f852b4
+#      api-key: sk-cf112f49cf4d4138a49575cda1f852b4
-#      base-url: https://gateway.ai.cloudflare.com/v1/b632a61caa85401f63c9b32eef3a74c8/aigetway/openrouter
+      base-url: https://gateway.ai.cloudflare.com/v1/b632a61caa85401f63c9b32eef3a74c8/aigetway/openrouter
-      base-url: https://dashscope-intl.aliyuncs.com/compatible-mode/
+#      base-url: https://dashscope-intl.aliyuncs.com/compatible-mode/
      chat:
        options:
-          model: qwen-plus
+          model: google/gemini-2.5-flash-lite
      embedding:
        options:
          model: text-embedding-v4