关键词 "sound recognition" 的搜索结果, 共 11 条, 只显示前 480 条
MCP server for image recognition with Angular mobile client app.
originally was going to be an mcp server, now it's a stupid soundcloud scraper
Claude 3.7 Swarm with Field Coherence: A Model Context Protocol (MCP) server that orchestrates multiple specialized Claude 3.7 Sonnet instances in a quantum-inspired swarm. It creates a field coherenc
A Model Context Protocol (MCP) server that provides ASR(Automatic Speech Recognition) capabilities using the whisper engine. This server exposes TTS functionality through MCP tools, making it easy to
A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities.
Recognize whether two sets of data are from the same entity.
An MCP server that provides image recognition 👀 capabilities using Anthropic and OpenAI vision APIs
<p>Overview Spark-TTS 是由出门问问(Mobvoi)联合多所顶尖学术机构(如香港科技大学、上海交通大学)最新推出的新一代语音合成模型,其核心创新在于BiCodec编码技术和与文本大模型的结构统一性,利用大型语言模型 (LLM) 的强大功能实现高度准确且自然的语音合成。</p> <p>Spark-TTS is an advanced text
An AI text humanizer transforms AI-generated content into natural, human-like text. It adds flow, uses conversational phrasing, and avoids robotic language. Our humanization tool helps create engaging
ThinkSound是阿里通义语音团队推出的首个CoT(链式思考)音频生成模型,用在视频配音,为每一帧画面生成专属匹配音效。模型引入CoT推理,解决传统技术难以捕捉画面动态细节和空间关系的问题,让AI像专业音效师一样逐步思考,生成音画同步的高保真音频。模型基于三阶思维链驱动音频生成,包括基础音效推理、对象级交互和指令编辑。模型配备AudioCoT数据集,包含带思维链标注的音频数据。在VGGSoun
只显示前20页数据,更多请搜索
Showing 337 to 347 of 347 results