关键词 "multi-modal input" 的搜索结果, 共 6 条, 只显示前 480 条
A Streamlit-based chatbot interface powered by OpenAI GPT-4o that intelligently routes user input to custom MCP tools such as GPT chat, image generation, Supabase queries, and text-to-speech.
MCP Server for skrape.ai, lets you input any URL and it returns clean markdown for the LLM
A deliberately vulnerable MCP server demonstrating command injection flaws. This Python implementation shows how lack of input sanitization in file paths leads to critical security vulnerabilities all
It consistently responds with "Ranger!" to any MCP tool request it receives via standard input/output.
MCP server that can execute commands such as keyboard input and mouse movement on macOS
MSQA(Multi-modal Situated Question Answering)是大规模多模态情境推理数据集,提升具身AI代理在3D场景中的理解与推理能力。数据集包含251K个问答对,覆盖9个问题类别,基于3D场景图和视觉-语言模型在真实世界3D场景中收集。MSQA用文本、图像和点云的交错多模态输入,减少单模态输入的歧义。引入MSNN(Multi-modal Next-step Navi
只显示前20页数据,更多请搜索
Showing 73 to 78 of 78 results