关键词 "multi-modal inputs" 的搜索结果, 共 2 条, 只显示前 480 条
Users can easily generate high-quality images and customize unique 3D character models with just a few inputs. The platform supports multilingual input and is ideal for various use cases such as illus
MSQA(Multi-modal Situated Question Answering)是大规模多模态情境推理数据集,提升具身AI代理在3D场景中的理解与推理能力。数据集包含251K个问答对,覆盖9个问题类别,基于3D场景图和视觉-语言模型在真实世界3D场景中收集。MSQA用文本、图像和点云的交错多模态输入,减少单模态输入的歧义。引入MSNN(Multi-modal Next-step Navi
只显示前20页数据,更多请搜索
Showing 25 to 26 of 26 results