关键词 "multi-modal inputs" 的搜索结果, 共 3 条, 只显示前 480 条
Users can easily generate high-quality images and customize unique 3D character models with just a few inputs. The platform supports multilingual input and is ideal for various use cases such as illus
MSQA(Multi-modal Situated Question Answering)是大规模多模态情境推理数据集,提升具身AI代理在3D场景中的理解与推理能力。数据集包含251K个问答对,覆盖9个问题类别,基于3D场景图和视觉-语言模型在真实世界3D场景中收集。MSQA用文本、图像和点云的交错多模态输入,减少单模态输入的歧义。引入MSNN(Multi-modal Next-step Navi
Wan Animate by Alibaba Wan2.2 enables animation of any character in videos. Supporting image and video inputs, it uses reference characters and motion to create custom animated videos. It accurately c
只显示前20页数据,更多请搜索
Showing 25 to 27 of 27 results