30/11/2020

V2A - Vision to Action: Learning robotic arm actions based on vision and language

Michal Nazarczuk, Krystian Mikolajczyk

Keywords:

Abstract: In this work, we present a new AI task - Vision to Action (V2A) - where an agent (robotic arm) is asked to perform a high-level task with objects (e.g. stacking) present in a scene. The agent has to suggest a plan consisting of primitive actions (e.g. simple movement, grasping) in order to successfully complete the given task. Instructions are formulated in a way that forces the agent to perform visual reasoning over the presented scene before inferring the actions. We extend the recently introduced dataset SHOP-VRB with task instructions for each scene as well as an engine capable of assessing whether the sequence of primitives leads to a successful task completion. We also propose a novel approach based on multimodal attention for this task and demonstrate its performance on the new dataset.

The video of this talk cannot be embedded. You can watch it here:
https://accv2020.github.io/miniconf/poster_867.html
(Link will open in new window)
 0
 0
 0
 0
This is an embedded video. Talk and the respective paper are published at ACCV 2020 virtual conference. If you are one of the authors of the paper and want to manage your upload, see the question "My papertalk has been externally embedded..." in the FAQ section.

Comments

Post Comment
no comments yet
code of conduct: tbd Characters remaining: 140

Similar Papers