TY - JOUR
T1 - Learning Visual Representations for Perception-Action Systems
AU - Piater, Justus
AU - Jodogne, Sebastien
AU - Detry, Renaud
AU - Kraft, Dirk
AU - Krüger, Norbert
AU - Kroemer, Oliver
AU - Peters, Jan
PY - 2011
Y1 - 2011
N2 - We discuss vision as a sensory modality for systems that effect actions in response to perceptions. While the internal representations informed by vision may be arbitrarily complex, we argue that in many cases it is advantageous to link them rather directly to action via learned mappings. These arguments are illustrated by two examples of our own work. First, our RLVC algorithm performs reinforcement learning directly on the visual input space. To make this very large space manageable, RLVC interleaves the reinforcement learner with a supervised classification algorithm that seeks to split perceptual states so as to reduce perceptual aliasing. This results in an adaptive discretization of the perceptual space based on the presence or absence of visual features. Its extension RLJC also handles continuous action spaces. In contrast to the minimalistic visual representations produced by RLVC and RLJC, our second method learns structural object models for robust object detection and pose estimation by probabilistic inference. To these models, the method associates grasp experiences autonomously learned by trial and error. These experiences form a nonparametric representation of grasp success likelihoods over gripper poses, which we call a grasp density. Thus, object detection in a novel scene simultaneously produces suitable grasping options.
AB - We discuss vision as a sensory modality for systems that effect actions in response to perceptions. While the internal representations informed by vision may be arbitrarily complex, we argue that in many cases it is advantageous to link them rather directly to action via learned mappings. These arguments are illustrated by two examples of our own work. First, our RLVC algorithm performs reinforcement learning directly on the visual input space. To make this very large space manageable, RLVC interleaves the reinforcement learner with a supervised classification algorithm that seeks to split perceptual states so as to reduce perceptual aliasing. This results in an adaptive discretization of the perceptual space based on the presence or absence of visual features. Its extension RLJC also handles continuous action spaces. In contrast to the minimalistic visual representations produced by RLVC and RLJC, our second method learns structural object models for robust object detection and pose estimation by probabilistic inference. To these models, the method associates grasp experiences autonomously learned by trial and error. These experiences form a nonparametric representation of grasp success likelihoods over gripper poses, which we call a grasp density. Thus, object detection in a novel scene simultaneously produces suitable grasping options.
U2 - 10.1177/0278364910382464
DO - 10.1177/0278364910382464
M3 - Journal article
SN - 0278-3649
VL - 30
SP - 294
EP - 307
JO - The International Journal of Robotics Research
JF - The International Journal of Robotics Research
IS - 3
ER -