本文同步于微信公众号:3D视觉前沿,欢迎大家关注。
本文总结了基于视觉的机器人抓取的相关论文及代码,同步于 GitHub。
机器人抓取必需的信息是相机系下抓取器的6DoF位姿,包括抓取器的3D位置和抓取器的3D空间朝向。通过控制机械臂移动使抓取器到该位置和旋转,就可以执行抓取操作。基于视觉的机器人抓取,是指给机器人安装RGB-D相机,通过人工智能算法,获得抓取器的目标抓取位姿。按照不同的抓取方式,可以分为 2D平面抓取 和 6D空间抓取。
2D平面抓取 是指目标物体放置在水平工作台上,抓取器只能从一个方向进行抓取。由于存在这样的限制,抓取器的6D位姿简化为3D,包含平面内的2D位置和平面内的1D旋转角度。这类方法存在两类,一类通过 评估抓取接触点的质量,一类 评估带朝向的抓取四边形。
6D空间抓取 是指抓取器可以在3D空间从各个角度抓取目标物体,此时抓取器的6D位姿不能简化。按照依赖物体的完整形状还是物体的部分点云,可以将方法分为基于部分点云的方法和基于完整形状的方法两类。基于部分点云的方法 包括评估候选抓取位姿的方法和从已有抓取库中迁移抓取的方法。基于完整形状的方法 包括评估物体6D位姿的方法和基于形状补全的方法。当前大多数6D空间抓取方法都是针对已知3D模型的物体,这些物体的最优抓取位置可以通过人工指定或者仿真预先得到,此时,问题转化为估计物体的6D位姿。
另外,当前大部分机器人抓取方法需要先从输入数据中获得目标物体的位置,这可以分为三个层次,物体定位但不识别、物体检测、物体实例分割。物体定位但不识别是指获得目标物体的2D/3D范围但是不知道物体的类别;目标检测是指得到目标物体的2D/3D包围盒,同时识别目标物体的类别;目标实例分割提供目标物体所占有的像素或者点级别的区域信息,同时识别目标物体的类别。
在这里对以上涉及的所有技术进行了归类整理,有传统方法和深度学习的方法,有基于2D和基于3D的方法,也包含一些相关的技术例如3D重建、形状补全、深度图估计、数据生成、灵巧手、强化学习等等,列出了最新的论文及链接,并且保持每周更新,希望能够对这个领域内的朋友们有帮助。总结的中文框架结构如下:
0. 综述文章
1. 物体定位
1.1 定位不识别-包括基于2D/3D数据,拟合形状基元或检测显著性区域
1.2 目标检测-包括基于2D/3D数据,进行两阶段和单阶段目标检测
1.3 目标实例分割-包括基于2D/3D数据,进行两阶段和单阶段目标实例分割
2. 物体6D位姿估计
2.1 基于RGB-D图像的方法-包括基于对应、模板或者投票的方法
2.2 基于点云的方法-包括基于对应、模板或者投票的方法
2.3 类别级位姿估计方法-包括类别级方法、基于图像的3D重建以及3D渲染
3. 2D平面抓取
3.1 估计抓取接触点的方法
3.2 估计带朝向四边形的方法
4. 6D空间抓取
4.1 基于单个视角点云的方法-包括评估候选抓取位姿的方法和迁移抓取经验的方法
4.2 基于完整形状的方法-包括估计物体6D位姿的方法和基于形状补全的方法
5.任务导向的抓取-包括任务导向的抓取、抓取支撑以及3D部件分割
6.灵巧手
7.数据生成-包括虚拟到真实,以及自监督方法
8.多源信息
9.动作规划-包括视觉伺服和路径规划
10.模仿学习
11.强化学习
12.领域专家
具体相关论文如下:
Vision-based Robotic Grasping: Papers and Codes
0. Review Papers
[arXiv] 2020-Affordances in Robotic Tasks - A Survey, [paper]
[arXiv] 2019-A Review of Robot Learning for Manipulation- Challenges, Representations, and Algorithms, [paper]
[arXiv] 2019-Vision-based Robotic Grasping from Object Localization, Pose Estimation, Grasp Detection to Motion Planning: A Review, [paper]
[MTI] 2018-Review of Deep Learning Methods in Robotic Grasp Detection, [paper]
[ToR] 2016-Data-Driven Grasp Synthesis - A Survey, [paper]
[RAS] 2012-An overview of 3D object grasp synthesis algorithms - A Survey, [paper]
1. Object Localization
1.1 Object Localization without Classification
1.1.1 2D-based Methods
a.Fitting 2D Shape Primitives
[BMVC] A buyer’s guide to conic fitting, [paper] [code]
[IJGIG] Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, [paper] [code]
b. Saliency Detection
Survey papers:
[arXiv] 2019-Salient object detection in the deep learning era: An in-depth survey, [paper]
[CVM] 2014-Salient object detection: A survey, [paper]
2020:
[arXiv] UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders, [paper]
[arXiv] Cross-layer Feature Pyramid Network for Salient Object Detection, [paper]
[arXiv] Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection, [paper]
[arXiv] Weakly-Supervised Salient Object Detection via Scribble Annotations, [paper]
[arXiv] Highly Efficient Salient Object Detection with 100K Parameters, [paper]
[arXiv] Global Context-Aware Progressive Aggregation Network for Salient Object Detection, [paper]
[arXiv] Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection, [paper]
2019:
[ICCV] Employing deep part-object relationships for salient object detection, [paper]
[ICME] Multi-scale capsule attention-based salient object detection with multi-crossed layer connections, [paper]
2018:
[CVPR] Picanet: Learning pixel-wise contextual attention for saliency detection, [paper]
[SPM] Advanced deep-learning techniques for salient and category-specific object detection: a survey, [paper]
2017:
[CVPR] Deeply supervised salient object detection with short connections, [paper]
[TOC] Video saliency detection using object proposals, [paper]
2016:
[CVPR] Unconstrained salient object detection via proposal subset optimization, [paper]
[CVPR] Deep hierarchical saliency network for salient object detection, [paper]
[TPAMI] Salient object detection via structured matrix decomposition, [paper]
[TIP] Correspondence driven saliency transfer, [paper]
2015:
[CVPR] Saliency detection by multi-context deep learning, [paper]
[TPAMI] Hierarchical image saliency detection on extended CSSD, [paper]
2014:
[CVPR] Saliency optimization from robust background detection, [paper]
[TPAMI] Global contrast based salient region detection, [paper]
2013:
[CVPR] Salient object detection: A discriminative regional feature integration approach, [paper]
[CVPR] Saliency detection via graph-based manifold ranking, [paper]
2012:
[ECCV] Geodesic saliency using background priors, [paper]
1.1.2 3D-based Methods
a.Fitting 3D Shape Primitives
Survey papers:
[CGF] 2019-A survey of simple geometric primitives detection methods for captured 3d data, [paper]
2020:
[arXiv] ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds, [paper]
2015:
[CVPR] Separating objects and clutter in indoor scenes, [paper]
2013:
[CVPR] A linear approach to matching cuboids in rgbd images, [paper]
2012:
[GCR] Robustly segmenting cylindrical and box-like objects in cluttered scenes using depth cameras, [paper]
2009:
[IROS] Close-range scene segmentation and reconstruction of 3d point cloud maps for mobile manipulation in domestic environments, [paper]
2005:
[ISPRS] Efficient hough transform for automatic detection of cylinders in point clouds, [paper]
1981:
[COM] Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, [paper]
b. Saliency Detection
2019:
[PR] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection, [paper]
[ICCV] Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection, [paper]
[ICCV] Pointcloud saliency maps, [paper]
[arXiv] CNN-based RGB-D Salient Object Detection: Learn, Select and Fuse, [paper]
2018:
[CVPR] Progressively complementarity-aware fusion network for RGB-D salient object detection, [paper]
2017:
[TIP] RGBD salient object detection via deep fusion, [paper]
2015:
[CVPRW] Exploiting global priors for RGB-D saliency detection, [paper]
2014:
[ECCV] Rgbd salient object detection: a benchmark and algorithms, [paper]
2013:
[JSIP] Segmenting salient objects in 3d point clouds of indoor scenes using geodesic distances, [paper]
2008:
[WACV] Segmentation of salient regions in outdoor scenes using imagery and 3-d data, [paper]
1.2 Object Detection
Detailed paper lists can refer to hoya012 or amusi.
1.2.1 2D Object Detection
Survey papers:
2020:
[arXiv] Deep Domain Adaptive Object Detection: a Survey, [paper]
[IJCV] Deep Learning for Generic Object Detection: A Survey, [paper]
2019:
[arXiv] Object Detection in 20 Years A Survey, [paper]
[arXiv] Object Detection with Deep Learning: A Review, [paper]
[arXiv] A Review of Object Detection Models based on Convolutional Neural Network, [paper]
[arXiv] A Review of methods for Textureless Object Recognition, [paper]
a. Two-stage methods
2020:
[arXiv] Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection, [paper]
[arXiv] Scalable Active Learning for Object Detection, [paper]
[arXiv] Any-Shot Object Detection, [paper]
[arXiv] Frustratingly Simple Few-Shot Object Detection, [paper]
[arXiv] Rethinking the Route Towards Weakly Supervised Object Localization, [paper]
[arXiv] Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN, [paper]
[arXiv] Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images, [paper]
[arXiv] PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal and NIR to Visible Image Transformation, [paper]
[arXiv] SpotNet: Self-Attention Multi-Task Network for Object Detection, [paper]
[arXiv] Real-Time Object Detection and Recognition on Low-Compute Humanoid Robots using Deep Learning, [paper]
[arXiv] FedVision: An Online Visual Object Detection Platform Powered by Federated Learning, [paper]
2019:
[arXiv] Combining Deep Learning and Verification for Precise Object Instance Detection, [paper]
[arXiv] cmSalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks, [paper]
[arXiv] OpenLORIS-Object: A Dataset and Benchmark towards Lifelong Object Recognition, [paper] [project]
[IROS] Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition, [paper]
[IROS] Recurrent Convolutional Fusion for RGB-D Object Recognition, [paper] [code]
[ICCVW] An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection, [paper]
2017:
[CVPR] FPN: Feature pyramid networks for object detection, [paper]
[arXiv] Light-Head R-CNN: In Defense of Two-Stage Object Detector, [paper] [code]
2016:
[NeurIPS] R-FCN: Object Detection via Region-based Fully Convolutional Networks, [paper] [code]
[TPAMI] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, [paper] [code]
[ECCV] Visual relationship detection with language priors, [paper]
2015:
[ICCV] Fast R-CNN, [paper] [code]
2014:
[ECCV] SPPNet: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, [paper] [code]
[CVPR] R-CNN: Rich feature hierarchies for accurate object detection and semantic segmentation, [paper] [code]
[CVPR] Scalable object detection using deep neural networks, [paper]
[arXiv] Scalable, high-quality object detection, [paper]
[ICLR] OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, [paper] [code]
2011:
[ICCV] ORB: An efficient alternative to SIFT or SURF, [paper]
2006:
[ECCV] SURF: Speeded up robust features, [paper]
2005:
[ICCV] FAST: Fusing points and lines for high performance tracking, [paper]
1999:
[ICCV] SIFT: Object Recognition from Local Scale-Invariant Features, [paper]
b. Single-stage methods
2020:
[arXiv] SaccadeNet: A Fast and Accurate Object Detector, [paper]
[arXiv] CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection, [paper]
[arXiv] Extended Feature Pyramid Network for Small Object Detection, [paper]
[arXiv] Real Time Detection of Small Objects, [paper]
[arXiv] OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features, [paper]
2019:
[arXiv] CenterNet: Objects as Points, [paper]
[arXiv] CenterNet: Keypoint Triplets for Object Detection, [paper]
[ECCV] CornerNet: Detecting Objects as Paired Keypoints, [paper]
[arXiv] FCOS: Fully Convolutional One-Stage Object Detection, [paper]
[arXiv] Bottom-up Object Detection by Grouping Extreme and Center Points, [paper]
2018:
[arXiv] YOLOv3: An Incremental Improvement, [paper] [code]
2017:
[CVPR] YOLO9000: Better, Faster, Stronger, [paper] [code]
[ICCV] RetinaNet: Focal loss for dense object detection, [paper]
2016:
[CVPR] YOLO: You only look once: Unified, real-time object detection, [paper] [code]
[ECCV] SSD: Single Shot MultiBox Detector, [paper] [code]
Dataset:
PASCAL VOC: The PASCAL Visual Object Classes (VOC) Challenge, [paper]
ILSVRC: ImageNet large scale visual recognition challenge, [paper]
Microsoft COCO: Common Objects in Context, is a large-scale object detection, segmentation, and captioning dataset, [paper]
Open Images: a collaborative release of ~9 million images annotated with labels spanning thousands of object categories, [paper]
1.2.2 3D Object Detection
This kind of methods can be divided into three kinds: RGB-based methods, point cloud-based methods, and fusion methods which consume images and point cloud. Most of these works are focus on autonomous driving.
a. RGB-based methods
Most of this kind of methods estimate depth images from RGB images, and then conduct 3D detection.
2020:
[arXiv] Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation, [paper]
[arXiv] End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection, [paper]
[arXiv] Confidence Guided Stereo 3D Object Detection with Split Depth Estimation, [paper]
[arXiv] Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras, [paper]
[arXiv] ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection, [paper]
[arXiv] MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships, [paper]
[arXiv] Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image, [paper]
[arXiv] SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation, [paper]
[arXiv] siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection, [paper]
[AAAI] Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation, [paper]
[arXiv] SDOD: Real-time Segmenting and Detecting 3D Objects by Depth, [paper]
[arXiv] DSGN: Deep Stereo Geometry Network for 3D Object Detection, [paper]
[arXiv] RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving, [paper]
2019:
[NeurIPS] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points, [paper]
[arXiv] Single-Stage Monocular 3D Object Detection with Virtual Cameras, [paper]
[arXiv] Environment reconstruction on depth images using Generative Adversarial Networks, [paper] [code]
[arXiv] Learning Depth-Guided Convolutions for Monocular 3D Object Detection, [paper]
[arXiv] RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving, [paper]
[IROS] Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition, [paper]
[arXiv] Task-Aware Monocular Depth Estimation for 3D Object Detection, [paper]
[CVPR] Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, [paper] [code]
[AAAI] MonoGRNet: A Geometric Reasoning Network for 3D Object Localization, [paper] [code]
[ICCV] Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving, [paper]
[ICCV] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection, [paper]
[ICCVW] Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud, [paper]
[arXiv] Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss, [paper]
[arXiv] Monocular 3D Object Detection via Geometric Reasoning on Keypoints, [paper]
b. Point cloud-based methods
This kind of methods only consume the 3D point cloud data.
Survey papers:
[arXiv] 2019-Deep Learning for 3D Point Clouds: A Survey, [paper]
2020:
[arXiv] MLCVNet: Multi-Level Context VoteNet for 3D Object Detection, [paper]
[arXiv] 3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds, [paper]
[arXiv] Finding Your (3D) Center: 3D Object Detection Using a Learned Loss, [paper]
[arXiv] LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention, [paper]
[arXiv] Quantifying Data Augmentation for LiDAR based 3D Object Detection, [paper]
[arXiv] DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes, [paper]
[arXiv] Improving 3D Object Detection through Progressive