SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation
Abstract
Autonomous space operations such as on-orbit servicing and active debris removal demand robust part-level semantic understanding and precise relative navigation of target spacecraft, yet acquiring large-scale real data in orbit remains prohibitively expensive. Existing synthetic datasets, moreover, suffer from limited target diversity, single-modality sensing, and incomplete ground-truth annotations. To bridge these gaps, we present SpaceSense-Bench, a large-scale multi-modal benchmark for spacecraft perception encompassing 136 satellite models with approximately 70 GB of data. Each frame provides time-synchronized 1024×1024 RGB images, millimeter-precision depth maps, and 256-beam LiDAR point clouds, together with dense 7-class part-level semantic labels at both the pixel and point level as well as accurate 6-DoF pose ground truth. The dataset is generated through a high-fidelity space simulation built in Unreal Engine 5 and a fully automated pipeline covering data acquisition, multi-stage quality control, and conversion to mainstream formats. Comprehensive benchmarks on object detection, 2D semantic segmentation, RGB-LiDAR fusion 3D point cloud segmentation, monocular depth estimation, and orientation estimation reveal two key findings: (i) perceiving small-scale components such as thrusters and omni-antennas and generalizing to entirely unseen spacecraft in a zero-shot setting remain critical bottlenecks for current methods, and (ii) scaling up the number of training satellites yields substantial performance gains on novel targets, underscoring the value of large-scale, diverse datasets for space perception research.
Dataset Highlights
136
Satellite models with diverse geometries and structures
~70 GB
Large-scale benchmark data generated in a high-fidelity simulator
3 Modalities
1024×1024 RGB, millimeter-precision depth, and 256-beam LiDAR
7 Classes
Dense part-level semantic labels at both pixel and point levels
6-DoF
Accurate relative pose annotations for each frame
UE5 Pipeline
Automated generation, quality control, and conversion workflow
Visual Overview
Representative spacecraft appearances across diverse structures and viewpoints.
Aligned multi-modal observations support perception under challenging orbital conditions.
Dense part-level semantic labels are provided for both image pixels and LiDAR points.
The benchmark covers detection, segmentation, depth estimation, and orientation estimation.
Benchmark Tasks and Findings
Supported Tasks
- Object detection
- 2D semantic segmentation
- RGB-LiDAR fusion 3D point cloud segmentation
- Monocular depth estimation
- Orientation estimation
Key Findings
- Small components such as thrusters and omni-antennas remain difficult to perceive reliably.
- Zero-shot transfer to completely unseen spacecraft is still a major open challenge.
- Increasing the number and diversity of training satellites substantially improves generalization.
Qualitative Demo
A short qualitative demo illustrating the rendering quality and data diversity of SpaceSense-Bench.
Release Status
The repository presently hosts the project page and summary information. Links for the paper, benchmark data, and accompanying resources will be updated after the review process.
For updates, please watch the GitHub repository.
BibTeX
@article{wu2026spacesensebench,
title={SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation},
author={Wu, Aodi and Zuo, Jianhong and Zhao, Zeyuan and Luo, Xubo and Wang, Ruisuo and Wan, Xue},
year={2026},
url={https://github.com/wuaodi/SpaceSense-Bench}
}