SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation

Aodi Wu1,2, Jianhong Zuo3, Zeyuan Zhao1,2, Xubo Luo1,2, Ruisuo Wang2, Xue Wan2
1 University of Chinese Academy of Sciences
2 Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences
3 Nanjing University of Aeronautics and Astronautics

SpaceSense-Bench provides high-fidelity simulated observations for spacecraft perception, combining synchronized RGB images, depth maps, LiDAR point clouds, dense part labels, and accurate 6-DoF poses.

Abstract

Autonomous space operations such as on-orbit servicing and active debris removal demand robust part-level semantic understanding and precise relative navigation of target spacecraft, yet acquiring large-scale real data in orbit remains prohibitively expensive. Existing synthetic datasets, moreover, suffer from limited target diversity, single-modality sensing, and incomplete ground-truth annotations. To bridge these gaps, we present SpaceSense-Bench, a large-scale multi-modal benchmark for spacecraft perception encompassing 136 satellite models with approximately 70 GB of data. Each frame provides time-synchronized 1024×1024 RGB images, millimeter-precision depth maps, and 256-beam LiDAR point clouds, together with dense 7-class part-level semantic labels at both the pixel and point level as well as accurate 6-DoF pose ground truth. The dataset is generated through a high-fidelity space simulation built in Unreal Engine 5 and a fully automated pipeline covering data acquisition, multi-stage quality control, and conversion to mainstream formats. Comprehensive benchmarks on object detection, 2D semantic segmentation, RGB-LiDAR fusion 3D point cloud segmentation, monocular depth estimation, and orientation estimation reveal two key findings: (i) perceiving small-scale components such as thrusters and omni-antennas and generalizing to entirely unseen spacecraft in a zero-shot setting remain critical bottlenecks for current methods, and (ii) scaling up the number of training satellites yields substantial performance gains on novel targets, underscoring the value of large-scale, diverse datasets for space perception research.

Dataset Highlights

136

Satellite models with diverse geometries and structures

~70 GB

Large-scale benchmark data generated in a high-fidelity simulator

3 Modalities

1024×1024 RGB, millimeter-precision depth, and 256-beam LiDAR

7 Classes

Dense part-level semantic labels at both pixel and point levels

6-DoF

Accurate relative pose annotations for each frame

UE5 Pipeline

Automated generation, quality control, and conversion workflow

Visual Overview

Benchmark Tasks and Findings

Supported Tasks

  • Object detection
  • 2D semantic segmentation
  • RGB-LiDAR fusion 3D point cloud segmentation
  • Monocular depth estimation
  • Orientation estimation

Key Findings

  • Small components such as thrusters and omni-antennas remain difficult to perceive reliably.
  • Zero-shot transfer to completely unseen spacecraft is still a major open challenge.
  • Increasing the number and diversity of training satellites substantially improves generalization.

Qualitative Demo

A short qualitative demo illustrating the rendering quality and data diversity of SpaceSense-Bench.

Release Status

The repository presently hosts the project page and summary information. Links for the paper, benchmark data, and accompanying resources will be updated after the review process.

For updates, please watch the GitHub repository.

BibTeX

@article{wu2026spacesensebench,
  title={SpaceSense-Bench: A Large-Scale Multi-Modal Benchmark for Spacecraft Perception and Pose Estimation},
  author={Wu, Aodi and Zuo, Jianhong and Zhao, Zeyuan and Luo, Xubo and Wang, Ruisuo and Wan, Xue},
  year={2026},
  url={https://github.com/wuaodi/SpaceSense-Bench}
}