WildRefer #62
Labels
anno: 3D bbox
anno: 3D instance segmentation
anno: 3D object grounding
find 3D object based on a natural language query
anno: 3D tracking
data: lidar
data: poses
data: RGB
data source: hucenlife
data source: stcrowd
domen: natural
features: dynamic objects
We propose two novel datasets, i.e., STRefer and LifeRefer, which focus on large-scale human-centric daily-life scenarios accompanied with abundant 3D object and natural language annotations.
We uniformly sampled 662 scenes from original STCrowd dataset, including a total length of 65 minutes, for STRefer and annotate 5,458 natural language descriptions for 3,581 subjects. The scene here means a frame of synchronized LiDAR point cloud and image. The content in each scene distinguishes from others due to changing capture locations or time. We split it into training and testing data by 4:1 without data leakage. LifeRefer involves 25,380 natural language descriptions for 11,864 subjects based on 3,172 scenes, which has totally 103 minutes length. Similarly, we split it into 14,650 training data and 10,730
testing data without data leakage.
Paper Project Code
The text was updated successfully, but these errors were encountered: