We present ODIN, a large-scale multi-modal dataset for human behavior understanding using top-view omnidirectional cameras. It features real-life indoor scenarios with synchronized data like RGB, infrared, and depth images, egocentric videos, physiological signals, and 3D scans. Notably, ODIN offers camera-frame 3D human pose estimates for omnidirectional images, a first in the field.
Computer vision in hand-object pose has diverse applications. Current methods on balanced datasets may not perform well in real-world scenarios. We introduce a benchmark for handling pose distribution shifts and propose meta-learning for adaptation. Results improve over the baseline, but face optimization challenges. Our analysis guides future benchmark work.
Autonomous drone racing faces challenges with traditional gate detection due to varying conditions. This work proposes a semi-synthetic dataset combining real backgrounds and 3D renders for training convolutional neural networks for gate detection.