EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts

Kyoto University

Abstract

Mistake action detection is crucial for developing intelligent archives that detect workers' errors and provide feedback. Existing studies have focused on visually apparent mistakes in free-style activities, resulting in video-only approaches to mistake detection. However, in text-following activities, models cannot determine the correctness of some actions without referring to the texts. Additionally, current mistake datasets rarely use procedural texts for video recording except for cooking. To fill these gaps, this paper proposes the EgoOops dataset, where egocentric videos record erroneous activities when following procedural texts across diverse domains. It features three types of annotations: video-text alignment, mistake labels, and descriptions for mistakes. We also propose a mistake detection approach, combining video-text alignment and mistake label classification to leverage the texts. Our experimental results show that incorporating procedural texts is essential for mistake detection. Data is available through https://y-haneji.github.io/EgoOops-project-page/.

Sample video (color mixture experiments)

Step: Mix powdered detergent and yellow liquid in a cup.

Mistake label: working with wrong objects

Mistake description: use green liquid but should use yellow liquid

Download

Videos

Our laboratory's website (720p, RGB)

Annotations

[Recommended] The GitHub repository
Our laboratory's website

*The both contents are same but we might update the GitHub version.

BibTeX

@misc{haneji2024egooopsdatasetmistakeaction, title={EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts}, author={Yuto Haneji and Taichi Nishimura and Hirotaka Kameko and Keisuke Shirai and Tomoya Yoshida and Keiya Kajimura and Koki Yamamoto and Taiyu Cui and Tomohiro Nishimoto and Shinsuke Mori}, year={2024}, eprint={2410.05343}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2410.05343}, }

EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts

Overview of the EgoOops dataset. EgoOops includes 50 egocentric videos across five procedural domains and corresponding procedural texts. It contains three types of annotations: video-text alignment, mistake labels, and descriptions explaining the errors in each segment.

Abstract

Sample video (color mixture experiments)

Statistics

Download

BibTeX