Seminar by Hokuto Munakata and Tatsuya Komatsu: "Temporal Structure Understanding in Audio and Video"
Seminar by Hokuto Munakata and Tatsuya Komatsu: "Temporal Structure Understanding in Audio and Video"
Seminar by Hokuto Munakata and Tatsuya Komatsu: "Temporal Structure Understanding in Audio and Video"

Title: Temporal Structure Understanding in Audio and Video
Hokuto Munakata and Ph.D. Tatsuya Komatsu (LY Corporation)
Abstract:
This talk provides an overview of our team’s research on music, video, and audio, along with a discussion of our recent work. A central theme of our research is the analysis of temporal structure in multimedia data. In this context, we have worked on problems such as singer diarization in music and alignment between music and text, including the construction of Music-to-Japanese text datasets.
From the perspective of temporal structure understanding, we have recently focused on Moment Retrieval, a task that aims to identify semantically relevant temporal segments within long-form media. In this talk, we present our work on Audio and Video Moment Retrieval, including the proposal of a new task, the development of benchmark datasets, the release of an open-source toolkit, and approaches for multimodal fusion of audio and visual information.
Short Bio:
• Hokuto Munakata
Research Scientist, LY Corporation; works on audio signal processing, video analysis, and multimodal foundation models. DCASE Challenge 2026 Task 6 Organizer.
• Ph.D. Tatsuya Komatsu
Senior Research Scientist, LY Corporation; works on speech/audio processing and multimodal foundation models. Technical Program Chair: DCASE 2020/2024;APSIPA AAM TC, ICCV2025 Workshop. DCASE Challenge 2026 Task 6 Organizer.
Link to the online session: https://www.upf.edu/web/mtg/streaming
Activity supported by:
Cátedra UPF-BMAT en Inteligencia Artificial y Música (TSI-100929-2023-1). Project funded by Secretaría de Estado de Digitalización e Inteligencia Artificial, the European Union-Next Generation EU, and by BMAT Music Innovators, the Music Operating System
