Seminar by Hokuto Munakata and Tatsuya Komatsu: "Temporal Structure Understanding in Audio and Video"

Seminar by Hokuto Munakata and Tatsuya Komatsu: "Temporal Structure Understanding in Audio and Video"

Monday, May 11th 2026 at 12:00h (CEST) - Room 55.309 (3rd floor) Tanger building (UPF Poblenou) and online
07.05.2026

Imatge inicial -

Title: Temporal Structure Understanding in Audio and Video

Hokuto Munakata and Ph.D. Tatsuya Komatsu (LY Corporation)

Abstract: This talk provides an overview of our team’s research on music, video, and audio, along with a discussion of our recent work. A central theme of our research is the analysis of temporal structure in multimedia data. In this context, we have worked on problems such as singer diarization in music and alignment between music and text, including the construction of Music-to-Japanese text datasets. From the perspective of temporal structure understanding, we have recently focused on Moment Retrieval, a task that aims to identify semantically relevant temporal segments within long-form media. In this talk, we present our work on Audio and Video Moment Retrieval, including the proposal of a new task, the development of benchmark datasets, the release of an open-source toolkit, and approaches for multimodal fusion of audio and visual information.

Short Bio:Hokuto Munakata Research Scientist, LY Corporation; works on audio signal processing, video analysis, and multimodal foundation models. DCASE Challenge 2026 Task 6 Organizer.Ph.D. Tatsuya Komatsu Senior Research Scientist, LY Corporation; works on speech/audio processing and multimodal foundation models. Technical Program Chair: DCASE 2020/2024;APSIPA AAM TC, ICCV2025 Workshop. DCASE Challenge 2026 Task 6 Organizer.

Link to the online sessionhttps://www.upf.edu/web/mtg/streaming

 

 

 

Activity supported by:

Cátedra UPF-BMAT en Inteligencia Artificial y Música (TSI-100929-2023-1). Project funded by Secretaría de Estado de Digitalización e Inteligencia Artificial, the European Union-Next Generation EU, and by BMAT Music Innovators, the Music Operating System