Contrastive Beam Diffusion Models for Decoding Visual Sequences Seminario

Seminar: “Contrastive Beam Diffusion Models for Decoding Visual Sequences” – Prof Joao Magalhães (Universidade NOVA de Lisboa)
Venerdì 11 Aprile – ore 11:30 – Aula 008 Centro Didattico Morgagni, V.le Morgagni 40-44

Seminar: “Contrastive Beam Diffusion Models for Decoding Visual Sequences” – Prof Joao Magalhães (Universidade NOVA de Lisboa)
Venerdì 11 Aprile – ore 11:30 – Aula 008 Centro Didattico Morgagni, V.le Morgagni 40-44

Abstract While diffusion models excel at generating high-quality images from text prompts, they struggle with visual consistency in image sequences. Existing methods generate each image independently, leading to disjointed narratives – a challenge further exacerbated in non-linear storytelling, where scenes must connect beyond adjacent frames. We introduce a novel beam search strategy for latent space exploration, enabling conditional generation of full image sequences with beam search decoding. Unlike prior approaches that use fixed latent priors, our method dynamically searches for an optimal sequence of latent representations, ensuring coherent visual transitions. To address beam search’s quadratic complexity, we integrate a contrastive mechanism that efficiently scores search paths and enables pruning, prioritizing alignment with both textual prompts and visual context. Human evaluations confirm that our approach outperforms baseline methods, producing full sequences with superior coherence, visual continuity, and textual alignment. By bridging advances in search optimization and latent space refinement, this work sets a new standard for structured image sequence generation. 

Bio João Magalhães is a Full Professor at the Department of Computer Science, Universidade NOVA de Lisboa, is national co-Director of the CMU-Portugal partnership and leads the Multimodal Systems Group at NOVA LINCS. He holds a PhD from Imperial College London (2008) and conducts research at the intersection of AI, vision, and language, focusing on generative models, controllable LLMs, multimedia search, multimodal conversational AI, and temporal models. João has coordinated and contributed to numerous international projects with partners like BBC, Amazon and Google. He has held key organizational roles in top-tier conferences, including General Chair of ACM Multimedia 2022 and PC Chair of ACM Multimedia 2026. His work has earned multiple awards, including first and second place in the Amazon Alexa Taskbot Challenge. He also contributed to MPEG-7 and MPEG-21 standards during his time in industry. João is currently a member of the ACM Multimedia Steering Committee. 

UMETECH courses

NEMECH 2017
3D & Mobile programming

Alberto Del Bimbo at AI DIVE 2018

AI DIVE 2018 Conference

Alberto Del Bimbo speaker on AI

Lamberto Ballan

Sharing Knowledge

for Large Scale Visual Recognition

AI4Debunk project just launched

Horizon Europe Programme