Junyu Xie

I am currently a third-year DPhil student at Visual Geometry Group (VGG), University of Oxford, advised by Prof. Andrew Zisserman and Prof. Weidi Xie. Before that, I received my MSc and BA degrees from University of Cambridge in 2021, majoring in Natural Science.

My research interest lies in computer vision, specifically in object-centric learning, motion segmentation, multimodal video understanding and generation.

Email  /  Google Scholar  /  Github

Publications
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie , Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
In ACCV, 2024  
ArXiv / Bibtex / Project page / Code / Dataset (TV-AD)

In this paper, we propose AutoAD-Zero, which is a training-free framework aiming at zero-shot Audio Description (AD) generation for movies and TV series. The overall framework feature two stages (dense description + AD summary), with the character information injected by visual-textual prompting.

Moving Object Segmentation: All You Need Is SAM (and Flow)
Junyu Xie , Charig Yang, Weidi Xie, Andrew Zisserman
In ACCV, 2024   (Oral)
ArXiv / Bibtex / Project page / Code

This paper focuses on motion segmentation by incorporating optical flow into the Segment Anything model (SAM), applying flow information as direct inputs (FlowISAM) or prompts (FlowPSAM).

Appearance-Based Refinement for Object-Centric Motion Segmentation
Junyu Xie , Weidi Xie, Andrew Zisserman
In ECCV, 2024  
ArXiv / Bibtex / Project page

This paper aims at improving flow-only motion segmentation (e.g. OCLR predictions) by leveraging appearance information across video frames. A selection-correction pipeline is developed, along with a test-time model adaptation scheme that further alleviates the Sim2Real disparity.

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi
In CVPR , 2024  
ArXiv / Bibtex / Project page / Code / Demo

This paper present a method, named SHAP-EDITOR, aiming at fast 3D editing (within one second). To acheve this, we propose to learn a universal editing function that can be applied to different objects in a feed-forward manner.

Segmenting Moving Objects via an Object-Centric Layered Representation
Junyu Xie , Weidi Xie, Andrew Zisserman
In NeurIPS, 2022  
ArXiv / Bibtex / Project page / Code

In this paper, we propose the OCLR model for discovering, tracking and segmenting multiple moving objects in a video without relying on human annotations. This object-centric segmentation model utilises depth-ordered layered representations and is trained following a Sim2Real procedure.


This website template is originally designed by Jon Barron.