Junyu Xie

I am currently a fourth-year DPhil student at Visual Geometry Group (VGG), University of Oxford, advised by Prof. Andrew Zisserman and Prof. Weidi Xie.

Prior to that, I completed my undergraduate studies at University of Cambridge and received MSc and BA degrees in Natural Sciences (Physics), during which I did summer interns on machine learning and physics at Caltech, Fudan University, and University of Cambridge.

Email  /  Google Scholar  /  Github

profile photo

Research

My research focuses on long-form video understanding, object-centric learning, and motion segmentation. I am also interested in representation learning, image and video generation, and multimodal language model.

b3do AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description
Junyu Xie , Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman
In ACCV, 2024  
ArXiv / Bibtex / Project page / Code / Dataset (TV-AD)

In this paper, we propose AutoAD-Zero, which is a training-free framework aiming at zero-shot Audio Description (AD) generation for movies and TV series. The overall framework feature two stages (dense description + AD summary), with the character information injected by visual-textual prompting.

b3do Moving Object Segmentation: All You Need Is SAM (and Flow)
Junyu Xie , Charig Yang, Weidi Xie, Andrew Zisserman
In ACCV, 2024   (Oral)
ArXiv / Bibtex / Project page / Code

This paper focuses on motion segmentation by incorporating optical flow into the Segment Anything model (SAM), applying flow information as direct inputs (FlowISAM) or prompts (FlowPSAM).

b3do Appearance-Based Refinement for Object-Centric Motion Segmentation
Junyu Xie , Weidi Xie, Andrew Zisserman
In ECCV, 2024  
ArXiv / Bibtex / Project page

This paper aims at improving flow-only motion segmentation (e.g. OCLR predictions) by leveraging appearance information across video frames. A selection-correction pipeline is developed, along with a test-time model adaptation scheme that further alleviates the Sim2Real disparity.

b3do SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi
In CVPR , 2024  
ArXiv / Bibtex / Project page / Code / Demo

This paper present a method, named SHAP-EDITOR, aiming at fast 3D editing (within one second). To acheve this, we propose to learn a universal editing function that can be applied to different objects in a feed-forward manner.

b3do Segmenting Moving Objects via an Object-Centric Layered Representation
Junyu Xie , Weidi Xie, Andrew Zisserman
In NeurIPS, 2022  
ArXiv / Bibtex / Project page / Code

In this paper, we propose the OCLR model for discovering, tracking and segmenting multiple moving objects in a video without relying on human annotations. This object-centric segmentation model utilises depth-ordered layered representations and is trained following a Sim2Real procedure.


This website template is originally designed by Jon Barron.