Shuaifeng Zhi

I am now a Lecturer (Assistant Professor) at the Department of Electronic Science and Technology, National University of Defense Technology (NUDT), China.
Prior to being an academic scholar, I obtained my Ph.D degree at The Dyson Robotics Lab at Imperial College, supervised by Prof. Andrew J. Davison and Dr. Stefan Leutenegger; finished my MSc.Eng and B.Eng in NUDT, China.

My research interest lies in semantic SLAM and semantic neural representations, combining semantics and SLAM systems using learning-based approaches.

Email  /  Google Scholar  /  LinkedIn  /  Twitter

Event and News

July 2023: Happy to announce that our paper PlaneRecTR got accepted to ICCV 2023, RaNeRF got accepted to T-GRS 2023!

June 2023: Happy to announce that our paper TsCM got accepted to T-PAMI 2023!

Jan 2023: Happy to announce that our paper EDAL got accepted to ICLR 2023!

Dec 2022: Happy to announce that our paper iLabel got accepted to RA-L!

Feb 2022: The codes and data of Semantic-NeRF are now released!

Jan 2022: Happy to announce that our paper ReCo got accepted to ICLR 2022.

Oct 2021: Attended GAMES Webminar of semantic scene representations.

Oct 2021: Attended ICCV 2021 3DReps Workshop and presented Semantic-NeRF in poster session.

Sep 2021: Gave a talk of Semantic-NeRF in a Webinar of neural implicit representations held by GAUSSIAN ROBOTICS and TechBeat.

July 2021: Happy to announce that our paper Semantic-NeRF got accepted to ICCV 2021 as Oral Presentation (top 3%)!

Feb 2019: SceneCode got accepted to CVPR 2019.

July 2018: Participated International Computer Vision Summer School (ICVSS 2018) in Sicily, Italy.


PlaneRecTR: Unified Query learning for 3D Plane Recovery from a Single View
Jingjia Shi, Shuaifeng Zhi, Kai Xu
International Conference on Computer Vision (ICCV), 2023
arxiv / video / video (bilibili) / code

PlaneRecTR is a vision transformer architecture with query-based learning, and for the first time unifies all subtasks of single-view plane recovery with a single compact model. Mutual benefits between planar geometry and segmentation can be obtained by PlaneRecTR, achieving SOTA performance on ScanNet and NYUv2-Plane datasets.


RaNeRF: Neural 3D Reconstruction of Space Targets from ISAR Image Sequences
Afei Liu, Shuanghui Zhang, Chi Zhang, Shuaifeng Zhi, Xiang Li
IEEE Transactions on Geoscience and Remote Sensing (T-GRS), 2023

RaNeRF is a novel 3D target reconstruction method using only observed ISAR image sequences, built upon powerful neural radiance fileds.


Unbiased Scene Graph Generation via Two-stage Causal Modeling
Shuzhou Sun, Shuaifeng Zhi, Qing Liao, Janne Janne Heikkila, Li Liu
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2023
arxiv / Early Access

We propose Two-stage Causal Modeling (TsCM) for unbiased SGG predictions. Taking the long-tailed distribution and semantic confusion as confounders to the Structural Causal Model (SCM), we decouple the causal intervention into two stages. As a model agnostic method, TsCM achieves a better tradeoff between head and tail relationships.


ROFusion: Efficient Object Detection using Hybrid Point-wise Radar-Optical Fusion
Liu Liu, Shuaifeng Zhi, Zhenhua Du, Li Liu, Xinyu Zhang, Kai Huo, Weidong Jiang
International Conference on Artificial Neural Networks (ICANN), 2023

We propose ROFusion, a hybrid point-wise Radar-Optical fusion approach for object detection in autonomous driving scenarios, motivated by dense contextual information from both the range-doppler spectrum and images.


S4R: Self-Supervised Semantic Scene Reconstruction from RGB-D Scans
Junwen Huang, Alexey Artemov, Yujin Chen, Shuaifeng Zhi, Kai Xu, Matthias Nießner
arxiv pre-print, 2023

We leverage differential rendering to learn complete 3D scans with semantic labelling without any 3D annotations.


Evidential Uncertainty and Diversity Guided Active Learning for Scene Graph Generation
Shuzhou Sun, Shuaifeng Zhi, Janne Heikkila, Li Liu
International Conference on Learning Representations (ICLR), 2023

We propose EDAL, a novel AL framework tailored for the SGG task leveraging Evidential Deep Learning (EDL) and diversity-based debias modules.


iLabel: Revealing Objects in Neural Fields
Shuaifeng Zhi*, Edgar Sucar*, Andre Mouton, Iain Haughton, Tristan Laidlow, Andrew J. Davison ( * denotes equal contributions.)
IEEE Robotics and Automation Letters (RA-L)
arxiv / video / project page

We build an online interactive 3D scene labelling and understanding system upon 3D neural field representation of geometry, colour and semantics.


Bootstrapping Semantic Segmentation with Regional Contrast
Shikun Liu, Shuaifeng Zhi, Edward Johns, Andrew J. Davison
International Conference on Learning Representations (ICLR), 2022
arxiv / code / project page

We present ReCo, a new contrastive learning framework designed at a regional level to assist learning in semantic segmentation. ReCo performs (semi-)supervised pixel-level contrastive learning on a sparse set of hard negative pixels. With minimal extra memory footprint, Reco boosts exsiting baselines by a large margin, revealing hierarchical similarities of various semantic classes as well.


In-Place Scene Labelling and Understanding with Implicit Scene Representation
Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, Andrew J. Davison
International Conference on Computer Vision (ICCV), 2021
(Oral Presentation)
arxiv / video / video (bilibili) / project page / code

We show that neural radiance fileds (NeRF) contains strong priors for scene cluster and segmentation. The internal multi-view consistency and smoothness make the training process itself a multi-view semantic fusion process. Such a scene-specific implcit semantic representation can be efficiently learned with various sparse or noisy annotations, leading to accurate dense labelling of the full scene.

SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations
Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
arxiv / video

We show that an efficient code representation is able to control the semantic label prediction of an image. Latent codes of overlapping images can be jointly optimised to perform coherent semantic fusion. We also show how this approach can be used within a monocular keyframe based semantic mapping system where a similar code approach is used for geometry.


Toward real-time 3D object recognition: A lightweight volumetric CNN framework using multitask learning
Shuaifeng Zhi, Yongxiang Liu, Xiang Li, Yulan Guo
Computers & Graphics, 2017

We propose LightNet, a light-weight 3D volumetric CNN for real-time 3D object classification.

This paper subsumes the 3DOR 2017 paper LightNet.


LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition
Shuaifeng Zhi, Yongxiang Liu, Xiang Li, Yulan Guo
Eurographics Workshop on 3D Object Retrieval (3DOR), 2017


Reviewer in 2022: CVPR, ICLR, ICML, ICME, ICVR (ordinary PC member), WCCI

Reviewer in 2021: CVPR, ICCV, NeurIPS, ICLR, ICME, IJCNN

Reviewer in 2020: ICME

Reviewer in 2019: CVPR-W, ICCV-W, ICRA-W

Lab Assistant, Robotics (Online with Coppeliasim!), Spring 2021

Lab Assistant, Robotics , Autumn 2019

Lab Assistant, Robotics , Spring 2019

Lab Assistant, Robotics , Autumn 2018

Lab Assistant, Advanced-Robotics, Spring 2018

Thank Dr. Jon Barron for sharing the source code of the website.