Shuaifeng Zhi

Shuaifeng Zhi (智帅峰)

I am now a Lecturer (Assistant Professor) at the Department of Electronic Science and Technology, National University of Defense Technology (NUDT), China. I am selected for the 10th Youth Talent Support Program of China Association for Science and Technology (CAST) and the grantee of Hunan Provincial Natural Science Foundation for the Excellent Young Scientists Fund.
Prior to being an academic scholar, I obtained my Ph.D degree at The Dyson Robotics Lab at Imperial College, supervised by Prof. Andrew J. Davison and Dr. Stefan Leutenegger; finished my MSc.Eng and B.Eng in NUDT, China. I was also a CSC-funded 6-month visiting student in 5GIC, University of Surrey in 2015.

My research interest lies in semantic SLAM and semantic neural representations, combining semantics and SLAM systems using learning-based approaches.

Email / Google Scholar / LinkedIn / Github

Event and News

Sep 2025: Happy to announce that our two papers PlaneRecTR++ and StepSPT got accepted to IEEE T-PAMI 2025, and RemixFusion got accepted to ACM TOG 2025!

Apr 2025: Happy to announce that our paper CDFSL Survey got accepted to ACM CSUR 2025!

Sep 2024: Happy to announce that our paper MOSE got accepted to IEEE RA-L 2024!

June 2024: Happy to announce that our paper SSR-2D got accepted to IEEE T-PAMI 2024!

July 2023: Happy to announce that our paper PlaneRecTR got accepted to ICCV 2023!

July 2021: Happy to announce that our paper Semantic-NeRF got accepted to ICCV 2021 as Oral Presentation (top 3%)!

Selected Publications

PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation
Jingjia Shi, Shuaifeng Zhi, Kai Xu
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2025
arxiv

We propose PlaneRecTR++, a Transformer-based architecture, which for the first time unifies all tasks of multi-view planar reconstruction and pose estimation within a compact single-stage framework.

RemixFusion: Residual-based Mixed Representation for Large-scale Online RGB-D Reconstruction
Yuqing Lan, Chenyang Zhu, Shuaifeng Zhi, Jiazhao Zhang, Zhoufeng Wang, Renjiao Yi, Yijie Wang, Kai Xu
ACM Transactions on Graphics (TOG), To be presented at SIGGRAPH Asia, 2025
arxiv / project page

We present RemixFusion, a residual-based RGB-D framework by virtue of both explicit and implicit representations for large-scale online dense reconstruction, supporting real-time fine-grained reconstruction in a memory-efficient way.

	Step-wise Distribution-aligned Style Prompt Tuning for Source-Free Cross-domain Few-shot Learning Huali Xu, Li Liu, Tianpeng Liu, Shuaifeng Zhi, Shuzhou Sun, Ming-Ming Cheng IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2025 arxiv / code This paper investigates the source-free CDFSL (SF-CDFSL) problem using only a pre-trained model and a few target samples, without requiring source data or training strategies. We propose StepSPT to implicitly narrow the domain gaps from the perspective of prediction distribution optimization.
	Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey Huali Xu, Shuaifeng Zhi, Shuzhou Sun, Vishal M. Patel, Li Liu ACM Computing Surveys (CSUR), 2025 arxiv We present the first comprehensive review of Cross-domain Few-shot Learning (CDFSL) covering key problems, existing methods, and future research directions.
	A Causal Adjustment Module for Debiasing Scene Graph Generation Li Liu, Shuzhou Sun, Shuaifeng Zhi, Fan Shi, Zhen Liu, Janne Heikkilä, Yongxiang Liu IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2025 arxiv We employ causal inference techniques to model the causality among skewed object and object pair distributions in SGG, achieving SOTA mean recall as well as promising zero-shot recall.
	SSR-2D: Semantic 3D Scene Reconstruction from 2D Images Junwen Huang, Alexey Artemov, Yujin Chen, Shuaifeng Zhi, Kai Xu, Matthias Nießner IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2024 arxiv We leverage differential rendering to learn complete 3D scans with semantic labelling without any 3D annotations.
	MOSE: Monocular Semantic Reconstruction Using NeRF-Lifted Noisy Priors Zhenhua Du, Binbin Xu, Haoyu Zhang, Kai Huo, Shuaifeng Zhi IEEE Robotics and Automation Letters (RA-L), 2024 arxiv We propose MOSE, a neural field semantic reconstruction approach to lift inferred image-level noisy priors to 3D, producing accurate semantics and geometry in both 3D and 2D space.
	OS³Flow: Optical and SAR Image Registration Using Symmetry-Guided Semi-Dense Optical Flow Zixuan Sun, Shuaifeng Zhi, Kai Huo, Xuecong Liu, Weidong Jiang, Yongxiang Liu IEEE Geoscience and Remote Sensing Letters (GRSL), 2024 We introduce OS³Flow, a novel registration framework using implicit symmetry between heterogeneous optical-SAR image pairs to extract high-quality semi-dense flow estimations.
	Enhancing Information Maximization with Distance-Aware Contrastive Learning for Source-Free Cross-Domain Few-Shot Learning Huali Xu, Li Liu, Shuaifeng Zhi, Shaojing Fu, Zhuo Su, Ming-Ming Cheng, Yongxiang Liu IEEE Transactions on Image Processing (T-IP), 2024 arxiv / IEEE Link We explore the Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) problem, leveraging on an enhanced information maximization (IM) strategy and a transductive mechanism.
	Looking Beneath More: A Sequence-based Localizing Ground Penetrating Radar Framework Pengyu Zhang, Shuaifeng Zhi, Yuelin Yuan, Beizhen Bi, Qin Xin, Xiaotao Huang, Liang Shen IEEE International Conference on Robotics and Automation (ICRA), 2024 We propose a sequence-based framework for Localizing Ground Penetrating Radar (LGPR) to address the challenges caused by the extreme scarcity of underground features and their introduced false candidate matches.
	PlaneRecTR: Unified Query learning for 3D Plane Recovery from a Single View Jingjia Shi, Shuaifeng Zhi, Kai Xu International Conference on Computer Vision (ICCV), 2023 arxiv / video / video (bilibili) / code project page / PlaneRecTR is a vision transformer architecture with query-based learning, and for the first time unifies all subtasks of single-view plane recovery with a single compact model. Mutual benefits between planar geometry and segmentation can be obtained by PlaneRecTR, achieving SOTA performance on ScanNet and NYUv2-Plane datasets.
	RaNeRF: Neural 3D Reconstruction of Space Targets from ISAR Image Sequences Afei Liu, Shuanghui Zhang, Chi Zhang, Shuaifeng Zhi, Xiang Li IEEE Transactions on Geoscience and Remote Sensing (T-GRS), 2023 RaNeRF is a novel 3D target reconstruction method using only observed ISAR image sequences, built upon powerful neural radiance fileds.
	Unbiased Scene Graph Generation via Two-stage Causal Modeling Shuzhou Sun, Shuaifeng Zhi, Qing Liao, Janne Janne Heikkila, Li Liu IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2023 arxiv / Early Access We propose Two-stage Causal Modeling (TsCM) for unbiased SGG predictions. Taking the long-tailed distribution and semantic confusion as confounders to the Structural Causal Model (SCM), we decouple the causal intervention into two stages. As a model agnostic method, TsCM achieves a better tradeoff between head and tail relationships.
	ROFusion: Efficient Object Detection using Hybrid Point-wise Radar-Optical Fusion Liu Liu, Shuaifeng Zhi, Zhenhua Du, Li Liu, Xinyu Zhang, Kai Huo, Weidong Jiang International Conference on Artificial Neural Networks (ICANN), 2023 arxiv We propose ROFusion, a hybrid point-wise Radar-Optical fusion approach for object detection in autonomous driving scenarios, motivated by dense contextual information from both the range-doppler spectrum and images.
	Evidential Uncertainty and Diversity Guided Active Learning for Scene Graph Generation Shuzhou Sun, Shuaifeng Zhi, Janne Heikkila, Li Liu International Conference on Learning Representations (ICLR), 2023 OpenReview We propose EDAL, a novel AL framework tailored for the SGG task leveraging Evidential Deep Learning (EDL) and diversity-based debias modules.
	iLabel: Revealing Objects in Neural Fields Shuaifeng Zhi^, Edgar Sucar^, Andre Mouton, Iain Haughton, Tristan Laidlow, Andrew J. Davison ( * denotes equal contributions.) IEEE Robotics and Automation Letters (RA-L) arxiv / video / project page We build an online interactive 3D scene labelling and understanding system upon 3D neural field representation of geometry, colour and semantics.
	Bootstrapping Semantic Segmentation with Regional Contrast Shikun Liu, Shuaifeng Zhi, Edward Johns, Andrew J. Davison International Conference on Learning Representations (ICLR), 2022 arxiv / code / project page We present ReCo, a new contrastive learning framework designed at a regional level to assist learning in semantic segmentation. ReCo performs (semi-)supervised pixel-level contrastive learning on a sparse set of hard negative pixels. With minimal extra memory footprint, Reco boosts exsiting baselines by a large margin, revealing hierarchical similarities of various semantic classes as well.
	In-Place Scene Labelling and Understanding with Implicit Scene Representation Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, Andrew J. Davison International Conference on Computer Vision (ICCV), 2021 (Oral Presentation) arxiv / video / video (bilibili) / project page / code We show that neural radiance fileds (NeRF) contains strong priors for scene cluster and segmentation. The internal multi-view consistency and smoothness make the training process itself a multi-view semantic fusion process. Such a scene-specific implcit semantic representation can be efficiently learned with various sparse or noisy annotations, leading to accurate dense labelling of the full scene.
	SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019 arxiv / video We show that an efficient code representation is able to control the semantic label prediction of an image. Latent codes of overlapping images can be jointly optimised to perform coherent semantic fusion. We also show how this approach can be used within a monocular keyframe based semantic mapping system where a similar code approach is used for geometry.
	Toward real-time 3D object recognition: A lightweight volumetric CNN framework using multitask learning Shuaifeng Zhi, Yongxiang Liu, Xiang Li, Yulan Guo Computers & Graphics, 2017 bibtex We propose LightNet, a light-weight 3D volumetric CNN for real-time 3D object classification. This paper subsumes the 3DOR 2017 paper LightNet.
	LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition Shuaifeng Zhi, Yongxiang Liu, Xiang Li, Yulan Guo Eurographics Workshop on 3D Object Retrieval (3DOR), 2017 bibtex

Service

	Reviewer in 2022: CVPR, ICLR, ICML, ICME, ICVR (ordinary PC member), WCCI Reviewer in 2021: CVPR, ICCV, NeurIPS, ICLR, ICME, IJCNN Reviewer in 2020: ICME Reviewer in 2019: CVPR-W, ICCV-W, ICRA-W
	Lab Assistant, Robotics (Online with Coppeliasim!), Spring 2021 Lab Assistant, Robotics , Autumn 2019 Lab Assistant, Robotics , Spring 2019 Lab Assistant, Robotics , Autumn 2018 Lab Assistant, Advanced-Robotics, Spring 2018

Thank Dr. Jon Barron for sharing the source code of the website.