Shengkun Tang

Shengkun (Bryson) Tang

Welcome to my website~ My name is Shengkun Tang. You can call me Bryson for short. Currently, I am a first-year PhD student of Machine Learning in MBZUAI, under the supervision of Prof. Zhiqiang Shen. During my gap year, I had a wonderful time as an research assistant in DASLab in ISTA , working with Prof. Dan Alistarh. Besides, I had close collaboration with Prof. Dongkuan Xu (NCSU) and Dr. Yaqing Wang (Google DeepMind), working on efficent multi-modal models. I finished B.E. in Remote Sensing at Wuhan University , under the supervision of Prof. Jian Yao and Prof. Xin Su.

Email / Google Scholar / Github

Last updated: Feb. 4th 2025

News

02/2025: Happy to release the code and pretrained weights of Bi-Mamba, please check here.

08/2024: Start my PhD life in MBZUAI.

05/2023: Invited to serve as Reviewer for International Workshop on Resource-Efficient Learning for Knowledge Discovery at KDD 2023.

05/2023: Invited to give a talk at 将门创投 on June 8, 2023. Welcome!

02/2023: My first paper on accelerating inference of vision language model was accept by CVPR 2023. This is my first work before PhD Program. Super excited :). Thank all co-authors' support.

09/2022: I joined Intelligent Automotive Group(IAG) at SenseTime as a system developer. I will build system for various perception modules of self-driving.

Research

My research interests lie on Landable Artificial Intelligence, focusing on the Resource Efficiency and Trustworthy of AI System. My research covers the whole pipeline of AI system, providing full-stack solutions from theoretical optimization methods and data-centric strategies to the development of efficient, interpretable and reliable deep learning techniques and the co-design of algorithms and hardware.

Resource-Efficient Training & Inference Algorithms
Data Optimization to Improve Data Quality & Efficiency
Scalable Methods for AI Systems with Theoretical Guarantees
Algorithm-Hardware Co-design for Acceleration
Application Scenario: Multi-Modal (Vision-Language), Uni-Modal (NLP, Computer Vision)

If you are interested in my research and seeking for collaboration, feel free to contact me. Any kinds of collaboration are welcome.

Publications

	DALD: Improving Logits-based Detector without Logits from Black-box LLM Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu [NeurIPS 2024] The Thirty-eighth Annual Conference on Neural Information Processing Systems arXiv / code We propose a simple but quite effective method to improve the performance of black-box LLM detection. DALD collects a small-size data from target model and train the surrogate model to align the distribution of surrogate model and target model.
	Adadiff: Accelerating diffusion models through step-wise adaptive computation Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu [ECCV 2024] European Conference on Computer Vision arXiv / code We propose a uncertainty estimation module (UEM) to decide the exiting point during diffusion model inference at each timestep. Moreover, we propose an uncertainty-aware layer-wise loss to recover the performance for early-exited model.
	You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, Dongkuan Xu [CVPR 2023] The IEEE/CVF Conference on Computer Vision and Pattern Recognition arXiv / code We propose a novel early exiting strategy based on cascading input similarity with valid assumptions on saturation states in visual-language models, a pioneering exploration of extending early exiting selection to encoders and decoders of sequence-to-sequence architectures.
	DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range Puyuan Yi, Shengkun Tang, Jian Yao Preprint, 2021 arXiv / code We propose a Dynamic Depth Range Network (DDR-Net) to determine the depth range hypotheses dynamically by applying a range estimation module (REM) to learn the uncertainties of range hypotheses in the former stages.
	Scale-robust deep-supervision network for mapping building footprints from high-resolution remote sensing images Haonan Guo, Xin Su, Shengkun Tang, Bo Du, Liangpei Zhang IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021 PDF We propose a novel deep-supervision convolutional neural network (denoted as DS-Net) for extracting building footprints from high-resolution remote sensing images.

Intern Experience

SenseTime Engineering, 06/2021 - 10/2021

Vision Algorithm Intern Researcher
Project: SenseRobot Chess Robotic, working with Ruodai Li

Work Experience

SenseTime, Intelligent Automotive Group(IAG), 05/2022 - Now

System Developer
Project: Large-Scale Self-Driving System Development

Contest

Baidu Astar Developer Competition, 05/2020 - 10/2020

Ranking: 7/2305 (teams)

The task of Baidu Astar 2020 is traffic signs and surveillance cameras detection and matching. I was in charge of detection task. I solved the problems of data imbalance by using my own data argumentation strategy and detect surveil- lance cameras more accurately. We got into the final and rank 7 out of 2305 teams.

Professional Services

Program Committee Member:

ICML 2025
ICLR 2025
AISTATS 2025
NeurIPS 2024
KDD 2023, 2024
AAAI 2023

This template comes from source code, thanks for his fantastic website templates.