Shengkun Tang

Shengkun Tang

Welcome to my website~ My name is Shengkun Tang. You can call me Bryson for short. Currently, I am a research intern in Alibaba Qwen Team. Besides, I am a PhD student of Machine Learning in MBZUAI, under the supervision of Prof. Zhiqiang Shen. During my gap year, I had a wonderful time as an research assistant in DASLab in ISTA , working with Prof. Dan Alistarh. Besides, I had close collaboration with Prof. Dongkuan Xu (NCSU) and Dr. Yaqing Wang (Google DeepMind), working on efficent multi-modal models. I finished B.E. in Remote Sensing at Wuhan University , under the supervision of Prof. Jian Yao and Prof. Xin Su.

Email / Google Scholar / Github

Last updated: May 12th 2026

News

05/2026: I am very happy to release SlimQwen. Please have a check if you want to know more details about Qwen3.5 pretraining. Paper Link
02/2026: Very excited to release Qwen3.5 & Qwen 3.6 series model!! Please have a try if you are interested! Blog Link
01/2026: one paper is accepted by CVPR 2026! Congratulations to Jiacheng!
01/2026: one paper is accepted by DMLR! Congratulations to the team!
10/2025: Bi-Mamba is accepted by TMLR! Thanks to all collaborators!
10/2025: one co-first author paper is accepted by NeurIPS 2025. Congratulations to Cong!
06/2025: one paper is accepted by ICCV 2025. Congratulations to Bowei!
06/2025: Start the research internship in Qwen pretraining team!
04/2025: one paper is accepted by 2nd Re-Align Workshop in ICLR 2025. Congratulations to Xuanjie and Cong!
02/2025: Happy to release the code and pretrained weights of Bi-Mamba, please check here.
08/2024: Start my PhD life in MBZUAI.
05/2023: Invited to serve as Reviewer for International Workshop on Resource-Efficient Learning for Knowledge Discovery at KDD 2023.
05/2023: Invited to give a talk at 将门创投 on June 8, 2023. Welcome!
02/2023: My first paper on accelerating inference of vision language model was accepted by CVPR 2023. Super excited :). Thank all co-authors' support.
09/2022: I joined Intelligent Automotive Group(IAG) at SenseTime as a system developer. I will build system for various perception modules of self-driving.

Research

My research focuses on building efficient, reliable, and deployable AI systems. I am interested in improving the full pipeline of modern foundation models, from architecture design and training to inference, data, and evaluation.

Specifically, my research spans four directions:

Inference Efficiency. I develop methods that reduce the computational and memory cost of large models during deployment, including structured pruning , quantization, adaptive computation, and token pruning.
Training Efficiency. I study resource-efficient training methods that improve model capability under limited computational budgets, including efficient optimization, data-efficient learning, and scalable training strategies.
Novel Model Architectures. I design compact and scalable model architectures for efficient intelligence, including work such as SlimQwen and other architecture-level innovations.
Data-Centric AI and Trustworthy Evaluation. I also study data quality, efficient data usage, benchmarks, and trustworthy evaluation.

I am always open to research collaborations. Please feel free to contact me if you are interested in efficient AI systems, foundation models, or related topics.

Publications

	SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training Shengkun Tang, Zekun Wang, Bo Zheng*, Liangyu Wang, Rui Men, Siqi Zhang, Xiulong Yuan, Zihan Qiu, Zhiqiang Shen, Dayiheng Liu Paper
	Qwen3.5: Towards Native Multimodal Agents Core Contributor Blog / code / model collection
	BiGain: Unified Token Compression for Joint Generation and Classification Jiacheng Liu, Shengkun Tang, Jiacheng Cui, Dongkuan Xu, Zhiqiang Shen [CVPR 2026] The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 Paper / code
	Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark Sondos Mahmoud Bsharat, Mukul Ranjan, Aidar Myrzakhan, Jiacheng Liu, Bowei Guo, Shengkun Tang, Zhuang Liu, Yuanzhi Li, Zhiqiang Shen [DMLR] Data-centric Machine Learning Research, 2026 Paper / code / website / leaderboard
	Bi-Mamba: Towards Accurate 1-Bit State Space Models Shengkun Tang, Liqun Ma, Haonan Li, Mingjie Sun, Zhiqiang Shen [TMLR] Transactions on Machine Learning Research, 2025 Paper / code
	Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection Cong Zeng, Shengkun Tang, Yuanzhou Chen, Zhiqiang Shen, Wenchao Yu, Xujiang Zhao, Haifeng Chen, Wei Cheng, Zhiqiang Xu [NeurIPS 2025] The Thirty-Ninth Annual Conference on Neural Information Processing Systems Paper / code
	MosaicDiff: Training-free Structural Pruning for Diffusion Model Acceleration Reflecting Pretraining Dynamics Bowei Guo, Shengkun Tang, Cong Zeng, Zhiqiang Shen [ICCV 2025] International Conference on Computer Vision, ICCV 2025 Paper
	Do Large Language Models Perceive Orderly Number Concepts as Human? Xuanjie Liu, Cong Zeng, Shengkun Tang, Ziyu Wang, Gus Xia [Re-Align Workshop, ICLR 2025] 2nd Workshop on Representational Alignment, ICLR 2025 Paper
	DALD: Improving Logits-based Detector without Logits from Black-box LLM Cong Zeng, Shengkun Tang, Xianjun Yang, Yuanzhou Chen, Yiyou Sun, Yao Li, Haifeng Chen, Wei Cheng, Dongkuan Xu [NeurIPS 2024] The Thirty-eighth Annual Conference on Neural Information Processing Systems arXiv / code
	Adadiff: Accelerating diffusion models through step-wise adaptive computation Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu [ECCV 2024] European Conference on Computer Vision arXiv / code
	You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model Shengkun Tang, Yaqing Wang, Zhenglun Kong, Tianchi Zhang, Yao Li, Caiwen Ding, Yanzhi Wang, Yi Liang, Dongkuan Xu [CVPR 2023] The IEEE/CVF Conference on Computer Vision and Pattern Recognition arXiv / code
	DDR-Net: Learning Multi-Stage Multi-View Stereo With Dynamic Depth Range Puyuan Yi, Shengkun Tang, Jian Yao Preprint, 2021 arXiv / code
	Scale-robust deep-supervision network for mapping building footprints from high-resolution remote sensing images Haonan Guo, Xin Su, Shengkun Tang, Bo Du, Liangpei Zhang IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021 PDF

Industrial Experience

	Qwen Team, Alibaba, 06/2025 - Now Research Intern Mentor: Bo Zheng and Dayiheng Liu
	SenseTime, Engineering & Intelligent Automotive Group (IAG), 06/2021 - 10/2021 & 05/2022 - 07/2023 Vision Algorithm Intern; System Developer Project: SenseRobot Chess Robotic, working with Ruodai Li Project: Large-Scale Self-Driving System Development

Contest

Baidu Astar Developer Competition, 05/2020 - 10/2020

Ranking: 7/2305 (teams)

Professional Services

Program Committee Member:

NeurIPS 2024, 2025
ICCV 2025
ICML 2025
ICLR 2025
AISTATS 2025
KDD 2023, 2024
AAAI 2023

This template comes from source code, thanks for his fantastic website templates.