Publications

* indicates equal contribution.

Publications

Turning Internal Gap into Self-Improvement: Promoting the Generation-Understanding Unification in MLLMs
International Conference on Learning Representations (ICLR), 2026
Yujin Han, Hao Chen, Andi Han, Zhiheng Wang, Xinyu Lin, Yingya Zhang, Shiwei Zhang, Difan Zou
[Arxiv][Project]
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Yunzhe Hu, Difan Zou, Dong Xu
International Conference on Learning Representations (ICLR), 2026
[Arxiv]
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
Xingwu Chen, Tianle Li, Difan Zou
International Conference on Learning Representations (ICLR), 2026
[Arxiv]
A Convergence Analysis of Adaptive Optimizers under Floating-point Quantization
Xuan Tang, Jichu Li, Difan Zou
International Conference on Learning Representations (ICLR), 2026
[Arxiv]
Learning under Quantization for High-Dimensional Linear Regression
Dechen Zhang, Junwei Su, Difan Zou
International Conference on Learning Representations (ICLR), 2026
[Arxiv]
Does higher interpretability imply better utility? A Pairwise Analysis on Sparse Autoencoders
Xu Wang, Yan Hu, Benyou Wang, Difan Zou
International Conference on Learning Representations (ICLR), 2026
[Arxiv]
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
Yujie Wei, Shiwei Zhang, Hangjie Yuan, Yujin Han, Zhekai Chen, Jiayu Wang, Difan Zou, Xihui Liu, Yingya Zhang, Yu Liu, Hongming Shan
International Conference on Learning Representations (ICLR), 2026
[Arxiv]
Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
Junwei Su, Chuan Wu, Le Zheng, Difan Zou
SIAM Journal on Mathematics of Data Science (Accepted) [Arxiv]
Learning Diffusion Policy from Primitive Skills for Robot Manipulation
Zhihao Gu, Ming Yang, Difan Zou, Dong Xu
The 40th Annual AAAI Conference on Artificial Intelligence (AAAI), 2026
[Arxiv]
SIDE: Surrogate Conditional Data Extraction from Diffusion Models
Yunhao Chen, Shujie Wang, Difan Zou, Xingjun Ma
The 40th Annual AAAI Conference on Artificial Intelligence (AAAI), 2026
[Arxiv]
Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning
Dechen Zhang, Zhenmei Shi, Yingyu Liang, Difan Zou
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
[Arxiv]
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
Wei Huang, Andi Han, Yujin Song, Yilan Chen, Denny Wu, Difan Zou, Taiji Suzuki
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
[Arxiv]
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Xuan Tang, Han Zhang, Yuan Cao, Difan Zou
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
[Arxiv]
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li*, Chenyang Zhang*, Xingwu Chen, Yuan Cao, Difan Zou
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
[Arxiv]
Hierarchical Koopman Diffusion: Fast Generation with Interpretable Diffusion Trajectory
Hanru Bai, Weiyang Ding, Difan Zou
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
[Arxiv]
F-Adapter: Frequency-Adaptive Parameter-Efficient Fine-Tuning in Scientific Machine Learning
Hangwei Zhang, Chun Kang, Yan Wang, Difan Zou
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
[Arxiv]
Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation
Yao Teng, Fu-Yun Wang, Xian Liu, Zhekai Chen, Han Shi, Yu Wang, Zhenguo Li, Weiyang Liu, Difan Zou, Xihui Liu
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2025
[Arxiv]
Model Unlearning via Sparse Autoencoder Subspace Guided Projections
Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
[Arxiv]
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen
Annual Meeting of the Association for Computational Linguistics (ACL Findings) , 2025
[Paper][Arxiv][Code]
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xu Wang, Yan Hu, Wenyu Du, Reynold Cheng, Benyou Wang, Difan Zou
International Conference on Machine Learning (ICML), 2025
[Paper][Arxiv][Code]
Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?
Yujin Han*, Andi Han*, Wei Huang, Chaochao Lu, Difan Zou
International Conference on Machine Learning (ICML), 2025
[Paper][Arxiv]
Masked Autoencoders Are Effective Tokenizers for Diffusion Models
Hao Chen*, Yujin Han*, Fangyi Chen, Xiang Li, Yidong Wang, Jindong Wang, Ze Wang, Zicheng Liu, Difan Zou, Bhiksha Raj
International Conference on Machine Learning (ICML), 2025 (Spotlight)
[Paper][Arxiv][Code]
STGAN: Spatial-temporal Graph Autoregression Network for Pavement Distress Deterioration Prediction
Shilin Tong, Difei Wu, Xiaona Liu, Le Zheng, Yuchuan Du, Difan Zou
IEEE Transactions on Intelligent Transportation Systems (TITS), 2025
[Paper][Arxiv]
Parallelized Autoregressive Visual Generation
Yuqing Wang, Shuhuai Ren, Zhijie Lin, Yujin Han, Haoyuan Guo, Zhenheng Yang, Difan Zou, Jiashi Feng, Xihui Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2025
[Paper][Arxiv][Project]
Optimization-Biased Hypernetworks for Generalizable Policy Generation
Hanxiang Ren, Li Sun, Xulong Wang, Pei Zhou, Zewen Wu, Siyan Dong, Difan Zou, Youyi Zheng, Yanchao Yang
International Conference on Learning Representations (ICLR), 2025
[Paper][Arxiv][Code]
On the Feature Learning in Diffusion Models
Andi Han, Wei Huang, Yuan Cao, Difan Zou
International Conference on Learning Representations (ICLR), 2025
[Paper][Arxiv]
Beyond Surface Structure: A Causal Assessment of LLMs’ Comprehension Ability
Yujin Han, Lei Xu, Sirui Chen, Difan Zou, Chaochao Lu
International Conference on Learning Representations (ICLR), 2025
[Paper][Arxiv][Code]
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham Kakade
International Conference on Learning Representations (ICLR), 2025
[Paper][Arxiv][Code]
Per-Example Gradient Regularization Improves Learning Signals from Noisy Data
Xuran Meng, Yuan Cao, Difan Zou
Springer Machine Learning Journal, 2025
[Paper][Arxiv]
Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference
Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang
ICML Workshop on Structured Probabilistic Inference & Generative Modeling, 2024 (Oral Presentation & Best Paper Award)
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2024 (Spotlight)
[Paper][Arxiv]
Slight Corruption in Pre-training Data Makes Better Diffusion Models
Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha Raj
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2024 (Spotlight)
[Paper][Arxiv]
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen*, Lei Zhao*, Difan Zou
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2024
[Paper][Arxiv]
The Implicit Bias of Adam on Separable Data
Chenyang Zhang, Difan Zou, Yuan Cao
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2024
[Paper][Arxiv]
An In-depth Investigation of Sparse Rate Reduction in Transformer-like Models
Yunzhe Hu, Difan Zou, Dong Xu
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2024
[Paper][Arxiv]
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo
Xunpeng Huang, Difan Zou, Hanze Dong, Yian Ma, and Tong Zhang
Annual Conference on Learning Theory (COLT), 2024
[Paper][Arxiv]
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen, Difan Zou
ICLR Workshop on Bridging the Gap Between Practice and Theory in Deep Learning (BPGT), 2024 (Oral Presentation)
International Conference on Machine Learning (ICML), 2024
[Paper][Arxiv]
Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data
Xuran Meng, Difan Zou, Yuan Cao
International Conference on Machine Learning (ICML), 2024
[Paper][Arxiv]
Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
Yujin Han, Difan Zou
International Conference on Machine Learning (ICML), 2024
[Paper][Arxiv]
Faster Sampling via Stochastic Gradient Proximal Sampler
Xunpeng Huang, Difan Zou, Yian Ma, Hanze Dong, Tong Zhang
International Conference on Machine Learning (ICML), 2024
[Paper][Arxiv]
On the Limitation and Experience Replay for GNNs in Continual Learning
Junwei Su, Difan Zou, Chuan Wu
Conference on Lifelong Learning Agents (CoLLAs), 2024
Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates
Miao Lu, Beining Wu, Xiaodong Yang, Difan Zou
NeurIPS Workshop on Mathematics of Modern Machine Learning (M3L), 2023
International Conference on Learning Representations (ICLR), 2024
[Paper][Arxiv]
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett
International Conference on Learning Representations (ICLR) (Spotlight), 2024
[Paper][Arxiv]
PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks
Junwei Su, Difan Zou, Chuan Wu
International Conference on Learning Representations (ICLR), 2024
[Paper][Arxiv][Code]
The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks
Yuan Cao, Difan Zou, Yuanzhi Li, Quanquan Gu
Annual Conference on Learning Theory (COLT), 2023
[Arxiv]
The Benefits of Mixup for Feature Learning
Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu
International Conference on Machine Learning (ICML), 2023
[Paper][Arxiv]
Learning High-Dimensional Single-Neuron ReLU Networks with Finite Samples
Jingfeng Wu*, Difan Zou*, Zixiang Chen*, Vladimir Braverman, Quanquan Gu, and Sham M. Kakade
International Conference on Machine Learning (ICML), 2023
[Paper][Arxiv]
Towards Robust Graph Incremental Learning on Evolving Graphs
Junwei Su, Difan Zou, Zijun Zhang, Chuan Wu
International Conference on Machine Learning (ICML), 2023
[Arxiv][Paper][Code]
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou, Yuan Cao, Yuanzhi Li, and Quanquan Gu
International Conference on Learning Representations (ICLR), 2023
[Paper][ArXiv]
Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime
Difan Zou*, Jingfeng Wu*, Vladimir Braverman, Quanquan Gu, and Sham M. Kakade
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2022
[Paper][ArXiv]
The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift
Jingfeng Wu*, Difan Zou*, Vladimir Braverman, Quanquan Gu, and Sham M. Kakade
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2022
[Paper][ArXiv]
Last Iterate Risk Bounds of SGD with Decaying Stepsize for Overparameterized Linear Regression
Jingfeng Wu*, Difan Zou*, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
International Conference on Machine Learning (ICML), 2022 (Long Presentation)
[Paper] [ArXiv]
Self-training Converts Weak Learners to Strong Learners in Mixture Models
Spencer Frei*, Difan Zou*, Zixiang Chen*, Quanquan Gu
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
[Paper] [ArXiv]
The Benefit of Implicit Regularization from SGD in Least Square Problems
Difan Zou*, Jingfeng Wu*, Vladimir Braverman, Quanquan Gu, Dean P. Foster, Sham M. Kakade
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2021
[Paper] [ArXiv]
Benign Overfitting of Constant-Stepsize SGD for Linear Regression
Difan Zou*, Jingfeng Wu*, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
Annual Conference on Learning Theory (COLT), 2021
[Paper] [ArXiv]
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling
Difan Zou, Pan Xu, Quanquan Gu
International Conference on Uncertainty in Artificial Intelligence (UAI), 2021
[Paper] [ArXiv]
On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients
Difan Zou, Quanquan Gu
International Conference on Machine Learning (ICML), 2021
[Paper]
Provable Robustness of Adversarial Training for Learning Halfspaces with Noise
Difan Zou*, Spencer Frei*, Quanquan Gu
International Conference on Machine Learning (ICML), 2021
[Paper] [ArXiv]
How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?
Zixiang Chen*, Yuan Cao*, Difan Zou* and Quanquan Gu
International Conference on Learning Representations (ICLR), 2021
[Paper] [ArXiv]
Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate
Jingfeng Wu, Difan Zou, Vladimir Braverman and Quanquan Gu
International Conference on Learning Representations (ICLR), 2021
[Paper] [ArXiv]
Laplacian Smoothing Stochastic Gradient Markov Chain Monte Carlo
Bao Wang*, Difan Zou*, Quanquan Gu, Stanley Osher
SIAM Journal on Scientific Computing (SISC), 2021
[Paper] [ArXiv] [Code]
On the Global Convergence of Training Deep Linear ResNets
Difan Zou, Philip M. Long, Quanquan Gu
International Conference on Learning Representations (ICLR), 2020
[Paper]
Improving Adversarial Robustness Requires Revisiting Misclassified Examples
Yisen Wang*, Difan Zou*, Jinfeng Yi, James Bailey, Xingjun Ma and Quanquan Gu
International Conference on Learning Representations (ICLR), 2020
[Paper] [Code]
Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou*, Yuan Cao*, Dongruo Zhou, Quanquan Gu
Springer Machine Learning Journal, 2020
[Paper] [ArXiv]
An Improved Analysis of Training Over-parameterized Deep Neural Networks
Difan Zou, Quanquan Gu
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2019
[Paper] [ArXiv]
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
Difan Zou*, Ziniu Hu*, Yewen Wang, Song Jiang, Yizhou Sun, Quanquan Gu
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2019
[Paper] [ArXiv] [Code]
Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction
Difan Zou, Pan Xu, Quanquan Gu
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2019
[Paper]
Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics
Difan Zou, Pan Xu, Quanquan Gu
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019
[Paper]
Global convergence of Langevin dynamics based algorithms for nonconvex optimization
Pan Xu*, Jinghui Chen*, Difan Zou, Quanquan Gu
Conference on Advances in Neural Information Processing Systems (NeurIPS), 2018, (Spotlight)
[Paper] [ArXiv]
Subsampled stochastic variance-reduced gradient Langevin dynamics
Difan Zou*, Pan Xu*, Quanquan Gu
International Conference on Uncertainty in Artificial Intelligence (UAI), 2018
[Paper]
Stochastic Variance-Reduced Hamilton Monte Carlo Methods
Difan Zou*, Pan Xu*, Quanquan Gu
International Conference on Machine Learning (ICML), 2018
[Paper] [ArXiv]

Preprint

On the Complexity Theory of Masked Discrete Diffusion: From \mathrm{poly}(1/ε) to Nearly ε-Free
Xunpeng Huang, Yingyu Lin, Nishant Jain, Kaibo Wang, Difan Zou, Yian Ma, Tong Zhang
[Arxiv]
On the Collapse Errors Induced by the Deterministic Sampler for Diffusion Models
Yi Zhang, Zhenyu Liao, Jingfeng Wu, Difan Zou
[Arxiv]
Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression
Xingwu Chen, Miao Lu, Beining Wu, Difan Zou
[Arxiv]
A Random Matrix Analysis of In-context Memorization for Nonlinear Attention
Zhenyu Liao, Jiaqing Liu, TianQi Hou, Difan Zou, Zenan Ling
[Arxiv]
Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation
Yi Zhang, Difan Zou
[Arxiv]
Almost Linear Convergence under Minimal Score Assumptions: Quantized Transition Diffusion
Xunpeng Huang, Yingyu Lin, Nikki Lijing Kuang, Hanze Dong, Difan Zou, Yian Ma, Tong Zhang
[Arxiv]
Capturing Conditional Dependence via Auto-regressive Diffusion Models
Xunpeng Huang, Yujin Han, Difan Zou, Yian Ma, Tong Zhang
[Arxiv]
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Chenyang Zhang, Peifeng Gao, Difan Zou, Yuan Cao
[Arxiv]
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li*, Chenyang Zhang*, Xingwu Chen, Yuan Cao, Difan Zou
[Arxiv]
Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers
Shuning Shang, Xuran Meng, Yuan Cao, Difan Zou
[Arxiv]
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu
[Arxiv][Website][Code]
A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models
Chengxing Xie, Difan Zou
ICML Workshop on LLMs and Cognition, 2024
[Arxiv]
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Yifan Hao, Yong Lin, Difan Zou, and Tong Zhang
[Arxiv]
An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling
Xunpeng Huang, Hanze Dong, Difan Zou, and Tong Zhang
[Arxiv]
Less is More: On the Feature Redundancy of Pertrained Models When Transferring to Few-Shot Tasks
Xu Luo, Difan Zou, Lianli Gao, Zenglin Xu, Jingkuan Song
[Arxiv]
Epidemic Model Guided Machine Learning for COVID-19 Forecasts in the United States
Difan Zou, Lingxiao Wang, Pan Xu, Jinghui Chen, Weitong Zhang, and Quanquan Gu
[MedRxiv]
Saving Gradient and Negative Curvature Computations: Finding Local Minima More Efficiently
Yaodong Yu*, Difan Zou*, Quanquan Gu
[ArXiv]

Publications in Wireless Communication & Signal Processing

An Efficient Iterative Least Square Method for Indoor Visible Light Positioning under Shot Noise
Xiaona Liu, Difan Zou, Nuo Huang, Yang Wang
IEEE Photonics Journal, 2023
[Paper]
Two-Dimensional Intensity Distribution and Adaptive Power Allocation for Ultraviolet Ad-Hoc Network
Hong Qi, Difan Zou, Zhengyuan Xu, Chen Gong
IEEE Transactions on Green Communications and Networking, 2022
[Paper]
Signal characterization and achievable transmission rate of VLC under receiver nonlinearity
Xiaona Liu, Chen Gong, Difan Zou, Zunaira Babar, Zhengyuan Xu, Lajos Hanzo
IEEE Access, 2019
[Paper]
Characterization on practical photon counting receiver in optical scattering communication
Difan Zou, Chen Gong, Zhengyuan Xu
IEEE Transactions on Communications, 2018 (Presented at GlobeCom 2018, Received Best Paper Award)
[Paper]
A 1Mbps Real-Time NLOS UV Scattering Communication System With Receiver Diversity Over 1km
Guanchu Wang, Kun Wang, Chen Gong, Difan Zou, Zhimeng Jiang, Zhengyuan Xu
IEEE Photonics Journal, 2018
[Paper]
Signal Detection Under Short-Interval Sampling of Continuous Waveforms for Optical Wireless Scattering Communication
Difan Zou, Chen Gong, Zhengyuan Xu
IEEE Transactions on Wireless Communication, 2018 (Presented at GlobeSip 2016)
[Paper]
Secrecy rate of MISO optical wireless scattering communications
Difan Zou, Chen Gong, Zhengyuan Xu
IEEE Transactions on Communication, 2017
[Paper]
Turbulence channel modeling and non-parametric estimation for optical wireless scattering communication
Kun Wang, Chen Gong, Difan Zou, Zhengyuan Xu
IEEE/OSA Journal of Lightwave Technology, 2017 (Presented at ICCS 2016, Received Best Paper Award)
[Paper]
Demonstration of a 400 kbps real-time non-line-of-sight laser-based ultraviolet communication system over 500 m
Kun Wang, Chen Gong, Difan Zou, Xianqing Jin, Zhengyuan Xu
OSA Chinese Optical Letters, 2017
[Paper]
Information security risks outside the laser beam in terrestrial free-space optical communication
Difan Zou, Zhengyuan Xu
IEEE Photonics Journal, 2016
[Paper]
Modeling of optical wireless scattering communication channels over broad spectra
Weihao Liu, Difan Zou, Zhengyuan Xu
OSA Journal of the Optical Society of America A, 2015
[Paper]