I am a Ph.D. candidate at X-Lance Lab, Shanghai Jiao Tong University, under supervision of Prof. Kai Yu, majoring in computer science and technology. Before that, I received my B.S. degree from the Department of Automation, Tsinghua University in 2020. In the first four months of 2024, I worked as a research assisstant with Prof. Tao Yu at XLANG Lab, the University of Hong Kong.
My research interest focuses on text-rich visual UI interaction. Currently, Iām working on constrution of realistic, complex interaction benchmark for GUI interaction. Iām also studying how to design smarter GUI agents with reinforcement learning (RL) and large language models (LLM), or by combining both.
š Publications
![Rememberer [NeurIPS 2023]](images/rememberer.png)
Large Language Models Are Semi-Parametric Reinforcement Learning Agents
Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu
NeurIPS 2023 | Code
- We designed Rememberer, a novel evolvable LLM-based agent framework, by equipping the LLM with a long-term experience memory, so as to enable the LLM to exploit the interaction experiences to improve performance.
- We proposed Reinforcement Learning with Experience Memory (RLEM) to update the memory, so that the agent can learn from both success and failure, and evolve its capability without fine-tuning LLM parameters.
- Rememberer demonstrates superior performance and robustness on both WebShop and WikiHow task set.

Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction
Danyang Zhang, Lu Chen, Zihan Zhao, Ruisheng Cao, Kai Yu
Project | Task Set
We designed an easily-extensible, adaptable, and close-to-reality interaction platform for building qualifed GUI agent benchmarks based on Android Mobile. Mobile-Env supports reliable evaluation, controllable and reproducible environments, intermediate rewards, and intermediate instructions.

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
NeurIPS 2024 D&B Track | Project | Code
We designed a unified benchmark for real-world desktop interaction, containing 369 complex desktop tasks, convering more than 9 common destop applications and multi-app workflow use scenarios.

ProgRM: Build Better GUI Agents with Progress Rewards
Danyang Zhang, Situo Zhang, Ziyue Yang, Zichen Zhu, Zihan Zhao, Ruisheng Cao, Lu Chen, Kai Yu
- We designed ProgRM, Progress Reward Model, for online RL training of GUI agents. ProgRM can predict accurate progress score for each step in an episode and assign adequate credits for steps even in failed trajectories.
- We designed a LCS-based progress labeling algorithm to automatically and efficiently discover key steps from collected trajectories and annotate progress labels accordingly.
- Extensive experiments and analyses demonstrate the effectiveness of ProgRM.
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Wenjing Hu, Yuchen Mao, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida I. Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu.
NeurIPS 2024 D&B Track | Project - WebSRC: A Dataset for Web-Based Structural Reading Comprehension
Xingyu Chen, Zihan Zhao, Lu Chen, JiaBao Ji, Danyang Zhang, Ao Luo, Yuxuan Xiong, Kai Yu
EMNLP 2021 | Project - Rotation-robust Intersection over Union for 3D Object Detection
Yu Zheng, Danyang Zhang, Sinan Xie, Jiwen Lu, Jie Zhou
ECCV 2020 - COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
Yansong Tang, Dajun Ding, Yongming Rao, Yu Zheng, Danyang Zhang, Lili Zhao, Jiwen Lu, Jie Zhou
CVPR 2019 | Project - MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation. Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, Situo Zhang, Liangtai Sun, Yixiao Wang, Yuheng Sun, Lu Chen, Kai Yu. NAACL 2025 Demo.
- ChemDFM-X: Towards Large Multimodal Model for Chemistry. Zihan Zhao, Bo Chen, Jingpiao Li, Lu Chen, Liyang Wen, Pengyu Wang, Zichen Zhu, Danyang Zhang, Ziping Wan, Yansi Li, Zhongyang Dai, Xin Chen, Kai Yu. SCIS 2024.
- Technical Report of MoGUI and MoCon. Zichen Zhu, Liangtai Sun, Danyang Zhang, Ziyuan Li, Guangpeng Li, Lu Chen, Kai Yu.
- CAM-GUI: A Conversational Assistant on Mobile GUI. Zichen Zhu, Liangtai Sun, Jingkai Yang, Yifan Peng, Weilin Zou, Ziyuan Li, Wutao Li, Lu Chen, Yingzi Ma, Danyang Zhang, Shuai Fan, Kai Yu. NCMMSC 2023.
- Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition. Yansong Tang, Xingyu Liu, Xumin Yu, Danyang Zhang, Jiwen Lu, Jie Zhou. TOMM 2022.
- Uncertainty-Aware Score Distribution Learning for Action Quality Assessment. Yansong Tang, Zanlin Ni, Jiahuan Zhou, Danyang Zhang, Jiwen Lu, Ying Wu, Jie Zhou. CVPR 2020.
š Educations
- 2020.9-(2025.6), Ph.D., School of Computer Science, Shanghai Jiao Tong University
- 2016.8-2020.6, B.S., Department of Automation, Tsinghua University
š Honors and Awards
- 2020.9-2025.6, The 2nd Wu Wenjun AI Honorary Doctoral Progam
- 2020.9-2025.6, Zhang Xu Scholarship
- 2018.10, Academic Excellence Scholarship 2017~2018
- 2017.10, Academic Excellence Scholarship 2016~2017