I am a Ph.D. candidate at X-Lance Lab, Shanghai Jiao Tong University, under supervision of Prof. Kai Yu, majoring in computer science and technology. Before that, I received my B.S. degree from the Department of Automation, Tsinghua University in 2020. In the first four months of 2024, I worked as a research assisstant with Prof. Tao Yu at XLANG Lab, the University of Hong Kong.

My research interest focuses on text-rich visual UI interaction. Currently, I’m working on constrution of realistic, complex interaction benchmark for GUI interaction. I’m also studying how to design smarter GUI agents with reinforcement learning (RL) and large language models (LLM), or by combining both.

šŸ“ Publications

NeurIPS 2023
Rememberer [NeurIPS 2023]

Large Language Models Are Semi-Parametric Reinforcement Learning Agents

Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu
NeurIPS 2023 | Code

  • We designed Rememberer, a novel evolvable LLM-based agent framework, by equipping the LLM with a long-term experience memory, so as to enable the LLM to exploit the interaction experiences to improve performance.
  • We proposed Reinforcement Learning with Experience Memory (RLEM) to update the memory, so that the agent can learn from both success and failure, and evolve its capability without fine-tuning LLM parameters.
  • Rememberer demonstrates superior performance and robustness on both WebShop and WikiHow task set.
Mobile-Env

Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction

Danyang Zhang, Lu Chen, Zihan Zhao, Ruisheng Cao, Kai Yu
Project | Task Set

We designed an easily-extensible, adaptable, and close-to-reality interaction platform for building qualifed GUI agent benchmarks based on Android Mobile. Mobile-Env supports reliable evaluation, controllable and reproducible environments, intermediate rewards, and intermediate instructions.

NeurIPS 2024
OSWorld

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
NeurIPS 2024 D&B Track | Project | Code

We designed a unified benchmark for real-world desktop interaction, containing 369 complex desktop tasks, convering more than 9 common destop applications and multi-app workflow use scenarios.

Progress rewards

ProgRM: Build Better GUI Agents with Progress Rewards

Danyang Zhang, Situo Zhang, Ziyue Yang, Zichen Zhu, Zihan Zhao, Ruisheng Cao, Lu Chen, Kai Yu

  • We designed ProgRM, Progress Reward Model, for online RL training of GUI agents. ProgRM can predict accurate progress score for each step in an episode and assign adequate credits for steps even in failed trajectories.
  • We designed a LCS-based progress labeling algorithm to automatically and efficiently discover key steps from collected trajectories and annotate progress labels accordingly.
  • Extensive experiments and analyses demonstrate the effectiveness of ProgRM.

šŸ“– Educations

  • 2020.9-(2025.6), Ph.D., School of Computer Science, Shanghai Jiao Tong University
  • 2016.8-2020.6, B.S., Department of Automation, Tsinghua University

šŸŽ– Honors and Awards

  • 2020.9-2025.6, The 2nd Wu Wenjun AI Honorary Doctoral Progam
  • 2020.9-2025.6, Zhang Xu Scholarship
  • 2018.10, Academic Excellence Scholarship 2017~2018
  • 2017.10, Academic Excellence Scholarship 2016~2017