PhD student at Vector Institute & University of British Columbia · Research on AI Agents, LLM, RL
ClawBench — Can AI Agents Complete Everyday Online Tasks?
153 tasks · 144 live websites · 8 categories · Best model: 33.3%
VidGround — Watch Before You Answer
Visually grounded post-training for video LLMs.
- 2026.04 — New paper: ClawBench: Can AI Agents Complete Everyday Online Tasks? — 153 real-world tasks, 144 live websites, 7 frontier models. Best model: 33.3%.
- 2026.04 — New paper: VidGround: Watch Before You Answer — Visually grounded post-training for video LLMs.
yuxuan.zhang(at)ubc.ca
Google Scholar
GitHub
Twitter
Website



