Yiwei Yang

Hi, I'm Yiwei Yang.

PhD Student, University of Washington
LLM Agents • Reinforcement Learning • Trustworthy AI

I am a PhD student at the University of Washington, advised by Bill Howe. I received my B.S. in Computer Science from the University of Michigan.

My research focuses on the reliability of LLM agents and multimodal models. I study how models trained with reinforcement learning can develop shortcut behaviors—such as spurious tool-use, spurious correlations, and reward hacking—that lead to brittle performance under distribution shift. To address this, I design benchmarks, evaluation methods, and training interventions that encourage more robust reasoning and decision-making.

Currently seeking Summer 2026 Research / Machine Learning Internship opportunities.
I am especially interested in roles related to LLM agents, RL for reasoning, multimodal learning, evaluation, and trustworthy AI.

Research Highlights

Representative projects that capture my current research direction.

Reliable Tool Use in LLM Agents

LLM agents often rely on tools such as search engines or code interpreters to solve complex tasks. However, reinforcement learning can cause agents to learn shortcut associations between prompt patterns and specific tools, leading to brittle behavior under distribution shift.

I study this phenomenon through controlled counterfactual evaluations that isolate cue-driven failures in tool selection. I also analyze training dynamics, such as entropy collapse in tool-choice behavior, and develop training interventions including rationale rewards and tool optimality constraints.

  • Problem: shortcut-driven tool decisions in RL-trained agents
  • Approach: counterfactual evaluation, synthetic cues, and training-time analysis
  • Goal: agents that generalize tool use beyond seen prompt patterns

SpuriVerse: Benchmarking Robustness in Vision-Language Models

Vision-language models often achieve strong benchmark performance by exploiting spurious correlations rather than learning intended semantics.

I developed SpuriVerse, a benchmark that systematically evaluates whether multimodal models can generalize beyond these shortcuts. The benchmark introduces controlled shifts in visual cues and group structure, revealing when models rely on dataset artifacts rather than genuine reasoning.

  • Problem: strong benchmark accuracy can hide shortcut reliance
  • Approach: controlled benchmark construction with spurious correlation shifts
  • Goal: evaluate whether VLMs generalize beyond seen correlations

Selected Publications

Representative work in multimodal robustness, spurious correlation mitigation, and reliable AI systems.

  • Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?
    Yiwei Yang, C. P. Lee, S. Feng, D. Zhao, B. Wen, A. Z. Liu, Y. Tsvetkov, B. Howe
    ICML 2025 R2-FM, NeurIPS 2025
  • Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
    Yiwei Yang, A. Liu, R. Wolfe, A. Caliskan, B. Howe
    CVPR 2024
  • Towards Zero-shot Annotation of the Built Environment with Vision-Language Models
    B. Han, Yiwei Yang, A. Caspi, B. Howe
    SIGSPATIAL 2024
  • Laboratory-scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings
    R. Wolfe, S. Issac, B. Han, B. Wen, Yiwei Yang, et al.
    FAccT 2024
  • Regularizing Model Gradients with Concepts to Improve Robustness to Spurious Correlations
    Yiwei Yang, A. Liu, R. Wolfe, A. Caliskan, B. Howe
    ICML SCIS 2023

SpuriVerse paper  |  Concept Correction paper  |  Full CV

Get in Touch