Yuan-Hong Liao

University of Toronto, Vector Institute

prof_pic.jpg

Toronto, Canada

:star: I am on the job market for an industry role starting mid 2025. If my research aligns with your needs, please feel free to reach out via email.

I am a final-year Ph.D. student at the University of Toronto and Vector Institute. I am fortunate to be supervised by Prof. Sanja Fidler. Previously, I was an CV/ML scientist intern at NVIDIA Toronto AI lab in 2022 - 2023 and Amazon Astros team in 2024.

My research surrounds two essential aspects: visual data labeling and vision-language models:

  • 🖼️ Improving visual labels: Visual labeling: I develop methods to reduce crowdsourced labeling costs [CVPR’21] and fix semantic inconsistencies in real-world datasets [ICLR’24].
  • đź§  Vision-language models: I enhance vision-language models in spatial reasoning [EMNLP’24], enable self-correction during inference [CVPR’25], and promote system-2 thinking in vision-centric tasks [arxiv’25]

Check my resume here (last updated in April 2025)

Previous experiences Prior to my Ph.D., I was a visiting student at Vector Institute and USC in 2018 and 2017, respectively. I was fortunate to start by AI research at National Tsing Hua University, supervised by Prof. Min Sun.

news

Apr 23, 2025 :star: Our preprint LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception is on arXiv!
Feb 26, 2025 :star: Our paper Can Large Vision-Language Models Correct Grounding Errors By Themselves? is accepted to CVPR 2025
Sep 20, 2024 :star: Our paper Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models is accepted to EMNLP 2024
Jul 22, 2024 :star: Start my internship at Amazon Astro team at Seattle!
Apr 09, 2024 :star: New preprint out on arXiv Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?!

selected publications

  1. LongPerceptualThoughts.pdf
    LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception
    Yuan-Hong Liao, Sven Elflein , Liu He , Laura Leal-Taixé , Yejin Choi , Sanja Fidler , and David Acuna
    2025
  2. vlm_feedback.png
    Can Large Vision-Language Models Correct Grounding Errors By Themselves?
    Yuan-Hong Liao, Rafid Mahmood , Sanja Fidler , and David Acuna
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Jun 2025
  3. spatial_prompt.pdf
    Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models
    Yuan-Hong Liao, Rafid Mahmood , Sanja Fidler , and David Acuna
    In The 2024 Conference on Empirical Methods in Natural Language Processing , Jun 2024
  4. label_transfer.png
    Translating Labels to Solve Annotation Mismatches Across Object Detection Datasets
    Yuan-Hong Liao, David Acuna , Rafid Mahmood , James Lucas , Viraj Uday Prabhu , and Sanja Fidler
    In The Twelfth International Conference on Learning Representations , Jun 2024
  5. good_practices.png
    Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
    Yuan-Hong Liao, Amlan Kar , and Sanja Fidler
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , Jun 2021