Fangyi Yu

About

Fangyi Yu is an Applied Scientist on the Foundational Research team at Thomson Reuters, where she specializes in large language model (LLM) evaluation. Her work spans autonomous evaluation pipelines, LLM- and agent-as-a-judge methodologies, and the assessment of AI agents in high-stakes domains. She builds evaluation frameworks that help ensure advanced language models are reliable, fair, and aligned with real-world requirements.

Fangyi holds an MSc in Computer Science from Ontario Tech University and a BSc in Applied Mathematics from Donghua University. Her background spans machine learning, natural language processing, and AI safety, with prior contributions to privacy-preserving systems and human–AI interaction research. She has previously worked at Coursera and the Human Machine Lab at Ontario Tech, contributing to projects in AI security and trust.

Her focus is on bridging research and practical applications to advance AI evaluation science — with the goal of supporting trustworthy AI deployment in high-stakes domains.

Skills & Expertise

Research Focus

LLM evaluation & benchmarking
LLM- and agent-as-a-judge methodologies
Post-training data design (SFT, DPO)
Natural language processing
AI safety & alignment

Languages & Frameworks

Python, SQL, Bash
PyTorch, TensorFlow
Hugging Face Transformers, TRL
spaCy, NLTK
Django, Flask

Cloud & MLOps

Amazon Bedrock, SageMaker
OpenAI & Anthropic APIs
Docker, Git, Linux
Weights & Biases
Tableau

AI Developer Tools

Claude Code
OpenAI Codex
Cline
GitHub Copilot
Cursor

Work Experience

Nov 2023 - Present

Thomson Reuters Foundational Research

Applied Scientist

In-house LLM development: contribute to model selection, post-training, and release gates for domain-specific models supporting legal research workflows.
Post-training data creation: design instruction-tuning and preference datasets using rubric-driven synthetic generation with human-in-the-loop QA.
Auto-evaluation pipelines: build evaluation harnesses that run on every model drop, with task suites, reproducible seeds for stable comparisons.
LLM-as-a-judge: implement multi-criteria rubric graders and multi-agent debate evaluation to reduce single-judge bias and improve reliability.
Agent evaluation: assess tool-using agents with metrics for task success, tool-call accuracy, latency, and failure recovery in sandboxed environments.
Cross-functional collaboration: partner with research, product, and legal subject-matter experts to translate evaluation results into model release criteria and product-ready guidance.

Jun 2023 - Sep 2023

Coursera

Machine Learning Engineer Intern

Built an enterprise propensity-to-purchase model combining binary classification for propensity scores with regression for ACV prediction, improving lead prioritization and sales targeting.
Performed extensive feature engineering and exploratory data analysis across firmographic, demographic, and engagement data sources.
Partnered with cross-functional stakeholders to scope requirements, communicate results, and iterate on modeling choices to maximize business impact.

Sep 2021 - Dec 2023

Human Machine Lab @ Ontario Tech University

Graduate Research & Teaching Assistant

The Human Machine Lab at Ontario Tech University is an interdisciplinary research group focused on designing computer systems around human needs and capabilities, with projects spanning human–computer interaction, usable security, privacy, and artificial intelligence.

Advised by Dr. Miguel Vargas Martin on applied machine learning research:

Published multiple peer-reviewed papers on applying machine learning techniques to authentication systems (see Google Scholar).
Designed GAN-based password guessing models that outperformed prior benchmarks by 83%, and developed a GPT-3-based honeyword generation technique to accelerate password-breach detection.
Conducted systematic literature reviews on machine learning applications in computer security.
Proposed novel approaches to strengthen the usability and security of password authentication systems using natural language processing.

May 2022 - Dec 2022

Thomson Reuters Labs

Applied Research Scientist Intern

Thomson Reuters Labs is the applied-research arm of Thomson Reuters, working with some of the world's most comprehensive legal, tax, and corporate datasets to advance AI for professional services.

Over an 8-month internship, I contributed to a legal text entailment research project and a named-entity recognition product initiative:

Co-authored two peer-reviewed papers exploring prompt engineering techniques for large language models on legal reasoning tasks.
Collaborated with research scientists to identify opportunities for applying state-of-the-art NLP methods to legal products.
Evaluated zero-shot, few-shot, chain-of-thought prompting, and fine-tuning strategies across GPT-3 and T5 using the Hugging Face and OpenAI APIs for domain-specific reasoning.
Benchmarked baseline and state-of-the-art models — including spaCy, Conditional Random Fields, and LegalBERT — on a highly imbalanced named-entity recognition dataset.
Maintained rigorous documentation of literature reviews, data processing, and experimental results following lab standards.

Oct 2019 - Jul 2020

AI Hub @ Durham College

Research Assistant

The AI Hub at Durham College partners with industry to deliver AI solutions that uncover business insights and drive productivity and growth.

Built interactive dashboards on historical gold-market data using Tableau.
Implemented machine learning models for time-series forecasting, with end-to-end documentation covering datasets, algorithms, APIs, data-flow diagrams, and optimization options.
Presented data-driven insights and recommendations to business stakeholders.

Education

Sep 2021 - Jan 2023

Master's Degree

Master of Science in Computer Science

Ontario Tech University (UOIT)

GPA: 4.24 on a scale of 4.30

Location: Ontario, Canada

Sep 2019 - Jun 2020

Post-graduate Certificate

Graduate Certificate in Artificial Intelligence

Durham College

GPA: 4.83 on a scale of 5.00

Location: Ontario, Canada

Bachelor's Degree

Bachelor of Science in Applied Mathematics

Donghua University

Location: Shanghai, China

Selected Publications

When AIs Judge AIs: The Rise of Agent-as-a-Judge Evaluation for LLMs

Fangyi Yu. arXiv preprint, 2025.

paper

Exploring the Effectiveness of Prompt Engineering for Legal Reasoning Tasks

Fangyi Yu, Lee Quartey, Frank Schilder. Findings of the Association for Computational Linguistics (ACL), 2023.

paper

Honey, I Chunked the Passwords: Generating Semantic Honeywords Resistant to Targeted Attacks Using Pre-Trained Language Models

Fangyi Yu, Miguel Vargas Martin. Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), 2023.

paper

Legal Prompting: Teaching a Language Model to Think Like a Lawyer

Fangyi Yu, Lee Quartey, Frank Schilder. Natural Legal Language Processing Workshop (NLLP), 2022.

paper

HoneyGAN: Creating Indistinguishable Honeywords with Improved Generative Adversarial Networks

Fangyi Yu, Miguel Vargas Martin. European Symposium on Research in Computer Security — STM Workshop (ESORICS), 2022.

paper

GNPassGAN: Improved Generative Adversarial Networks For Trawling Offline Password Guessing

Fangyi Yu, Miguel Vargas Martin. IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), 2021.

paper

On Deep Learning in Password Guessing: A Survey

Fangyi Yu. arXiv preprint, 2022.

paper

Selected Articles (Full list)

Research Methods Involving Human Studies

Towards Data Science, April 2022.

A walkthrough of the lifecycle of a human-subjects research experiment — from formulating hypotheses and designing the study, to running pilots, recruiting participants, collecting and analyzing data, and reporting results.

read more

How to Build a Fake News Detection Web App Using Flask

Towards Data Science, August 2021.

A hands-on tutorial on framing fake-news detection as a binary classification problem, building an NLP model from scratch, and deploying it as a Flask web application.

read more

A Thorough Guide to Time Series Analysis

Towards Data Science, July 2021.

A comprehensive guide to time-series analysis — covering core concepts and components, common statistical and machine-learning forecasting methods, and an end-to-end worked example predicting climate data.

read more