Nandan Thakur

📢 I am on the job market! Seeking academic/industry opportunities starting Spring 2026, preferably in India. Contact me at nandan.thakur@uwaterloo.ca.

Hello! I’m Nandan Thakur (नंदन ठाकुर / নন্দন ঠাকুর). I’m a ~~(fourth)~~ final-year PhD student in Computer Science at the University of Waterloo advised by Prof. Jimmy Lin. My PhD is supported by the David R. Cheriton Graduate Scholarship (2024-2026). During my PhD, I have interned at Google, Vectara and Databricks. I’ve also collaborated with industry partners including Snowflake, Microsoft and Huawei.

📄 View my CV (Fall 2025)

Research

I focus on three core aspects focused around evaluation, data in information retrieval:

Constructing challenging and realistic benchmarks with high-quality, human-curated evaluation samples
Designing efficient retrieval systems that balance quality and cost, generalize well to challenging domains
Standardizing RAG evaluation to build a more principled foundation within the IR & NLP community

In my research, I’ve developed widely used retrieval benchmarks such as BEIR or MIRACL, and trained efficient retrieval models like GPL or SWIM-IR. This advances accelerate RAG systems–such as TREC-RAG–to produce better language model answers by (i) leveraging cleaner training data (e.g., RLHN) (ii) reducing hallucinations across domains and languages (e.g., NoMIRACL, MIRAGE-Bench), and (iii) enabling evaluation on realistic benchmarks and metrics (e.g., FreshStack).

Past

Prior to my PhD, I was a NLP research assistant at the UKP Lab in TU Darmstadt advised by Prof. Iryna Gurevych and Nils Reimers. I have prior industry experience as a Data Scientist working at KNOLSKAPE. I completed my undergraduate from BITS Pilani KK Birla Goa Campus.

Recent Updates

[Jan 2025] We hosted the second iteration of the RAG Track at TREC RAG 2025, we appreciate all the teams that participated in the track!
[Dec 2025] FreshStack received the honourable mention for the “Best 2025 Search Project” by BCS, The Chartered Institute for IT Search Solutions.
[Oct 2025] Invited talk at IISC Bangalore on Beyond Models: Rethinking Benchmarks, Data, and Evaluation for Retrieval-Augmented Generation.
[Oct 2025] FreshStack is now a part of the RTEB benchmark.
[Sep 2025] Invited talk at Microsoft Research India, Bangalore on our Relabeling Hard Negatives (RLHN) work!
[Sep 2025] FreshStack has been accepted at the NeurIPS 2025 Datasets & Benchmark Track.
[Aug 2025] Acting as a PC Member for the BREV-RAG (Beyond Relevance-based EValuation of RAG systems) workshop being held at SIGIR-AP 2025!
[Aug 2025] Checkout the Beyond-RAG mini-book by Hamel which condensed five important research talks in the RAG series. The second chapter covers my RAG evaluation guest talk!
[Aug 2025] I gave a guest lecture on Modern IR Evaluation in the RAG Era at Mila, Montreal!
[Jun 2025] My invited guest lecture with over 400+ participants on Modern IR Evaluation in the RAG Era is available now! [YouTube]
[Jun 2025] I was invited at Weaviate Podcast to talk about RAG benchmarks, Check out the video now! [YouTube]
[Jun 2025] I have been invited to talk at Hamel’s mini RAG course on “Modern Information Retrieval Evaluation In The RAG Era”. Sign up here: [https://maven.com/p/fae749/modern-ir-evaluation-in-the-generative-rag-era].
[May 2025] Our new work on nuggetizing search arena RAG answers is available as a preprint now!
[Apr 2025] My exciting internship work at Databricks on FreshStack to create realistic RAG benchmarks is available on arXiv!
[Apr 2025] Our TREC 2024 RAG support and Nuggets preprints have been accepted at SIGIR 2025!
[Jan 2025] Gave a research talk on “Accelerating Multilingual RAG Systems” at Microsoft Research, Bangalore. [video].
[Jan 2025] My work during my internship at Vectara on “MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems” is now accepted at NAACL 2025.
[Jan 2025] Our contribution on including MIRACL in “MMTEB: Massive Multilingual Text Embedding Benchmark” is now accepted at ICLR 2025.

Selected Talks & Recordings

Selected Publications

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
Z. Chen, X. Ma, ..., N. Thakur, ... J. Lin
MTI-LLM @ NeurIPS 2025

Paper | Website | Dataset | Code

Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with LLMs
N. Thakur*, C. Zhang*, X. Ma, J. Lin
EMNLP 2025 (Findings)

Paper | Dataset | Code

Chatbot Arena Meets Nuggets: Towards Explanations & Diagnostics in the Evaluation of LLM Responses
S. Sharifymoghaddam*, S. Upadhyay*, N. Thakur*, R. Pradeep, J. Lin
Preprint 2025

Paper Dataset Code

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
N. Thakur, J. Lin, S. Havens, M. Carbin, O. Khattab, A. Drozdov
NeurIPS 2025 (D&B)

Paper | Website | Dataset | Code

Assessing Support for the TREC 2024 RAG Track: A Large-Scale Comparative Study of LLM and Human Evaluations
N. Thakur, R. Pradeep, S. Upadhyay, D. Campos, N. Craswell, J. Lin
SIGIR 2025 (short)

Paper

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
R. Pradeep, N. Thakur, S. Upadhyay, D. Campos, N. Craswell, J. Lin
SIGIR 2025

Paper | Code

A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look
S. Upadhyay, R. Pradeep, N. Thakur, D. Campos, N. Craswell, I. Soboroff, H. T. Dang, J. Lin
ICTIR 2025

Paper

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
R. Pradeep*, N. Thakur*, S. Sharifymoghaddam, E. Zhang, R. Nguyen, D. Campos, N. Craswell, J. Lin
ECIR 2025 (Findings)

Paper | Code

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
N. Thakur, S. Kazi, G. Luo, J. Lin, A. Ahmad
NAACL 2025

Paper | Code | Website

MMTEB: Massive Multilingual Text Embedding Benchmark
K. Enevoldsen, I. Chung, …, N. Thakur, …
ICLR 2025

Paper | Website

UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor
S. Upadhyay, R. Pradeep, N. Thakur, N. Craswell, J. Lin
Preprint 2024

Paper | Code

“Knowing When You Don’t Know”: A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation
N. Thakur, L. Bonifacio, X. Zhang, O. Ogundepo, E. Kamalloo, D. A. Hermelo, …, M. Rezagholizadeh, J. Lin
EMNLP 2024 (Findings)

Paper | Code

Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR
N. Thakur, L. Bonifacio, M. Fröbe, A. Bondarenko, E. Kamalloo, M. Potthast, M. Hagen, J. Lin
SIGIR 2024 (Repro)

Paper | Code

Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses
E. Kamalloo, N. Thakur, C. Lassance, X. Ma, J. H. Yang, J. Lin
SIGIR 2024 (Resource)

Paper

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
N. Thakur, J. Ni, G. H. Abrego, J. Wieting, J. Lin, D. Cer
NAACL 2024

Paper | Code

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
E. Kamalloo, A. Jafari, X. Zhang, N. Thakur, J. Lin
Preprint 2023

Paper

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval
J. Lin, D. Alfonso-Hermelo, V. Jeronymo, E. Kamalloo, C. Lassance, …, N. Thakur, J. H. Yang, X. Zhang
Preprint 2023

Paper

MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages
X. Zhang*, N. Thakur*, O. Ogundepo, E. Kamalloo, D. A. Hermelo, …, M. Rezagholizadeh, J. Lin
TACL 2023

Paper | Website | Code | Dataset

Evaluating Embedding APIs for Information Retrieval
E. Kamalloo, X. Zhang, O. Ogundepo, N. Thakur, D. A. Hermelo, M. Rezagholizadeh, J. Lin
ACL 2023 (Industry)

Paper

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval
N. Thakur, K. Wang, I. Gurevych, J. Lin
SIGIR 2023 (Resource)

Paper | Code

Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval
N. Thakur, N. Reimers, J. Lin
ReNeuIR 2023

Paper | Code

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
K. Wang, N. Thakur, N. Reimers, I. Gurevych
NAACL 2022

Paper | Code

BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models
N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, I. Gurevych
NeurIPS 2021 (D&B)

Paper | Website | Code

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
N. Thakur, N. Reimers, J. Daxenberger, I. Gurevych
NAACL 2021

Paper | Website

Old Updates

2024

[Dec 2024] My work on “Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track” has been accepted at ECIR 2025 (Resource).
[Sep 2024] I started my Fall 2024 internship at Databricks in San Francisco, mentored by Omar Khattab and managed by Sam Havens and Michael Carbin.
[Aug 2024] We have received over 40+ participants in the first year of the TREC 2024 RAG Track. One of the best participated tracks up to date!
[May 2024] I have been awarded the David R. Cheriton Graduate Scholarship starting Fall 2024 for my scholastic excellence in my PhD! [Link]
[May 2024] Collaboration with Snowflake AI towards building better BEIRv2 and TREC-RAG [blogpost].
[Apr 2024] I will be attending in-person NAACL 2024 in Mexico City, Mexico between 16-20 June 2024 and SIGIR in Washington DC, USA between 14-18 July 2024. If interested, do reach out!
[Apr 2024] Received a 3K USD grant from Google to attend the NAACL 2024 Conference in Mexico City, 2024.
[Apr 2024] My work on “Systematic Evaluation of Neural Retrieval Models on the Touch{'e}~2020 Argument Retrieval Subset of BEIR” has been accepted at SIGIR 2024 (Reproduction).
[Apr 2024] My work on “Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses” has been accepted at SIGIR 2024 (Resource).
[Mar 2024] My Google internship work on “SWIM-IR: Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval” has been accepted at NAACL 2024.
[Feb 2024] Started part time research collaboration on improving multilingual RAG systems with Vectara.
[Jan 2024] Gave two research talks on “Heterogeneous Benchmarking of Information Retrieval” in IIT-D (Delhi) and IIIT-Delhi [presentation] [video].

2023

[Nov 2023] TREC RAG 2024 has been accepted and will be conducted as a shared task in TREC 2024.
[Nov 2023] My internship work at Google is out on Arxiv, dataset is released here.
[Jul 2023] I will be attending the SIGIR 2023 virtual conference being held in Taipei, Taiwan! Say hi to me (virtually)!
[Jul 2023] I will be attending the ACL 2023 in-person conference being held in Toronto, Canada! Say hi to me!
[Jun 2023] The Domain Adaptation Paper has been accepted in ReNeuIR 2023 Workshop to be held jointly with SIGIR 2023!
[Jun 2023] The SPRINT Toolkit Paper has been accepted in SIGIR 2023 Resource Track!
[May 2023] The MIRACL Paper has been accepted in TACL 2023!
[May 2023] The Evaluating Embedding API Paper has been accepted in ACL 2023 Industry Track!

2022

[Sep 2022] The MIRACL Challenge was accepted in WSDM Cup 2023. The Challenge is now live and looking for participants.
[Aug 2022] I started my Fall Internship at the Language Team in Google Research with Daniel Cer and Jianmo Ni.

2021

[Mar 2021] Augmented SBERT got accepted as a long paper at NAACL 2021! PDF
[Feb 2021] Designed and attended The First ELLIS NLP 2021 Workshop. Website
[Jan 2021] Designed the Second 2021 SustaiNLP Workshop Website. Website

2020

[Nov 2020] [Cancelled (COVID-19)] Selected to speak at PyCon Italia 2020: “Extract or Replace Keywords in sentences 28x times faster than Regex - FlashText”. Abstract YouTube Github
[Jul 2020] ArgumenText won 4th place amongst 3000+ startups in Nordbayerischen Businessplan. Link
[Jul 2020] I attended the Association for Computational Linguistics (ACL) 2020 virtual conference.