:wave: Hello! My name is Nandan Thakur (नंदन ठाकुर / নন্দন ঠাকুর) [CV].

I’m currently a (second) third-year PhD student in the David R. Cheriton School of Computer Science at the University of Waterloo in Canada, advised by Prof. Jimmy Lin. My research is supported by the David R. Cheriton Graduate Scholarship (2024).

My research is on “Heterogeneous benchmarking of Retrieval and RAG systems across diverse domains and languages”. I developed the BEIR benchmark in 2021, which has become an industry standard for measuring model generalization. I’ve been fortunate to be a part of MIRACL, SWIM-IR and similar datasets. Recently, I have been working on better evaluation of Retrieval-Augmented Generation (RAG) systems, where I am co-hosting the first ever RAG competition in TREC 2024: TREC 2024 RAG. During my PhD, I’ve been involved in many research collaborations, including SnowFlake AI, Vectara and Huawei. I’ve also been fortunate to intern at Google Research.

Before my PhD, I was a research assistant at the UKP Lab (TU Darmstadt) in Germany, supervised by Prof. Iryna Gurevych and Nils Reimers (2019 - 2021). I received my undergraduate degree at Birla Institute of Technology and Science, Pilani (BITS Pilani) in 2018. I also have industrial experience, working as a Data Scientist in KNOLSKAPE (2018 - 2019) and undergraduate internships at EMBL Heidelberg (Summer 2018) and Belong.co (Fall & Winter 2017).

If you wish to learn more about my research, visit Research where I mention in depth and maintain a list of all my publications. For any questions, best way to reach out to me is via email: nandan.thakur@uwaterloo.ca, nandant@gmail.com or Twitter.

I am actively looking for research internships in Fall 2024!

:fire: Recent News


  • [May 2024] :trophy: I have been awarded the David R. Cheriton Graduate Scholarship starting Fall 2024 for my scholastic excellence in my PhD! [Link]
  • [May 2024] :handshake: Collaboration with Snowflake AI towards building better BEIRv2 and TREC-RAG [blogpost].
  • [Apr 2024] :airplane: I will be attending in-person NAACL 2024 in Mexico City, Mexico between 16-20 June 2024 and SIGIR in Washington DC, USA between 14-18 July 2024. If interested, do reach out!
  • [Apr 2024] :moneybag: Received a 3K USD grant from Google to attend the NAACL 2024 Conference in Mexico City, 2024.
  • [Apr 2024] :page_facing_up: My work on “Systematic Evaluation of Neural Retrieval Models on the Touch{'e}~2020 Argument Retrieval Subset of BEIR” has been accepted at SIGIR 2024 (Reproduction).
  • [Apr 2024] :page_facing_up: My work on “Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses” has been accepted at SIGIR 2024 (Resource).
  • [Mar 2024] :page_facing_up: My Google internship work on “SWIM-IR: Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval” has been accepted at NAACL 2024.
  • [Feb 2024] :bulb: Started part time research collaboration on improving multilingual RAG systems with Vectara.
  • [Jan 2024] :speaking_head: Gave two research talks on “Heterogeneous Benchmarking of Information Retrieval” in IIT-D (Delhi) and IIIT-Delhi [presentation] [video].
  • [Nov 2023] :scroll: TREC RAG 2024 has been accepted and will be conducted as a shared task in TREC 2024.
  • [Nov 2023] :newspaper: My internship work at Google is out on Arxiv, dataset is released here.
  • [Jul 2023] :computer: I will be attending the SIGIR 2023 virtual conference being held in Taipei, Taiwan! Say hi to me (virtually)!
  • [Jul 2023] :cityscape: I will be attending the ACL 2023 in-person conference being held in Toronto, Canada! Say hi to me!
  • [Jun 2023] :page_facing_up: The Domain Adaptation Paper has been accepted in ReNeuIR 2023 Workshop to be held jointly with SIGIR 2023!
  • [Jun 2023] :page_facing_up: The SPRINT Toolkit Paper has been accepted in SIGIR 2023 Resource Track!
  • [May 2023] :page_facing_up: The MIRACL Paper has been accepted in TACL 2023!
  • [May 2023] :page_facing_up: The Evaluating Embedding API Paper has been accepted in ACL 2023 Industry Track!
  • [Sep 2022] :trophy: The MIRACL Challenge was accepted in WSDM Cup 2023. The Challenge is now live and looking for participants.
  • [Aug 2022] :briefcase: I started my Fall Internship at the Language Team in Google Research with Daniel Cer and Jianmo Ni.
  • [Mar 2021] :page_facing_up: Augmented SBERT got accepted as a long paper at NAACL 2021! PDF
  • [Feb 2021] :globe_with_meridians: Designed and attended The First ELLIS NLP 2021 Workshop. Website
  • [Jan 2021] :globe_with_meridians: Designed the Second 2021 SustaiNLP Workshop Website. Website
  • [Nov 2020] :no_entry_sign: [Cancelled (COVID-19)] Selected to speak at PyCon Italia 2020: “Extract or Replace Keywords in sentences 28x times faster than Regex - FlashText”. Abstract YouTube Github
  • [Jul 2020] :trophy: ArgumenText won 4th place amongst 3000+ startups in Nordbayerischen Businessplan. Link
  • [Jul 2020] :computer: I attended the Association for Computational Linguistics (ACL) 2020 virtual conference.