Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas.

Jupyter notebook markdown generator

Posts

publications

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

Published in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021

A simple yet efficient data augmentation strategy using the cross-encoder to label training data for training the bi-encoder for pairwise sentence scoring tasks.

Recommended citation: https://aclanthology.org/2021.naacl-main.28/

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Published in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021

A novel heterogeneous zero-shot retrieval benchmark containing 18 datasets from diverse text retrieval tasks and domains in English.

Recommended citation: https://openreview.net/forum?id=wCu6T5xFjeJ

GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval

Published in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2022

A novel unsupervised domain adaptation method which combines a query generator with pseudo labeling from a cross-encoder.

Recommended citation: https://aclanthology.org/2022.naacl-main.168/

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

Published in Proceedings of Transactions of the Association for Computational Linguistics (TACL), 2022

A human-labeled multilingual retrieval dataset across 18 languages from diverse langauge families to progress retrieval systems across various languages.

Recommended citation: https://arxiv.org/abs/2210.09984

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

Published in Arxiv Preprint, 2023

Simple yet Effective Cross-lingual Baselines involving both sparse and dense retrieval models using IR Toolkits for test collections in the TREC 2022 NeuCLIR Track.

Recommended citation: https://arxiv.org/abs/2304.01019

Evaluating Embedding APIs for Information Retrieval

Published in Association for Computational Linguistics (ACL) 2023 Industry Track, 2023

Analyze semantic embedding APIs in realistic retrieval scenarios in order to assist practitioners and researchers in finding suitable services.

Recommended citation: https://arxiv.org/abs/2305.06300

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

Published in In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023

A unified toolkit for evaluation of diverse zero-shot neural sparse retrieval models.

Recommended citation: https://dl.acm.org/doi/abs/10.1145/3539618.3591902

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

Published in Arxiv Preprint, 2023

A high-quality dataset for training and evaluating generative search (RAG) models with citations.

Recommended citation: https://arxiv.org/abs/2307.16883

Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval

Published in 2023 Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR’23), 2023

A domain adaptation technique which is able to improve zero-shot performance of dense-retrieval models by maintaining 32x memory efficiency and latency.

Recommended citation: https://arxiv.org/abs/2205.11498

Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses

Published in To appear in SIGIR 2024 (Resource Track), 2024

Resources to support the BEIR benchmark: Reproducible lexical, sparse and dense baselines and statistical analyses.

Recommended citation: https://arxiv.org/abs/2210.09984

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

Published in To appear in the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2024)., 2024

A large-scale synthetic LLM-generated dataset for improving multilingual retrieval systems without human-labeled training data.

Recommended citation: https://arxiv.org/abs/2311.05800

talks

BEIR: An Open-Source Benchmark for Information Retrieval Systems

Published: June 04, 2021

Gave a talk about BEIR: A zero-shot retrieval benchmark for heterogeneous benchmarking.

Heterogeneous Benchmarking in Information Retrieval Research

Published: November 17, 2022

Gave a talk about the importance of zero-shot benchmarking in the field of information retrieval.

Advanced Information Retrieval (Tutorial)

Published: June 01, 2023

Gave a tutorial about recent advances in the field of Information Retrieval.

teaching

CS 135 Designing Functional Programs

Fall 2021, University of Waterloo, 2021

Worked as TA for the CS 135 undergraduate-level course on Designing Functional Programs in University of Waterloo. Addressed students’ doubts and marked student assignments weekly on Racket.

CS 136 Elementary Algorithm Design and Data Abstraction

Winter 2022, University of Waterloo, 2022

Worked as TA for the CS 136 undergraduate-level course on Elementary Algorithm Design and Data Abstraction in University of Waterloo. Addressed students’ doubts and marked student assignments weekly in C Programming.

CS 241 Foundations of Sequential Programs

Spring 2023, University of Waterloo, 2023

Worked as TA for the CS 241 undergraduate-level course on Foundations of Sequential Programs in University of Waterloo. Addressed students’ doubts and marked student assignments weekly in C Programming.

CS 479/679 Introduction to Artificial Intelligence

Winter 2023, University of Waterloo, 2023

Worked as TA for the CS 479/679 undergraduate and graduate level mixed course on Introduction to Artificial Intelligence in University of Waterloo. Addressed students’ doubts and wrote and evaluated a whole assignment by myself. In this assignment, I asked students to code basic foundational principles in neural networks, including the model forward pass, backpropogation algorithm from scratch (using numpy only) and multiple different loss functions. I also set a regression and classification problem for the students to evaluate. All the codes were written in Python.

CS 370 Numerical Approximation

Fall 2023, University of Waterloo, 2023

Worked as TA for the CS 370 undergraduate level course on Numerical Approximation in University of Waterloo. Addressed students’ doubts by taking office hours for all four assignments. Created Video solutions for one of the assignments. Delved concepts such as systems of ODEs, Euler approximation, Runge-Kutta approximation and Discerete Fourier Transforms (DFTs). All codes were written in Python.

Nandan Thakur

Sitemap

Pages

Posts

publications

talks

teaching