Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
publications
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
Published in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021
A simple yet efficient data augmentation strategy using the cross-encoder to label training data for training the bi-encoder for pairwise sentence scoring tasks.
Recommended citation:
Download Paper
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Published in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021
A novel heterogeneous zero-shot retrieval benchmark containing 18 datasets from diverse text retrieval tasks and domains in English.
Recommended citation:
Download Paper
GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval
Published in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2022
A novel unsupervised domain adaptation method which combines a query generator with pseudo labeling from a cross-encoder.
Recommended citation:
Download Paper
Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages
Published in Proceedings of Transactions of the Association for Computational Linguistics (TACL), 2022
A human-labeled multilingual retrieval dataset across 18 languages from diverse langauge families to progress retrieval systems across various languages.
Recommended citation:
Download Paper
Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval
Published in Arxiv Preprint, 2023
Simple yet Effective Cross-lingual Baselines involving both sparse and dense retrieval models using IR Toolkits for test collections in the TREC 2022 NeuCLIR Track.
Recommended citation:
Download Paper
Evaluating Embedding APIs for Information Retrieval
Published in Association for Computational Linguistics (ACL) 2023 Industry Track, 2023
Analyze semantic embedding APIs in realistic retrieval scenarios in order to assist practitioners and researchers in finding suitable services.
Recommended citation:
Download Paper
SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval
Published in In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023
A unified toolkit for evaluation of diverse zero-shot neural sparse retrieval models.
Recommended citation:
Download Paper
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval
Published in 2023 Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR’23), 2023
A domain adaptation technique which is able to improve zero-shot performance of dense-retrieval models by maintaining 32x memory efficiency and latency.
Recommended citation:
Download Paper
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
Published in Arxiv Preprint, 2023
A high-quality dataset for training and evaluating generative search (RAG) models with citations.
Recommended citation:
Download Paper
Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses
Published in To appear in SIGIR 2024 (Resource Track), 2024
Resources to support the BEIR benchmark: Reproducible lexical, sparse and dense baselines and statistical analyses.
Recommended citation:
Download Paper
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Published in To appear in the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2024)., 2024
A large-scale synthetic LLM-generated dataset for improving multilingual retrieval systems without human-labeled training data.
Recommended citation:
Download Paper
Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR
Published in To appear in SIGIR 2024 (Resource Track), 2024
We denoise and conduct post-hoc judgments on the Touché 2020 Argument Retrieval Subset of BEIR.
Recommended citation:
Download Paper
NoMIRACL: Knowing When You Don’t Know for Robust Multilingual Retrieval-Augmented Generation
Published in Arxiv Preprint, 2024
A multilingual hallucination evaluation dataset for measuring LLM performance on non-answerable questions in RAG systems.
Recommended citation:
Download Paper
talks
BEIR: An Open-Source Benchmark for Information Retrieval Systems @ OpenNLP
Published:
Summary
Gave a talk about BEIR: A zero-shot retrieval benchmark for heterogeneous benchmarking.
Heterogeneous Benchmarking @ Stanford
Published:
Summary
Gave a talk about the importance of zero-shot benchmarking in the field of information retrieval.
Advanced Information Retrieval @ Koç
Published:
Summary
Gave a tutorial about recent advances in the field of Information Retrieval.
Heterogeneous IR Benchmarking @ IIT Delhi
Published:
Title
Heterogenous Benchmarking across Domains and Languages: The Key to Enable Meaningful Progress in IR Research
Heterogeneous IR Benchmarking @ IIIT Delhi
Published:
Title
Heterogenous Benchmarking across Domains and Languages: The Key to Enable Meaningful Progress in IR Research
Accelerating Multilingual RAG @ Micrsoft Research
Published:
Title
Advancing Multilingual RAG Systems: Retrieval, Relevance, and Generation Evaluation
teaching
CS 135 Designing Functional Programs
Fall 2021, University of Waterloo, 2021
Worked as TA for the CS 135 undergraduate-level course on Designing Functional Programs in University of Waterloo. Addressed students' doubts and marked student assignments weekly on Racket.
CS 136 Elementary Algorithm Design and Data Abstraction
Winter 2022, University of Waterloo, 2022
Worked as TA for the CS 136 undergraduate-level course on Elementary Algorithm Design and Data Abstraction in University of Waterloo. Addressed students' doubts and marked student assignments weekly in C Programming.
CS 241 Foundations of Sequential Programs
Spring 2023, University of Waterloo, 2023
Worked as TA for the CS 241 undergraduate-level course on Foundations of Sequential Programs in University of Waterloo. Addressed students' doubts and marked student assignments weekly in C Programming.
CS 479/679 Introduction to Artificial Intelligence
Winter 2023, University of Waterloo, 2023
Worked as TA for the CS 479/679 undergraduate and graduate level mixed course on Introduction to Artificial Intelligence in University of Waterloo. Addressed students' doubts and wrote and evaluated a whole assignment by myself. In this assignment, I asked students to code basic foundational principles in neural networks, including the model forward pass, backpropogation algorithm from scratch (using numpy only) and multiple different loss functions. I also set a regression and classification problem for the students to evaluate. All the codes were written in Python.
CS 370 Numerical Approximation
Winter 2024, Spring 2024 & Winter 2025, University of Waterloo, 2025
Worked as TA for the CS 370 undergraduate level course on Numerical Approximation in University of Waterloo. Addressed students' doubts by taking office hours for all four assignments. Created Video solutions for one of the assignments. Delved concepts such as systems of ODEs, Euler approximation, Runge-Kutta approximation and Discerete Fourier Transforms (DFTs). All codes were written in Python.