Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
This is a page not in th emain menu
Published in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2021
A simple yet efficient data augmentation strategy using the cross-encoder to label training data for training the bi-encoder for pairwise sentence scoring tasks.
Recommended citation: https://aclanthology.org/2021.naacl-main.28/
Published in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021
A novel heterogeneous zero-shot retrieval benchmark containing 18 datasets from diverse text retrieval tasks and domains in English.
Recommended citation: https://openreview.net/forum?id=wCu6T5xFjeJ
Published in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2022
A novel unsupervised domain adaptation method which combines a query generator with pseudo labeling from a cross-encoder.
Recommended citation: https://aclanthology.org/2022.naacl-main.168/
Published in Proceedings of Transactions of the Association for Computational Linguistics (TACL), 2022
A human-labeled multilingual retrieval dataset across 18 languages from diverse langauge families to progress retrieval systems across various languages.
Recommended citation: https://arxiv.org/abs/2210.09984
Published in Arxiv Preprint, 2023
Simple yet Effective Cross-lingual Baselines involving both sparse and dense retrieval models using IR Toolkits for test collections in the TREC 2022 NeuCLIR Track.
Recommended citation: https://arxiv.org/abs/2304.01019
Published in Association for Computational Linguistics (ACL) 2023 Industry Track, 2023
Analyze semantic embedding APIs in realistic retrieval scenarios in order to assist practitioners and researchers in finding suitable services.
Recommended citation: https://arxiv.org/abs/2305.06300
Published in In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), 2023
A unified toolkit for evaluation of diverse zero-shot neural sparse retrieval models.
Recommended citation: https://dl.acm.org/doi/abs/10.1145/3539618.3591902
Published in 2023 Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR’23), 2023
A domain adaptation technique which is able to improve zero-shot performance of dense-retrieval models by maintaining 32x memory efficiency and latency.
Recommended citation: https://arxiv.org/abs/2205.11498
Published in Arxiv Preprint, 2023
A high-quality dataset for training and evaluating generative search (RAG) models with citations.
Recommended citation: https://arxiv.org/abs/2307.16883
Published in To appear in SIGIR 2024 (Resource Track), 2024
Resources to support the BEIR benchmark: Reproducible lexical, sparse and dense baselines and statistical analyses.
Recommended citation: https://arxiv.org/abs/2210.09984
Published in To appear in the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2024)., 2024
A large-scale synthetic LLM-generated dataset for improving multilingual retrieval systems without human-labeled training data.
Recommended citation: https://arxiv.org/abs/2311.05800
Published in To appear in SIGIR 2024 (Resource Track), 2024
We denoise and conduct post-hoc judgments on the Touché 2020 Argument Retrieval Subset of BEIR.
Recommended citation: https://drive.google.com/file/d/1PelcEwWWCN1pS9HIJ9f7N_D5jtM3ZZUd/view
Published in Arxiv Preprint, 2024
A multilingual hallucination evaluation dataset for measuring LLM performance on non-answerable questions in RAG systems.
Recommended citation: https://arxiv.org/abs/2312.11361
Published:
Gave a talk about BEIR: A zero-shot retrieval benchmark for heterogeneous benchmarking.
Published:
Gave a talk about the importance of zero-shot benchmarking in the field of information retrieval.
Published:
Gave a tutorial about recent advances in the field of Information Retrieval.
Published:
Heterogenous Benchmarking across Domains and Languages: The Key to Enable Meaningful Progress in IR Research.
Published:
Heterogenous Benchmarking across Domains and Languages: The Key to Enable Meaningful Progress in IR Research.
Fall 2021, University of Waterloo, 2021
Worked as TA for the CS 135 undergraduate-level course on Designing Functional Programs in University of Waterloo. Addressed students’ doubts and marked student assignments weekly on Racket.
Winter 2022, University of Waterloo, 2022
Worked as TA for the CS 136 undergraduate-level course on Elementary Algorithm Design and Data Abstraction in University of Waterloo. Addressed students’ doubts and marked student assignments weekly in C Programming.
Spring 2023, University of Waterloo, 2023
Worked as TA for the CS 241 undergraduate-level course on Foundations of Sequential Programs in University of Waterloo. Addressed students’ doubts and marked student assignments weekly in C Programming.
Winter 2023, University of Waterloo, 2023
Worked as TA for the CS 479/679 undergraduate and graduate level mixed course on Introduction to Artificial Intelligence in University of Waterloo. Addressed students’ doubts and wrote and evaluated a whole assignment by myself. In this assignment, I asked students to code basic foundational principles in neural networks, including the model forward pass, backpropogation algorithm from scratch (using numpy only) and multiple different loss functions. I also set a regression and classification problem for the students to evaluate. All the codes were written in Python.
Fall 2023, University of Waterloo, 2023
Worked as TA for the CS 370 undergraduate level course on Numerical Approximation in University of Waterloo. Addressed students’ doubts by taking office hours for all four assignments. Created Video solutions for one of the assignments. Delved concepts such as systems of ODEs, Euler approximation, Runge-Kutta approximation and Discerete Fourier Transforms (DFTs). All codes were written in Python.