From 3ab7155aa9b89ac532b2f2efcc3f136766b91025 Mon Sep 17 00:00:00 2001 From: Xiao Date: Sun, 11 Feb 2024 12:33:00 +0000 Subject: [PATCH] Update README.md --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 3794565..8773346 100644 --- a/README.md +++ b/README.md @@ -237,9 +237,13 @@ We compare BGE-M3 with some popular methods, including BM25, openAI embedding, e - NarritiveQA: ![avatar](./imgs/nqa.jpg) -- BM25 +- Comparison with BM25 We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline). +We tested BM25 using two different tokenizers: +one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta). +The results indicate that BM25 remains a competitive baseline, +especially in long document retrieval. ![avatar](./imgs/bm25.jpg)