From 3ab7155aa9b89ac532b2f2efcc3f136766b91025 Mon Sep 17 00:00:00 2001
From: Xiao <Shitao@users.noreply.huggingface.co>
Date: Sun, 11 Feb 2024 12:33:00 +0000
Subject: [PATCH] Update README.md

---
 README.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 3794565..8773346 100644
--- a/README.md
+++ b/README.md
@@ -237,9 +237,13 @@ We compare BGE-M3 with some popular methods, including BM25, openAI embedding, e
   - NarritiveQA:  
   ![avatar](./imgs/nqa.jpg)
 
-- BM25  
+- Comparison with BM25  
 
 We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
+We tested BM25 using two different tokenizers: 
+one using Lucene Analyzer and the other using the same tokenizer as M3 (i.e., the tokenizer of xlm-roberta). 
+The results indicate that BM25 remains a competitive baseline, 
+especially in long document retrieval.
 
 ![avatar](./imgs/bm25.jpg)