Update README.md

This commit is contained in:
Xiao 2024-02-08 18:04:27 +00:00 committed by system
parent 2d5552fc4c
commit 6d44202115
No known key found for this signature in database
GPG Key ID: 6A528E38E0733467

@ -214,10 +214,11 @@ print(model.compute_score(sentence_pairs,
## Evaluation ## Evaluation
**Currently, the results of BM25 on non-English data are incorrect. We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
We will review our testing process and update the paper as soon as possible. We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
For more powerful BM25, you can refer to this [repo](https://github.com/carlos-lassance/bm25_mldr). To make the BM25 and BGE-M3 more comparable, in the experiment,
Thanks to the community for the reminder and to carlos-lassance for providing the results.** BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta).
Using the same vocabulary can also ensure that both approaches have the same retrieval latency.
- Multilingual (Miracl dataset) - Multilingual (Miracl dataset)