Update README.md
This commit is contained in:
parent
aa47896ffa
commit
cfdb103bd8
13
README.md
13
README.md
@ -215,11 +215,6 @@ print(model.compute_score(sentence_pairs,
|
|||||||
|
|
||||||
|
|
||||||
We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
|
We compare BGE-M3 with some popular methods, including BM25, openAI embedding, etc.
|
||||||
We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
|
|
||||||
To make the BM25 and BGE-M3 more comparable, in the experiment,
|
|
||||||
BM25 used the same tokenizer as BGE-M3 (i.e., the tokenizer of XLM-Roberta).
|
|
||||||
Using the same vocabulary can also ensure that both approaches have the same retrieval latency.
|
|
||||||
|
|
||||||
|
|
||||||
- Multilingual (Miracl dataset)
|
- Multilingual (Miracl dataset)
|
||||||
|
|
||||||
@ -242,6 +237,12 @@ Using the same vocabulary can also ensure that both approaches have the same ret
|
|||||||
- NarritiveQA:
|
- NarritiveQA:
|
||||||

|

|
||||||
|
|
||||||
|
- BM25
|
||||||
|
|
||||||
|
We utilized Pyserini to implement BM25, and the test results can be reproduced by this [script](https://github.com/FlagOpen/FlagEmbedding/tree/master/C_MTEB/MLDR#bm25-baseline).
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
## Training
|
## Training
|
||||||
- Self-knowledge Distillation: combining multiple outputs from different
|
- Self-knowledge Distillation: combining multiple outputs from different
|
||||||
@ -259,7 +260,7 @@ Refer to our [report](https://arxiv.org/pdf/2402.03216.pdf) for more details.
|
|||||||
## Acknowledgement
|
## Acknowledgement
|
||||||
|
|
||||||
Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
|
Thanks the authors of open-sourced datasets, including Miracl, MKQA, NarritiveQA, etc.
|
||||||
Thanks the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [pyserial](https://github.com/pyserial/pyserial).
|
Thanks the open-sourced libraries like [Tevatron](https://github.com/texttron/tevatron), [Pyserini](https://github.com/castorini/pyserini).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user