From a12bcd780a35568b0f3dc7c16da902622fe6af0f Mon Sep 17 00:00:00 2001 From: Xiao Date: Thu, 1 Feb 2024 09:09:40 +0000 Subject: [PATCH] Update README.md --- README.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 0018398..865fa58 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,19 @@ Utilizing the re-ranking model (e.g., [bge-reranker](https://github.com/FlagOpen - Sparse retrieval (lexical matching): a vector of size equal to the vocabulary, with the majority of positions set to zero, calculating a weight only for tokens present in the text. e.g., BM25, [unicoil](https://arxiv.org/pdf/2106.14807.pdf), and [splade](https://arxiv.org/abs/2107.05720) - Multi-vector retrieval: use multiple vectors to represent a text, e.g., [ColBERT](https://arxiv.org/abs/2004.12832). -**2. How to use BGE-M3 in other projects?** + +**2. Comparison with BGE-v1.5 and other monolingual models** + +BGE-M3 is a multilingual model, and its ability in monolingual embedding retrieval may not surpass models specifically designed for single languages. +However, we still recommend trying BGE-M3 because of its versatility (support for multiple languages and long texts). +Moreover, it can simultaneously generate multiple representations, and using them together can enhance accuracy and generalization, +unlike most existing models that can only perform dense retrieval. + +In the open-source community, there are many excellent models (e.g., jina-embedding, colbert, e5, etc), +and users can choose a model that suits their specific needs based on practical considerations, +such as whether to require multilingual or cross-language support, and whether to process long texts. + +**3. How to use BGE-M3 in other projects?** For embedding retrieval, you can employ the BGE-M3 model using the same approach as BGE. The only difference is that the BGE-M3 model no longer requires adding instructions to the queries. @@ -52,7 +64,7 @@ For sparse retrieval methods, most open-source libraries currently do not suppor Contributions from the community are welcome. -**3. How to fine-tune bge-M3 model?** +**4. How to fine-tune bge-M3 model?** You can follow the common in this [example](https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/finetune) to fine-tune the dense embedding.