Upload README.md with huggingface_hub

This commit is contained in:
Xiao 2024-03-18 06:36:49 +00:00 committed by system
parent 20d65a70df
commit e306d60933
No known key found for this signature in database
GPG Key ID: 6A528E38E0733467

@ -245,6 +245,20 @@ with torch.no_grad():
## Fine-tune
### Data Format
Train data should be a json file, where each line is a dict like this:
```
{"query": str, "pos": List[str], "neg":List[str], "prompt": str}
```
`query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives.
See [toy_finetune_data.jsonl](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker/toy_finetune_data.jsonl) for a toy data file.
### Train
You can fine-tune the reranker with the following code:
**For llm-based reranker**