From 8bff2120360d20fe9e80574f4cf71d7e15e5b8c6 Mon Sep 17 00:00:00 2001 From: Open LLM Leaderboard PR Bot Date: Fri, 17 Nov 2023 21:16:51 +0000 Subject: [PATCH] Adding Evaluation Results This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card. If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions --- README.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9282036..f31b21b 100644 --- a/README.md +++ b/README.md @@ -140,4 +140,17 @@ You can find the paper at https://arxiv.org/abs/2309.05463 journal={arXiv preprint arXiv:2309.05463}, year={2023} } -``` \ No newline at end of file +``` +# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) +Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_microsoft__phi-1_5) + +| Metric | Value | +|-----------------------|---------------------------| +| Avg. | 41.6 | +| ARC (25-shot) | 52.9 | +| HellaSwag (10-shot) | 63.79 | +| MMLU (5-shot) | 43.89 | +| TruthfulQA (0-shot) | 40.89 | +| Winogrande (5-shot) | 72.22 | +| GSM8K (5-shot) | 12.43 | +| DROP (3-shot) | 5.04 |