Adding Evaluation Results
This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card. If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
This commit is contained in:
parent
fe8a4ea1ff
commit
3a67035e2d
127
README.md
127
README.md
@ -1,19 +1,115 @@
|
||||
---
|
||||
language:
|
||||
- en
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- cerebras/SlimPajama-627B
|
||||
- bigcode/starcoderdata
|
||||
- HuggingFaceH4/ultrachat_200k
|
||||
- HuggingFaceH4/ultrafeedback_binarized
|
||||
language:
|
||||
- en
|
||||
widget:
|
||||
- example_title: Fibonacci (Python)
|
||||
messages:
|
||||
- role: system
|
||||
content: You are a chatbot who can help code!
|
||||
- role: user
|
||||
content: Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.
|
||||
- example_title: Fibonacci (Python)
|
||||
messages:
|
||||
- role: system
|
||||
content: You are a chatbot who can help code!
|
||||
- role: user
|
||||
content: Write me a function to calculate the first 10 digits of the fibonacci
|
||||
sequence in Python and print it out to the CLI.
|
||||
model-index:
|
||||
- name: TinyLlama-1.1B-Chat-v1.0
|
||||
results:
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: IFEval (0-Shot)
|
||||
type: HuggingFaceH4/ifeval
|
||||
args:
|
||||
num_few_shot: 0
|
||||
metrics:
|
||||
- type: inst_level_strict_acc and prompt_level_strict_acc
|
||||
value: 5.96
|
||||
name: strict accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: BBH (3-Shot)
|
||||
type: BBH
|
||||
args:
|
||||
num_few_shot: 3
|
||||
metrics:
|
||||
- type: acc_norm
|
||||
value: 4.01
|
||||
name: normalized accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: MATH Lvl 5 (4-Shot)
|
||||
type: hendrycks/competition_math
|
||||
args:
|
||||
num_few_shot: 4
|
||||
metrics:
|
||||
- type: exact_match
|
||||
value: 0.91
|
||||
name: exact match
|
||||
source:
|
||||
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: GPQA (0-shot)
|
||||
type: Idavidrein/gpqa
|
||||
args:
|
||||
num_few_shot: 0
|
||||
metrics:
|
||||
- type: acc_norm
|
||||
value: 0.0
|
||||
name: acc_norm
|
||||
source:
|
||||
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: MuSR (0-shot)
|
||||
type: TAUR-Lab/MuSR
|
||||
args:
|
||||
num_few_shot: 0
|
||||
metrics:
|
||||
- type: acc_norm
|
||||
value: 4.31
|
||||
name: acc_norm
|
||||
source:
|
||||
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
||||
name: Open LLM Leaderboard
|
||||
- task:
|
||||
type: text-generation
|
||||
name: Text Generation
|
||||
dataset:
|
||||
name: MMLU-PRO (5-shot)
|
||||
type: TIGER-Lab/MMLU-Pro
|
||||
config: main
|
||||
split: test
|
||||
args:
|
||||
num_few_shot: 5
|
||||
metrics:
|
||||
- type: acc
|
||||
value: 1.12
|
||||
name: accuracy
|
||||
source:
|
||||
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=TinyLlama/TinyLlama-1.1B-Chat-v1.0
|
||||
name: Open LLM Leaderboard
|
||||
---
|
||||
<div align="center">
|
||||
|
||||
@ -63,4 +159,17 @@ print(outputs[0]["generated_text"])
|
||||
# How many helicopters can a human eat in one sitting?</s>
|
||||
# <|assistant|>
|
||||
# ...
|
||||
```
|
||||
```
|
||||
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
||||
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/TinyLlama__TinyLlama-1.1B-Chat-v1.0-details)
|
||||
|
||||
| Metric |Value|
|
||||
|-------------------|----:|
|
||||
|Avg. | 2.72|
|
||||
|IFEval (0-Shot) | 5.96|
|
||||
|BBH (3-Shot) | 4.01|
|
||||
|MATH Lvl 5 (4-Shot)| 0.91|
|
||||
|GPQA (0-shot) | 0.00|
|
||||
|MuSR (0-shot) | 4.31|
|
||||
|MMLU-PRO (5-shot) | 1.12|
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user