FineTome-100k/README.md
2024-07-29 09:52:30 +00:00

982 B

dataset_info configs
features splits download_size dataset_size
name list
conversations
name dtype
from string
name dtype
value string
name dtype
source string
name dtype
score float64
name num_bytes num_examples
train 239650960.7474458 100000
116531415 239650960.7474458
config_name data_files
default
split path
train data/train-*

FineTome-100k

image/jpeg

The FineTome dataset is a subset of arcee-ai/The-Tome (without arcee-ai/qwen2-72b-magpie-en), re-filtered using HuggingFaceFW/fineweb-edu-classifier.

It was made for my article "Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth".