diff --git a/README.md b/README.md index 339b6a7..d030d28 100644 --- a/README.md +++ b/README.md @@ -23,3 +23,9 @@ configs: - split: train path: data/train-* --- + +# FineTome-100k + +![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/75I3ffI4XnRlheOQ7kNJ3.jpeg) + +The FineTome dataset is a susbet of [arcee-ai/The-Tome](https://huggingface.co/datasets/arcee-ai/The-Tome) (without arcee-ai/qwen2-72b-magpie-en) re-filtered using [HuggingFaceFW/fineweb-edu-classifier](https://huggingface.co/HuggingFaceFW/fineweb-edu-classifier).