From 33923f685be7b6a1ea9971299e59955c9ecaf51b Mon Sep 17 00:00:00 2001
From: Jonathan Tow <jon-tow@users.noreply.huggingface.co>
Date: Thu, 21 Mar 2024 22:01:19 +0000
Subject: [PATCH] fix(README): correct tokenizer name

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 7276bc1..45812d5 100644
--- a/README.md
+++ b/README.md
@@ -108,7 +108,7 @@ The dataset is comprised of a filtered mixture of open-source large-scale datase
 
 ### Training Procedure
 
-The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW, and trained using the NeoX tokenizer with a vocabulary size of 100,352. We outline the complete hyperparameters choices in the project's [GitHub repository - config*](https://github.com/Stability-AI/StableLM/blob/main/configs/stablelm-2-1.6b.yml). The final checkpoint of pre-training, before cooldown, is provided in the `global_step420000` [branch](https://huggingface.co/stabilityai/stablelm-2-1_6b/blob/global_step420000/README.md).
+The model is pre-trained on the aforementioned datasets in `bfloat16` precision, optimized with AdamW, and trained using the Arcade100k tokenizer with a vocabulary size of 100,352. We outline the complete hyperparameters choices in the project's [GitHub repository - config*](https://github.com/Stability-AI/StableLM/blob/main/configs/stablelm-2-1.6b.yml). The final checkpoint of pre-training, before cooldown, is provided in the `global_step420000` [branch](https://huggingface.co/stabilityai/stablelm-2-1_6b/blob/global_step420000/README.md).
 
 ### Training Infrastructure