TinyLlama-1.1B-Chat-v1.0
Go to file
Ryan Nelson a19aa7c77d
Adding ONNX file of this model
Beep boop I am the [ONNX export bot 🤖🏎️](https://huggingface.co/spaces/onnx/export). On behalf of [WRCREX](https://huggingface.co/WRCREX), I would like to add to this repository the model converted to ONNX.

What is ONNX? It stands for "Open Neural Network Exchange", and is the most commonly used open standard for machine learning interoperability. You can find out more at [onnx.ai](https://onnx.ai/)!

The exported ONNX model can be then be consumed by various backends as TensorRT or TVM, or simply be used in a few lines with 🤗 Optimum through ONNX Runtime, check out how [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models)!
2025-03-10 19:26:12 +00:00
onnx Adding ONNX file of this model 2025-03-10 19:26:12 +00:00
.gitattributes Adding ONNX file of this model 2025-03-10 19:26:12 +00:00
config.json Update config.json 2023-12-31 03:34:42 +00:00
eval_results.json Model save 2023-12-30 12:27:37 +00:00
generation_config.json Model save 2023-12-30 12:27:37 +00:00
model.safetensors Model save 2023-12-30 12:27:37 +00:00
README.md Update examples in README to be compatible with soon-to-come ChatWidget (#23) 2024-03-17 05:07:08 +00:00
special_tokens_map.json Model save 2023-12-30 12:27:37 +00:00
tokenizer_config.json Model save 2023-12-30 12:27:37 +00:00
tokenizer.json Model save 2023-12-30 12:27:37 +00:00
tokenizer.model Upload tokenizer.model 2024-01-02 03:15:46 +00:00

license datasets language widget
apache-2.0
cerebras/SlimPajama-627B
bigcode/starcoderdata
HuggingFaceH4/ultrachat_200k
HuggingFaceH4/ultrafeedback_binarized
en
example_title messages
Fibonacci (Python)
role content
system You are a chatbot who can help code!
role content
user Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.

TinyLlama-1.1B

https://github.com/jzhang38/TinyLlama

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.

This Model

This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. We follow HF's Zephyr's training recipe. The model was " initially fine-tuned on a variant of the UltraChat dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. We then further aligned the model with 🤗 TRL's DPOTrainer on the openbmb/UltraFeedback dataset, which contain 64k prompts and model completions that are ranked by GPT-4."

How to use

You will need the transformers>=4.34 Do check the TinyLlama github page for more information.

# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...