TinyLlama-1.1B-Chat-v1.0

Go to file

Ryan Nelson a19aa7c77d Adding ONNX file of this model Beep boop I am the [ONNX export bot 🤖🏎️](https://huggingface.co/spaces/onnx/export). On behalf of [WRCREX](https://huggingface.co/WRCREX), I would like to add to this repository the model converted to ONNX. What is ONNX? It stands for "Open Neural Network Exchange", and is the most commonly used open standard for machine learning interoperability. You can find out more at [onnx.ai](https://onnx.ai/)! The exported ONNX model can be then be consumed by various backends as TensorRT or TVM, or simply be used in a few lines with 🤗 Optimum through ONNX Runtime, check out how [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models)!		2025-03-10 19:26:12 +00:00
onnx	Adding ONNX file of this model	2025-03-10 19:26:12 +00:00
.gitattributes	Adding ONNX file of this model	2025-03-10 19:26:12 +00:00
config.json	Update config.json	2023-12-31 03:34:42 +00:00
eval_results.json	Model save	2023-12-30 12:27:37 +00:00
generation_config.json	Model save	2023-12-30 12:27:37 +00:00
model.safetensors	Model save	2023-12-30 12:27:37 +00:00
README.md	Update examples in README to be compatible with soon-to-come ChatWidget (#23 )	2024-03-17 05:07:08 +00:00
special_tokens_map.json	Model save	2023-12-30 12:27:37 +00:00
tokenizer_config.json	Model save	2023-12-30 12:27:37 +00:00
tokenizer.json	Model save	2023-12-30 12:27:37 +00:00
tokenizer.model	Upload tokenizer.model	2024-01-02 03:15:46 +00:00

README.md

license

datasets

language

widget

apache-2.0

cerebras/SlimPajama-627B

bigcode/starcoderdata

HuggingFaceH4/ultrachat_200k

HuggingFaceH4/ultrafeedback_binarized

example_title

messages

Fibonacci (Python)

role	content
system	You are a chatbot who can help code!

role	content
user	Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.

TinyLlama-1.1B

https://github.com/jzhang38/TinyLlama

The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs 🚀🚀. The training has started on 2023-09-01.

We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.

This Model

This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T. We follow HF's Zephyr's training recipe. The model was " initially fine-tuned on a variant of the UltraChat dataset, which contains a diverse range of synthetic dialogues generated by ChatGPT. We then further aligned the model with 🤗 TRL's DPOTrainer on the openbmb/UltraFeedback dataset, which contain 64k prompts and model completions that are ranked by GPT-4."

How to use

You will need the transformers>=4.34 Do check the TinyLlama github page for more information.

# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...