Update README.md

This commit is contained in:
Gustavo de Rosa 2023-12-13 23:01:12 +00:00 committed by huggingface-web
parent 80c0ba9f8e
commit f27cd936bd

@ -75,13 +75,34 @@ def print_prime(n):
``` ```
where the model generates the text after the comments. where the model generates the text after the comments.
**Notes** **Notes:**
* Phi-1.5 is intended for research purposes. The model-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications. * Phi-1.5 is intended for research purposes. The model-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications.
* Direct adoption for production tasks is out of the scope of this research project. As a result, Phi-1.5 has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details. * Direct adoption for production tasks is out of the scope of this research project. As a result, Phi-1.5 has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details.
* If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects. * If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects.
## Sample Code ## Sample Code
There are four types of execution mode:
1. FP16 / Flash-Attention / CUDA:
```python
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype="auto", flash_attn=True, flash_rotary=True, fused_dense=True, device_map="cuda", trust_remote_code=True)
```
2. FP16 / CUDA:
```python
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype="auto", device_map="cuda", trust_remote_code=True)
```
3. FP32 / CUDA:
```python
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype=torch.float32, device_map="cuda", trust_remote_code=True)
```
4. FP32 / CPU:
```python
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype=torch.float32, device_map="cpu", trust_remote_code=True)
```
To ensure the maximum compatibility, we recommend using the second execution mode (FP16 / CUDA), as follows:
```python ```python
import torch import torch
from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import AutoModelForCausalLM, AutoTokenizer
@ -91,8 +112,7 @@ torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype="auto", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
inputs = tokenizer('''```python inputs = tokenizer('''def print_prime(n):
def print_prime(n):
""" """
Print all primes between 1 and n Print all primes between 1 and n
"""''', return_tensors="pt", return_attention_mask=False) """''', return_tensors="pt", return_attention_mask=False)
@ -102,9 +122,10 @@ text = tokenizer.batch_decode(outputs)[0]
print(text) print(text)
``` ```
**Remark.** In the generation function, our model currently does not support beam search (`num_beams > 1`). **Remark:** In the generation function, our model currently does not support beam search (`num_beams > 1`).
Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings. Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings.
## Limitations of Phi-1.5 ## Limitations of Phi-1.5
* Generate Inaccurate Code and Facts: The model often produces incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions. * Generate Inaccurate Code and Facts: The model often produces incorrect code snippets and statements. Users should treat these outputs as suggestions or starting points, not as definitive or accurate solutions.