Commit Graph

67 Commits

Author SHA1 Message Date
Gustavo de Rosa
bffd3b29c4
Update LICENSE 2024-02-06 12:36:39 +00:00
Gustavo de Rosa
349cf8b5e8
Update README.md 2024-01-24 13:34:13 +00:00
Gustavo de Rosa
83b9c52637
Update README.md 2024-01-22 12:25:40 +00:00
Gustavo de Rosa
675e8c1bae
Update config.json 2024-01-22 12:25:27 +00:00
Gustavo de Rosa
34a1490e06
Update modeling_phi.py 2024-01-16 16:05:38 +00:00
Gustavo de Rosa
59e722d14e
Update README.md 2024-01-16 14:56:49 +00:00
Gustavo de Rosa
426ea900b0
Update modeling_phi.py 2024-01-15 14:26:10 +00:00
Gustavo de Rosa
3edb5e62c4
Update modeling_phi.py 2024-01-12 00:44:23 +00:00
Gustavo de Rosa
e0f03c4877
Update modeling_phi.py 2024-01-11 16:40:17 +00:00
Gustavo de Rosa
051d15f1e7
Update config.json 2024-01-11 11:22:42 +00:00
Gustavo de Rosa
914c8fb3c6 Upload modeling_phi.py 2024-01-10 13:54:40 +00:00
Gustavo de Rosa
3a705a2d6b Delete Research License.docx 2024-01-10 13:16:00 +00:00
Gustavo de Rosa
341a17a8f2 Upload 5 files 2024-01-10 13:15:50 +00:00
Gustavo de Rosa
1dc35eb2f5 Update README.md (#69)
- Update README.md (8584061b4d9f189aea26e170cb1c285a22fe731d)


Co-authored-by: Mojan Javaheripi <mojanjp@users.noreply.huggingface.co>
2024-01-10 11:29:00 +00:00
Gustavo de Rosa
41217aafb5 Update config.json 2024-01-08 17:13:22 +00:00
Gustavo de Rosa
d3ba318b78 chore(root): Updates files to internal transformers implementation. 2024-01-08 13:12:24 +00:00
Gustavo de Rosa
24f9ea14df Update README.md 2023-12-13 23:24:09 +00:00
Gustavo de Rosa
d262514668 Upload 4 files 2023-12-13 23:19:24 +00:00
Gustavo de Rosa
f27cd936bd Update README.md 2023-12-13 23:01:12 +00:00
Gustavo de Rosa
80c0ba9f8e Update README.md 2023-12-13 22:44:59 +00:00
Gustavo de Rosa
a286f5c1de Disables inference API to prevent mismatch with HF implementation. 2023-12-13 21:54:41 +00:00
Gustavo de Rosa
ca573e3fa3 fix(modeling_phi): Fixes initial generation with length larger than context length. 2023-12-08 17:40:16 +00:00
Gustavo de Rosa
37527ba0b8 fix(modeling_phi): Fixes cached generation when above maximum context length. 2023-12-05 21:09:53 +00:00
Gustavo de Rosa
5fd430c7bc Fixes exceeding maximum sequence length when using generate(). 2023-11-20 18:11:04 +00:00
Gustavo de Rosa
d212a78962 Delete modeling_mixformer_sequential.py 2023-11-16 18:10:37 +00:00
Gustavo de Rosa
8e9ebfb9bf Delete configuration_mixformer_sequential.py 2023-11-16 18:10:30 +00:00
Gustavo de Rosa
271c3397ab Update to new model interface. 2023-11-16 17:28:06 +00:00
Gustavo de Rosa
92557d03bb Improves type hinting on configuration arguments. 2023-11-01 23:40:19 +00:00
Gustavo de Rosa
45f4b21525 Enables to toggle fused_dense, flash_rotary and attn_pdrop in the configuration. 2023-11-01 23:33:57 +00:00
Gustavo de Rosa
0254d42a95 Fixes flash-attn import with a try/except statement 2023-11-01 23:32:35 +00:00
Gustavo de Rosa
0bbd68a176 Adds support for flash-attn rotary embedding and fused dense layers. 2023-11-01 20:40:12 +00:00
Gustavo de Rosa
de35f900d3 Adds support for MQA/GQA and attention mask during training. 2023-10-30 16:59:12 +00:00
Gustavo de Rosa
d38e6f954e Update modeling_mixformer_sequential.py
Removes print regarding attention_mask to prevent excessive information from being logged.
2023-10-26 20:01:15 +00:00
Gustavo de Rosa
8091327f9e Adding _set_gradient_checkpointing for compatibility (#22)
- Adding _set_gradient_checkpointing for compatibility (a30a931294ac0f344a0c1547877c692ceb17123c)


Co-authored-by: Vicente Rivera <vriveras@users.noreply.huggingface.co>
2023-10-17 12:11:30 +00:00
Gustavo de Rosa
b6a7e2fe15 Upload modeling_mixformer_sequential.py 2023-09-27 15:22:44 +00:00
Gustavo de Rosa
8ab0f29ff6 Add more precise license metadata (UI will be cleaner!) (#35)
- Add more precise license metadata (UI will be cleaner!) (2c182742af8c7c93f0f4ee1180232a5d0c114958)


Co-authored-by: Julien Chaumond <julien-c@users.noreply.huggingface.co>
2023-09-27 15:20:42 +00:00
Gustavo de Rosa
bc09a085e7 Upload README.md 2023-09-27 14:04:07 +00:00
Gustavo de Rosa
f9f2ac7c45 fix(phi-1_5): Checks length of attention_maskif it is passed as direct tensor. 2023-09-26 21:21:45 +00:00
Gustavo de Rosa
3128bb636a Support for attention_mask in forward pass.
This commit implements the following:

- Cleans up unused arguments and definitions.
- Adds support for `attention_mask`.
- Adds support for cached inference.
2023-09-26 18:17:08 +00:00
Gustavo de Rosa
4a426d8015 add _no_split_modules property (#17)
- add _no_split_modules property (7e925ddfdf2d1bb29fc26db755aafd77fb8f565e)


Co-authored-by: wing lian <winglian@users.noreply.huggingface.co>
2023-09-15 22:57:07 +00:00
Gunasekar
7d482ddf93 Update README.md 2023-09-14 00:44:40 +00:00
Gunasekar
c8f6ad8189 Update README.md 2023-09-12 18:40:56 +00:00
Gustavo de Rosa
762a3110be Link paper to arXiv (#5)
- Link paper to arXiv (c30653547e6bbdc00a068e538a7f84ed568d1918)


Co-authored-by: Omar Sanseviero <osanseviero@users.noreply.huggingface.co>
2023-09-12 16:01:41 +00:00
Gunasekar
ea95720a35 Update README.md 2023-09-12 01:38:42 +00:00
Gunasekar
4bba51c9b5 Update README.md 2023-09-11 21:45:49 +00:00
Gunasekar
52e294acfe Update README.md 2023-09-11 21:44:15 +00:00
Gunasekar
9efbcafbe4 Upload tokenizer 2023-09-11 21:30:53 +00:00
Gunasekar
d655135ca1 Upload MixFormerSequentialForCausalLM 2023-09-11 21:30:53 +00:00
Gunasekar
07a048efa7 Update README.md 2023-09-11 07:57:24 +00:00
Gunasekar
b63051536f Update README.md 2023-09-11 07:56:12 +00:00