Upload tokenizer
This commit is contained in:
parent
16982066f0
commit
67a43eb1b5
40
added_tokens.json
Normal file
40
added_tokens.json
Normal file
@ -0,0 +1,40 @@
|
||||
{
|
||||
"\t\t": 50294,
|
||||
"\t\t\t": 50293,
|
||||
"\t\t\t\t": 50292,
|
||||
"\t\t\t\t\t": 50291,
|
||||
"\t\t\t\t\t\t": 50290,
|
||||
"\t\t\t\t\t\t\t": 50289,
|
||||
"\t\t\t\t\t\t\t\t": 50288,
|
||||
"\t\t\t\t\t\t\t\t\t": 50287,
|
||||
" ": 50286,
|
||||
" ": 50285,
|
||||
" ": 50284,
|
||||
" ": 50283,
|
||||
" ": 50282,
|
||||
" ": 50281,
|
||||
" ": 50280,
|
||||
" ": 50279,
|
||||
" ": 50278,
|
||||
" ": 50277,
|
||||
" ": 50276,
|
||||
" ": 50275,
|
||||
" ": 50274,
|
||||
" ": 50273,
|
||||
" ": 50272,
|
||||
" ": 50271,
|
||||
" ": 50270,
|
||||
" ": 50269,
|
||||
" ": 50268,
|
||||
" ": 50267,
|
||||
" ": 50266,
|
||||
" ": 50265,
|
||||
" ": 50264,
|
||||
" ": 50263,
|
||||
" ": 50262,
|
||||
" ": 50261,
|
||||
" ": 50260,
|
||||
" ": 50259,
|
||||
" ": 50258,
|
||||
" ": 50257
|
||||
}
|
||||
50001
merges.txt
Normal file
50001
merges.txt
Normal file
File diff suppressed because it is too large
Load Diff
5
special_tokens_map.json
Normal file
5
special_tokens_map.json
Normal file
@ -0,0 +1,5 @@
|
||||
{
|
||||
"bos_token": "<|endoftext|>",
|
||||
"eos_token": "<|endoftext|>",
|
||||
"unk_token": "<|endoftext|>"
|
||||
}
|
||||
100647
tokenizer.json
Normal file
100647
tokenizer.json
Normal file
File diff suppressed because it is too large
Load Diff
9
tokenizer_config.json
Normal file
9
tokenizer_config.json
Normal file
@ -0,0 +1,9 @@
|
||||
{
|
||||
"add_prefix_space": false,
|
||||
"bos_token": "<|endoftext|>",
|
||||
"clean_up_tokenization_spaces": true,
|
||||
"eos_token": "<|endoftext|>",
|
||||
"model_max_length": 2048,
|
||||
"tokenizer_class": "CodeGenTokenizer",
|
||||
"unk_token": "<|endoftext|>"
|
||||
}
|
||||
1
vocab.json
Normal file
1
vocab.json
Normal file
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue
Block a user