docs: 添加英文版模型卡 README_en.md
- 保留 Hugging Face YAML 元数据 - 添加完整的英文模型说明 - 包含使用方法、评测结果、数据集引用 - 与中文版 README.md 内容对齐
This commit is contained in:
parent
6d03adb1ae
commit
ad278ac7b0
26
README_en.md
26
README_en.md
@ -4,6 +4,7 @@ datasets:
|
|||||||
- pjura/mahjong_board_states
|
- pjura/mahjong_board_states
|
||||||
language:
|
language:
|
||||||
- zh
|
- zh
|
||||||
|
- en
|
||||||
base_model:
|
base_model:
|
||||||
- unsloth/Qwen3-4B-Instruct-2507
|
- unsloth/Qwen3-4B-Instruct-2507
|
||||||
tags:
|
tags:
|
||||||
@ -19,6 +20,8 @@ pipeline_tag: text-generation
|
|||||||
|
|
||||||
# Qwen3-4B-Instruct-2507-mahjong-alpha
|
# Qwen3-4B-Instruct-2507-mahjong-alpha
|
||||||
|
|
||||||
|
[中文](./README.md)
|
||||||
|
|
||||||
`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.
|
`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.
|
||||||
|
|
||||||
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
|
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
|
||||||
@ -28,10 +31,10 @@ The current version is mainly intended for tool integration. The output is a sin
|
|||||||
## Model Features
|
## Model Features
|
||||||
|
|
||||||
- **Task**: 4-player Riichi Mahjong discard recommendation
|
- **Task**: 4-player Riichi Mahjong discard recommendation
|
||||||
- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507`
|
- **Base model**: `unsloth/Qwen3-4B-Instruct-2507`
|
||||||
- **Fine-tuning**: `QLoRA`
|
- **Fine-tuning**: `QLoRA`
|
||||||
- **Training Framework**: `Unsloth`
|
- **Training framework**: `Unsloth`
|
||||||
- **Release Format**: `GGUF (F16)`
|
- **Release format**: `GGUF (F16)`
|
||||||
- **Inference**: `llama.cpp`
|
- **Inference**: `llama.cpp`
|
||||||
- **Maintainer**: `TTDXQ`
|
- **Maintainer**: `TTDXQ`
|
||||||
|
|
||||||
@ -90,9 +93,9 @@ The output is strictly a single tile text without any prefix like "discard" and
|
|||||||
白
|
白
|
||||||
```
|
```
|
||||||
|
|
||||||
## How to Use
|
## Usage
|
||||||
|
|
||||||
### Inference with llama.cpp
|
### llama.cpp Inference
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
|
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
|
||||||
@ -111,7 +114,10 @@ tokenizer = AutoTokenizer.from_pretrained(
|
|||||||
)
|
)
|
||||||
|
|
||||||
# Prepare input
|
# Prepare input
|
||||||
input_text = "[情景分析]\n- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。\n..."
|
input_text = """[情景分析]
|
||||||
|
- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。
|
||||||
|
- 状态: 当前排名 1/4 (与一位差 0)。
|
||||||
|
..."""
|
||||||
|
|
||||||
# Inference
|
# Inference
|
||||||
inputs = tokenizer(input_text, return_tensors="pt")
|
inputs = tokenizer(input_text, return_tensors="pt")
|
||||||
@ -184,10 +190,10 @@ A total of `192000` samples were used, with no general instruction data or self-
|
|||||||
|
|
||||||
Inference parameters: Temperature=0.1, Top_P=0.1
|
Inference parameters: Temperature=0.1, Top_P=0.1
|
||||||
|
|
||||||
**Metrics Explanation:**
|
**Metrics explanation**:
|
||||||
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
|
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
|
||||||
- Full-match Rate: Samples where all 3 tests matched the dataset
|
- Full-match rate: Samples where all 3 tests matched the dataset
|
||||||
- Zero-score Rate: Samples where all 3 tests disagreed with the dataset
|
- Zero-score rate: Samples where all 3 tests disagreed with the dataset
|
||||||
|
|
||||||
#### Tile-Efficiency Test
|
#### Tile-Efficiency Test
|
||||||
|
|
||||||
@ -238,7 +244,7 @@ Inference parameters: Temperature=0.6, Top_P=0.95
|
|||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
This model follows the Apache License 2.0.
|
This model is licensed under Apache License 2.0.
|
||||||
|
|
||||||
The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.
|
The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user