docs: 添加英文版模型卡 README_en.md

- 保留相同的 YAML 元数据
- 完整的英文模型说明
- 与中文版内容对应
This commit is contained in:
TTDXQ 2026-03-15 23:39:01 +08:00
parent cabd63252b
commit 6d03adb1ae

251
README_en.md Normal file

@ -0,0 +1,251 @@
---
license: apache-2.0
datasets:
- pjura/mahjong_board_states
language:
- zh
base_model:
- unsloth/Qwen3-4B-Instruct-2507
tags:
- riichi-mahjong
- game-ai
- qwen
- qwen3
- mahjong
- discard-recommendation
- gguf
pipeline_tag: text-generation
---
# Qwen3-4B-Instruct-2507-mahjong-alpha
`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
The current version is mainly intended for tool integration. The output is a single tile text only, without explanation.
## Model Features
- **Task**: 4-player Riichi Mahjong discard recommendation
- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507`
- **Fine-tuning**: `QLoRA`
- **Training Framework**: `Unsloth`
- **Release Format**: `GGUF (F16)`
- **Inference**: `llama.cpp`
- **Maintainer**: `TTDXQ`
## Scope
This model targets 4-player Riichi Mahjong without red dora. The current version focuses only on discard recommendation. It does not provide full-game planning, yaku/score analysis, or detailed offense-defense explanations.
## Limitations
- Discard recommendation only
- No full-game planning
- No yaku, point calculation, or detailed strategic explanation
- Not guaranteed for competitive or real-match performance
- For research and learning purposes only
## Prohibited Uses
This model must not be used for:
- cheating
- game automation or plug-ins
- account boosting or ghost-playing
- real-money gambling assistance
## Input and Output
### Input Format
The model input is a structured natural-language game-state description. Example:
```text
[情景分析]
- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
- 宝牌: 5万
- 各玩家分数: 你有 25分, 下家: 25分, 对家: 25分, 上家: 25分。
- 你的手牌: 1万 5万 7万 3筒 5筒 6筒 8筒 8筒 3索 5索 8索 南 白 发
- 牌效: 5 向听,进张 82 张。
- 防御:
最安全牌放铳率11.3%
平均放铳率18.5%
最危险牌放铳率25.9%
场上已见牌信息
各玩家副露信息:本家副露:无, 下家副露:无, 对家副露:无, 上家副露:无
各玩家牌河信息:本家:无, 下家:无, 对家:无, 上家:无
[任务]
根据当前情景,选择一张最应该打出的手牌。
```
### Output Format
The output is strictly a single tile text without any prefix like "discard" and without explanation. Example:
```text
```
## How to Use
### Inference with llama.cpp
```bash
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
```
### Python Inference Example
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)
tokenizer = AutoTokenizer.from_pretrained(
"TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)
# Prepare input
input_text = "[情景分析]\n- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。\n..."
# Inference
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result) # Output: 白
```
## Dataset
The training data uses the 2018 subset of `pjura/mahjong_board_states`. This dataset originates from Tenhou.net gameplay records, with each record containing 511 data points covering game basics, dora indicators, player hand tiles, calls, discard piles, and discard decisions.
### Data Processing
The raw data was converted into human-readable natural language descriptions, with calculated turn numbers, actual dora, and simplified risk assessment. Sample distribution by turn:
- Turns 1-3: 15%
- Turns 4-6: 20%
- Turns 7-12: 35%
A total of `192000` samples were used, with no general instruction data or self-built data mixed in.
- Train: `192000`
- Validation: `2000`
- Test: sampled as needed from 2019 data
- Train / validation / test are fully non-overlapping
### Dataset Citation
```bibtex
@dataset{mahjong_board_states,
title = {MahJong Board States Dataset},
author = {Patrick Jura},
year = {2024},
url = {https://huggingface.co/datasets/pjura/mahjong_board_states}
}
```
## Training Details
### Model Configuration
- Base Model: `unsloth/Qwen3-4B-Instruct-2507`
- Training Precision: `4bit`
- Fine-tuning Method: `QLoRA`
- Framework: `Unsloth`
- Max Sequence Length: `2048`
### LoRA Parameters
- Rank: `128`
- Alpha: `256`
- Target Modules: All
### Training Hyperparameters
- Learning Rate: `1e-4`
- LR Scheduler: `cosine`
- Batch Size: `64`
- Per-device Batch: `2`
- Gradient Accumulation Steps: `32`
- Training Steps: `3000`
- Warmup Steps: `300`
- Random Seed: `3407`
- Load Best Checkpoint: Yes
### Training Time
- Total Duration: ~16.44 hours
## Evaluation Results
### Comparison with Dataset Actions
Inference parameters: Temperature=0.1, Top_P=0.1
**Metrics Explanation:**
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
- Full-match Rate: Samples where all 3 tests matched the dataset
- Zero-score Rate: Samples where all 3 tests disagreed with the dataset
#### Tile-Efficiency Test
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|-------|--------|-------|----------------|-----------------|
| Qwen3-4B | Prompt Engineering | 50.21 | 6.60% | 86.13% |
| Qwen3-4B | Fine-tuned | 229.66 | 45.87% | 53.93% |
| DeepSeek-V3.2 | Prompt Engineering | 181.66 | 21.40% | 46.33% |
#### Defense Test
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|-------|--------|-------|----------------|-----------------|
| Qwen3-4B | Prompt Engineering | 53.55 | 6.17% | 84.43% |
| Qwen3-4B | Fine-tuned | 239.89 | 47.93% | 52.00% |
| DeepSeek-V3.2 | Prompt Engineering | 172.00 | 16.00% | 46.80% |
#### Comprehensive Test
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|-------|--------|-------|----------------|-----------------|
| Qwen3-4B | Prompt Engineering | 53.44 | 0.60% | 84.40% |
| Qwen3-4B | Fine-tuned | 233.33 | 46.53% | 53.20% |
| DeepSeek-V3.2 | Prompt Engineering | 179.44 | 18.07% | 44.93% |
### Comparison with Mortal
Inference parameters: Temperature=0.6, Top_P=0.95
#### Test 1: All Turn Data
- Samples: 3000
- Top-1 Accuracy: **50.73%**
- Top-3 Accuracy: **83.37%**
#### Test 2: Excluding Early Turns
- Valid Samples: 3000
- Top-1 Accuracy: **48.70%**
- Top-3 Accuracy: **79.20%**
> Note: Mortal is one of the strongest open-source Riichi Mahjong AIs currently available
## Repository Links
- GitHub: https://github.com/ttdxq/Qwen3-4B-Instruct-2507-mahjong-alpha
- Hugging Face: https://huggingface.co/TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha
## License
This model follows the Apache License 2.0.
The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.
## Acknowledgements
Thanks to the following open-source resources:
- `unsloth/Qwen3-4B-Instruct-2507`
- `pjura/mahjong_board_states`
- `Mortal`