docs: 添加英文版模型卡 README_en.md
- 保留相同的 YAML 元数据 - 完整的英文模型说明 - 与中文版内容对应
This commit is contained in:
parent
cabd63252b
commit
6d03adb1ae
251
README_en.md
Normal file
251
README_en.md
Normal file
@ -0,0 +1,251 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
datasets:
|
||||
- pjura/mahjong_board_states
|
||||
language:
|
||||
- zh
|
||||
base_model:
|
||||
- unsloth/Qwen3-4B-Instruct-2507
|
||||
tags:
|
||||
- riichi-mahjong
|
||||
- game-ai
|
||||
- qwen
|
||||
- qwen3
|
||||
- mahjong
|
||||
- discard-recommendation
|
||||
- gguf
|
||||
pipeline_tag: text-generation
|
||||
---
|
||||
|
||||
# Qwen3-4B-Instruct-2507-mahjong-alpha
|
||||
|
||||
`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.
|
||||
|
||||
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
|
||||
|
||||
The current version is mainly intended for tool integration. The output is a single tile text only, without explanation.
|
||||
|
||||
## Model Features
|
||||
|
||||
- **Task**: 4-player Riichi Mahjong discard recommendation
|
||||
- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507`
|
||||
- **Fine-tuning**: `QLoRA`
|
||||
- **Training Framework**: `Unsloth`
|
||||
- **Release Format**: `GGUF (F16)`
|
||||
- **Inference**: `llama.cpp`
|
||||
- **Maintainer**: `TTDXQ`
|
||||
|
||||
## Scope
|
||||
|
||||
This model targets 4-player Riichi Mahjong without red dora. The current version focuses only on discard recommendation. It does not provide full-game planning, yaku/score analysis, or detailed offense-defense explanations.
|
||||
|
||||
## Limitations
|
||||
|
||||
- Discard recommendation only
|
||||
- No full-game planning
|
||||
- No yaku, point calculation, or detailed strategic explanation
|
||||
- Not guaranteed for competitive or real-match performance
|
||||
- For research and learning purposes only
|
||||
|
||||
## Prohibited Uses
|
||||
|
||||
This model must not be used for:
|
||||
|
||||
- cheating
|
||||
- game automation or plug-ins
|
||||
- account boosting or ghost-playing
|
||||
- real-money gambling assistance
|
||||
|
||||
## Input and Output
|
||||
|
||||
### Input Format
|
||||
|
||||
The model input is a structured natural-language game-state description. Example:
|
||||
|
||||
```text
|
||||
[情景分析]
|
||||
- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。
|
||||
- 状态: 当前排名 1/4 (与一位差 0)。
|
||||
- 宝牌: 5万
|
||||
- 各玩家分数: 你有 25分, 下家: 25分, 对家: 25分, 上家: 25分。
|
||||
- 你的手牌: 1万 5万 7万 3筒 5筒 6筒 8筒 8筒 3索 5索 8索 南 白 发
|
||||
- 牌效: 5 向听,进张 82 张。
|
||||
- 防御:
|
||||
最安全牌放铳率:11.3%
|
||||
平均放铳率:18.5%
|
||||
最危险牌放铳率:25.9%
|
||||
场上已见牌信息
|
||||
各玩家副露信息:本家副露:无, 下家副露:无, 对家副露:无, 上家副露:无
|
||||
各玩家牌河信息:本家:无, 下家:无, 对家:无, 上家:无
|
||||
|
||||
[任务]
|
||||
根据当前情景,选择一张最应该打出的手牌。
|
||||
```
|
||||
|
||||
### Output Format
|
||||
|
||||
The output is strictly a single tile text without any prefix like "discard" and without explanation. Example:
|
||||
|
||||
```text
|
||||
白
|
||||
```
|
||||
|
||||
## How to Use
|
||||
|
||||
### Inference with llama.cpp
|
||||
|
||||
```bash
|
||||
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
|
||||
```
|
||||
|
||||
### Python Inference Example
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
|
||||
)
|
||||
tokenizer = AutoTokenizer.from_pretrained(
|
||||
"TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
|
||||
)
|
||||
|
||||
# Prepare input
|
||||
input_text = "[情景分析]\n- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。\n..."
|
||||
|
||||
# Inference
|
||||
inputs = tokenizer(input_text, return_tensors="pt")
|
||||
outputs = model.generate(**inputs, max_new_tokens=10)
|
||||
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
||||
print(result) # Output: 白
|
||||
```
|
||||
|
||||
## Dataset
|
||||
|
||||
The training data uses the 2018 subset of `pjura/mahjong_board_states`. This dataset originates from Tenhou.net gameplay records, with each record containing 511 data points covering game basics, dora indicators, player hand tiles, calls, discard piles, and discard decisions.
|
||||
|
||||
### Data Processing
|
||||
|
||||
The raw data was converted into human-readable natural language descriptions, with calculated turn numbers, actual dora, and simplified risk assessment. Sample distribution by turn:
|
||||
|
||||
- Turns 1-3: 15%
|
||||
- Turns 4-6: 20%
|
||||
- Turns 7-12: 35%
|
||||
|
||||
A total of `192000` samples were used, with no general instruction data or self-built data mixed in.
|
||||
|
||||
- Train: `192000`
|
||||
- Validation: `2000`
|
||||
- Test: sampled as needed from 2019 data
|
||||
- Train / validation / test are fully non-overlapping
|
||||
|
||||
### Dataset Citation
|
||||
|
||||
```bibtex
|
||||
@dataset{mahjong_board_states,
|
||||
title = {MahJong Board States Dataset},
|
||||
author = {Patrick Jura},
|
||||
year = {2024},
|
||||
url = {https://huggingface.co/datasets/pjura/mahjong_board_states}
|
||||
}
|
||||
```
|
||||
|
||||
## Training Details
|
||||
|
||||
### Model Configuration
|
||||
- Base Model: `unsloth/Qwen3-4B-Instruct-2507`
|
||||
- Training Precision: `4bit`
|
||||
- Fine-tuning Method: `QLoRA`
|
||||
- Framework: `Unsloth`
|
||||
- Max Sequence Length: `2048`
|
||||
|
||||
### LoRA Parameters
|
||||
- Rank: `128`
|
||||
- Alpha: `256`
|
||||
- Target Modules: All
|
||||
|
||||
### Training Hyperparameters
|
||||
- Learning Rate: `1e-4`
|
||||
- LR Scheduler: `cosine`
|
||||
- Batch Size: `64`
|
||||
- Per-device Batch: `2`
|
||||
- Gradient Accumulation Steps: `32`
|
||||
- Training Steps: `3000`
|
||||
- Warmup Steps: `300`
|
||||
- Random Seed: `3407`
|
||||
- Load Best Checkpoint: Yes
|
||||
|
||||
### Training Time
|
||||
- Total Duration: ~16.44 hours
|
||||
|
||||
## Evaluation Results
|
||||
|
||||
### Comparison with Dataset Actions
|
||||
|
||||
Inference parameters: Temperature=0.1, Top_P=0.1
|
||||
|
||||
**Metrics Explanation:**
|
||||
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
|
||||
- Full-match Rate: Samples where all 3 tests matched the dataset
|
||||
- Zero-score Rate: Samples where all 3 tests disagreed with the dataset
|
||||
|
||||
#### Tile-Efficiency Test
|
||||
|
||||
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|
||||
|-------|--------|-------|----------------|-----------------|
|
||||
| Qwen3-4B | Prompt Engineering | 50.21 | 6.60% | 86.13% |
|
||||
| Qwen3-4B | Fine-tuned | 229.66 | 45.87% | 53.93% |
|
||||
| DeepSeek-V3.2 | Prompt Engineering | 181.66 | 21.40% | 46.33% |
|
||||
|
||||
#### Defense Test
|
||||
|
||||
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|
||||
|-------|--------|-------|----------------|-----------------|
|
||||
| Qwen3-4B | Prompt Engineering | 53.55 | 6.17% | 84.43% |
|
||||
| Qwen3-4B | Fine-tuned | 239.89 | 47.93% | 52.00% |
|
||||
| DeepSeek-V3.2 | Prompt Engineering | 172.00 | 16.00% | 46.80% |
|
||||
|
||||
#### Comprehensive Test
|
||||
|
||||
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|
||||
|-------|--------|-------|----------------|-----------------|
|
||||
| Qwen3-4B | Prompt Engineering | 53.44 | 0.60% | 84.40% |
|
||||
| Qwen3-4B | Fine-tuned | 233.33 | 46.53% | 53.20% |
|
||||
| DeepSeek-V3.2 | Prompt Engineering | 179.44 | 18.07% | 44.93% |
|
||||
|
||||
### Comparison with Mortal
|
||||
|
||||
Inference parameters: Temperature=0.6, Top_P=0.95
|
||||
|
||||
#### Test 1: All Turn Data
|
||||
|
||||
- Samples: 3000
|
||||
- Top-1 Accuracy: **50.73%**
|
||||
- Top-3 Accuracy: **83.37%**
|
||||
|
||||
#### Test 2: Excluding Early Turns
|
||||
|
||||
- Valid Samples: 3000
|
||||
- Top-1 Accuracy: **48.70%**
|
||||
- Top-3 Accuracy: **79.20%**
|
||||
|
||||
> Note: Mortal is one of the strongest open-source Riichi Mahjong AIs currently available
|
||||
|
||||
## Repository Links
|
||||
|
||||
- GitHub: https://github.com/ttdxq/Qwen3-4B-Instruct-2507-mahjong-alpha
|
||||
- Hugging Face: https://huggingface.co/TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha
|
||||
|
||||
## License
|
||||
|
||||
This model follows the Apache License 2.0.
|
||||
|
||||
The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
Thanks to the following open-source resources:
|
||||
|
||||
- `unsloth/Qwen3-4B-Instruct-2507`
|
||||
- `pjura/mahjong_board_states`
|
||||
- `Mortal`
|
||||
Loading…
Reference in New Issue
Block a user