|
|
||
|---|---|---|
| .gitattributes | ||
| Qwen3-4B-Instruct-2507-mahjong-alpha.gguf | ||
| README_en.md | ||
| README.md | ||
| license | datasets | language | base_model | tags | pipeline_tag | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 |
|
|
|
|
text-generation |
Qwen3-4B-Instruct-2507-mahjong-alpha
Qwen3-4B-Instruct-2507-mahjong-alpha is a Riichi Mahjong domain model fine-tuned from unsloth/Qwen3-4B-Instruct-2507 with QLoRA.
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
The current version is mainly intended for tool integration. The output is a single tile text only, without explanation.
Model Features
- Task: 4-player Riichi Mahjong discard recommendation
- Base model:
unsloth/Qwen3-4B-Instruct-2507 - Fine-tuning:
QLoRA - Training framework:
Unsloth - Release format:
GGUF (F16) - Inference:
llama.cpp - Maintainer:
TTDXQ
Scope
This model targets 4-player Riichi Mahjong without red dora. The current version focuses only on discard recommendation. It does not provide full-game planning, yaku/score analysis, or detailed offense-defense explanations.
Limitations
- Discard recommendation only
- No full-game planning
- No yaku, point calculation, or detailed strategic explanation
- Not guaranteed for competitive or real-match performance
- For research and learning purposes only
Prohibited Uses
This model must not be used for:
- cheating
- game automation or plug-ins
- account boosting or ghost-playing
- real-money gambling assistance
Input and Output
Input Format
The model input is a structured natural-language game-state description. Example:
[情景分析]
- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
- 宝牌: 5万
- 各玩家分数: 你有 25分, 下家: 25分, 对家: 25分, 上家: 25分。
- 你的手牌: 1万 5万 7万 3筒 5筒 6筒 8筒 8筒 3索 5索 8索 南 白 发
- 牌效: 5 向听,进张 82 张。
- 防御:
最安全牌放铳率:11.3%
平均放铳率:18.5%
最危险牌放铳率:25.9%
场上已见牌信息
各玩家副露信息:本家副露:无, 下家副露:无, 对家副露:无, 上家副露:无
各玩家牌河信息:本家:无, 下家:无, 对家:无, 上家:无
[任务]
根据当前情景,选择一张最应该打出的手牌。
Output Format
The output is strictly a single tile text without any prefix like "discard" and without explanation. Example:
白
Usage
llama.cpp Inference
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
Python Inference Example
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)
tokenizer = AutoTokenizer.from_pretrained(
"TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)
# Prepare input
input_text = """[情景分析]
- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
..."""
# Inference
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result) # Output: 白
Dataset
The training data uses the 2018 subset of pjura/mahjong_board_states. This dataset originates from Tenhou.net gameplay records, with each record containing 511 data points covering game basics, dora indicators, player hand tiles, calls, discard piles, and discard decisions.
Data Processing
The raw data was converted into human-readable natural language descriptions, with calculated turn numbers, actual dora, and simplified risk assessment. Sample distribution by turn:
- Turns 1-3: 15%
- Turns 4-6: 20%
- Turns 7-12: 35%
A total of 192000 samples were used, with no general instruction data or self-built data mixed in.
- Train:
192000 - Validation:
2000 - Test: sampled as needed from 2019 data
- Train / validation / test are fully non-overlapping
Dataset Citation
@dataset{mahjong_board_states,
title = {MahJong Board States Dataset},
author = {Patrick Jura},
year = {2024},
url = {https://huggingface.co/datasets/pjura/mahjong_board_states}
}
Training Details
Model Configuration
- Base Model:
unsloth/Qwen3-4B-Instruct-2507 - Training Precision:
4bit - Fine-tuning Method:
QLoRA - Framework:
Unsloth - Max Sequence Length:
2048
LoRA Parameters
- Rank:
128 - Alpha:
256 - Target Modules: All
Training Hyperparameters
- Learning Rate:
1e-4 - LR Scheduler:
cosine - Batch Size:
64 - Per-device Batch:
2 - Gradient Accumulation Steps:
32 - Training Steps:
3000 - Warmup Steps:
300 - Random Seed:
3407 - Load Best Checkpoint: Yes
Training Time
- Total Duration: ~16.44 hours
Evaluation Results
Comparison with Dataset Actions
Inference parameters: Temperature=0.1, Top_P=0.1
Metrics explanation:
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
- Full-match rate: Samples where all 3 tests matched the dataset
- Zero-score rate: Samples where all 3 tests disagreed with the dataset
Tile-Efficiency Test
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|---|---|---|---|---|
| Qwen3-4B | Prompt Engineering | 50.21 | 6.60% | 86.13% |
| Qwen3-4B | Fine-tuned | 229.66 | 45.87% | 53.93% |
| DeepSeek-V3.2 | Prompt Engineering | 181.66 | 21.40% | 46.33% |
Defense Test
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|---|---|---|---|---|
| Qwen3-4B | Prompt Engineering | 53.55 | 6.17% | 84.43% |
| Qwen3-4B | Fine-tuned | 239.89 | 47.93% | 52.00% |
| DeepSeek-V3.2 | Prompt Engineering | 172.00 | 16.00% | 46.80% |
Comprehensive Test
| Model | Method | Score | Full-match Rate | Zero-score Rate |
|---|---|---|---|---|
| Qwen3-4B | Prompt Engineering | 53.44 | 0.60% | 84.40% |
| Qwen3-4B | Fine-tuned | 233.33 | 46.53% | 53.20% |
| DeepSeek-V3.2 | Prompt Engineering | 179.44 | 18.07% | 44.93% |
Comparison with Mortal
Inference parameters: Temperature=0.6, Top_P=0.95
Test 1: All Turn Data
- Samples: 3000
- Top-1 Accuracy: 50.73%
- Top-3 Accuracy: 83.37%
Test 2: Excluding Early Turns
- Valid Samples: 3000
- Top-1 Accuracy: 48.70%
- Top-3 Accuracy: 79.20%
Note: Mortal is one of the strongest open-source Riichi Mahjong AIs currently available
Repository Links
- GitHub: https://github.com/ttdxq/Qwen3-4B-Instruct-2507-mahjong-alpha
- Hugging Face: https://huggingface.co/TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha
License
This model is licensed under Apache License 2.0.
The training data comes from pjura/mahjong_board_states, which is licensed under CC BY 4.0. Please preserve the required attribution and citation when using it.
Acknowledgements
Thanks to the following open-source resources:
unsloth/Qwen3-4B-Instruct-2507pjura/mahjong_board_statesMortal