Qwen3-4B-Instruct-2507-mahjong-alpha
Go to file
TTDXQ c077bab252 feat: 添加模型权重文件 Qwen3-4B-Instruct-2507-mahjong-alpha.gguf
- 添加 GGUF 格式模型权重 (7.5GB)
- 配置 Git LFS 追踪 *.gguf 文件
- 使用 llama.cpp 可直接加载推理
2026-03-15 23:41:19 +08:00
.gitattributes feat: 添加模型权重文件 Qwen3-4B-Instruct-2507-mahjong-alpha.gguf 2026-03-15 23:41:19 +08:00
Qwen3-4B-Instruct-2507-mahjong-alpha.gguf feat: 添加模型权重文件 Qwen3-4B-Instruct-2507-mahjong-alpha.gguf 2026-03-15 23:41:19 +08:00
README_en.md docs: 添加英文版模型卡 README_en.md 2026-03-15 23:40:19 +08:00
README.md docs: 更新模型卡,补充完整模型说明与评测结果 2026-03-15 23:36:19 +08:00

license datasets language base_model tags pipeline_tag
apache-2.0
pjura/mahjong_board_states
zh
en
unsloth/Qwen3-4B-Instruct-2507
riichi-mahjong
game-ai
qwen
qwen3
mahjong
discard-recommendation
gguf
text-generation

Qwen3-4B-Instruct-2507-mahjong-alpha

中文

Qwen3-4B-Instruct-2507-mahjong-alpha is a Riichi Mahjong domain model fine-tuned from unsloth/Qwen3-4B-Instruct-2507 with QLoRA.

It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.

The current version is mainly intended for tool integration. The output is a single tile text only, without explanation.

Model Features

  • Task: 4-player Riichi Mahjong discard recommendation
  • Base model: unsloth/Qwen3-4B-Instruct-2507
  • Fine-tuning: QLoRA
  • Training framework: Unsloth
  • Release format: GGUF (F16)
  • Inference: llama.cpp
  • Maintainer: TTDXQ

Scope

This model targets 4-player Riichi Mahjong without red dora. The current version focuses only on discard recommendation. It does not provide full-game planning, yaku/score analysis, or detailed offense-defense explanations.

Limitations

  • Discard recommendation only
  • No full-game planning
  • No yaku, point calculation, or detailed strategic explanation
  • Not guaranteed for competitive or real-match performance
  • For research and learning purposes only

Prohibited Uses

This model must not be used for:

  • cheating
  • game automation or plug-ins
  • account boosting or ghost-playing
  • real-money gambling assistance

Input and Output

Input Format

The model input is a structured natural-language game-state description. Example:

[情景分析]
- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
- 宝牌: 5万
- 各玩家分数: 你有 25分, 下家: 25分, 对家: 25分, 上家: 25分。
- 你的手牌: 1万 5万 7万 3筒 5筒 6筒 8筒 8筒 3索 5索 8索 南 白 发
- 牌效: 5 向听,进张 82 张。
- 防御:
  最安全牌放铳率11.3%
  平均放铳率18.5%
  最危险牌放铳率25.9%
场上已见牌信息
各玩家副露信息:本家副露:无, 下家副露:无, 对家副露:无, 上家副露:无
各玩家牌河信息:本家:无, 下家:无, 对家:无, 上家:无

[任务]
根据当前情景,选择一张最应该打出的手牌。

Output Format

The output is strictly a single tile text without any prefix like "discard" and without explanation. Example:

Usage

llama.cpp Inference

llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048

Python Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)
tokenizer = AutoTokenizer.from_pretrained(
    "TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)

# Prepare input
input_text = """[情景分析]
- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
..."""

# Inference
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)  # Output: 白

Dataset

The training data uses the 2018 subset of pjura/mahjong_board_states. This dataset originates from Tenhou.net gameplay records, with each record containing 511 data points covering game basics, dora indicators, player hand tiles, calls, discard piles, and discard decisions.

Data Processing

The raw data was converted into human-readable natural language descriptions, with calculated turn numbers, actual dora, and simplified risk assessment. Sample distribution by turn:

  • Turns 1-3: 15%
  • Turns 4-6: 20%
  • Turns 7-12: 35%

A total of 192000 samples were used, with no general instruction data or self-built data mixed in.

  • Train: 192000
  • Validation: 2000
  • Test: sampled as needed from 2019 data
  • Train / validation / test are fully non-overlapping

Dataset Citation

@dataset{mahjong_board_states,
  title = {MahJong Board States Dataset},
  author = {Patrick Jura},
  year = {2024},
  url = {https://huggingface.co/datasets/pjura/mahjong_board_states}
}

Training Details

Model Configuration

  • Base Model: unsloth/Qwen3-4B-Instruct-2507
  • Training Precision: 4bit
  • Fine-tuning Method: QLoRA
  • Framework: Unsloth
  • Max Sequence Length: 2048

LoRA Parameters

  • Rank: 128
  • Alpha: 256
  • Target Modules: All

Training Hyperparameters

  • Learning Rate: 1e-4
  • LR Scheduler: cosine
  • Batch Size: 64
  • Per-device Batch: 2
  • Gradient Accumulation Steps: 32
  • Training Steps: 3000
  • Warmup Steps: 300
  • Random Seed: 3407
  • Load Best Checkpoint: Yes

Training Time

  • Total Duration: ~16.44 hours

Evaluation Results

Comparison with Dataset Actions

Inference parameters: Temperature=0.1, Top_P=0.1

Metrics explanation:

  • Score: Max 500 points (1 point per correct sample, 0 for incorrect)
  • Full-match rate: Samples where all 3 tests matched the dataset
  • Zero-score rate: Samples where all 3 tests disagreed with the dataset

Tile-Efficiency Test

Model Method Score Full-match Rate Zero-score Rate
Qwen3-4B Prompt Engineering 50.21 6.60% 86.13%
Qwen3-4B Fine-tuned 229.66 45.87% 53.93%
DeepSeek-V3.2 Prompt Engineering 181.66 21.40% 46.33%

Defense Test

Model Method Score Full-match Rate Zero-score Rate
Qwen3-4B Prompt Engineering 53.55 6.17% 84.43%
Qwen3-4B Fine-tuned 239.89 47.93% 52.00%
DeepSeek-V3.2 Prompt Engineering 172.00 16.00% 46.80%

Comprehensive Test

Model Method Score Full-match Rate Zero-score Rate
Qwen3-4B Prompt Engineering 53.44 0.60% 84.40%
Qwen3-4B Fine-tuned 233.33 46.53% 53.20%
DeepSeek-V3.2 Prompt Engineering 179.44 18.07% 44.93%

Comparison with Mortal

Inference parameters: Temperature=0.6, Top_P=0.95

Test 1: All Turn Data

  • Samples: 3000
  • Top-1 Accuracy: 50.73%
  • Top-3 Accuracy: 83.37%

Test 2: Excluding Early Turns

  • Valid Samples: 3000
  • Top-1 Accuracy: 48.70%
  • Top-3 Accuracy: 79.20%

Note: Mortal is one of the strongest open-source Riichi Mahjong AIs currently available

License

This model is licensed under Apache License 2.0.

The training data comes from pjura/mahjong_board_states, which is licensed under CC BY 4.0. Please preserve the required attribution and citation when using it.

Acknowledgements

Thanks to the following open-source resources:

  • unsloth/Qwen3-4B-Instruct-2507
  • pjura/mahjong_board_states
  • Mortal