groupuser/Qwen3-4B-Instruct-2507-mahjong-alpha

Qwen3-4B-Instruct-2507-mahjong-alpha

Go to file

TTDXQ c077bab252 feat: 添加模型权重文件 Qwen3-4B-Instruct-2507-mahjong-alpha.gguf - 添加 GGUF 格式模型权重 (7.5GB) - 配置 Git LFS 追踪 *.gguf 文件 - 使用 llama.cpp 可直接加载推理		2026-03-15 23:41:19 +08:00
.gitattributes	feat: 添加模型权重文件 Qwen3-4B-Instruct-2507-mahjong-alpha.gguf	2026-03-15 23:41:19 +08:00
Qwen3-4B-Instruct-2507-mahjong-alpha.gguf	feat: 添加模型权重文件 Qwen3-4B-Instruct-2507-mahjong-alpha.gguf	2026-03-15 23:41:19 +08:00
README_en.md	docs: 添加英文版模型卡 README_en.md	2026-03-15 23:40:19 +08:00
README.md	docs: 更新模型卡，补充完整模型说明与评测结果	2026-03-15 23:36:19 +08:00

README_en.md

license

datasets

language

base_model

Qwen3-4B-Instruct-2507-mahjong-alpha

中文

Qwen3-4B-Instruct-2507-mahjong-alpha is a Riichi Mahjong domain model fine-tuned from unsloth/Qwen3-4B-Instruct-2507 with QLoRA.

It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.

The current version is mainly intended for tool integration. The output is a single tile text only, without explanation.

Model Features

Task: 4-player Riichi Mahjong discard recommendation
Base model: unsloth/Qwen3-4B-Instruct-2507
Fine-tuning: QLoRA
Training framework: Unsloth
Release format: GGUF (F16)
Inference: llama.cpp
Maintainer: TTDXQ

Scope

This model targets 4-player Riichi Mahjong without red dora. The current version focuses only on discard recommendation. It does not provide full-game planning, yaku/score analysis, or detailed offense-defense explanations.

Limitations

Discard recommendation only
No full-game planning
No yaku, point calculation, or detailed strategic explanation
Not guaranteed for competitive or real-match performance
For research and learning purposes only

Prohibited Uses

This model must not be used for:

cheating
game automation or plug-ins
account boosting or ghost-playing
real-money gambling assistance

Input and Output

Input Format

The model input is a structured natural-language game-state description. Example:

[情景分析]
- 牌局: 东一局，你是庄家 (第1巡，牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
- 宝牌: 5万
- 各玩家分数: 你有 25分, 下家: 25分, 对家: 25分, 上家: 25分。
- 你的手牌: 1万 5万 7万 3筒 5筒 6筒 8筒 8筒 3索 5索 8索 南 白 发
- 牌效: 5 向听，进张 82 张。
- 防御：
  最安全牌放铳率：11.3%
  平均放铳率：18.5%
  最危险牌放铳率：25.9%
场上已见牌信息
各玩家副露信息:本家副露：无, 下家副露：无, 对家副露：无, 上家副露：无
各玩家牌河信息:本家：无, 下家：无, 对家：无, 上家：无

[任务]
根据当前情景，选择一张最应该打出的手牌。

Output Format

The output is strictly a single tile text without any prefix like "discard" and without explanation. Example:

白

Usage

llama.cpp Inference

llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048

Python Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)
tokenizer = AutoTokenizer.from_pretrained(
    "TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)

# Prepare input
input_text = """[情景分析]
- 牌局: 东一局，你是庄家 (第1巡，牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
..."""

# Inference
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)  # Output: 白

Dataset

The training data uses the 2018 subset of pjura/mahjong_board_states. This dataset originates from Tenhou.net gameplay records, with each record containing 511 data points covering game basics, dora indicators, player hand tiles, calls, discard piles, and discard decisions.

Data Processing

The raw data was converted into human-readable natural language descriptions, with calculated turn numbers, actual dora, and simplified risk assessment. Sample distribution by turn:

Turns 1-3: 15%
Turns 4-6: 20%
Turns 7-12: 35%

A total of 192000 samples were used, with no general instruction data or self-built data mixed in.

Train: 192000
Validation: 2000
Test: sampled as needed from 2019 data
Train / validation / test are fully non-overlapping

Dataset Citation

@dataset{mahjong_board_states,
  title = {MahJong Board States Dataset},
  author = {Patrick Jura},
  year = {2024},
  url = {https://huggingface.co/datasets/pjura/mahjong_board_states}
}

Training Details

Model Configuration

Base Model: unsloth/Qwen3-4B-Instruct-2507
Training Precision: 4bit
Fine-tuning Method: QLoRA
Framework: Unsloth
Max Sequence Length: 2048

LoRA Parameters

Rank: 128
Alpha: 256
Target Modules: All

Training Hyperparameters

Learning Rate: 1e-4
LR Scheduler: cosine
Batch Size: 64
Per-device Batch: 2
Gradient Accumulation Steps: 32
Training Steps: 3000
Warmup Steps: 300
Random Seed: 3407
Load Best Checkpoint: Yes

Training Time

Total Duration: ~16.44 hours

Evaluation Results

Comparison with Dataset Actions

Inference parameters: Temperature=0.1, Top_P=0.1

Metrics explanation:

Score: Max 500 points (1 point per correct sample, 0 for incorrect)
Full-match rate: Samples where all 3 tests matched the dataset
Zero-score rate: Samples where all 3 tests disagreed with the dataset

Tile-Efficiency Test

Model	Method	Score	Full-match Rate	Zero-score Rate
Qwen3-4B	Prompt Engineering	50.21	6.60%	86.13%
Qwen3-4B	Fine-tuned	229.66	45.87%	53.93%
DeepSeek-V3.2	Prompt Engineering	181.66	21.40%	46.33%

Defense Test

Model	Method	Score	Full-match Rate	Zero-score Rate
Qwen3-4B	Prompt Engineering	53.55	6.17%	84.43%
Qwen3-4B	Fine-tuned	239.89	47.93%	52.00%
DeepSeek-V3.2	Prompt Engineering	172.00	16.00%	46.80%

Comprehensive Test

Model	Method	Score	Full-match Rate	Zero-score Rate
Qwen3-4B	Prompt Engineering	53.44	0.60%	84.40%
Qwen3-4B	Fine-tuned	233.33	46.53%	53.20%
DeepSeek-V3.2	Prompt Engineering	179.44	18.07%	44.93%

Comparison with Mortal

Inference parameters: Temperature=0.6, Top_P=0.95

Test 1: All Turn Data

Samples: 3000
Top-1 Accuracy: 50.73%
Top-3 Accuracy: 83.37%

Test 2: Excluding Early Turns

Valid Samples: 3000
Top-1 Accuracy: 48.70%
Top-3 Accuracy: 79.20%

Note: Mortal is one of the strongest open-source Riichi Mahjong AIs currently available

Repository Links

GitHub: https://github.com/ttdxq/Qwen3-4B-Instruct-2507-mahjong-alpha
Hugging Face: https://huggingface.co/TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha

License

This model is licensed under Apache License 2.0.

The training data comes from pjura/mahjong_board_states, which is licensed under CC BY 4.0. Please preserve the required attribution and citation when using it.

Acknowledgements

Thanks to the following open-source resources:

unsloth/Qwen3-4B-Instruct-2507
pjura/mahjong_board_states
Mortal

README_en.md Unescape Escape

Qwen3-4B-Instruct-2507-mahjong-alpha

Model Features

Scope

Limitations

Prohibited Uses

Input and Output

Input Format

Output Format

Usage

llama.cpp Inference

Python Inference Example

Dataset

Data Processing

Dataset Citation

Training Details

Model Configuration

LoRA Parameters

Training Hyperparameters

Training Time

Evaluation Results

Comparison with Dataset Actions

Tile-Efficiency Test

Defense Test

Comprehensive Test

Comparison with Mortal

Test 1: All Turn Data

Test 2: Excluding Early Turns

Repository Links

License

Acknowledgements

README_en.md