Qwen3-4B-Instruct-2507-mahj.../README_en.md

---
license: apache-2.0
datasets:
- pjura/mahjong_board_states
language:
- zh
- en
base_model:
- unsloth/Qwen3-4B-Instruct-2507
tags:
- riichi-mahjong
- game-ai
- qwen
- qwen3
- mahjong
- discard-recommendation
- gguf
pipeline_tag: text-generation
---

# Qwen3-4B-Instruct-2507-mahjong-alpha

[中文](./README.md)

`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.

It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.

The current version is mainly intended for tool integration. The output is a single tile text only, without explanation.

## Model Features

- **Task**: 4-player Riichi Mahjong discard recommendation
- **Base model**: `unsloth/Qwen3-4B-Instruct-2507`
- **Fine-tuning**: `QLoRA`
- **Training framework**: `Unsloth`
- **Release format**: `GGUF (F16)`
- **Inference**: `llama.cpp`
- **Maintainer**: `TTDXQ`

## Scope

This model targets 4-player Riichi Mahjong without red dora. The current version focuses only on discard recommendation. It does not provide full-game planning, yaku/score analysis, or detailed offense-defense explanations.

## Limitations

- Discard recommendation only
- No full-game planning
- No yaku, point calculation, or detailed strategic explanation
- Not guaranteed for competitive or real-match performance
- For research and learning purposes only

## Prohibited Uses

This model must not be used for:

- cheating
- game automation or plug-ins
- account boosting or ghost-playing
- real-money gambling assistance

## Input and Output

### Input Format

The model input is a structured natural-language game-state description. Example:

```text
[情景分析]
- 牌局: 东一局，你是庄家 (第1巡，牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
- 宝牌: 5万
- 各玩家分数: 你有 25分, 下家: 25分, 对家: 25分, 上家: 25分。
- 你的手牌: 1万 5万 7万 3筒 5筒 6筒 8筒 8筒 3索 5索 8索 南 白 发
- 牌效: 5 向听，进张 82 张。
- 防御：
  最安全牌放铳率：11.3%
  平均放铳率：18.5%
  最危险牌放铳率：25.9%
场上已见牌信息
各玩家副露信息:本家副露：无, 下家副露：无, 对家副露：无, 上家副露：无
各玩家牌河信息:本家：无, 下家：无, 对家：无, 上家：无

[任务]
根据当前情景，选择一张最应该打出的手牌。
```

### Output Format

The output is strictly a single tile text without any prefix like "discard" and without explanation. Example:

```text
白
```

## Usage

### llama.cpp Inference

```bash
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
```

### Python Inference Example

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)
tokenizer = AutoTokenizer.from_pretrained(
    "TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha"
)

# Prepare input
input_text = """[情景分析]
- 牌局: 东一局，你是庄家 (第1巡，牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
..."""

# Inference
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)  # Output: 白
```

## Dataset

The training data uses the 2018 subset of `pjura/mahjong_board_states`. This dataset originates from Tenhou.net gameplay records, with each record containing 511 data points covering game basics, dora indicators, player hand tiles, calls, discard piles, and discard decisions.

### Data Processing

The raw data was converted into human-readable natural language descriptions, with calculated turn numbers, actual dora, and simplified risk assessment. Sample distribution by turn:

- Turns 1-3: 15%
- Turns 4-6: 20%
- Turns 7-12: 35%

A total of `192000` samples were used, with no general instruction data or self-built data mixed in.

- Train: `192000`
- Validation: `2000`
- Test: sampled as needed from 2019 data
- Train / validation / test are fully non-overlapping

### Dataset Citation

```bibtex
@dataset{mahjong_board_states,
  title = {MahJong Board States Dataset},
  author = {Patrick Jura},
  year = {2024},
  url = {https://huggingface.co/datasets/pjura/mahjong_board_states}
}
```

## Training Details

### Model Configuration
- Base Model: `unsloth/Qwen3-4B-Instruct-2507`
- Training Precision: `4bit`
- Fine-tuning Method: `QLoRA`
- Framework: `Unsloth`
- Max Sequence Length: `2048`

### LoRA Parameters
- Rank: `128`
- Alpha: `256`
- Target Modules: All

### Training Hyperparameters
- Learning Rate: `1e-4`
- LR Scheduler: `cosine`
- Batch Size: `64`
- Per-device Batch: `2`
- Gradient Accumulation Steps: `32`
- Training Steps: `3000`
- Warmup Steps: `300`
- Random Seed: `3407`
- Load Best Checkpoint: Yes

### Training Time
- Total Duration: ~16.44 hours

## Evaluation Results

### Comparison with Dataset Actions

Inference parameters: Temperature=0.1, Top_P=0.1

**Metrics explanation**:
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
- Full-match rate: Samples where all 3 tests matched the dataset
- Zero-score rate: Samples where all 3 tests disagreed with the dataset

#### Tile-Efficiency Test

| Model | Method | Score | Full-match Rate | Zero-score Rate |
|-------|--------|-------|----------------|-----------------|
| Qwen3-4B | Prompt Engineering | 50.21 | 6.60% | 86.13% |
| Qwen3-4B | Fine-tuned | 229.66 | 45.87% | 53.93% |
| DeepSeek-V3.2 | Prompt Engineering | 181.66 | 21.40% | 46.33% |

#### Defense Test

| Model | Method | Score | Full-match Rate | Zero-score Rate |
|-------|--------|-------|----------------|-----------------|
| Qwen3-4B | Prompt Engineering | 53.55 | 6.17% | 84.43% |
| Qwen3-4B | Fine-tuned | 239.89 | 47.93% | 52.00% |
| DeepSeek-V3.2 | Prompt Engineering | 172.00 | 16.00% | 46.80% |

#### Comprehensive Test

| Model | Method | Score | Full-match Rate | Zero-score Rate |
|-------|--------|-------|----------------|-----------------|
| Qwen3-4B | Prompt Engineering | 53.44 | 0.60% | 84.40% |
| Qwen3-4B | Fine-tuned | 233.33 | 46.53% | 53.20% |
| DeepSeek-V3.2 | Prompt Engineering | 179.44 | 18.07% | 44.93% |

### Comparison with Mortal

Inference parameters: Temperature=0.6, Top_P=0.95

#### Test 1: All Turn Data

- Samples: 3000
- Top-1 Accuracy: **50.73%**
- Top-3 Accuracy: **83.37%**

#### Test 2: Excluding Early Turns

- Valid Samples: 3000
- Top-1 Accuracy: **48.70%**
- Top-3 Accuracy: **79.20%**

> Note: Mortal is one of the strongest open-source Riichi Mahjong AIs currently available

## Repository Links

- GitHub: https://github.com/ttdxq/Qwen3-4B-Instruct-2507-mahjong-alpha
- Hugging Face: https://huggingface.co/TTDXQ/Qwen3-4B-Instruct-2507-mahjong-alpha

## License

This model is licensed under Apache License 2.0.

The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.

## Acknowledgements

Thanks to the following open-source resources:

- `unsloth/Qwen3-4B-Instruct-2507`
- `pjura/mahjong_board_states`
- `Mortal`