docs: 添加英文版模型卡 README_en.md

- 保留 Hugging Face YAML 元数据
- 添加完整的英文模型说明
- 包含使用方法、评测结果、数据集引用
- 与中文版 README.md 内容对齐
This commit is contained in:
TTDXQ 2026-03-15 23:40:19 +08:00
parent 6d03adb1ae
commit ad278ac7b0

@ -4,6 +4,7 @@ datasets:
- pjura/mahjong_board_states
language:
- zh
- en
base_model:
- unsloth/Qwen3-4B-Instruct-2507
tags:
@ -19,6 +20,8 @@ pipeline_tag: text-generation
# Qwen3-4B-Instruct-2507-mahjong-alpha
[中文](./README.md)
`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
@ -28,10 +31,10 @@ The current version is mainly intended for tool integration. The output is a sin
## Model Features
- **Task**: 4-player Riichi Mahjong discard recommendation
- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507`
- **Base model**: `unsloth/Qwen3-4B-Instruct-2507`
- **Fine-tuning**: `QLoRA`
- **Training Framework**: `Unsloth`
- **Release Format**: `GGUF (F16)`
- **Training framework**: `Unsloth`
- **Release format**: `GGUF (F16)`
- **Inference**: `llama.cpp`
- **Maintainer**: `TTDXQ`
@ -90,9 +93,9 @@ The output is strictly a single tile text without any prefix like "discard" and
```
## How to Use
## Usage
### Inference with llama.cpp
### llama.cpp Inference
```bash
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
@ -111,7 +114,10 @@ tokenizer = AutoTokenizer.from_pretrained(
)
# Prepare input
input_text = "[情景分析]\n- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。\n..."
input_text = """[情景分析]
- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
..."""
# Inference
inputs = tokenizer(input_text, return_tensors="pt")
@ -184,10 +190,10 @@ A total of `192000` samples were used, with no general instruction data or self-
Inference parameters: Temperature=0.1, Top_P=0.1
**Metrics Explanation:**
**Metrics explanation**:
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
- Full-match Rate: Samples where all 3 tests matched the dataset
- Zero-score Rate: Samples where all 3 tests disagreed with the dataset
- Full-match rate: Samples where all 3 tests matched the dataset
- Zero-score rate: Samples where all 3 tests disagreed with the dataset
#### Tile-Efficiency Test
@ -238,7 +244,7 @@ Inference parameters: Temperature=0.6, Top_P=0.95
## License
This model follows the Apache License 2.0.
This model is licensed under Apache License 2.0.
The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.