docs: 添加英文版模型卡 README_en.md

- 保留 Hugging Face YAML 元数据
- 添加完整的英文模型说明
- 包含使用方法、评测结果、数据集引用
- 与中文版 README.md 内容对齐
This commit is contained in:
TTDXQ 2026-03-15 23:40:19 +08:00
parent 6d03adb1ae
commit ad278ac7b0

@ -4,6 +4,7 @@ datasets:
- pjura/mahjong_board_states - pjura/mahjong_board_states
language: language:
- zh - zh
- en
base_model: base_model:
- unsloth/Qwen3-4B-Instruct-2507 - unsloth/Qwen3-4B-Instruct-2507
tags: tags:
@ -19,6 +20,8 @@ pipeline_tag: text-generation
# Qwen3-4B-Instruct-2507-mahjong-alpha # Qwen3-4B-Instruct-2507-mahjong-alpha
[中文](./README.md)
`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA. `Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state. It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
@ -28,10 +31,10 @@ The current version is mainly intended for tool integration. The output is a sin
## Model Features ## Model Features
- **Task**: 4-player Riichi Mahjong discard recommendation - **Task**: 4-player Riichi Mahjong discard recommendation
- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507` - **Base model**: `unsloth/Qwen3-4B-Instruct-2507`
- **Fine-tuning**: `QLoRA` - **Fine-tuning**: `QLoRA`
- **Training Framework**: `Unsloth` - **Training framework**: `Unsloth`
- **Release Format**: `GGUF (F16)` - **Release format**: `GGUF (F16)`
- **Inference**: `llama.cpp` - **Inference**: `llama.cpp`
- **Maintainer**: `TTDXQ` - **Maintainer**: `TTDXQ`
@ -90,9 +93,9 @@ The output is strictly a single tile text without any prefix like "discard" and
``` ```
## How to Use ## Usage
### Inference with llama.cpp ### llama.cpp Inference
```bash ```bash
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048 llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
@ -111,7 +114,10 @@ tokenizer = AutoTokenizer.from_pretrained(
) )
# Prepare input # Prepare input
input_text = "[情景分析]\n- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。\n..." input_text = """[情景分析]
- 牌局: 东一局,你是庄家 (第1巡牌墙余69张)。
- 状态: 当前排名 1/4 (与一位差 0)。
..."""
# Inference # Inference
inputs = tokenizer(input_text, return_tensors="pt") inputs = tokenizer(input_text, return_tensors="pt")
@ -184,10 +190,10 @@ A total of `192000` samples were used, with no general instruction data or self-
Inference parameters: Temperature=0.1, Top_P=0.1 Inference parameters: Temperature=0.1, Top_P=0.1
**Metrics Explanation:** **Metrics explanation**:
- Score: Max 500 points (1 point per correct sample, 0 for incorrect) - Score: Max 500 points (1 point per correct sample, 0 for incorrect)
- Full-match Rate: Samples where all 3 tests matched the dataset - Full-match rate: Samples where all 3 tests matched the dataset
- Zero-score Rate: Samples where all 3 tests disagreed with the dataset - Zero-score rate: Samples where all 3 tests disagreed with the dataset
#### Tile-Efficiency Test #### Tile-Efficiency Test
@ -238,7 +244,7 @@ Inference parameters: Temperature=0.6, Top_P=0.95
## License ## License
This model follows the Apache License 2.0. This model is licensed under Apache License 2.0.
The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it. The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.