docs: 添加英文版模型卡 README_en.md

- 保留 Hugging Face YAML 元数据 - 添加完整的英文模型说明 - 包含使用方法、评测结果、数据集引用 - 与中文版 README.md 内容对齐
2026-03-15 23:40:19 +08:00 · 2026-03-15 23:40:19 +08:00 · ad278ac7b0
commit ad278ac7b0
parent 6d03adb1ae
1 changed files with 16 additions and 10 deletions
--- a/README_en.md
+++ b/README_en.md
@ -4,6 +4,7 @@ datasets:
 - pjura/mahjong_board_states
 language:
 - zh
+- en
 base_model:
 - unsloth/Qwen3-4B-Instruct-2507
 tags:
@ -19,6 +20,8 @@ pipeline_tag: text-generation

 # Qwen3-4B-Instruct-2507-mahjong-alpha

+[中文](./README.md)
+
 `Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.

 It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
@ -28,10 +31,10 @@ The current version is mainly intended for tool integration. The output is a sin
 ## Model Features

 - **Task**: 4-player Riichi Mahjong discard recommendation
- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507`
+- **Base model**: `unsloth/Qwen3-4B-Instruct-2507`
 - **Fine-tuning**: `QLoRA`
- **Training Framework**: `Unsloth`
- **Release Format**: `GGUF (F16)`
+- **Training framework**: `Unsloth`
+- **Release format**: `GGUF (F16)`
 - **Inference**: `llama.cpp`
 - **Maintainer**: `TTDXQ`

@ -90,9 +93,9 @@ The output is strictly a single tile text without any prefix like "discard" and
 白
 ```

-## How to Use
+## Usage

-### Inference with llama.cpp
+### llama.cpp Inference

 ```bash
 llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
@ -111,7 +114,10 @@ tokenizer = AutoTokenizer.from_pretrained(
 )

 # Prepare input
-input_text = "[情景分析]\n- 牌局: 东一局，你是庄家 (第1巡，牌墙余69张)。\n..."
+input_text = """[情景分析]
+- 牌局: 东一局，你是庄家 (第1巡，牌墙余69张)。
+- 状态: 当前排名 1/4 (与一位差 0)。
+..."""

 # Inference
 inputs = tokenizer(input_text, return_tensors="pt")
@ -184,10 +190,10 @@ A total of `192000` samples were used, with no general instruction data or self-

 Inference parameters: Temperature=0.1, Top_P=0.1

-**Metrics Explanation:**
+**Metrics explanation**:
 - Score: Max 500 points (1 point per correct sample, 0 for incorrect)
- Full-match Rate: Samples where all 3 tests matched the dataset
- Zero-score Rate: Samples where all 3 tests disagreed with the dataset
+- Full-match rate: Samples where all 3 tests matched the dataset
+- Zero-score rate: Samples where all 3 tests disagreed with the dataset

 #### Tile-Efficiency Test

@ -238,7 +244,7 @@ Inference parameters: Temperature=0.6, Top_P=0.95

 ## License

-This model follows the Apache License 2.0.
+This model is licensed under Apache License 2.0.

 The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.