docs: 添加英文版模型卡 README_en.md
- 保留 Hugging Face YAML 元数据 - 添加完整的英文模型说明 - 包含使用方法、评测结果、数据集引用 - 与中文版 README.md 内容对齐
This commit is contained in:
parent
6d03adb1ae
commit
ad278ac7b0
26
README_en.md
26
README_en.md
@ -4,6 +4,7 @@ datasets:
|
||||
- pjura/mahjong_board_states
|
||||
language:
|
||||
- zh
|
||||
- en
|
||||
base_model:
|
||||
- unsloth/Qwen3-4B-Instruct-2507
|
||||
tags:
|
||||
@ -19,6 +20,8 @@ pipeline_tag: text-generation
|
||||
|
||||
# Qwen3-4B-Instruct-2507-mahjong-alpha
|
||||
|
||||
[中文](./README.md)
|
||||
|
||||
`Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA.
|
||||
|
||||
It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state.
|
||||
@ -28,10 +31,10 @@ The current version is mainly intended for tool integration. The output is a sin
|
||||
## Model Features
|
||||
|
||||
- **Task**: 4-player Riichi Mahjong discard recommendation
|
||||
- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507`
|
||||
- **Base model**: `unsloth/Qwen3-4B-Instruct-2507`
|
||||
- **Fine-tuning**: `QLoRA`
|
||||
- **Training Framework**: `Unsloth`
|
||||
- **Release Format**: `GGUF (F16)`
|
||||
- **Training framework**: `Unsloth`
|
||||
- **Release format**: `GGUF (F16)`
|
||||
- **Inference**: `llama.cpp`
|
||||
- **Maintainer**: `TTDXQ`
|
||||
|
||||
@ -90,9 +93,9 @@ The output is strictly a single tile text without any prefix like "discard" and
|
||||
白
|
||||
```
|
||||
|
||||
## How to Use
|
||||
## Usage
|
||||
|
||||
### Inference with llama.cpp
|
||||
### llama.cpp Inference
|
||||
|
||||
```bash
|
||||
llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048
|
||||
@ -111,7 +114,10 @@ tokenizer = AutoTokenizer.from_pretrained(
|
||||
)
|
||||
|
||||
# Prepare input
|
||||
input_text = "[情景分析]\n- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。\n..."
|
||||
input_text = """[情景分析]
|
||||
- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。
|
||||
- 状态: 当前排名 1/4 (与一位差 0)。
|
||||
..."""
|
||||
|
||||
# Inference
|
||||
inputs = tokenizer(input_text, return_tensors="pt")
|
||||
@ -184,10 +190,10 @@ A total of `192000` samples were used, with no general instruction data or self-
|
||||
|
||||
Inference parameters: Temperature=0.1, Top_P=0.1
|
||||
|
||||
**Metrics Explanation:**
|
||||
**Metrics explanation**:
|
||||
- Score: Max 500 points (1 point per correct sample, 0 for incorrect)
|
||||
- Full-match Rate: Samples where all 3 tests matched the dataset
|
||||
- Zero-score Rate: Samples where all 3 tests disagreed with the dataset
|
||||
- Full-match rate: Samples where all 3 tests matched the dataset
|
||||
- Zero-score rate: Samples where all 3 tests disagreed with the dataset
|
||||
|
||||
#### Tile-Efficiency Test
|
||||
|
||||
@ -238,7 +244,7 @@ Inference parameters: Temperature=0.6, Top_P=0.95
|
||||
|
||||
## License
|
||||
|
||||
This model follows the Apache License 2.0.
|
||||
This model is licensed under Apache License 2.0.
|
||||
|
||||
The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user