diff --git a/README_en.md b/README_en.md index 5fd5847..46735f8 100644 --- a/README_en.md +++ b/README_en.md @@ -4,6 +4,7 @@ datasets: - pjura/mahjong_board_states language: - zh +- en base_model: - unsloth/Qwen3-4B-Instruct-2507 tags: @@ -19,6 +20,8 @@ pipeline_tag: text-generation # Qwen3-4B-Instruct-2507-mahjong-alpha +[中文](./README.md) + `Qwen3-4B-Instruct-2507-mahjong-alpha` is a Riichi Mahjong domain model fine-tuned from `unsloth/Qwen3-4B-Instruct-2507` with QLoRA. It is designed for 4-player Riichi Mahjong discard recommendation: given round information, hand tiles, calls, visible tiles, tile-efficiency, and defense signals, the model outputs the single best discard tile for the current state. @@ -28,10 +31,10 @@ The current version is mainly intended for tool integration. The output is a sin ## Model Features - **Task**: 4-player Riichi Mahjong discard recommendation -- **Base Model**: `unsloth/Qwen3-4B-Instruct-2507` +- **Base model**: `unsloth/Qwen3-4B-Instruct-2507` - **Fine-tuning**: `QLoRA` -- **Training Framework**: `Unsloth` -- **Release Format**: `GGUF (F16)` +- **Training framework**: `Unsloth` +- **Release format**: `GGUF (F16)` - **Inference**: `llama.cpp` - **Maintainer**: `TTDXQ` @@ -90,9 +93,9 @@ The output is strictly a single tile text without any prefix like "discard" and 白 ``` -## How to Use +## Usage -### Inference with llama.cpp +### llama.cpp Inference ```bash llama-server -m Qwen3-4B-Instruct-2507-mahjong-alpha.gguf -c 2048 @@ -111,7 +114,10 @@ tokenizer = AutoTokenizer.from_pretrained( ) # Prepare input -input_text = "[情景分析]\n- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。\n..." +input_text = """[情景分析] +- 牌局: 东一局,你是庄家 (第1巡,牌墙余69张)。 +- 状态: 当前排名 1/4 (与一位差 0)。 +...""" # Inference inputs = tokenizer(input_text, return_tensors="pt") @@ -184,10 +190,10 @@ A total of `192000` samples were used, with no general instruction data or self- Inference parameters: Temperature=0.1, Top_P=0.1 -**Metrics Explanation:** +**Metrics explanation**: - Score: Max 500 points (1 point per correct sample, 0 for incorrect) -- Full-match Rate: Samples where all 3 tests matched the dataset -- Zero-score Rate: Samples where all 3 tests disagreed with the dataset +- Full-match rate: Samples where all 3 tests matched the dataset +- Zero-score rate: Samples where all 3 tests disagreed with the dataset #### Tile-Efficiency Test @@ -238,7 +244,7 @@ Inference parameters: Temperature=0.6, Top_P=0.95 ## License -This model follows the Apache License 2.0. +This model is licensed under Apache License 2.0. The training data comes from `pjura/mahjong_board_states`, which is licensed under `CC BY 4.0`. Please preserve the required attribution and citation when using it.