Ko-PIQA
| .gitattributes | ||
| final_bench.csv | ||
| README.md | ||
Ko-PIQA: Korean Physical Commonsense Reasoning Dataset
📖 Dataset Overview
Ko-PIQA is a Korean Physical Commonsense Reasoning dataset designed to complement English-centric benchmarks like PIQA and to include culturally-grounded physical reasoning questions.
- Total items: 441
- Culturally-grounded items: 87 (19.7%)
(e.g., kimchi storage, hanbok care, ondol heating) - Format: PIQA-style binary choice (
solution0/solution1) - Goal: Evaluate Korean LLM physical reasoning capabilities
📊 Data Fields
| Field | Type | Description |
|---|---|---|
prompt |
string | The goal or question |
solution0 |
string | Candidate answer A |
solution1 |
string | Candidate answer B |
label |
int | Correct answer index (0 or 1) |
cultural |
int/null | 1 if culturally-grounded, otherwise null |
🔎 Source & Filtering Pipeline
- Source: 3.01M Korean Q&A pairs from Naver Knowledge iN (collected until May 2025)
- Step 1: Filtered PIQA-style questions using Qwen3-4B, Qwen3-32B, and HCX-14B
→ 11,553 candidates - Step 2: Sampled 600 general and 158 cultural questions
- Step 3: Refined and generated distractors using GPT-4o
- Step 4: Two native Korean speakers validated and filtered questions → 471 items
- Step 5: Deduplicated using KoSentenceBERT (cosine similarity > 0.85) → final 441 items
💡 Example
{
"prompt": "김치찌개를 끓일 때 묵은지의 신맛을 중화시키면서도 깊은 맛을 내려면?",
"solution0": "설탕을 한 스푼 넣고 물을 부은 후 중불에서 5분간 끓인다.",
"solution1": "설탕을 한 스푼 넣고 중불에서 5분간 먼저 볶은 후 물을 붓는다.",
"label": 1,
"cultural": 1
}
💻 Usage
from datasets import load_dataset
ds = load_dataset("HAERAE-HUB/Ko-PIQA")
print(ds['train'][0])
📌 Citation
@misc{choi2025kopiqakoreanphysicalcommonsense,
title={Ko-PIQA: A Korean Physical Commonsense Reasoning Dataset with Cultural Context},
author={Dasol Choi and Jungwhan Kim and Guijin Son},
year={2025},
eprint={2509.11303},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2509.11303},
}