YKS Uniform

A balanced multimodal benchmark covering the Turkish high school curriculum, with equal representation across all topics.

Bilim Ağacı

Introduction

Large Language Models (LLMs) and Large Multimodal Models (LMMs) demonstrate impressive problem-solving skills across many tasks and domains. However, their ability to reason over structured, curriculum-based educational questions—particularly in the context of Turkish high school entrance examinations—has not been systematically studied.

To address this gap, we introduce YKS Uniform, a balanced multimodal benchmark covering the Turkish high school curriculum with equal representation across all topics. By sampling six questions per topic, we constructed a dataset of 1,854 multimodal questions spanning both TYT and AYT exams. These questions require deep reasoning over text, diagrams, and exam-style contexts.

Using this benchmark, we conducted a comprehensive evaluation of 10 open-weight and proprietary models. Our results highlight both the strengths and limitations of current models in handling exam-style reasoning tasks. The best-performing system, Gemini-2.5-Flash, achieved an overall accuracy of %84.7, substantially higher than open-weight alternatives but still leaving a measurable gap to human-level performance.

Contributions of YKS Uniform:

YKS Uniform provides the first systematic lens into multimodal reasoning across the entire Turkish high school curriculum. We hope it will serve as a foundation for future research in educational AI, curriculum-grounded reasoning, and robust model evaluation in multimodal contexts.

Collage of TYT and AYT exam-style questions—including mathematics, physics, and reading comprehension—that were misclassified or unanswered by vision-language models.
Examples of challenging YKS Uniform questions that leading vision-language models (VLLMs) failed to solve. Each example highlights the reasoning difficulty across different subjects.

Dataset


The YKS Uniform dataset contains 1,854 multimodal questions sampled evenly across all topics in the Turkish high school curriculum. It is designed exclusively as a test benchmark.

Updates

Leaderboard


TYT
AYT
Rank Model Date All Turkish History Geography Philosophy Math Physics Chemistry Biology Literature History Geography Philosophy Math Physics Chemistry Biology
1 Gemini 2.5 Flash 🥇 2025-07-17 84.7 77.8 84.2 86.5 93.3 85.1 73.1 84.7 86.7 84.8 90.4 91.7 86.9 85.4 72 88.2 87.7
2 Gemini 2.0 Flash 🥈 2025-02-05 79.2 67.5 84.2 84.9 90 72.6 62 79.2 75.6 82.6 87.8 89.6 79.8 85.4 70.5 75.5 81.6
3 METU DLM QMSA 🥉 2025-07-31 78.6 66.7 91.2 90.5 94.4 58.3 70.4 81.9 81.1 77.3 83.3 90.6 85.7 73.6 67.4 81.4 83.3
4 OpenAI o3 2025-04-16 74.5 65.9 82.5 81 75.6 81 53.7 72.2 66.7 72 78.8 79.2 75 84 62.9 80.4 74.6
5 OpenAI GPT-5 2025-08-07 73.2 64.3 78.1 75.4 73.3 84.5 53.7 73.6 68.9 70.5 80.1 80.2 71.4 86.1 55.3 71.6 75.4
6 GLM 4.5V 2025-08-11 69.4 46 65.8 63.5 77.8 76.2 56.5 81.9 74.4 59.1 65.4 82.3 83.3 80.6 59.8 79.4 73.7
7 OpenAI o1 2024-12-05 68.8 62.7 71.1 72.2 80 70.2 54.6 79.2 61.1 68.9 73.1 71.9 65.5 79.2 57.6 64.7 68.4
8 Gemini 1.5 Flash 2024-09-24 67.2 51.6 70.2 74.6 86.7 51.8 57.4 73.6 61.1 60.6 75 85.4 84.5 68.8 53 66.7 73.7
9 Gemma 3 27B 2025-03-10 63.1 44.4 76.3 66.7 82.2 50.6 48.1 69.4 72.2 62.1 67.9 76 77.4 64.6 41.7 59.8 71.1
10 Qwen2.5 VL 32B 2025-02-20 62.5 43.7 61.4 65.1 81.1 56 60.2 70.8 67.8 49.2 66.7 70.8 72.6 68.1 53.8 62.7 66.7
11 Claude Sonnet 4 2025-05-22 60.4 46.8 64.9 64.3 71.1 56 53.7 59.7 56.7 59.8 64.1 61.5 70.2 72.2 52.3 63.7 52.6


🚨 Submission Instructions
  • Submit a single .jsonl file.
  • Lines must align 1:1 with the dataset indices (same order & count).
  • Each line should include a full model_dump from an OpenAI-compatible API call (Gemini / vLLM / SGLang 's OpenAI wrappers accepted).
  • Include your model’s final answer in predicted_answer. Also include the function you used to extract predicted_answer from the generated solution.
  • Report zero-shot solutions only.
  • Use temperature=0 (if applicable) for reproducibility.

Example JSONL line

{
        "predicted_answer": "A",              // one of: A,B,C,D,E or null
        "solution_context": { ... },          // openai_wrapper_response.model_dump()
        "exam": "TYT Physics",                // optional: dataset fields (except image)
        "topic": "Heat and Temperature"
      }

Email your .jsonl file to this address.

Team

Portrait image of Åžeyda Ertekin

Åžeyda Ertekin
Assoc. Prof. • METU Computer Engineering

Website · LinkedIn

BibTeX

@misc{yksuniform2025,
  title        = {YKS Uniform: A Balanced Multimodal Benchmark Covering the Turkish High School Curriculum},
  author       = {Sert, Egemen and Ertekin, Åžeyda},
  year         = {2025},
  howpublished = {\url{https://yks-uniform.github.io/}},
  note         = {Accessed: 2025-08-23}
}