[ScienceChina 2025] Ability Decomposition and Difficulty Quantification of Visual Tasks: Towards Systematic Evaluations of Artificial General Intelligence

Abstract

With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence (AGI), there is a growing need for corresponding evaluation systems. Systematic AGI evaluation requires tasks that encompass a wide range of ability dimensions and difficulty levels. However, although many benchmarks exist, the field still lacks a quantification system to assess ability decompositions or difficulty levels. Here, we took the visual domain as a starting point and proposed an explainable system for task ability decomposition and difficulty level quantification of vision (TADDL-V). Using large language models, TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between ability sets and task difficulty levels. The estimated ability masses align with human intuition, and TADDL-V’s task difficulty estimates are empirically validated against aggregated human comparisons of task difficulty. Furthermore, we proposed an AGI visual evaluation task set, AGI-V70, comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum of task difficulties. Together, TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification, which are essential for future AGI evaluations.

Publication
In Science China Technological Sciences
Click the Cite button above to import publication metadata into their reference management software.
Zhenliang Zhang
Zhenliang Zhang
Research Scientist of AI

My research interests include wearable computing, machine learning, Cognitive Reasoning, and mixed/virtual reality.