With the rapid development of multi-modal foundation models and the pursuit of artificial general intelligence (AGI), there is a growing need for corresponding evaluation systems. Systematic AGI evaluation requires tasks that encompass a wide range of ability dimensions and difficulty levels. However, although many benchmarks exist, the field still lacks a quantification system to assess ability decompositions or difficulty levels. Here, we took the visual domain as a starting point and proposed an explainable system for task ability decomposition and difficulty level quantification of vision (TADDL-V). Using large language models, TADDL-V decomposed the visual abilities required for a given task and leveraged statistical data to map between ability sets and task difficulty levels. The estimated ability masses align with human intuition, and TADDL-V’s task difficulty estimates are empirically validated against aggregated human comparisons of task difficulty. Furthermore, we proposed an AGI visual evaluation task set, AGI-V70, comprising 70 composite visual tasks that incorporate visual abilities across a broad spectrum of task difficulties. Together, TADDL-V serves as a prototype for ability decomposition and task difficulty level quantification, which are essential for future AGI evaluations.