Research Project - vision-language image generation models have enabled the creation of visually realistic and semantically rich images

2027 · 2027 Competition

School: School of Computer and Information Sciences

Category: ResearchPrimary

Project Overview

One Liner: Research Project - vision-language image generation models have enabled the creation of visually realistic and semantically rich images

Abstract

Recent advances in vision-language image generation models have enabled the creation of
visually realistic and semantically rich images; however, achieving fine-grained control over
diagram structure remains a significant challenge. This limitation is particularly problematic in
educational settings, where diagrams often require specific topological relationships,
developmental learning-stage representations, or intentionally embedded misconceptions for
assessment and instruction. The challenge is further amplified by reasoning-native image
generation systems, which actively correct perceived errors during generation and reduce
user control over intended outcomes. This thesis proposes SkillDraw, a neuro-symbolic
framework for controllable educational diagram generation through the composition of
reusable visual skills. Grounded in the Next Generation Science Standards (NGSS),
SkillDraw decomposes scientific knowledge into Disciplinary Core Ideas, Science and
Engineering Practices, and Crosscutting Concepts, which are combined to form structured
generation specifications. The framework incorporates a closed-loop verification and repair
mechanism that evaluates structural correctness, misconception realization, and
competency leakage, enabling targeted refinement of generated diagrams. Through
experiments across multiple image generation backbones, this research investigates
controllability, skill reusability, correction resistance, and structural diversity. The expected
outcome is a robust framework for generating pedagogically meaningful scientific diagrams
with precise structural control, contributing to both controllable image generation and
AI-supported educational content creation.

No video available.

Screenshots

0 image(s)

No screenshots uploaded yet.

Research Project - vision-language image generation models have enabled the creation of visually realistic and semantically rich images

Project Overview

Screenshots

Team Members

Stakeholders