Pixels to Pathways
This project explores using a Vision-Language Model (VLM) with in-context learning to enhance strategic planning in Pico Park, a cooperative multiplayer puzzle game. Without requiring fine-tuning, the VLM will analyze game visuals and generate strategic guidance to improve coordination for smaller agents, whether AI-driven or human-controlled. The approach involves curating representative gameplay scenarios, designing effective prompts, and leveraging SOTA visual grounding capabilities of VLMs to adaptively solve complex, multi-agent cooperative tasks.
Intern: Sunay Raval
Mentors: Shayne Biagi and Derek Zhang (FPS)