Diagnosing the Problem: Automated Metrics for Evaluating AI-Generated Medical Information
As the use of AI to obtain health information becomes more widespread, it carries the potential to improve health literacy and access, but also the risk of spreading misinformation and perpetuating disparities. Diagnosing the Problem introduces a generalizable metric-based framework for evaluating the quality and safety of AI-generated medical communication by asking: what makes a response readable, helpful, and accurate? To operationalize these metrics, it also includes a Python-based tool that automates scoring and generates detailed, user-friendly reports, making it easier to assess and improve how AI engages with medical information.