There has been widespread optimism that artificial intelligence (AI) applications can transform medical care, improving patient treatment and reducing administrative burdens for hospitals and clinicians. For patients, a healthcare system augmented by AI could mean less wait time due to optimal scheduling and resource allocation and higher-quality diagnostic and treatment decisions due to AI-driven capabilities, such as anomaly detection (e.g., in radiology), risk stratification, and personalized care. For clinicians, AI tools promise to reduce time spent on administrative tasks linked to burnout.
However, there is also a legacy of challenges when new technologies, and specifically AI-enabled software, are introduced in clinical settings. Past research has underlined barriers to technology adoption when the clinical workforce is insufficiently involved as well as trust and safety challenges when experts are prompted to interact with new technological systems. Another major challenge is the integration with legacy control systems and specifically electronic health record (EHR) systems, which hospitals have invested in making a hub for their IT infrastructure. In fact, past attempts to disseminate new technologies into healthcare systems have often resulted in unintended consequences, such as an increase in the administrative burden on physicians and clinical teams. Moreover, despite several successful proof of concepts and prototype pilots, the number of large-scale field implementations of AI-enabled software within healthcare systems is still relatively low.
As hospitals begin to experiment with generative AI (GAI) tools in clinical settings, it is important to identify known challenges that might stand in the way of achieving better patient care and lower administrative burdens—and, where possible, document practices that can mitigate those challenges. Additionally, it is important to identify areas in which further academic and industry research is needed to better understand the challenges and potential mitigation strategies.
The discussion around the implementation of AI-enabled software within healthcare systems should distinguish between traditional AI models and algorithms, hereafter called Narrow AI (NAI), and the more recent GAI models and algorithms. Although both applications are technically prediction algorithms, the tools have different technical characteristics, which lend themselves to different use cases, different user experiences, and different implications for organizations.
There are at least three major differences between the two. First, NAI models and algorithms are typically built for a specific prediction task (e.g., cancer detection on mammograms). In contrast, GAI tools are typically based on large language models (LLMs) and are capable of performing a wide variety of tasks, such as search, summarization, and text generation tasks (e.g., patient visit note summarization). Second, NAI models and algorithms are typically developed based on a well-defined and labeled dataset specific to the target prediction task. On the other hand, corresponding to their broad functionality, GAI tools require much larger and broader datasets. Third, unlike the output of NAI models, which is typically very structured, the output of GAI models is often complex and unstructured (e.g., newly created text). We examine the challenges facing NAI and GAI applications in healthcare systems from three perspectives: technical, organizational, and cognitive. Our analysis draws on past studies of NAI applications as well as public reporting, interviews, and early observations of GAI applications in large research hospital settings.