$200B Medical Overbilling: Can LLMs & Protecto Provide the Cure?
Medical claim overbilling is estimated to cost the system upwards of $200 billion annually. Leveraging Large Language Models (LLMs), our customer, a large insurance provider, aims to bring accuracy to the billing process.

The Problem: Medical Coding Errors and Overbilling
Medical billing is a complex and intricate process where services provided to patients are translated into standardized codes for insurance claims. However, this process is prone to errors:
- Upcoding: Billing for a more severe diagnosis or more intensive procedure than was actually performed.
- Unbundling: Separating components of a procedure that should be billed together into individual claims.
- Incorrect Coding: Simple mistakes or intentional misrepresentations that lead to inaccurate billing.
These accidental or deliberate errors contribute significantly to overbilling, resulting in inflated costs that burden patients, insurance companies, and the healthcare system as a whole.
How LLMs (RAG) Can Help
LLMs, such as GPT-4, are designed to analyze and understand large volumes of textual data. In medical billing, these models can be applied to compare the clinical notes with the assigned billing codes to detect discrepancies. Here’s how LLMs assist in this process:
- Establishing a Ground Truth (Golden Set): The foundation of this approach lies in curating a comprehensive dataset comprising detailed clinical notes meticulously crafted by healthcare providers. These notes should encompass diagnoses, procedures, medications, and other vital patient care details. Crucially, this dataset must include accurate mappings between the clinical notes and the corresponding medical billing codes.
- Transforming Clinical Notes into a Vector Database: Employing techniques like embeddings, clinical notes and their associated medical codes are transformed into vectors and stored in a vector database. This enables efficient similarity comparisons between new clinical notes and the established ground truth.
- Performing Similarity Searches: Incoming clinical notes from medical claims are also converted into vectors. By comparing these vectors against those in the established database, LLMs can identify potential discrepancies. For example, the LLM could flag instances where a billing code doesn’t align with the documented care in the clinical notes.
- Detecting Errors and Unbundling: The power of LLMs extends beyond simple code validation. They can be leveraged to detect complex billing errors, such as unbundling — the practice of billing for individual components of a procedure rather than the comprehensive code. By designing retrievals, we can analyze multiple claims from the same patient, the LLM can identify patterns indicative of unbundling.
- Realizing Efficiency Gains: Automating code validation with LLMs dramatically reduces the risk of human error, accelerates claim processing, and significantly improves billing accuracy. These efficiencies translate into substantial cost savings for healthcare providers.
By harnessing the capabilities of LLMs, the medical billing process can be streamlined, errors minimized, and compliance enhanced. This ultimately contributes to a more efficient and cost-effective healthcare system.
The Role of Protecto — AI Guardrails
When implementing LLMs for analyzing claims data, protecting sensitive information becomes paramount. The claims data processed by LLMs contains Protected Health Information (PHI), which raises concerns about potential data leaks or mishandling. Our customer was particularly concerned about the security of PHI during the loading and processing phases using AI/LLMs.
To address these concerns, the customer utilized Protecto’s solution to identify and mask PHI within the claims data. Protecto’s advanced data protection technology ensured that while the sensitive information was anonymized, the integrity and usability of the data were preserved for analysis. This allowed the insurance company to safely process claims data with LLMs without compromising privacy or accuracy.
Conclusion: A Path Forward with LLMs and Protecto
The collaboration between LLMs and Protecto (https://www.protecto.ai) represents a powerful approach to solving the pervasive issue of medical overbilling. As the healthcare industry continues to adopt advanced technologies, integrating LLMs and robust data protection tools like Protecto will be essential in driving a more transparent and fair system that benefits providers, payers, and patients alike.