IPAK Publishes Fully Queryable HHS Medicaid Provider Spending Archive Is Now Public. Unfettered, PERMANENT Access.

Be careful your results are not positively misleading. Use best practices to avoid misleading results- and consider consulting with someone who understands data analytics!

Today, an unmodified archival copy of the HHS Medicaid Provider Spending dataset has been published as a public IPAK BigQuery HHS dataset. It is queryable by anyone with a Google account. No scraping. No downloads required. No proprietary gatekeeping.

Share

Subscribe now

This is the raw file, ingested directly from the official HHS Open Data release. No transformations. No filtering. No derived fields. The ingestion artifact is cryptographically anchored via SHA256 for integrity verification.

Below is exactly how to access it and find providers contained in the results.

What This Dataset Contains

This dataset includes provider-level Medicaid spending aggregated by:

  • Billing Provider NPI

  • Servicing Provider NPI

  • HCPCS procedure code

  • Month of service

  • Total unique beneficiaries

  • Total claims

  • Total paid amount

Each row represents a provider–procedure–month aggregation.

Source:
HHS Open Data Portal — Medicaid Provider Spending (T-MSIS derived)


BEGINNER GUIDE

How Anyone Can Query the Data (Step-by-Step)

You do not need to download anything.

You do not need to install software.

You only need a free Google account.


Step 1 — Create a Google Account (If You Don’t Have One)

Go to:

https://accounts.google.com/signup

Create a free account.


Step 2 — Open BigQuery

Visit:

https://console.cloud.google.com/bigquery

If prompted:

  • Accept terms

  • Select your account

You do not need to enable billing to query public datasets.

Step 3 — Open the Public Dataset

In the search bar, paste:

project-401c8f7e-90fc-4838-9fc.medicaidproviderspending_1771078072347.medicaid_provider_spending_raw

Click the table.

You can now:

  • View schema

  • View sample rows

  • Run queries


Step 4 — Using Gemini to Query

Click the “Create conversation” button (or open the Gemini panel if visible).

Now you can ask questions in plain English.

Examples:

What is the total Medicaid spending in this dataset?

Show the top 25 billing providers by total paid amount.

Which HCPCS codes account for the highest total spending?

Show monthly spending trends over time.

Which providers had the largest increase in spending year over year?

Gemini will generate and run the query for you.

You can see the SQL it generates before execution.

If it looks correct, click Run.

Step 5 — Further Use Gemini to Explore

Identifying Practices Billing More Than 2 Standard Deviations Above Peers

If you want to identify unusually high billing patterns — in a statistically defensible way — you can ask Gemini to compute outliers relative to peer distributions.

We will do this in three structured steps.

Step 5A — First Ask Gemini to Compute Paid Per Beneficiary

In the Gemini prompt box, type:

Calculate paid per unique beneficiary for each billing provider, defined as SUM(TOTAL_PAID) divided by SUM(TOTAL_UNIQUE_BENEFICIARIES). Return provider-level results.

This creates a normalized metric:

Paid per unique beneficiary
= Total paid ÷ Total unique beneficiaries

Normalization matters.
Raw totals alone are not meaningful comparisons.

Review the generated SQL before running it.

Step 5B — Ask Gemini to Calculate the Mean and Standard Deviation

Now type:

Calculate the mean and standard deviation of paid per unique beneficiary across all billing providers.

This establishes the baseline distribution.

Gemini will compute:

  • Average paid per beneficiary

  • Standard deviation of paid per beneficiary

These values define what “normal” looks like in the dataset.

Step 5C — Identify Providers More Than 2 Standard Deviations Above the Mean

Now type:

Identify billing providers whose paid per unique beneficiary is more than 2 standard deviations above the mean. Return provider ID, paid per beneficiary, and number of beneficiaries.

This flags providers whose per-beneficiary billing intensity is statistically elevated.

2 standard deviations above the mean is a common threshold for identifying unusually high values in a distribution.

Step 5D – Identify the Providers from their NPIs

HHS has an NPI registry here: https://npiregistry.cms.hhs.gov/search

Why This Method Matters

You are not asking:

“Who looks high?”

You are asking:

“Who exceeds a statistically defined threshold relative to peers?”

That distinction matters.

Important Interpretation Cautions

• High paid per beneficiary does not automatically indicate wrongdoing.
• Some specialties serve higher-acuity populations.
• Procedure mix can distort comparisons.
• Month aggregation may mask episodic spikes.

This is a screening tool, not a conclusion.

Optional Refinement

You can further constrain the comparison:

Perform the 2 standard deviation analysis within each HCPCS_CODE separately.

Or:

Perform the 2 standard deviation analysis within each CLAIM_FROM_MONTH.

This compares providers against true procedural peers.

CAVEAT ON INTERPRETATION:

Ranking providers by total paid without context is analytically weak because raw totals conflate scale with intensity. A more defensible approach is to first bin providers by total number of unique beneficiaries, creating peer groups of similar size, and then rank providers within those bins on normalized metrics such as paid per beneficiary or paid per claim. For example, providers can be grouped into strata (e.g., 1–50 beneficiaries, 51–250, 251–1,000, 1,001–5,000, 5,000+). Within each stratum, variation in total paid reflects differences in billing intensity or service mix rather than mere population size. This reduces scale bias and prevents large multi-site systems from being mechanically ranked above small specialty practices simply because they serve more patients. Stratified ranking transforms the question from “Who bills the most?” to “Who bills more than similarly sized peers?” — a far more meaningful starting point for anomaly detection or policy analysis.

Step 6 — Export Results

You can:

  • Download results as CSV

  • Copy results

  • Share query links

You are running queries under your own Google account. Query costs apply to your account if you exceed free limits.

For Journalists and Analysts

This dataset is:

  • Public

  • Immutable (read-only)

  • Cryptographically anchored

  • Fully reproducible

You can independently verify totals and trends.


VERSION 2

Expert Access Guide

For analysts, data scientists, and auditors.


Dataset Path

project-401c8f7e-90fc-4838-9fc.medicaidproviderspending_1771078072347.medicaid_provider_spending_raw

Region: US

Access: Public read-only (allAuthenticatedUsers → BigQuery Data Viewer)


Reproducibility Query

SELECT
  COUNT(*) AS rows,
  SUM(TOTAL_PAID) AS total_paid
FROM
  `project-401c8f7e-90fc-4838-9fc.medicaidproviderspending_1771078072347.medicaid_provider_spending_raw`;

Time-Series Normalization

CLAIM_FROM_MONTH is stored as STRING in YYYY-MM format.

For temporal analysis:

SELECT
  PARSE_DATE(’%Y-%m’, CLAIM_FROM_MONTH) AS service_month,
  SUM(TOTAL_PAID) AS monthly_total
FROM
  `project-401c8f7e-90fc-4838-9fc.medicaidproviderspending_1771078072347.medicaid_provider_spending_raw`
GROUP BY service_month
ORDER BY service_month;

Paid Per Claim

SELECT
  BILLING_PROVIDER_NPI_NUM,
  SUM(TOTAL_PAID) / SUM(TOTAL_CLAIMS) AS paid_per_claim
FROM
  `project-401c8f7e-90fc-4838-9fc.medicaidproviderspending_1771078072347.medicaid_provider_spending_raw`
GROUP BY BILLING_PROVIDER_NPI_NUM
ORDER BY paid_per_claim DESC;

HCPCS Concentration

SELECT
  HCPCS_CODE,
  SUM(TOTAL_PAID) AS total_paid
FROM
  `project-401c8f7e-90fc-4838-9fc.medicaidproviderspending_1771078072347.medicaid_provider_spending_raw`
GROUP BY HCPCS_CODE
ORDER BY total_paid DESC;

Gemini Usage (Advanced)

Gemini works best when:

  • You specify grouping logic.

  • You specify aggregation type.

  • You clarify billing vs servicing provider.

  • You limit result size.

Example prompt:

“Compute month-over-month percentage change in total paid per billing provider and return providers with >100% increase.”

Always inspect generated SQL before execution.


Integrity Statement

This table is:

  • An unmodified archival copy of the HHS Open Data release.

  • Ingested directly from the official CSV.

  • Hash-anchored via SHA256 at time of ingestion.

  • Published read-only.

No transformations have been applied.


Why This Matters

Transparency is not rhetorical.

It is operational.

If you believe in accountability:

Run your own queries.

Replicate results.

Challenge assumptions.

The data are now open.

Let us know what you find! Maybe we’ll publish a summary.

If you find this useful remember the day we gave to make this publicly queryable and become a paid subscriber to Popular Rationalism.

Popular Rationalism is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Thanks for reading Popular Rationalism! This post is public so feel free to share it.

Share

Leave a comment

 

IPAK-EDU is grateful to Popular Rationalism as this piece was originally published there and is included in this news feed with mutual agreement. Read More

Subscribe to SciPublHealth


Science-based knowledge, not narrative-dictated knowledge, is the goal of WSES, and we will work to make sure that only objective knowledge is used in the formation of medical standards of care and public health policies.

Comments


Join the conversation! We welcome your thoughts, feedback, and questions. Share your comments below.

Leave a Reply

Discover more from Science, Public Health Policy and the Law

Subscribe now to keep reading and get access to the full archive.

Continue reading