Evaluating Data Integrity and Reporting Challenges in Public Health: Lessons from COVID-19 Data Collection in Washington State

Clifford Knopik *
Clifford Knopik
Corresponding Author

Affiliation: Independent Researcher, Renton, WA, United States

Email: cliffordknopik@gmail.com

Abstract


Between July 2020 and August 18, 2021, requests for Washington State Department of Health (WA DOH) Public Records were made in order to obtain and clarify data related to COVID-19 testing, case definitions, hospitalizations, deaths, and vaccination status. Initial requests sought the most official data available in order create better visualizations, and subsequent inquiries were submitted due to significant data errors and inconsistencies discovered. Public Records Requests were submitted to understand the discovered errors better. This study aims to document and analyze the various data errors identified in the WA DOH datasets, which included data classification errors, inclusion of unknown data, base rate fallacy, unequal comparisons, and lack of data standardization. These errors collectively rendered the data irreparably compromised, leading to inaccurate visualizations and potentially misleading public health decisions during a critical period of the COVID-19 pandemic. Recommendations to improve data collection and quality are given. The findings underscore the importance of data quality and maintenance of databases, especially in public health emergencies, where decisions based on flawed data can have life-or-death consequences.

Background


Between July 2020 and August 18, 2021, multiple emails and ten Public Records Requests were submitted to the Washington State Department of Health requesting data and answers to a variety of questions. The questions and requests were based around the data about testing and defining Covid cases, hospitalizations, and deaths as well as definitions and criteria surrounding the Covid vaccinations. The earliest emails and Public Records Requests were attempts to get the most official data available, but it quickly became apparent there were serious data errors in the datasets being returned, as well as confusion in some of the answers returned, therefore subsequent requests became attempts to clarify and understand the data errors identified. In all cases, there were no datasets returned in response to official Public Records Requests that did not include at least one of the data errors identified.

Data and Information


Data are the raw, unprocessed, “discrete, objective facts about events,” both quantitative and qualitative, collected through various methods such as surveys, experiments, and observations [1]. Information is created when data is processed, organized, or structured in a way that adds meaning or context, enabling its use in decision-making processes. Information is used to help answer “questions that begin with words such as who, what, where, when, and how many” [2]. The transformation of data into information requires the application of data analytics techniques, which often involve statistical analysis, machine learning, and other computational methods to derive insights and patterns from the data [3,4]. Data visualization plays a crucial role in this process by converting complex data sets into visual formats that are easier to interpret and analyze, thereby facilitating further analysis and the discovery of new insights [5]. Methodical, neutral precision is essential at every stage of transforming data into information to prevent the distortion of truth caused by misinterpreting the data or allowing personal bias to influence desired outcomes. In the fields of public health and public policy, information derived from data is essential for drawing conclusions that guide decision-making and influence public behavior, particularly in times of health crises [6].

Because data elements are the foundation of all analysis and resulting visualizations, their quality is of utmost importance. Once the data is collected, how the data is then analyzed and utilized matters. If there are any errors, issues, or gaps in the data anywhere in the chain from collection to visualization, the information being conveyed could become incorrect or inaccurate. This is important, particularly if the information is being used for decision-making. A common error, for example, in COVID-19 data is the absence of dates for when symptoms first appeared. If a significant portion of this data is missing or if gaps are filled with arbitrary data, no amount of analysis or visualization will produce accurate information for making informed decisions. Decisions made concerning public health can be life-impacting at the population level, as most have experienced with the COVID-19 implemented countermeasures in the past 4 years, and therefore it is vital to ensure these decisions are made based on correct data such that the best interests of the public are actually served.

Methods & Datasets


The methods to collect data for this paper included downloading files from the public health website and submitting formal Public Records Requests to Washington State’s Department of Health. The datasets included excel spreadsheet files, tables, and textual responses. Any quantifiable data that were included were studied for analysis. In most cases, however, due to the data errors discovered, this analysis was not possible.

Summary Findings


There were six distinct types of data errors discovered through Public Records Requests (Table 1). These errors often overlapped with other error types, which compounded the other error calculations. The effects of these errors on the quality of the data collected by the Washington State Department of Health rendered the data irreparably damaged and unfit for use. It is unclear how the WA DOH were able to provide analyses and/or visualizations of this data since it is so corrupt. Anything that was created by WA DOH or other entities using this corrupted data was thus non-informational and likely to lead to incorrect conclusions.

Table 1.

Found Data Errors

Explanation of Data Errors found in Public Records Requests


Data Classification Errors

Data classification errors occur when data is incorrectly labeled or categorized, leading to inaccurate analysis and decision-making. These errors can significantly impact the quality of insights derived from the data, as they may result in misleading patterns, correlations, and conclusions [18]. For instance, when health data is misclassified, it can lead to incorrect public health decisions, potentially exacerbating health crises rather than mitigating them. Misclassification errors can also compromise the reliability of predictive models, reducing their accuracy and effectiveness in real-world applications [19]. Addressing data classification errors is crucial for ensuring the integrity and validity of any data-driven analysis.

The ease with which a chart can begin to be distorted can be demonstrated using six days’ worth of hypothetical COVID-19 cases. In this example we have two dates being misclassified and combined. The Symptom Onset Date is the preferred date, but if the Symptom Onset Date is unknown, the Record Create Date will be used, which is the date the case is entered into the database. If symptoms occurred over a six-day period with 200 cases per day, and these case dates were plotted on a bar chart, the resulting chart would resemble Figure 1.

Figure 1.

Bar chart of Symptom Onset Dates, 200 cases per day.

However, if the Symptom Onset Date was only available for 100 cases on Sunday and 100 cases on Wednesday, the remaining 1,000 cases would rely on the Record Create Date. Now, imagine the data entry person receives the information throughout the week but doesn’t enter it into the system until the following Friday. This would result in a bar chart resembling Figure 2, where most cases are assigned a Friday Record Create Date. While Figure 1 represents the actual distribution, basing decisions on the skewed data in Figure 2 would lead to highly inaccurate conclusions. With 1,000 cases recorded on Friday, one might mistakenly conclude that a surge in cases was starting to occur.

Figure 2.

Bar chart with most cases using a Friday Record Create Date.

In the publicly available datasets from the WA DOH official website, similar data classification errors were found, not with two, but with five different dates being collected [20]. Official datasets provided in response to Public Records Requests confirmed that the WA DOH data had included classification errors from the very start of data collection in February 2020 [7]. WA DOH used the following algorithm, which incorporated the classification errors, to generate totals for COVID-19 cases, hospitalizations, and deaths:

Symptom Onset Date (SOD) + Diagnosis Date (DD) + Positive Defining Lab Date (PDLD) + Local Health Notification Date (LHND) + Record Creation Date (RCD) = Total Symptom Onset Dates (Total SOD) which were displayed on graphs and charts

SOD + DD + PDLD + LHND + RCD ≠ Total SOD

These five distinct dates were all misclassified as the Symptom Onset Date, combined, and displayed in charts and on the visualizations being used by public policy decision makers and the public to evaluate risks. Due to the data classification errors the charts and graphs rendered were not capable of conveying accurate, usable information, thereby potentially misleading consumers of the information.

A subsequent Public Records Request with a smaller set of data not only confirmed the above algorithm was being used, but also revealed a second data classification error [8]. Probable Cases of COVID-19 were being totaled together with Confirmed Cases of COVID-19 and being misclassified as Confirmed Cases of COVID-19. This additional classification error compounded the variation of totals and further distorted visualizations being rendered using this data.

Probable Cases (PC) + Confirmed Cases (CC) = Total Confirmed Cases (Total CC) which were displayed on graphs and charts

PC + CC ≠ Total CC

Further data classification errors persisted into 2021 when vaccination status was first tracked, along with already tracked dates for COVID-19 cases, hospitalizations, and deaths. This exacerbated existing errors and misrepresented true COVID-19 cases, hospitalizations, and deaths. In a news release entitled “New data reveals COVID-19 impact on unvaccinated” a new data visualization dashboard was announced. The announcement included four categories of people with three of the categories being misclassified as “unvaccinated” [9]. A Public Records Request seeking clarity on these categorizations revealed that a third additional category, people not in the Washington Immunization Information System (WA IIS) regardless of vaccination status, were also being misclassified as “unvaccinated” [10].

“Fully vaccinated” people were those who are two weeks or more past receiving their final dose of the COVID-19 vaccine and who are in the Washington Immunization System. All others are considered as “unvaccinated.”  People who received zero doses (UNV), people who received their first dose (1D), people who are less than two weeks past their final dose (FDL2W), and people who have unknown vaccine status (UKN), because they were not in the WA ISS, were all misclassified as “unvaccinated” and totaled together. The result of this misclassification is a distortion in the unvaccinated numbers which artificially increased the totals.

UNV + 1D + FDL2W + UKN ≠ Total UNV

There was also a data classification error with the COVID-19 test Cycle Thresholds (Ct). The New York Times article “Your Coronavirus Test is Positive; Maybe it Shouldn’t Be” discussed the sensitivity of the PCR tests used to identify COVID-19 cases, hospitalizations and deaths [21]. The article states “The PCR test amplifies genetic matter from the virus in cycles; the fewer cycles required, the greater the amount of virus, or viral load, in the sample. The greater the viral load, the more likely the patient is to be contagious.” If you amplify too much with a higher cycle, however, you can trigger positives when there is no or barely any virus present. These would be false positives because viral artefacts can be amplified following resolution of infection. Even unrelated material can be amplified if the PCR primers are not selected properly. In the article, Juliet Morrison, a virologist at the University of California, Riverside, stated “Any test with a cycle threshold above 35 is too sensitive.”  According to the article, the CDC’s tests “suggest that it is extremely difficult to detect any live virus in a sample above a threshold of 33 cycles.” This matches what Dr. Anthony Fauci stated in an interview with This Week in Virology (TWiV) on July 16, 2020, where he stated PCR tests with Ct above 35 are just finding “dead nucleotides” [22]. 

A Public Records Request to the WA DOH confirmed that the WA DOH State Public Health Labs were using tests above the recommended 33-35 Ct thresholds [11]
. The WA DOH Public Health Laboratories were using two different tests – one with a maximum 37Ct and one with a maximum 40Ct. In the reply to the Public Records Request the WA DOH stated: “WA DOH has no visibility on other medical testing site’s Ct usage.” Not only were they misclassifying tests conducted at different Ct’s, but they were also misclassifying unknown Ct tests. Due to the high Ct, there could be false positives included in the totals. All the tests, regardless of Ct category or visibility category, were misclassified as positive visible tests. They were being treated identically as if full knowledge of the tests was available.

(Visible 37Ct Positives (V37P) + Visible 37 Ct False Positives (V37FP) + Visible 40Ct Positives (V40P) + Visible 40Ct False Positives(V40FP)) + (Non-visible Unknown Ct Positives (NVUP) + Non-visible Unknown Ct False Positives (NVUFP)) = Total Visible Positives (VP).

(V37P + V37FP + V40P + V40FP) + (NVUP + NVUFP) ≠ Total VP

All these diverse types of data classification errors cumulatively carried the risk of massively distorting the truth and creating the appearance of a pandemic when in fact, there may not have been one. An analogous situation occurred in 2006, as explained in The New York Times article “Faith in Quick Tests Leads to Epidemic That Wasn’t” [23]. In this incident, PCR Test false positives created the illusion that an epidemic of pertussis was underway leading to panic and unnecessary economic and health interventions at a regional hospital. The WA DOH classification errors carried the same risks of distorting the truth and misguiding public policy decision makers who used the WA DOH data and distorted visualizations as a basis for decisions that affected the citizens of Washington State, and in many cases, with devastating effects.

Including unknown data

In data analytics, during the data cleaning phase unknown data is typically removed from the dataset when it cannot be categorized or defined accurately. This practice is essential because including such data can introduce errors that distort the analysis and affect the accuracy of the results [24]. While it is possible to create a separate category for unknown data, it must be isolated from the known data to ensure the integrity of the analysis. By combining known and unknown data there is a significant risk of introducing errors that can lead to misleading conclusions and inaccurate visualizations [25].

Distortions created by incorporating unknown data can be demonstrated using the following hypothetical scenario. Assume there are 22,000 COVID-19 cases. The Symptom Onset Date is known for half of them and the symptoms are occurring in a relatively consistent manner. If the unknown dates are excluded and the known dates are included in a bar chart, the resulting chart might look like Figure 3.

Figure 3.

Bar chart only showing cases with a known Symptom Onset Date.

If the unknown data was instead inserted into the bar chart utilizing the Record Create Date, then the resulting chart might look like Figure 4. Any chart that includes unknown data in this manner would be inaccurate because for half of the data the Symptom Onset Date is unknown.

Figure 4.

Bar chart combining known Symptom Onset Dates with Unknown Symptom Onset Dates using Misclassified Record Create Date.

Public record requests showed that there were two exceptionally large inclusions of unknown data into the WA DOH datasets. Through 2020, the COVID-19 visualizations created by WA DOH were being used to demonstrate trends for the Symptom Onset Date to determine whether or not COVID-19 cases were surging and rising. Regrettably, similar to Figure 4, nearly half (49%) of the data presented in the chart was missing from the dataset and should have been excluded from the visualization, rather than misclassified and included  [12]. Misclassifying half of the data could only result in the distortion of totals, graphs, and charts.

Even worse, between February 28, 2020 and February 16, 2021, WA DOH Public Health Laboratories only found 8816 (about 3% ± .5%1) of all the COVID-19 cases [13] [26]. approximately 97% of all COVID-19 cases, hospitalizations and deaths being visualized on WA DOH charts had “no visibility” in the tests. These no visibility cases should have been treated as unknowns since the dates the tests were collected, the Symptom Onset Date, the quality and accuracy of the tests, whether the test results were false positives or duplicates, and the overall reliability of the data was unknown and unknowable. This unknown data should not have been included in any COVID-19 visualizations. Including 97% of unknown data in the dataset renders the entire dataset corrupted and useless, and any visualizations using this data result in severely inaccurate representations of reality.

Base Rate Fallacy

The base rate fallacy occurs when base rates of various groups being compared are ignored in favor of often preferred competing “individuating information.” The entirety of data should be integrated and used together during the analysis, rather than excluding non-preferred characteristics [27]. When analyzing vaccination status by comparing vaccinated and unvaccinated individuals, it is crucial to consider the base rate of vaccination within the population alongside other variables. Failing to account for this can lead to misinterpretations of the data, as timeframes with predominantly vaccinated or unvaccinated populations may skew the results. For instance, a higher incidence of COVID-19 cases among unvaccinated individuals might reflect the fact that a larger proportion of the population is unvaccinated during that period, and similarly for vaccinated individuals [28][29]. 

WA DOH published a base rate fallacy when, on July 28, 2021, they issued their news release titled “New data reveals COVID-19 impact on unvaccinated.” In the news release they include the sub-title “Between February and June 2021, most people in Washington who died of COVID-19 were unvaccinated.” In Washington State, most people were not eligible to get the vaccines until after April 15, 2021 [30]. Clearly February, March, and half of the April data should not have been included in their totals because the results become skewed (Fig. 5). The misclassification and inclusion of months prior to April 15 make the unvaccinated numbers look higher than they truly are. The high unvaccinated numbers are not due to any sort of vaccine hesitancy, as implied in the news release, but because in the months prior to April 15 most people were not eligible to be vaccinated.

Figure 5.

Including months prior to April 15, 2021, introduced a base rate fallacy and skewed unvaccinated numbers.

The Washington State Department of Health’s claim in their news release that 94% of COVID-19 cases, deaths, and hospitalizations were among the unvaccinated seems to have influenced Governor Inslee. On July 28, 2021, Governor Inslee made the following statement:

“The vast, vast majority of people who are in hospitals today all have one thing in common, they’re unvaccinated, ninety six percent of all the people in our hospitals today have one thing in common. They didn’t get the vaccine.” [31]

Governor Inslee’s statement, and the numbers from the WA DOH news release, were disproved using data files from Public Records Requests to the WA DOH [14][15]. In all WA DOH publicly available data, as well as data received through Public Records Requests, the unvaccinated totals were never the percentages stated by the WA DOH or Governor Inslee. The numbers were lower. The unvaccinated were never 94% of COVID-19 cases, deaths and hospitalizations as stated in the WA DOH news release. The unvaccinated were never 96% of all COVID-19 hospitalizations as stated by Governor Inslee on July 28, 2021. Using data received and including January – March 2021 (which should not be included as mentioned above) the percentage was around 86%, and even lower if April is considered as the start month. The misclassification errors and including months when most people were unvaccinated artificially inflated the unvaccinated totals. It is unclear how either the 94% rate (as stated by WA DOH) or the 96% rate (as stated by Governor Inslee), were calculated.

Unequal Comparison

When comparing data sets of unequal weight, the results may be inaccurate because an unequal (or unfair) comparison is being conducted [32][33]. This is a common error when comparing data over different time periods without consideration of other variables that could account for differences in the data. There are a variety of unequal comparisons made in the WA DOH analysis. For example, as related to the base rate fallacy, comparison of time periods when vaccines were not available to most people, to a period when vaccines were available to most people, created an unequal comparison.

Respiratory illnesses were already known to be seasonal, while COVID-19 seasonal patterns were still unknown at the time of the charts and graphs [34]. Because only one winter had passed since the pandemic had started, caution should have been taken including and comparing time periods during different seasons, especially when data classification errors were already confounding the numbers.

Another confounding factor when attempting to compare COVID-19 cases during various time periods is the time lag between testing and reporting. This incoming data can lag up to a month with death reports often lagging longest from collection to entry into the WA DOH system [35]. Because of this lag there were potential unequal comparisons being made at any given time and period – it is still unknown just how much this was affecting accuracy because of the already low-quality corrupt data being used for these comparisons.

No Data Definition Standardization

On December 16, 2020, the WA DOH appears to have abandoned using most of the various date data types in their algorithm to calculate Symptom Onset Date. Public Records Requests show this was due to confusion across the counties. As an example, referencing the Washington Disease Reporting System (WDRS), the WA DOH explained: “The Diagnosis Date is a field in WDRS that can be completed by LHJ’s [Local Health Jurisdictions], but there is not a standardized definition across counties and states to know what the date means. This field is also missing 80% of the data state-wide” [16]. The WA DOH appears to have decided to just focus on the Specimen Collection Date when possible because it was “the date with the most reliable data (standardized definition, minimal missing data) that can be used with PCR and antigen positive tests as well as symptomatic and asymptomatic COVID-19 cases.” The other dates being misclassified were abandoned because those dates were not considered reliable enough to continue to use. Without a standardized definition counties were unable to know what the dates meant. All data entered for these should have been considered unreliable or unknown due to this confusion about what the dates meant. It was impossible for the WA DOH to know if the dates were entered in the correct field. The 2020 graphs appear to have never been retroactively fixed when these dates were abandoned. Due to the original data classification errors, already covered, and the resulting corruption of the data sets, the dates would have been impossible to fix even if WA DOH had wanted to.

Changing Methodologies

On December 16, 2020, the WA DOH changed their methodology for calculating Covid cases to report  on their graphs. Rather than reporting Symptom Onset Date, which was calculated using totals from the misclassified dates, they began to report Symptom Collection Date: the date a test was taken. This presented a new and completely different visualization and representation of the COVID-19 numbers. A Public Records Request revealed that this was not the only methodology change to these graphs. “Effective August 25, 2020, DOH changed the methodology for reporting test results. The total number of tests are now reported instead of the total number of individuals tested” [17].

The original graphs represented misclassified Symptom Onset Dates and were then made to additionally misrepresent the data whereby total test results were included, as opposed to individuals tested in August 2020. By December 2020, the graph was changed once again to simply represent the date that a test was collected. Rather than creating new graphs and updating the titles of the graph to reflect the new purpose of the visualization, the same graph continued to be used with the same titles. Governor Inslee appears to have thought the graphs were still showing the Symptom Onset Date because he used a WA DOH graph in his press release on December 8, 2020. The title of the slide showing the graph was “COVID-19 cases are still setting record highs” [36]. This inaccurate, mis-titled, corrupted graph which was not capable of showing anything informative about the state of COVID-19 in Washington at the time, was used as justification for the December 2020 three-week extension of the lockdowns.

Recommendations


Based on the data collection issues outlined in this article, the following recommendations are proposed to improve public health data collection. By implementing these recommendations, accurate information can be ensured for both the public and decision-makers. This will support informed decisions, promote balanced policies, and enable effective risk management, helping to prevent overreactions that could lead to lasting personal and economic harm. These issues should be addressed, and recommendations implemented before the next crisis occurs.

1. Implement Standardized Data Definitions: Establish clear, consistent definitions for key data elements, such as dates (e.g., Symptom Onset Date, Diagnosis Date), case classifications (e.g., Confirmed vs. Probable Cases), and vaccination status categories. Ensure that these definitions are uniformly applied across all data collection sites and are well-documented.

2. Ensure Data Integrity and Quality Control: Implement rigorous data validation and quality control processes at all stages of data collection, processing, and analysis. This includes regularly auditing datasets for accuracy, consistency, and completeness before they are used for decision-making or public dissemination.

3. Exclude or Clearly Categorize Unknown Data: During data cleaning, unknown or incomplete data should be excluded from analyses where it cannot be reliably categorized. If unknown data must be included, it should be clearly labeled and treated as a separate category to avoid distorting the overall analysis.

4. Avoid Data Classification Errors: Develop and enforce strict guidelines for data classification to prevent errors such as combining distinct types of data such as dates, cases (e.g., Confirmed vs. Probable cases) or vaccination status. This will help ensure that the data accurately reflects the situation on the ground.

5. Regularly Update and Communicate Methodological Changes: When methodologies for data collection or analysis are changed, these changes should be clearly documented and communicated to all stakeholders. Historical data should be retroactively adjusted or reanalyzed where possible to maintain consistency. New graphs and charts should be created that only include data created using the new methodology.

6. Improve Public Transparency and Accountability: Increase transparency in how data is collected, processed, and used in public health decision-making. This includes making raw data, methodologies, and any discovered errors publicly available, along with explanations of how these issues are being addressed.

7. Integrate Time Lag Considerations: Account for time lags in data reporting and processing to avoid unequal comparisons between different time periods. This is particularly important when comparing data across phases of a pandemic or during changes in public health policy. It should be clearly stated when periods susceptible to data lag issues are being utilized.

8. Enhance Data Literacy Among Public Health Officials: Provide ongoing training for public health officials on data literacy, including the importance of understanding data limitations, interpreting statistical analyses, and avoiding common fallacies such as the base rate fallacy.

9. Develop Robust Data Visualization Standards: Establish standards for data visualization to ensure that charts, graphs, and other visual tools accurately represent the underlying data. These standards should prevent misleading representations due to misclassified data or changes in methodology. Consistent style and designs amongst visualizations will help prevent misinterpretations from occurring.

10. Conduct Regular Data Audits Using Independent Data Analysts: Use independent data analysts to perform routine audits of public health data systems to identify and correct errors early in the data collection process. This proactive approach will help maintain the reliability and credibility of the data used in public health responses while also avoiding conflicts of interest.

Implementing these recommendations can help ensure that data collected during future pandemics is accurate, reliable, and useful for making informed public health decisions.

Further Research


The collected data for this paper should be analyzed for additional undiscovered data errors. Other researchers should submit additional Public Records Requests to the WA DOH to confirm how widespread these errors are. Similar misclassification errors with COVID-19 data have been observed in other states and countries [37][38][39]. Data from other states should be examined to identify if similar COVID-19 data errors occurred or additional errors happened. Data Quality Standards can be compared amongst the various states and state agencies that collected COVID-19 data to identify best practices or to confirm the issues identified in this paper. Additional research should be conducted to identify how data errors that were introduced at the state level impacted the national and international COVID-19 totals and overall data quality. The implication is that these data errors were global, leading to decisions based on flawed information that resulted in significant, often deadly harm to individuals [40]. The above research is recommended because the global socio-economic challenges, brought on by the decisions made reacting to COVID-19, related to health, safety, and freedom continue to persist.

Conclusion


The analysis of the data obtained from the Washington State Department of Health reveals critical flaws in data classification, methodology, and overall data mismanagement during the COVID-19 era. The identified errors, including misclassification of dates and vaccination status, inclusion of unknown and unverified data, and methodological inconsistencies, significantly compromised the reliability of the data. These issues not only distorted the visualizations used for public health decision-making, but also posed serious risks of misinforming the public and policymakers. The study highlights the necessity of rigorous data quality standards and transparent data management practices to ensure that public health interventions are based on accurate and reliable information, particularly during emergencies where health could be negatively impacted.

Acknowledgments


I would like to express my sincere gratitude to the reviewers for their valuable suggestions and insightful comments, which greatly contributed to improving the quality of this article. I deeply appreciate their time and effort.

Conflict of Interest: The author declares that he has no conflicts of interest relevant to this research and received no funding for this research.

Funding: The publication of this study was made possible by sponsorship by Informed Choice Washington and by The Institute for Pure and Applied Knowledge.

Availability of data and materials: The datasets generated and/or analyzed during the current study are available in the Harvard Dataverse repository at https://doi.org/10.7910/DVN/V5HTLY.

References


T. H. Davenport and L. Prusak, Working knowledge: How organizations manage what they know, Boston, MA: Harvard Business School Press, 1998.

R. L. Ackoff, “From data to wisdom,” Journal of Applied Systems Analysis, vol. 16, no. 1, pp. 3-9, 1989.

F. Provost and T. Fawcett, Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking, Sebatopol, CA: O’Reilly Media, Inc., 2013.

A. O. Babarinde, O. Ayo-Farai, C. P. Maduka, C. C. Okongwu and O. Sodamade, “Data analytics in public health, A USA perspe,” World Journal of Advanced Research and Reviews, vol. 20, no. 3, pp. 211-224, 2023.

S. Few, Show Me the Numbers: Designing Tables and Graphs to Enlighten, Burlingame, CA: Analytics Press, 2012.

K. Patel, “The Importance of Data-Driven Decision-Making in Public Health,” International Journal of Computer Trends and Technology, vol. 72, no. 5, pp. 27-32, 2024.

Washington State Department of Health, “Public Records Request #D000265-111720,” 7 April 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 7 April 2021].

Washington State Department of Health, “Public Records Request Reference # D000290-121720,” 30 December 2020. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 30 December 2020].

Washington State Department of Health, “News Release: New data reveals COVID-19 impact on unvaccinated,” 28 July 2021. [Online]. Available: https://content.govdelivery.com/accounts/WADOH/bulletins/2ea7345. [Accessed 28 July 2021].

Washington State Department of Health, “Public Record Request Reference # D000460-072921,” 8 September 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 8 September 2021].

Washington State Department of Health, “Public Records Request Reference # D000303-010521,” 8 January 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 8 January 2021].

Washington State Department of Health, “Public Records Requests Reference # D000301-010321,” 29 January 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 29 January 2021].

Washington State Department of Health, “Public Records Request Reference # D000347-021721,” 8 April 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 8 April 2021].

Washington State Department of Health, “Public Record Request Reference # D000459-072921,” 18 August 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 18 August 2021].

Washington State Department of Health, “Public Record Request Reference # D000489-081821,” 4 10 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 4 10 2021].

Washington State Department of Health, “Public Records Request Reference # D000302-010321,” 29 January 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records.

Washington State Department of Health, “Public Records Request Reference # D000344-021021,” 4 May 2021. [Online]. Available: https://doh.wa.gov/about-us/public-records. [Accessed 4 May 2021].

T. T. Chen, “A review of methods for misclassified categorical data in epidemiology,” Statistics in Medicine, vol. 8, no. 9, pp. 1098-1106, 1989.

D. J. Hand, “Classifier Technology and the Illusion of Progress,” Statistical Science, vol. 21, no. 1, pp. 1-14, 2006.

N. Thai, “Subject: Data Set for Dashboard emailed to Clifford Knopik,” 27 July 2020. [Online]. Available: https://www.doh.wa.gov/Emergencies/NovelCoronavirusOutbreak2020COVID19/DataDashboard. [Accessed 27 July 2020].

A. Mandavilli, “Your coronavirus test is positive; maybe it shouldn’t be,” The New Yorkl Times, 29 August 2020.

“TWiV 641: COVID-19 with Dr. Anthony Fauci,” This Week In Virology (TWiV), 16 July 202. [Online]. Available: https://www.youtube.com/watch?v=a_Vy6fgaBPE.

G. Kolata, “Faith in Quick Tests Leads to Epidemic That Wasn’t,” The New York Times, 22 January 2007. [Online]. Available: https://www.nytimes.com/2007/01/22/health/22whoop.html.

E. Rahm and H. H. Do, “Data Cleaning: Problems and Current Approaches,” IEEE Data Engineering Bulletin, vol. 23, no. 4, pp. 3-13, 2000.

T. Dasu and T. Johnson, Exploratory Data Mining and Data Cleaning, Hoboken, NJ: John Wiley & Sons, Inc., 2003.

Washington State Department of Health, “COVID-19 Annual Report 2020,” 31 May 2023. [Online]. Available: https://doh.wa.gov/sites/default/files/2023-05/421038-2020Covid19AnnualReport.pdf. [Accessed 26 August 2024].

M. Bar-Hillel, “The base-rate fallacy in probability judgments,” Acta Psychologica, vol. 44, no. 3, pp. 211-233, 1980.

North Dakota Health & Human Services, “Base rate fallacy,” [Online]. Available: https://www.hhs.nd.gov/sites/www/files/documents/DOH%20Legacy/COVID-19_Vaccine_Base_Rate_Fallacy.pdf. [Accessed 14 August 2024].

S. Egger and G. Egger, “The vaccinated proportion of people with COVID-19 needs context,” The Lancet, vol. 399, no. 10325, p. 627, 2022.

MyNorthwest Staff, “COVID updates: State gears up for April 15 expansion of vaccine eligibility,” MyNorthwest.com, 11 April 2021. [Online]. Available: https://mynorthwest.com/2749463/covid-updates-washington-state-6/.

TVW.Org, “Governor Inslee Press Conference,” 28 July 2021. [Online]. Available: https://tvw.org/video/governor-inslee-press-conference-2021071085/?eventID=2021071085.

M. ElBermawy, “15 Most Common Deadly Mistakes in Data Analysis,” https://nogood.io/, 18 October 2018. [Online]. Available: https://nogood.io/2018/10/18/mistakes-data-analysis-marketing/. [Accessed 15 August 2024].

C. T. Bergstrom and J. D. West, Calling Bullshit: The Art of Skepticism in a Data-Driven World, New York: Random House, 2020.

A. Audi, M. AlIbrahim, M. Kaddoura, G. Hijazi and H. M. Yassine, “Seasonality of Respiratory Viral Infections: Will COVID-19 Follow Suit?,” Frontiers in Public Health, vol. 8, 2020.

C. Sailor, “Here are the latest COVID-19 numbers confirmed Monday in Washington state,” The Bellingham Herald, 7 February 2022. [Online]. Available: https://www.bellinghamherald.com/news/state/washington/article258145348.html. [Accessed 15 August 2024].

TVW.ORG, “Governor Inslee Press Conference on COVID-19,” 8 December 2020. [Online]. Available: https://tvw.org/video/governor-inslee-press-conference-on-covid-19-2020121068/?eventID=2020121068.

M. Neil, N. Fenton, J. Smalley, C. Craig, J. Guetzkow, S. McLachlan, J. Engler, D. Russell and J. Rose, “Official mortality data for England suggest systematic miscategorisation of vaccine status and uncertain effectiveness of Covid-19 vaccination,” DOI:10.13140/RG.2.2.28055.09124, 2022.

C. Severance, “Are dying with COVID-19 and dying from COVID-19 the same thing? In Oregon, they are,” KGW, 7 August 2020. [Online]. Available: https://www.kgw.com/article/news/investigations/questions-over-the-accuracy-of-how-the-state-tracks-covid-deaths/283-0b1b7b6c-695e-4313-92cf-a4cfd7510721.

A. M. Miller, “Man who died after falling from ladder ruled a coronavirus death by doctors: Report,” Washington Examiner, 20 November 2020. [Online]. Available: https://www.washingtonexaminer.com/news/836728/man-who-died-after-falling-from-ladder-ruled-a-coronavirus-death-by-doctors-report/.

M. Tomlinson, “Ethical Failures of the COVID-19 Era,” Brownstone Institute, 5 August 2023. [Online]. Available: https://brownstone.org/articles/ethical-failures-of-covid-19-era/.

Footnotes


1At the time of the author’s original calculation of 2.6% (8/4/2021) the WA DOH Dashboard showed 337705 COVID-19 cases between 2/20/2020 and 2/16/2021. For 2020 the final published total covid cases is 262,516. The total found by WA DOH between 2/20/2020 and 2/16/2021 (8816) is 3.4% if using just the 2020 total. Until the 2021 data is finalized and published the most accurate that can be stated is “about 3% ± .5%).

Subscribe to SciPublHealth


Science-based knowledge, not narrative-dictated knowledge, is the goal of WSES, and we will work to make sure that only objective knowledge is used in the formation of medical standards of care and public health policies.

About this paper


Cite this paper

Knopik C. Evaluating Data Integrity and Reporting Challenges in Public Health: Lessons from COVID-19 Data Collection in Washington State. Science, Public Health Policy and the Law. 2024 Oct 15; v5.2019-2024

Date submitted:

08/31/2024

Date accepted:

10/12/2024

Reviewing editor:

James Lyons-Weiler, PhD

  • Feds for Freedom

Discover more from Science, Public Health Policy and the Law

Subscribe now to keep reading and get access to the full archive.

Continue reading