Impact in Power BI of States Reporting Only Once a Week
I’ve noticed that states have stopped reporting coronavirus status every single day, including the weekends, especially in the South. I get it…we’re getting tired of keeping track of how we’re doing (or how we’re not faring). Florida, though, is a state that is on another level so that now my graphs are starting to be deceiving.
Image 1: Daily Cases: For South and Florida in particular
In Image 1, there are three pieces to the image: 1) the left hand side showing line graphs of daily cases for all states in the South; 2) the upper right showing daily cases just for Florida; and 3) lower right table showing the daily cases numbers for Florida for the last few days. On the left hand side, the line for Florida is the bright green one showing the state’s daily cases to be rising way above the other states, but that didn’t seem right. I tried to pull it out singly, which is in the upper right, to see if the graph changes but no, the line slopes upward very steeply. That is not right because I knew that the state was reporting once every 7 days and that is confirmed in the table in the lower right. Look at the dates: the daily reporting ended on 6/4/2021 and started to be once every 7 days.
These graphics were made in Power BI and the source of data came from Wikipedia. Data sources can be found at the end of this post.
So, I tried to see what Excel would do.
Image 2 uses the same Wikipedia data for Florida as in Image 1 but added CDC data for comparison. The CDC data shows Florida’s data for everyday rather than once a week. The blue line is Wikipedia’s data and is what the graphs in Image 1(Power BI) should be showing – the line dropping to zero for 6 days out of 7. Image 3 on the right hand side shows moving averages for both Wikipedia data (blue line) and CDC data (orange line). The Wiki line now falls to what I would expect it to be.
Finally, I went back to Power BI and tried graphing the moving averages.
In Image 4, which comes from Power BI, the left hand side shows the original daily cases for the South and the upper right hand side shows the moving averages for the South. You can see that the moving averages for the other states smooth out the daily jags but for Florida, no changes. If you look at the table in the bottom right corner, you will see that the moving averages for the last few days do not really compute the moving averages as one would expect (I would normally expect to include the zero days to be included in the calculation).
So something is going on with the Power BI. This instance is why it is good to double check the results in Power BI because it can give you unexpected results and maybe not what you want. And it also confirms why I want the data to come in daily, not once a week.
Sources of Data
WORLD : Cases and deaths from Wikipedia website https://en.wikipedia.org/wiki/COVID-19_pandemic_by_country_and_territory
US and STATES : Five main sources of data are available – Wikipedia, COVID Tracking Project, CDC, JHU, and HHS
Wikipedia: Wikipedia has broken out their tables into four links, separating out the cases from deaths and separating out the years.
New Cases 2021: https://en.wikipedia.org/wiki/Template:2019–20_coronavirus_pandemic_data/United_States_medical_cases
New Cases 2020: https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data/United_States_daily_cases_in_2020
New Deaths 2021: https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data/United_States_daily_deaths
New Deaths 2020: https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data/United_States_daily_deaths_in_2020
COVID Tracking Project: The COVID Tracking Project was a collaborative effort of free labor overseen by The Atlantic. This project ended on 3/7/2021. The Atlantic’s COVID Tracking Project was provided under Common Creative license “CC BY-NC-4.0” and covered cases, deaths, hospitalization, and positivity, amongst other data.
API: https://covidtracking.com/api/v1/states/daily.csv
Table: daily
CDC: CDC has become a replacement for the COVID Tracking Project for me although the data will often come in a few days later. Hospitalization comes in a week later. I’m tracking cases, deaths, hospitalization, and positivity.
Centers for Disease Control and Prevention, COVID-19 Response. COVID-19 Case Surveillance Public Data Access, Summary, and Limitations
Table: rows
API:
Cases and deaths: https://data.cdc.gov/api/views/9mfq-cb36/rows.csv
Hospitalization: https://beta.healthdata.gov/api/views/g62h-syeh/rows.csv (Good data doesn’t start until about 7/15/2020)
Testing: https://beta.healthdata.gov/api/views/j8mb-icvb/rows.csv
Positivity: https://beta.healthdata.gov/api/views/j8mb-icvb/rows.csv
John Hopkins University (JHU): I rarely show these sets of data; I mostly use Wikipedia or CDC but sometimes I like to reference the JHU.
Please cite our Lancet Article for any use of this data in a publication (link)
Provided by Johns Hopkins University
Center for Systems Science and Engineering (JHU CSSE):
https://systems.jhu.edu/
Terms of Use:
1. This data set is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) by the Johns Hopkins University on behalf of its Center for Systems Science in Engineering. Copyright Johns Hopkins University 2020.
2. Attribute the data as the “COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University” or “JHU CSSE COVID-19 Data” for short, and the
url: https://github.com/CSSEGISandData/COVID-19.
3. For publications that use the data, please cite the following publication: “Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1”
Website https://github.com/CSSEGISandData/COVID-19
HHS: Hospitalization data for US – can be US level, state level or county level