יונ8
Written by:
ronen ariely
08/06/2020 05:02 
Let's "play" with some data and what fit better in these days then to use the Coronavirus Covid-19 data?
Preparation: Getting the Source data
The source of the data whichn I will use in this post was taken from the CSSEGISandData GitHub project. The project includes the COVID-19 Data Repository created by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University.
CSSEGISandData GitHub project can be found here:
https://github.com/CSSEGISandData/COVID-19
For the sake of this sample code, I wil use the data from a specific day:
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/03-28-2020.csv
The data is in a csv comma delimited file.
Step 1: Create format file in order to query the data
The source data includes 12 columns: FIPS, Admin2, Province_State, Country_Region, Last_Update, Lat, Long_, Confirmed, Deaths, Recovered, Active, Combined_Key. You can use the following basic format file.
12.0
12
1 SQLCHAR 0 100 "," 1 FIPS ""
2 SQLCHAR 0 100 "," 2 Admin2 SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "," 3 Province_State SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "," 4 Country_Region ""
5 SQLCHAR 0 100 "," 5 Last_Update ""
6 SQLCHAR 0 100 "," 6 Lat ""
7 SQLCHAR 0 100 "," 7 Long_ ""
8 SQLCHAR 0 100 "," 8 Confirmed ""
9 SQLCHAR 0 100 "," 9 Deaths ""
10 SQLCHAR 0 100 "," 10 Recovered ""
11 SQLCHAR 0 100 "," 11 Active ""
12 SQLCHAR 0 100 "\r\n" 12 Combined_Key ""
Step 2: Query the data in the file using OPENROWSET
As first step I always like to watch the data, before I do any ETL manipulation and storing the data.
SELECT
FIPS, Admin2, Province_State, Country_Region, Last_Update, Lat,
Long_, Confirmed, Deaths, Recovered, Active, Combined_Key
FROM
OPENROWSET(
BULK N
'E:\Users\ronen\Desktop\x\03-28-2020.csv'
,
FORMATFILE = N
'E:\Users\ronen\Desktop\x\csv.fmt'
,
FIRSTROW=2,
FORMAT=
'CSV'
)
AS
CoronaData
GO

Step 3: Create new table to store the data
At this time we can create new table and import the data while executing some ETL task like converting the string data into the right data type which fit the columns
CREATE
TABLE
[dbo].[Covid-19](
[FIPS]
BIGINT
,
[Admin2]
varchar
(100)
NULL
,
[Province_State]
varchar
(100)
NULL
,
[Country_Region]
varchar
(100)
NULL
,
[Last_Update] DATETIME2
NULL
,
[Lat]
Decimal
(16,13)
NULL
,
[Long_]
Decimal
(16,13)
NULL
,
[Confirmed]
INT
NULL
,
[Deaths]
INT
NULL
,
[Recovered]
INT
NULL
,
[Active]
INT
NULL
,
[Combined_Key]
varchar
(100)
NULL
)
ON
[
PRIMARY
]
GO
Step 4: Import the data to the table
And now we can INSERT the data while converting the data types
;
with
MyCTE
as
(
SELECT
FIPS, Admin2, Province_State, Country_Region, Last_Update, Lat,
Long_, Confirmed, Deaths, Recovered, Active, Combined_Key
FROM
OPENROWSET(
BULK N
'E:\Users\ronen\Desktop\x\03-28-2020.csv'
,
FORMATFILE = N
'E:\Users\ronen\Desktop\x\csv.fmt'
,
FIRSTROW=2,
FORMAT=
'CSV'
)
AS
CoronaData
)
INSERT
[dbo].[Covid-19](
[FIPS],[Admin2],[Province_State],[Country_Region],[Last_Update],
[Lat],[Long_],[Confirmed],[Deaths],[Recovered],[Active],
[Combined_Key]
)
SELECT
CONVERT
(
BIGINT
,FIPS),
Admin2, Province_State, Country_Region,
CONVERT
(DATETIME2,Last_Update),
CONVERT
(
DECIMAL
(16,13),Lat),
CONVERT
(
DECIMAL
(16,13),Long_),
CONVERT
(
INT
,Confirmed),
CONVERT
(
INT
,Deaths),
CONVERT
(
INT
,Recovered),
CONVERT
(
INT
,Active),
Combined_Key
FROM
MyCTE
GO
Note! You can download the rest of the data of all dates from the project site and repeat the same steps in order to import it to the [Covid-19] table.
Next we can start to Analyzing the data.
Analyze the number of confirmed for each day in each country
Get a row for each country and region with the information of the number of confirmed for each day
;
with
MyCTE
as
(
select
*
from
(
SELECT
Country_Region,
--Province_State,
FirstDayOfMonth = FORMAT (
DATEADD(
DAY
,1,EOMONTH(DATEADD(
MONTH
,-1,Last_Update))),
'yyyy-MM-dd'
,
'en-US'
)
, Confirmed
--, Deaths, Recovered, Active
FROM
dbo.[Covid-19]
) src
pivot(
MAX
(Confirmed)
for
FirstDayOfMonth
in
(
-- You can add all the dates which you imported to the table
[2020-03-01], [2020-03-02], [2020-03-03], [2020-03-04], [2020-03-05]
)
) piv
)
select
*
from
MyCTE
Order
by
Country_Region
--,Province_State
GO
for the sake of the post I only imported the data from a single day which mean that all the rest of the columns which not have data as can be seen bellow:

What more can we do next?
1. You can use exactly the same query to get the Deaths, Recovered, and Active effected people in each country. Simply replace in the query the word Confirmed with one of the others.
2. You can get more details of each Province in each country. For this you simply need to un-comment the column Province_State from the query above in the SELECT and in the WHERE sections.
So...
This was a small taste demonstrate what we are doing in the last few months in order to help the appropriate authorities analyzing the numbers and get the right decision 😀
Unfortunately, we can only provide the data and the numbers but the fact is that the actual decisions are probably made based on political personal gain😕, but we are speaking about the technical aspect here so let's leave the political discussion to a different channel 🙄