en-UShe-IL
You are here:   Blog
Register   |  Login

Blog Archive:

Maximize
* Can be used in order to search for older blogs Entries

Search in blogs


Blog Categories:

Maximize
* Can be used in order to search for blogs Entries by Categories

Blog Tags:

Maximize
* Can be used in order to search for blogs by keywords

TNWikiSummit


Awared MVP 


 


Microsoft® Community Contributor 


Microsoft® Community Contributor


Read first, before you you use the blog

Maximize
ספט9

Written by: ronen ariely
09/09/2018 18:12 RssIcon

TABLE OF CONTENTS


Background

A web server software processes incoming network requests, and serve contents to the World Wide Web. A client software like a browser initiates communication by making a request for a specific resource and the server responds with the content of that resource.

Common web severs like Apache and Internet Information Services (IIS) allow us to keep track the incoming network requests. The server stores the information in simple text files. The raw log file, which lists the information as recorded by the server, are not particularly easy to read directly. The raw log files can become incredibly large very quickly. Moreover, usually we are interested in aggregate information like “how many people visit my website” or “how many times people view my blog”.

In this short post I will show how you can use the power of SQL Server and Transact SQL language in order to (1) import IIS log files to SQL Server table, (2) parse IIS log files data directly or from the data that was imported to the SQL Server Database, and (3) I will show several examples of queries to process the IIS log and get useful information out of it.

Note! you can enable or disable logging, configure the format of the IIS log file, specify the categories of information which are logged, change the directory where the log file is stored, and choose log file format – Read more here.


Figure out which fields are stored in the log file

The categories of information, which the IIS server stores in the log file, depends on the IIS configuration and can be different in each website. Moreover, IIS 8.5 enables us to log custom fields in addition to the standard logged set. Therefore, in order to parse the data or build a table that fit the fields that are stored in your IIS logs, our first step is to figure out which fields are stored in the log file.

Each log file starts with headers, which provide information about the version of the IIS, the date and time which we start to log the data, and list of fields which were recorded in the log

For example, the following headers were taken from my personal website’s log:

#Software: Microsoft Internet Information Services 10.0
#Version: 1.0
#Date: 2018-09-07 00:00:16
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-substatus sc-win32-status time-taken

2Do! Open the log file using simple text editor like Notepad++, and check which fields are stored in your IIS log file. We will use this information in the following steps.

Note! It is not recommended to use Windows Notepad, since the log file might be incredibly large.


Read IIS log file On-The-Fly, using Transact SQL

We can read the log file directly using SELECT from OPENROWSET.

Note! Using OPENROWSET we must be specified a format file or one of the three options SINGLE_BLOB, SINGLE_CLOB, or SINGLE_NCLOB.


Option 1: returns the content as a multi-row, multi -column rowset using format file

It is simple to work directly with log files, if all the lines in the file are well formatted in the same way. In these cases, we can create a format file which define the structure including all the field which are stored in the log file (as we found in previous steps).

This is also possible if the file includes headers only at the beginning of the file, since OPENROWSET supports starting the parsing from specific a line using the parameter FIRSTROW.

My log file from yesterday 2018-09-07 fits this case. It includes 4 lines of headers (as presented above) at the beginning of the file, while the rest of the file is well formatted. Therefore, I can use the bellow format file:

14.0
15
1       SQLCHAR            0       0   " "       1     DATE                               ""
2       SQLCHAR            0       0   " "       2     TIME                               ""
3       SQLCHAR            0       0   " "       3     s-ip                               SQL_Latin1_General_CP1_CI_AS
4       SQLCHAR            0       0   " "       4     cs-method                          SQL_Latin1_General_CP1_CI_AS
5       SQLCHAR            0       0   " "       5     cs-uri-stem                        SQL_Latin1_General_CP1_CI_AS
6       SQLCHAR            0       0   " "       6     cs-uri-query                       SQL_Latin1_General_CP1_CI_AS
7       SQLCHAR            0       0   " "       7     s-port                             SQL_Latin1_General_CP1_CI_AS
8       SQLCHAR            0       0   " "       8     s-username                         SQL_Latin1_General_CP1_CI_AS
9       SQLCHAR            0       0   " "       9     c-ip                               SQL_Latin1_General_CP1_CI_AS
10      SQLCHAR            0       0   " "       10    cs(User-Agent)                     SQL_Latin1_General_CP1_CI_AS
11      SQLCHAR            0       0   " "       11    cs(Referer)                        SQL_Latin1_General_CP1_CI_AS
12      SQLCHAR            0       0   " "       12    sc-STATUS                          ""
13      SQLCHAR            0       0   " "       13    sc-substatus                       ""
14      SQLCHAR            0       0   " "       14    sc-win32-STATUS                    ""
15      SQLCHAR            0       0   "\r\n"    15    time-taken                         ""

And we can read the log file directly using the query bellow:

-- Option 1: OPENROWSET with the FIRSTROW, and format file with full structure
SELECT *
FROM OPENROWSET(
    BULK 'E:\W3SVC2\u_ex180907.log'
    , FIRSTROW = 2 -- check the Note bellow about why we start from line 2
    ,FORMATFILE = 'E:\MyFormatFile.fmt'
) AS LogFile
GO

Note! I mentioned that my log file includes 4 lines of headers, but in my code, I skipped only one line and I started to read from line 2. The reason is that all the headers (four lines) in the file considered as one line, because of my format file. According to our format file we have 15 columns separated with space " ". The first three lines in the log file together have less than 15 spaces. Therefore, the server continues to read the content in the next line as it is part of the first row. If you will add 15 spaces in the first line, then the server will understand that once he reaches "end of line" ("\r\n"), then he should start collect the data for the next row. Therefore, we start to read from line 2.


In reality we should never count on this solution!!!

Let me explain why… IIS creates headers in the middle of the file and not only in the beginning of the file! Each time the application starts, the IIS will add new headers to the log file. For example, on “app pool recycles” event new headers will be added. Therefore, we cannot use a format file in order to parse the entire data directly, since once the server will get a line which is part of the headers, it will raise and error.

There are two approaches to deal with this issue: (1) We can read the file differently, as I will show next. (2) We can clean the file from all headers first, using one of the methods I will show bellow.

At this section, I am working on the file directly so let’s see how we can read the file differently.

Option 2: returns the content as a multi-row, single-column rowset using format file, and filter the rows using simple WHERE condition.

The procedure is exactly as before, except that this time I will use a format file which configure a single column, like bellow:

#Software: Microsoft Internet Information Services 10.0
10.0
1
1       SQLCHAR             0       0     "\r\n"   1     RawData            SQL_Latin1_General_CP1_CI_AS

And now we can use the same query as above:

-- Option 2: OPENROWSET and format file with “one column” structure + filter the headers
SELECT RawData
FROM OPENROWSET(
    BULK 'E:\W3SVC2\u_ex180907.log',
    FORMATFILE = 'E:\SingleColumnFormatFile.fmt'
) AS LogFile
where not LogFile.RawData LIKE '#%'
GO

Next, we can split the value in the column in each row, using XML

-- splitting the value into columns
-- executing this quesry can take long time! paring strking in SQL Server is not recommended to do on0the-fly.
-- SQL Server does not works well with string manipulation
;With MyCTE as (
    -- Split string to columns
    SELECT LogFile.RawData,
        MyXML = CONVERT (
            XML,
            N'' +
            REPLACE(REPLACE(LogFile.RawData,'&','&'), ' ','') +
            N''
            )
    FROM OPENROWSET(
        BULK 'E:\W3SVC2\u_ex180907.log'
        ,FORMATFILE = 'E:\SingleColumnFormatFile.fmt'
    ) AS LogFile
    where not LogFile.RawData LIKE '#%'
)
select
    --RawData,MyXML,
    MyXML.value('/MyRow[1]/MyCol[1]','nvarchar(max)') AS [DATE]             ,
    MyXML.value('/MyRow[1]/MyCol[2]','nvarchar(max)') AS [TIME]             ,
    MyXML.value('/MyRow[1]/MyCol[3]','nvarchar(max)') AS [s-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[4]','nvarchar(max)') AS [cs-method]        ,
    MyXML.value('/MyRow[1]/MyCol[5]','nvarchar(max)') AS [cs-uri-stem]      ,
    MyXML.value('/MyRow[1]/MyCol[6]','nvarchar(max)') AS [cs-uri-query]     ,
    MyXML.value('/MyRow[1]/MyCol[7]','nvarchar(max)') AS [s-port]           ,
    MyXML.value('/MyRow[1]/MyCol[8]','nvarchar(max)') AS [s-username]       ,
    MyXML.value('/MyRow[1]/MyCol[9]','nvarchar(max)') AS [c-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[10]','nvarchar(max)') AS [cs(User-Agent)]  ,
    MyXML.value('/MyRow[1]/MyCol[11]','nvarchar(max)') AS [cs(Referer)]     ,
    MyXML.value('/MyRow[1]/MyCol[12]','nvarchar(max)') AS [sc-STATUS]       ,
    MyXML.value('/MyRow[1]/MyCol[13]','nvarchar(max)') AS [sc-substatus]    ,
    MyXML.value('/MyRow[1]/MyCol[14]','nvarchar(max)') AS [sc-win32-STATUS] ,
    MyXML.value('/MyRow[1]/MyCol[15]','nvarchar(max)') AS [time-taken]     
from MyCTE
GO

  

Option 3: returns the content as a single-row, single-column rowset using SINGLE_CLOB, and split the value into rows using  STRING_SPLIT function

-- Option 3: OPENROWSET using SINGLE_CLOB, and split the value into rows using  STRING_SPLIT function
SELECT [Value]
FROM OPENROWSET(
    BULK 'E:\W3SVC2\u_ex180907.log'
    ,SINGLE_CLOB
) AS LogFile
CROSS APPLY STRING_SPLIT(LogFile.BulkColumn,char(10))-- Char(10) is "Line Feed"
where not [Value] LIKE '#%'
GO

Same is in previous option, we can split the value in the column in each row to multiple columns using XML

-- splitting the value into columns
-- executing this query can take long time! parsing string in SQL Server is not recommended to do on0the-fly.
-- SQL Server does not works well with string manipulation
;With MyCTE as (
    -- Split string to columns
    SELECT [Value],
        MyXML = CONVERT (
            XML,
            N'' +
            -- Since STRING_SPLIT works on CHAR(1) I will clear the CHAR(13) and use CHAR(10) as delimator
            REPLACE(REPLACE(REPLACE([Value],'&','&'),CHAR(13),''), ' ','') +
            N''
            )
    FROM OPENROWSET(
        BULK 'E:\W3SVC2\u_ex180907.log',
        SINGLE_CLOB
    ) AS LogFile
    CROSS APPLY STRING_SPLIT(LogFile.BulkColumn,char(10))-- Char(10) is "Line Feed"
    where not [Value] LIKE '#%'
)
select
    --RawData,MyXML,
    MyXML.value('/MyRow[1]/MyCol[1]','nvarchar(max)') AS [DATE]             ,
    MyXML.value('/MyRow[1]/MyCol[2]','nvarchar(max)') AS [TIME]             ,
    MyXML.value('/MyRow[1]/MyCol[3]','nvarchar(max)') AS [s-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[4]','nvarchar(max)') AS [cs-method]        ,
    MyXML.value('/MyRow[1]/MyCol[5]','nvarchar(max)') AS [cs-uri-stem]      ,
    MyXML.value('/MyRow[1]/MyCol[6]','nvarchar(max)') AS [cs-uri-query]     ,
    MyXML.value('/MyRow[1]/MyCol[7]','nvarchar(max)') AS [s-port]           ,
    MyXML.value('/MyRow[1]/MyCol[8]','nvarchar(max)') AS [s-username]       ,
    MyXML.value('/MyRow[1]/MyCol[9]','nvarchar(max)') AS [c-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[10]','nvarchar(max)') AS [cs(User-Agent)]  ,
    MyXML.value('/MyRow[1]/MyCol[11]','nvarchar(max)') AS [cs(Referer)]     ,
    MyXML.value('/MyRow[1]/MyCol[12]','nvarchar(max)') AS [sc-STATUS]       ,
    MyXML.value('/MyRow[1]/MyCol[13]','nvarchar(max)') AS [sc-substatus]    ,
    MyXML.value('/MyRow[1]/MyCol[14]','nvarchar(max)') AS [sc-win32-STATUS] ,
    MyXML.value('/MyRow[1]/MyCol[15]','nvarchar(max)') AS [time-taken]     
from MyCTE
GO

Note! if you execute all three solutions in SSMS and check the combine execution plan, then you will see that the third query seems like much better, but this is not necessarily correct! Reading the entire file using SINGLE_CLOB is very useful, but splitting the text using the function STRING_SPLIT can cost a lot of resources which are not represented in the execution plan (I wrote a blog with several example about this topic).

Clean headers in IIS log file before using it

Parsing the file on-the-fly, as in the second and third options above, can be very slow and will perform very bad. Since the log file that the IIS creates is not necessary includes headers only in the beginning of the file, we cannot count on the first solution.

The best solution for most cases is to clean the log files in advance, before we read the data on-the-fly as we did above, or import the data to SQL Server as we will do bellow.

We can clean the files using any programming language and/or several tools. These options are outside the scope of this post, but for the sake of the post and in order to provid a full solution for the readers, I will show two simple solutions, one using SQL Server and one using external tool.

Clean the log file using SQL Server and Transact SQL

Step 1: Create a Staging Table named MyStagingTableName

-- Create Staging Table
DROP TABLE IF EXISTS dbo.MyStagingTableName;
CREATE TABLE dbo.MyStagingTableName (line NVARCHAR(MAX))
GO

Step 2: Import the content of the log file to the Staging Table using BULK INSERT

-- Import the data
BULK INSERT dbo.MyStagingTableName
FROM 'E:\W3SVC2\u_ex180907.log'
WITH (
 ROWTERMINATOR = '\n'
)
GO
SELECT * FROM dbo.MyStagingTableName
GO

Step 3: Delete unwanted rows with simple query

-- clean headers
DELETE dbo.MyStagingTableName
WHERE line like '#%'
GO
SELECT * FROM dbo.MyStagingTableName
GO

Step 4: Export Staging Table to flat file with bcp

There are multiple tools built-in in SQL Operations Studio (SOS) and SQL Server Management Studio (SSMS) as well as other utilities which come with SQL Server. This is not the topic of this tutorial so I will simply use one of these options which is using bcp

-- export the clean content to new clean file
xp_cmdshell 'bcp IIS_Log_DB.dbo.MyStagingTableName OUT "E:\W3SVC2\u_ex180907_clean.log" -n -w -T -S .\SQL_Instance_Name'
GO

  

Clean IIS log file using Notepad++

Step 1: Open the file with Notepad++

Step 2: Open the search windows (click Cntl + f)

Step 3: Move to the “Replace” tab

Step 4: Select search mode “Regular Expression”

Step 5: In the text box “Find what” enter the expression: ^[#].*\R

Note! The sign “^” indicates the start of a line, the “[#]” is a single character which match #, and “.*” matches anything else in the line. The addition of “\R” will remove the end of line. Without this we will get blank lines since we remove the content but not the end of line.

Step 6: In the text box “Replace with” delete the content (we want to replace with empty string which is the equivalent to remove the text).

Step 7: Make sure that the mouse cursor is at the beginning of the file

Step 8: And click on “Replace All”

Step 9: Save the file with new name

  

Import IIS log files to SQL Server Table

OK… we saw how we can read the file using OPENROWSET, and we discussed the option of cleaning the file in advance. But data is only a tool, and our goal is to get useful information by analyzing/processing the data. We can execute queries to analyze the data on-the-fly, for example by using the above query as Common Table Expression (CTE) and using aggregate tasks.

Processing the data on-the-fly might fit for one-time-needs where we need to process a short period of time but it can be a very bad solution for the long range. There is no reason to read the entire log file each time we execute the query in order to get the information out of it. Moreover, as I mentioned several times the log files are usually incredibly large and in real life cases we usually want to process several log files together.

The best solution for most cases is to import the data from all the log files into SQL Server table, and process the data as needed from the SQL Server. For this task let go over the procedure we ca use in order to import IIS log file to SQL Server.

Preparation: creating table to store the data

Let’s create new database to store all our IIS information

Note! For the sake of this tutorial, I highly recommend to use SQL Operations Studio (SOS) over of SQL Server Management Studio (SSMS). SQL Operations Studio includes an awesome feature, which allows to present the result of any query as a graph. I am going to use this feature to get a nice presentation when I will analyze the data, without the need to export the data to anther application for reporting.


2Do! Open query editor (SOS or SSMS for example).

2Do! Create new database and move to use the new database

create database IIS_Log_DB
GO
 
use IIS_Log_DB
GO

2Do! Create a new table, according to the fields that are stored in your log file. If you are not sure which field are needed, then go back to the explanation I gave in the “Figure out which fields are stored in the log file” section.

---------------------------------------------------------
-- Preparation: creating table to store the data
---------------------------------------------------------
DROP TABLE IF EXISTS dbo.IisLogsTbl
CREATE TABLE dbo.IisLogsTbl(
 [DATE]             [DATE] NULL,
 [TIME]             [TIME] NULL,
 [s-ip]             [VARCHAR] (48) NULL,
 [cs-method]        [VARCHAR] (8) NULL,
 [cs-uri-stem]      [VARCHAR] (255) NULL,
 [cs-uri-query]     [VARCHAR] (2048) NULL,
 [s-port]           [VARCHAR] (4) NULL,
 [s-username]       [VARCHAR] (256) NULL,
 [c-ip]             [VARCHAR] (48) NULL,
 [cs(User-Agent)]   [VARCHAR] (1024) NULL,
 [cs(Referer)]      [VARCHAR] (4096) NULL,
 [sc-STATUS]        [INT] NULL,
 [sc-substatus]     [INT] NULL,
 [sc-win32-STATUS]  [BIGINT] NULL,
 [time-taken]       [INT] NULL,
 INDEX cci CLUSTERED COLUMNSTORE
)
GO

  

Import the data using BULK INSERT

BULK INSERT doesn't' have the ability to logically skip rows based on string values. It only allows us to start the reading from specific line in the file. If your log files are cleaned or includes only headers at the beginning of the file, then this solution should fit you best.

---------------------------------------------------------
-- Import the data using BULK INSERT
---------------------------------------------------------
TRUNCATE TABLE dbo.IisLogsTbl;
BULK INSERT dbo.IisLogsTbl
FROM 'E:\W3SVC2\u_ex180907.log'
WITH (
    FIRSTROW = 2,
    FIELDTERMINATOR = ' ',
    ROWTERMINATOR = '\n'
)
GO
SELECT * FROM dbo.IisLogsTbl
GO

Note! you can notice that using BULK INSERT it is not a must to use a format file, but this case fit only for well formatted files.


IMPORT THE DATA USING SELECT FROM OPENROWSET, with full structure format file

OPENROWSET and the full structure format file

---------------------------------------------------------
-- Import the data using OPENROWSET with the FIRSTROW, and format file with full structure
---------------------------------------------------------
TRUNCATE TABLE dbo.IisLogsTbl;
;With MyCTE as(
    SELECT *
    FROM OPENROWSET(
        BULK 'E:\W3SVC2\u_ex180907.log'
        , FIRSTROW = 2
        ,FORMATFILE = 'E:\FullStructureFormatFile.fmt'
    ) AS LogFile
)
INSERT dbo.IisLogsTbl
SELECT * FROM MyCTE
GO
SELECT * FROM dbo.IisLogsTbl
GO

The big advantage of using INSERT FROM OPENROWSET is that we have the ability to logically filter rows based on string values, since we use simple SELECT FROM query. This means that we can filter the data once it is returns by the OPENROWSET, but like the solution that based on BULK INSERT, the above solution will not work if the file is not well formatted since the failure comes before we filter the data during the OPENROWSET.

These two options are the fastest and most effective if our IIS log file includes only header at the beginning of the file, but does not fit for production, since we cannot count on having a clean log files, where there are headers only at the beginning of the file, as I explained above. Therefore, if you do not want to clean the files in advance (which is recommended for most cases), then we should use one of the next solutions

IMPORT THE DATA USING SELECT FROM OPENROWSET, with one column structure format file

---------------------------------------------------------
-- Import the data using OPENROWSE Twith format file as one column structure
---------------------------------------------------------
TRUNCATE TABLE dbo.IisLogsTbl;
;With MyCTE as (
    -- Split string to columns
    SELECT LogFile.RawData,
        MyXML = CONVERT (
            XML,
            N'' +
            REPLACE(REPLACE(LogFile.RawData,'&','&'), ' ','') +
            N''
            )
    FROM OPENROWSET(
        BULK 'E:\W3SVC2\u_ex180907.log',
        FORMATFILE = 'E:\SingleColumnFormatFile.fmt'
    ) AS LogFile
    where not LogFile.RawData LIKE '#%'
)
INSERT dbo.IisLogsTbl (
    [DATE],[TIME],[s-ip],[cs-method],[cs-uri-stem],[cs-uri-query],
    [s-port],[s-username],[c-ip],[cs(User-Agent)],[cs(Referer)],
    [sc-STATUS],[sc-substatus],[sc-win32-STATUS],[time-taken]
)
select
    --RawData,MyXML,
    MyXML.value('/MyRow[1]/MyCol[1]','nvarchar(max)') AS [DATE]             ,
    MyXML.value('/MyRow[1]/MyCol[2]','nvarchar(max)') AS [TIME]             ,
    MyXML.value('/MyRow[1]/MyCol[3]','nvarchar(max)') AS [s-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[4]','nvarchar(max)') AS [cs-method]        ,
    MyXML.value('/MyRow[1]/MyCol[5]','nvarchar(max)') AS [cs-uri-stem]      ,
    MyXML.value('/MyRow[1]/MyCol[6]','nvarchar(max)') AS [cs-uri-query]     ,
    MyXML.value('/MyRow[1]/MyCol[7]','nvarchar(max)') AS [s-port]           ,
    MyXML.value('/MyRow[1]/MyCol[8]','nvarchar(max)') AS [s-username]       ,
    MyXML.value('/MyRow[1]/MyCol[9]','nvarchar(max)') AS [c-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[10]','nvarchar(max)') AS [cs(User-Agent)]  ,
    MyXML.value('/MyRow[1]/MyCol[11]','nvarchar(max)') AS [cs(Referer)]     ,
    MyXML.value('/MyRow[1]/MyCol[12]','nvarchar(max)') AS [sc-STATUS]       ,
    MyXML.value('/MyRow[1]/MyCol[13]','nvarchar(max)') AS [sc-substatus]    ,
    MyXML.value('/MyRow[1]/MyCol[14]','nvarchar(max)') AS [sc-win32-STATUS] ,
    MyXML.value('/MyRow[1]/MyCol[15]','nvarchar(max)') AS [time-taken]     
from MyCTE
GO
SELECT * FROM dbo.IisLogsTbl
GO

  

IMPORT THE DATA USING SELECT FROM OPENROWSET, with single row and one column structure

---------------------------------------------------------
-- Option 3: OPENROWSET using SINGLE_CLOB, and split the value into rows using  STRING_SPLIT function
---------------------------------------------------------
SELECT [Value]
FROM OPENROWSET(
    BULK 'E:\W3SVC2\u_ex180907.log'
    ,SINGLE_CLOB
) AS LogFile
CROSS APPLY STRING_SPLIT(LogFile.BulkColumn,char(10))-- Char(10) is "Line Feed"
where not [Value] LIKE '#%'
GO
 
-- splitting the value into columns
-- executing this quesry can take long time! paring strking in SQL Server is not recommended to do on0the-fly.
-- SQL Server does not works well with string manipulation
;With MyCTE as (
    -- Split string to columns
    SELECT [Value],
        MyXML = CONVERT (
            XML,
            N'' +
            -- Since STRING_SPLIT works on CHAR(1) I will clear the CHAR(13) and use CHAR(10) as delimator
            REPLACE(REPLACE(REPLACE([Value],'&','&'),CHAR(13),''), ' ','') +
            N''
            )
    FROM OPENROWSET(
        BULK 'E:\W3SVC2\u_ex180907.log',
        SINGLE_CLOB
    ) AS LogFile
    CROSS APPLY STRING_SPLIT(LogFile.BulkColumn,char(10))-- Char(10) is "Line Feed"
    where not [Value] LIKE '#%'
)
select
    --RawData,MyXML,
    MyXML.value('/MyRow[1]/MyCol[1]','nvarchar(max)') AS [DATE]             ,
    MyXML.value('/MyRow[1]/MyCol[2]','nvarchar(max)') AS [TIME]             ,
    MyXML.value('/MyRow[1]/MyCol[3]','nvarchar(max)') AS [s-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[4]','nvarchar(max)') AS [cs-method]        ,
    MyXML.value('/MyRow[1]/MyCol[5]','nvarchar(max)') AS [cs-uri-stem]      ,
    MyXML.value('/MyRow[1]/MyCol[6]','nvarchar(max)') AS [cs-uri-query]     ,
    MyXML.value('/MyRow[1]/MyCol[7]','nvarchar(max)') AS [s-port]           ,
    MyXML.value('/MyRow[1]/MyCol[8]','nvarchar(max)') AS [s-username]       ,
    MyXML.value('/MyRow[1]/MyCol[9]','nvarchar(max)') AS [c-ip]             ,
    MyXML.value('/MyRow[1]/MyCol[10]','nvarchar(max)') AS [cs(User-Agent)]  ,
    MyXML.value('/MyRow[1]/MyCol[11]','nvarchar(max)') AS [cs(Referer)]     ,
    MyXML.value('/MyRow[1]/MyCol[12]','nvarchar(max)') AS [sc-STATUS]       ,
    MyXML.value('/MyRow[1]/MyCol[13]','nvarchar(max)') AS [sc-substatus]    ,
    MyXML.value('/MyRow[1]/MyCol[14]','nvarchar(max)') AS [sc-win32-STATUS] ,
    MyXML.value('/MyRow[1]/MyCol[15]','nvarchar(max)') AS [time-taken]     
from MyCTE
GO

  

Conclusions

We can import the data to SQL Server using OPENROWSET or BULK INSERT.

If the file includes headers only at the beginning of the content, then we can use simple BULK INSERT solution or SELECT FROM OPENROWSET with a full structure format file. These solutions are the fastest and best in this specific case but does not fit for production, since IIS adds headers each time the application start which mean it might add headers in the middle of the log file as well.

For the general cases, we can use INSERT FROM OPENROWSET with SINGLE_CLOB, where we read the file as a single value (returns one row with one column), and we will split the value into multiple rows using STRING_SPLIT function. Another option is to INSERT FROM OPENROWSET with a single column format file. In both of those solutions we will need to split the value of the singe column to the final columns structure of the table, which is done using XML.

Well ... So, what did we have in this post?

In this post I showed you how you can parse the data in IIS log files while reading it using OPENROWSET, how you clean IIS log file, I showed you several options you can use in order to import IIS log files to SQL Server using Transact-SQL language, and on the way we used the function STRING_SPLIT and a small trick on how to split a string to multiple columns using XML.

I hope that this post was useful and clear. It is a bit long, since I tried to put full explanations together with solutions to cover all cases. You are more than welcome to post comments and send me feedback on my Facebook page or contact me on Facebook or Linkedin .

  


[Personal Site]   [Blog]   [Facebook]   [Linkedin]

  

Resources and more information


Tags: IIS , Log , T-SQL
Categories: SQL
Location: Blogs Parent Separator Public blog