several day ago, I noticed in the documentation this page:
According to the document, if you want to Access the error log files on Linux then you can convert the files to UTF-8.
This might make sense in Windows, since when we use SQL Server on Windows, then the error log files are written in UTF-16 encoding, but this sound to me a bit strange when it comes in a document which focused on SQL Server on Linux.
The error log file are related to the operating system, and Linux use UTF-8 as their default encoding. This means that there is only one issue with the document... The files are already written in UTF-8 encoding on Linux and not like in windows in UTF-16, which mean that this paragraph is really confusing... Therefore I decided to check and confirm that the error log files are written in UTF-8 as I expected.
To clarify! I use only Ubuntu for SQL Server according to my performance tests in the past, and therefore I only tested on Ubuntu. I assume that most common Linux will behave the same, but I sis not check it and if anyone can confirm this to one way or another, then you are welcome to add a comment.
The test was simple and you can re-produce it step by step:
Step 1: create Virtual machines
For the sake of the test I used 3 new Azure Virtual Machines using the built-in templates. First I tested on two machines with Ubuntu 18.04.2 LTS, and third machine was with Ubuntu 16.04.6 LTS.
Note: you can check which version of OS you use with the command lsb_release -a.
Note! SQL Server on Linux does not support multiple Instances, which is why I had to test twice on the same OS in order to test two versions of SQL Server.
Step 2: I installed SQL Server 2019 CTP 3.1 on the first machine, and SQL Server 2017 on the other two machines
To install SQL Server on Ubuntu, You can follow the instruction in the following documentation:
Step 3: On each server create new database with a name that is not in English. In my case I created a database with the name רונן which is my first name in Hebrew.
Note! You should NEVER use Non-English characters for names of entities like databases, tables, and so on! This is supported in SQL Server but HIGHLY NOT RECOMMENDED.
The reason I did created new database with non-English name, is that when the server start then it go over all the databases and write in the error log file the information. This way, I can confirm that the server will write in the log file some non-English characters. If you do not have any non-English characters in the log file then the file will be recognized as ASCII encoding since all the ASCII characters (code points 0-127) are encoded the same in UTF-8 encoding (UTF-8 fully compatible to ASCII).
Step 4: Restart the server
> sudo systemctl restart mssql-server
Step 5: Stop the server and Check the encoding
> sudo systemctl stop mssql-server
> sudo file /var/opt/mssql/log/errorlog
The result was the same in all tests!
/var/opt/mssql/log/errorlog: UTF-8 Unicode text, with very long lines, with CRLF line terminators
It is clear that the server used UTF-8.
* To be 100% sure, I also read the file using the command 'more' which does not support UTF-16.
* And finely I even move the file to windows and opened it with Notepad
Next, I confirmed that nothing was changed in Windows and as expected the error log files on windows are encoded with UTF-16.
1. SQL Server on Linux Ubuntu stores the error log file using UTF-8 encoding. Therefore, The documentation misleading and should be fixed IMO
2. Microsoft should change the encoding in windows to UTF-8 in my opinion