NVME Drives and Oracle Database

Hi There,

NVME drives are so hot these days. They have amazing read/write speed. my old samsung ssd drive can read and write around 550/530 MB per second and my even older 2.5inch hdd drive has 45/20 MB per second. Current NVME disc drives has around 3500 read and 3000 MB per second which is extremely high. Also, there are many people using raid 0 on nvme drives and some of them has reached 33GB per second read and 28GB write speed. so these are incredible speeds and you might want to use them in your database, especially the parts that requires too much IO like REDO Logs.

Well think again and be careful before an upgrade!

Not all Nvme drives has the same basic speed thresholds. I made some tests on a server which is a dedicated server at a hosting company and would like to share the results.

On this server there are 3 Nvme drives (TOSHIBA KXG60ZNV512G ) and 1 SSD drive (Micron 1100). SSD drive is the boot drive and Windows 2019 standard edition on it.

Now first one of the most popular test, Crystal Disk Mark Results:

SSD:

Nvme:

as you see NVME is much much faster on every category on Crystal disk mark. My basic thought was put every main component on a different NVME and check the “transaction per second” (TPS)  value and I installed 19.3 on the first NVME drive and create a database. data files reside on first Nvme, redo logs on second and archive logs on the third. That way all IO intensive components will be on different Nvme drive and I thought performance would be super high. Also I just have 1 redo log member for each redo group to eliminate extra disk IO.

To test database (a stress test actually) I used “swingbench” which is a great tool to test your database, written by Dominic Giles. Many thanks to him. I used “stress_test” model to check real transaction per second value, I used 24 simultaneous sessions and I only used INSERT statements. so screenshot of “ALL NVME” configuration test result is :

so TPS has a stable graphics with around 10,800 TPS. (testing result with all DML and select is around 18,000 TPS)

one of the biggest IO bottleneck is REDO and I think the second is archive log. when a redo log switch occurs, old redo log file is written as archive log. during that time both redo drives are read and archive log drive is written but as you see in the graph there is no down points and I assure you there were 2 – 3 redo log switches during the test above.

just for the curiosity, I put those components (redo, data file, archivelog) into different drives or put them same drives. One of them was locating redo logs into SSD drive not nvme! So, I expected much much less TPS and here is the result:

yes, that is insane! TPS was around 29,000! ((testing result with all DML and select is around 51,000 TPS) almost 3 times faster than Nvme! How is that possible. First if as you see in the graph it wasn’t stable. at some point it goes down around 7,000 TPS. That is the time redo log switch occurs. since I have 1 SSD drive I couldn’t test more but I am pretty sure if I have 2 SSDs and put a redo file each of them, that down wouldn’t happen. because SSD is not enough capacity to read and write at the same time. So SSD is 3 times faster than Nvme? not quite actually but I will come up to that later.

So what happened here? I check crystal disk mark results and they were awesome. Main problem is REDO has a very small block size which is 512 Byte. Crystal disk mark does not check that size by default so I used another disk benchmark utility. ATTO here are the results:

SSD:

Nvme:

if you check 512 Byte test, that SSD has 42 MB/s write speed and Nvme has just 28 MB/s. That explains a lot. Nvme disks are mostly designed to process big files in big chunks but not for small files in small chunks. of course this is not true for all Nvme drives. hosting companies uses “datacenter edition” nvme drives mostly. more reliable but has lower performance on some points.

For example, this is a Samsung 970 Pro Atto result:

it has around 70MB/s write speed at 512 Byte. if I have to chance to use this nvme, probably putting redo log file on this let me have much more TPS than that SSD.

So I am not saying, use SSD instead of Nvme. Just check first! as in my example it might decrease your performance.

Also I will try to test a sever with more ssd and see if I could prevent TPS down while archiving redo.

I wish you all healthy days.

Edit: by the way to eliminate the archiving load, on SSD drive, I created a one big redo file with 10GB and tested, TPS was stable and reached around 30,000 for just insert and 53,000 TPS for all operation)

Leave a Reply

Your email address will not be published.