Oracle Table Compression Part1

Hi,

I have been dealing with table compression a lot during my development period and I still do. I wanted to share some information about table information but I believe this will be a long explanation so I will write them in parts and this is part1.

Table compression is a very useful feature of Enterprise Edition (unfortunately cannot be used in Standard Edition). Table compression will compress your data and it will reduce space requirement for the table. Not just on disk also in memory. When you compress a table, Oracle will store it compressed always so it will consume less memory space and I think this is very important for large databases with relatively small memory amounts.

Of course compression has a disadvantage and that’s why it is not default option for a table. as you can guess it has a “cpu overhead” which means your cpu will be doing some extra job about compressing and decompressing. Many times this overhead could be acceptable because you will have extra disk and memory space. Do not think this as just a space gain. while reading a data from disk (and loading into memory) you will read much more less data from disk. this will significantly increase your query performance because of less disk IO.

Basically there are 3 compression algorithm for tables:

  1. Basic
  2. Advanced (if I am not wrong, it was called “FOR OLTP” compression at earlier relases)
  3. Hybrid Columnar (or Column Store)

there are important things about those compression methods. Most important one is ADVANCED compression requires “Advanced Compression License” so it will cost you a lot! you can find detail of advanced compression license detail from this link: https://www.oracle.com/technetwork/database/options/compression/advanced-compression-datasheet-134474.pdf

Other two, Basic and Hybrid compression does not require a license but Hybrid Columnar Compression (also called as HCC or Column Store compression) requires special storage systems. your storage must be support HCC.

Basic Compression

you can compress a table with an Alter Table command or create table with a compression option. there are few important points. I’ve seen some clients that “mark” table as a compressed table but they don’t know that previous rows are not compressed! here is an example:

I created a dummy table and “mark” it as “compress” but size didn’t change and rows are not compressed! I know that because DBMS_Compression.get_compression_type returned 1 for a random row. 1 means no compression.

marking a table as “compress” means “consequent rows” will be compressed but not the existing ones. I used ROW STORE COMPRESS BASIC method above (or just “compress”) and only some specific operations can be compressed at that operation. This is the most basic compression but it has a HIGH compression ratio!

I just bulk inserted same rows (basically I doubled the number of rows) but table size increased only 4 MB not 12 MB.

So, if you want to compress existing rows, you must MOVE the table:

here it is, previous rows are also compressed. you can combine these operations with: ALTER TABLE TMP MOVE COMPRESS; it will compress currently existing rows. of course these operations will be invalidate indexes! you can use “update indexes” clause but it will increase compression time. I will talk about this at my later posts.

you can use different syntax:

Alter table tmp compress;
Alter table tmp row store compress;
Alter table tmp row store compress basic;

all commands above are the same. compress table with BASIC method. BASIC compression will compress new rows only if data is coming as BULK insert or an ARRAY insert. row by row inserts (insert into values (…) ) for example won’t be compressed. I mostly use this compression for previous months data which are not frequently accessed and when accessed, only queried not any DML operation because update also won’t be compressed.

Advanced Compression

Don’t let the naming mislead you. it is not more “advanced” than BASIC compression. actually it almost has the same compression level (HIGH) with basic compression but Advanced Table compression has some advantages. Most important one is it can be used for OLTP systems and by saying that can be used for active data not previous month data etc. performance overhead for currently DML operations are less than BASIC compression and it can even compress row by row inserts.

compression will not be done after a row processed (insert or update). when Block has reached to a threshold, Oracle will compress that block at the background. so your operations will run at fastest speed and even newly inserted / updated rows will be compressed. BUT it requires advanced compression license as I said before.

Hybrid Columnar Compression

this is the king of the compression’s. it will compress data much much much much more than first two and as you can guess it has a disadvantage. HCC is much more cpu intensive compression. it requires cpu usage far more than basic and advanced compression. Mostly used for less accessed rows. it has 2 methods which have 2 sub methods:

  1. Query Low (Lowest Compression Ratio, Lowest Cpu Usage)
  2. Query High
  3. Archive Low
  4. Archive High (Highest Compression Ratio, Highest Cpu Usage)

from 1 to 4, compression ratio increased but also cpu usage.

Basically, it uses a different logic in compression method. first two compression, compresses rows in the same blocks. writes same data once and uses a reference for actual places. Column store compression compresses columns not rows in a block and it is more logical because repetitive data mostly exists in same column, for example gender or customer id or product code etc.

So, at this post I wanted to share a real life example with compression ratios and speeds. of course I can not provide main data in here but I used a table with real data and copied it then compress the same table with different methods. Here is the results:

this small code, will compress table TMP with different methods and reports about timing and size. first one is NOCOMPRESS so we will see size of table without compression and then other methods results:

without compression this table’s size is 1408 MB. as you see BASIC and ADVANCED compression reduce the size to 280 MB but it takes 37-31 seconds to compress.

interestingly, Hybrid compression (column store) “query low” takes less than others, just 20 seconds and size reduced to 176 MB almost half of first two. You can also see that size is decreasing from “Query Low” to “Query High”. king is Query High, table size is 72 MB. 20x compression is very impressive but also as you can see, it takes 2 minutes to compress. Also, querying those compression methods will take significantly amount of time.

Compression is very useful if you use it accordingly. This is just a demonstration about compression level and basics of compression. I will mentioned about query performance of compression, CLOB compression, CLOB vs HCC compression etc at my later posts too.

that’s all for now. wish you all a health day.

Goodbye Log Triggers Welcome Flashback Data Archive

Hello,

I would like to talk about Flashback Data Archive (or Flashback Archive – FBA) in 12c. FBA was introduced in 11g. It is not new but it has very important new features that allow us to use FBA very efficiently and for free.  I would like to talk about new features more than what it does and and how it works but let’s give a quick look.

What is FBA?

Basically FBA is a module that let you store historical information about data in your table. When you enable FBA for a table then Oracle will start to watch this table and store every change on the table. this is a large definition and it is not wrong because all the DML changes will be started to log but also physical changes will be recorded too. you will be able to flashback your table before truncate or any other table alter. if you dropped a column, you can take it back or vice versa.

What is new in 12c?

First of all, and I believe the most important one, it is free anymore! in 11g FBA was using compressed tables by default which requires “Advanced Compression License” and that means additional costs. in 12c this is an optional feature. By default Oracle does not create those table as compressed so you don’t have to pay anything unless you don’t want to use compression option.

Secondly, FBA can store context information along the data changes anymore which I needed most and couldn’t use it in 11g just because of that. if you have a web application then probably  application will be using a common user and managing users itself. this causes you not to identify sessions because they are all same users but if you have a good developer team then you can ask them to set some context information like client_identifier. This data can be used to separate sessions and identify real users for example. With 12c FBA is able to store those information with changes and when we check historical data we can see all context information too.

Is FBA better than Log Triggers?

In my opinion, YES! of course there are many things to check but I will try to make a demonstration about performance of FBA.

This is my test case:

So we have two tables, TMP and TMP_FBA. I created a logging trigger on TMP and write every DML into TMP_LOG table with some context information like, client identifier, os_user, terminal etc. In this point you can see that my trigger is a for each row trigger and it will be writing every change one by one to log table. Some might use a compound trigger and store changed rows into a collection write it to log table at after statement section. This can optimize your logging while using Bulk DMLs but if your DMLs change too much rows then this can cause you consume too much PGA and memory problems. So I didn’t use it in my example. By the way to provide stability I created T_Base_Data table and I will use this to insert my original test tables.

My FBA is not compressed one (I didn’t use “optimize data” clause) as well as my Tmp_Log table too. I will do some DML and compare the performance. Also I want to compare size of the tables it will give useful information too. First I will insert some data with “insert select” statement then insert same data row by row using a for loop.

When we check timings, we see unbelievable  difference:

Trigger Bulk Insert: 22.13 seconds
FBA Bulk Insert       : 00.07 seconds

Trigger Row By Row Insert: 33.62 seconds
FBA Row By Row Insert       : 05.12 seconds

so for performance of our Insert statements, winner is definitely FBA. if we check log sizes, Our log table which is inserted by trigger, has reach to 112MB but FBA related objects are 35MB. One of the best things about FBA is it does not generate much INSERT log records because the original data is already in our table. This feature has already given us a lot of space. So we can say that about logging size FBA is winner again!

PS: While running my codes, I want you to know that only FBA table is Tmp_FBA so while checking size of FBA related object I used “object_name like ‘SYS_FBA%” condition. I will explain those objects at the end.

Let’s run some UPDATE:

Trigger Update : 53.38 seconds
FBA Update        : 02.25 seconds

Trigger Log Size : 96 MB (208 – 112)
FBA Log Size        : 91 MB (126 – 35)

Winner is still FBA.

Delete:

Trigger Delete : 48.24 seconds
FBA Delete        : 01.92 seconds

Trigger Log Size : 104 MB (312-208)
FBA Log Size        : 48 MB (174-126)

and winner is again FBA!

Everything is awesome but how can we see our logs in FBA? where are those logs? of course we can check the tables that FBA created automatically but there is a better way to see logs, Flashback Query:

you don’t even need to find the log table, just a flashback query will be enough to see historical data.

In my example I inserted 2 times all dba_objects into t_base_data table and I used this table to insert 2 times again into Tmp_FBA, that is why you see 4 COL$ tables.

finally, if you want to see FBA tables:

SYS_FBA_DDL_COLMAP_nnnnn is used to store column changes.
SYS_FBA_TCRV_nnnnn is used to store transaction informations.
SYS_FBA_HIST_nnnnn is used to store data changes.

also there are 2 default indexes on those tables.

Why FBA is so much faster?

trigger logging cause 2 actions. first calling a trigger which is a plsql object and then running an another insert statement. That means too much job to complete and context switch between sql and plsql.

FBA is using UNDO segments so basically it does no extra job! whenever you run a DML statement, Oracle copies all data which you are about to change to undo segments. if you commit, undo segments become obsolete (unless there is no select actively running) but if you rollback then all data in undo segments copied back to original table blocks. that is why commit is too fast but rollback is slow. Anyway, FBA reads undo segments which means your DML already generated undo blocks and FBA just read and save them. That’s all.

How about the Security?

One more time, FBA is the winner! You can not modify FBA related tables and by saying modify we mean any DML or DDL. even if SYS user can not drop or delete FBA related tables:

any user with drop any table or delete any table can delete your trigger base logging table but not with FBA! that brings a huge security advantage. of course a user who has flashback archive administer privilege can remove FBA from your table but this will be an obvious action because previous data will be also lost!

In Conclusion

Based on results of my test case I decided to convert all my log structure to FBA but there are a few more tests that I must complete first like checking PMOs (partition management operation), compression on FBA (since I have advanced compression license) etc.

thanks for reading.