Tables with Memoptimize for Read (fast lookup)

Hi,

I would like to share my initial thoughts about memoptimized for read (fast lookup) tables in this post. fast lookup tables came with 18c and they are here to provide faster access to a table if you are using primary key with a equality condition. this is basically single table hash cluster. instead of using primary key index, Oracle creates a hash index and searching on a hash index should be much more faster than a unique index.

first, create the test environment.

to use a fast lookup table you must set memoptimize_pool_size paramter to a number higher than 0. this is a static memory component in sga to store hash index of the table. this memory component is not resizable which means it won’t grow or become smaller with automatic shared memory management.

create a table for fast lookup:

segment creation must be immediate otherwise you will get an error! Your table must have a primary key as well, since whole structure is based on PK, it is logical to have one. I roughly inserted 146.000 rows into table.

EDIT: I also tried those tests with 14 million rows too but results are similar.

this is not enough! you must use dbms_memoptimize package to use this table as fast lookup table.

let’s see what is happening now:

I skipped first execution statistics! you will see a small consistent gets at the first execution but consecutive executions will be like above.

as you see there are new execution plan steps “TABLE ACCESS BY INDEX ROWID READ OPTIM” and “INDEX UNIQUE SCAN READ OPTIM”, so these steps tell us, an hash index is used to retrieve data like as in key-value pairs. there is no “consistent gets”. this is amazing. almost nothing has read for this query and that should make this query so much faster.  Is it? well, I couldn’t find it, not as much as I expected at least. Actually, I found different blog posts mentioning about fast lookup tables but non of them has made a performance test.

So, I want to test the speed but there is a problem. for now, there are many limitations on fast lookup tables. one of them is “you cannot use it via plsql”. this means I can’t create a simple plsql procedure and run a query for 100000 times and compare timings. to do a test, I wrote a small python code.

Warning: I am no python developer, I just now python to solve my small problems, this code that I share below probably has many bugs but I just use it for this test purposes.

here is the python code I use:

after changing code many times, I used this final code. this code creates 9 threads and every thread run “select * from tmp_optimize where id = :myid” query for 100.000 times and myid variable value is between 1 and 10000 in circular way in this loop. each threads print total execution time for 100.000 runs of the query. here is the result:

timing is changed between 16.0 and 16.1 seconds for this fast lookup tables. how about a normal table with normal primary key?

I used same python code just changed table name from “tmp_optimize” to “tmp_not_optimize” and result is like this:

it is between 17.1 and 17.2 seconds. there is %6 performance gain (give or take).

I am not sure whether %6 is a good gain or not because to use fast lookup table actively, we need to separate serious amount of memory area. Also, there are few limitations like, cannot be used with partitioned or compressed tables, cannot be used in plsql (not yet anyway). only equality conditions on primary keys can advantage of fast lookup.

During my tests I didn’t check wait events. I expect that less latch waits on fast lookup table since no consistent gets occur. Maybe there is something I missed and I will do more tests on these tables (I didn’t even test “memoptimize for write” (fast ingest) yet).

wish you all healthy days.

How to Demonize Apex V Function!

Hi,

Answer of the question in the subject is “Using too much V in SQL statements”. Let’s see why and what should you do.

First things first, I am NOT a APEX developer. I am a DBA and Database Developer but some of my clients are using APEX and I am dealing with many sql functions that uses V function in apex. Since the day 1, I did not like this V function because developers using it in very different ways and it returns VARCHAR2 but it uses for all kind of compares without any conversion functions (this is not on the V function of course).

So, what is the problem here? At my clients’ codes, V function is used heavily in sql statements. Some queries that I dealt with lead me to investigate V function because even if the query has no performance issue, still, it takes too much to complete. When I checked the V function, I find out that it is not DETERMINISTIC (which shouldn’t be because of the logic of DETERMINISTIC but I will talk about this later). Let’s do some tests but before I must explain a few things. I hacked the code of V a little bit and add some logging codes to demonstrate how it is worked and performance issues about it but I won’t explain how I do it (probably many of you can do it easily) and share any source code of it. whenever a V function is called my logging code will write the parameter name of v function into “my_log” table.

I added my log codes, set an apex session in my sql developer session and run these selects:

I delete log table first then call a select statement with four V function in select list run over DUAL table and then check the log table, as you see it is called 4 times. So same data is read 4 times and 4 plsql function call is occurred. this caused too much context switch between SQL and PLSQL and this can reduce your performance dramatically.  Especially, Where clause is extremely dangerous, why? Because V function might run for all rows in the table which will cause a disaster.

as you can see,  this query called  V function 79.647 times why? because there are 79.645 rows in the TMP table. Since I use V in Where clause, condition ( object_id = v(‘OBJECT_ID’) ) executed for every rows in the table (79.645 rows) and only 1 of the rows meets the condition so select list runs 1 times and there are 2 more V function (same with the one in where clause) so totally V called 79.647 times. I hope you see the problem here. in your application code if you use V function in your sql statements, it might be called too much!

How about an index? yep that will help:

so it run only 3 times at this execution because this time search operation (where condition) is run on an index not table so V function is called 1 times in where clause and 2 times in select list. this means you should have indexes at least for your condition columns.

is that enough? Let me show you a pseudo code on a database:

so many same V function call in select statement. creating and index will help you about where clause how about select list? since all items in select list cause a function call (even with same parameters) let’s test performance:

calling this query for 10.000 times (or you can thing this as calling sql 1 times for 10.000 rows) takes 3.2 seconds (20 same V function). if we change the code to this:

this query takes only 0.5 seconds. timings might significant but think about 1000 different queries that run on your database 1000 times per day and calculate spend of time for just calling V function. it is a lot! your significant amount of db time might be wasted.

Let’s improve the code a little bit more:

even better. if you increase number of call, you will see that first 2 plsql code timings will increase linearly but the last code (code just above) will give you almost same performance. Remember, V is a function not a variable. Use variables instead of using V if you can.

Even if these might help your system, main problem is still there: V is not DETERMINISTIC. you might change your codes as I suggested above or create more indexes but many developers also use V in VIEWS which is another and for me a bigger disaster. it is harder to modify the code you cannot use bind variables etc… What would happen if V was deterministic? I modified V and made it deterministic and let’s see the result with first test case without index:

pay attention that there is no index but V is just called 2 times (1 for where clause and 1 for select list, event if select list has 2 copy). it was 79.647 without DETERMINISTIC if you remember the first result.

So making V as deterministic will solve almost every kind of problem but only problem is what I looking for here is not deterministic. Why? Let me explain what deterministic is first. if a function always returns same value with same parameters then this function is a deterministic function like TO_CHAR or SUBSTR. this allow oracle to know that if calls with same parameters are repeated, it can be cached and used first calculation result for on going calls. This way, if same function with same parameters are called in a SQL statement, that function is called only 1 times but this is not V. Result of V is not deterministic because application might change the value of a V item and this means V call with same parameters can return different results because it is just return a context variable value.

So, it shouldn’t be deterministic but it should be something that acting like deterministic for an sql statement. I don’t think any apex developer wants to see a change of V values during a select statement. I think they assume values of V won’t change during the sql statement. So, there should be something like DETERMINISTIC_FOR_CALL in Oracle database. we should be able to define plsql objects as “deterministic” while they are used in a sql statement. Their result should be calculated once and used during the sql call. I think this is also necessary for any plsql object that we can call on SQL. because current structure of plsql is kind of violating the consistency. A select statement returns rows as they were when the sql call started. even if the rows are changed during the sql call, result of the statement won’t be affected from those changes but PLSQL breaks that. when a function calculated for every row, a select statement inside the plsql object can return different results because it will be affected by on going changes. This is a database option. Database should provide us this functionality, Apex team has nothing to do about it.

By the way, I am really frustrated about data type conversions about V. since V returns VARCHAR2, many developer does not care about type conversions. if you compare a V value with a number then you should use “TO_NUMBER(V(…))” for example or at least (as I search for it) NV function (number result of V). this is another issue to me.

For now, I suggest to avoid using V in a SQL statement as much as possible and use bind variable if you can.  Let’s say you have some SQL with so many V call with same parameter, what can you do?  you can “force” oracle to cache them:

there are better way to write this sql but I just want to demonstrate a way. as you see this query takes 3.26 seconds to complete because of too many V calls. you can rewrite your query and cache the V results first and then use them as this:

I am adding a NO_MERGE hint to prevent optimizer to merge queries before query run. as you see this version returns in 0.19 seconds 15 times faster becuase V functions are called only 2 times (2 different V item is selected OBJECT_ID and USER_NAME).

Edit:

Another and maybe better solution for this problem is using your own V function and create it as deterministic:

this is faster than previous join method. only thing that you should change V calls with MY_V.

I hope, this helps you on your systems. Now after those tests, I must connect with my clients and explain them to change their coding to gain significant db time.

wish you all healthy days.

Subquery Caching

Hi,

I want to talk about very sweet performance “trick”. I love when I do something small and gain a lot 🙂 Today, I will demonstrate subquery caching in Oracle and try to explain how it works and it’s limitations.

Probably, you are using some plsql functions in your select (or dml) statements. There is always a package that contains some functions calculates staff or return some constant strings like “base url” or “environment name” etc. some of those functions has parameters too and we call them from SQL. Calling user defined plsql function from sql statement is a performance problem because it will cause context switch a lot. Let’s say you have function return a constant string. To see the performance difference I will add a sleep code into function so it will be easier to see performance problem.

this is our function. you need to execute grant on DBMS_LOCK package to create this procedure. Now, we will call this function from an sql statements. let’s create a table and then call this function:

when I call sf_base_url function in the first select statement (2 rows) it takes 2 seconds and on the second select statement (with 4 rows) it takes 4 seconds. this proves that our function called as many times as the number of rows from the query. Did you see the problem here? this function always returns the same result but it called more than 1 times. Actually calling it just 1 time is enough. Think about it on your databases. you have business function takes 10 seconds to complete and it is called on an sql statements which returns 1000 rows. you will lose 9990 seconds (we must call it at least 1 time so 10 seconds must be spend) calling your function unnecessarily.

This is where the subquery caching comes in. We can tell Oracle to cache this function and use that value from now on. How we do it is a little bit interesting to me 🙂 We just use a subquery to call this function.

as you see, I change sf_Base_url function to a subquery and select the function value from dual. Main query returns 4 rows (rownum <=4) but it only takes 1 second! our function is just called 1 times and result of it used consequent rows. That is so small change and very nice performance gain.

So far so good. how about a function with parameters? Let’s change the function a little bit and make it have 1 parameter, returning our parameter added at the end of base url .

let’s call this from the sql and give a column value as paramater:

it takes 4 seconds as expected. Now, subquery caching:

it also takes 4 seconds. Subquery caching didn’t work? of course not it is used but this time our function is cached with it’s parameters. since every parameter is different, our function is called for each different parameter and every row has different my_string column value. we would see the performance gain if we have same values on our table. lets do it:

Now our table has 3 A, B and C rows. let’s run subquery caching again.

Now it returns 12 rows (3 of each letter) but it takes 4 seconds because our function called one time only for each parameter and repeated parameters are read from the cache. Work like a charm.

Is there a limit for that? how many different parameter can be cached and how it is work? Let’s test.

Now, our table has 256 distinct rows and then I double it by inserting all rows again. Finally, we have 512 rows on the table. Run our subquery cached select statement again.

interesting result. This table has 256 distinct value and 512 total number of rows. every row is doubled but both subquery caching and normal call takes almost same time. Did not subquery caching work? Actually, It worked but it can only cache limited number of rows. if there are more than number of rows that can be cached, new parameter values of function will overwrite an old cached values. So, our table rows are like this: 1,2,3,4…256,1,2,3,4…256 it cached some rows but new arriving rows eventually overwrite the cache and we couldn’t take advantage of subquery caching. lets order those rows:

t2 has the same table with table t but rows are ordered in the first place. When we run subquery caching example on t2 it takes 270 seconds almost half of the previous version which was on table t. now it was able to use cached values before they are overwrite.

Edit:

I wrote that my tests indicates that only 8 different values has been cached not 255 but my further tests show that this is not true. I assume Oracle uses a hash function to store data and while hashing our function and variable, some collusions occur. that is why some of the function results are overwritten frequently. this is not something we can change. of course consequent same calls are cached and take advantage of subquery caching.

thank you very much for reading. any comment will be welcomed.

wish you all healthy days.

Blockchain vs Immutable tables

Hi,

as you know 21c is here for linux and some new features are (as always) pretty exciting. two of them are Blockchain and Immutable tables. Let’s explain and test.

first of all, those are new features for 21c but they are backported to 19c. at 19.10 you are able to use BLOCKCHAIN tables and 19.11 IMMUTABLE tables. I will make my tests at 19.12

if you are familiar to Bitcoin or any other alt coin then you probably now what blockchain is. basically it is a distributed ledger. every transaction is a “block” and each block contains information about transaction like sender, receiver, amount etc and also a hash value of the previous transaction (using some hash functions). That way all new transactions are added to the end of the “chain” and that creates the chain. This ensures that chain is unbreakable you can not tamper with old records because that will breaks the chain (hash values won’t match with next blocks).

So, Oracle uses this technology to create a new table called Blockchain table. every inserted row in this table are stored with some additional information like current timestamp and most importantly, previous row’s  hash values! Hash value operations are made at the “COMMIT” time not before. that way Oracle guarantees that every row is a part of chain and you can not tamper with those rows. if you do, chain will be broken and can not be verified. Also, Oracle won’t let you delete or update those rows using any DML statement like delete, update, merge etc.

Immutable tables are pretty similar to Blockchain tables. you can not update those table using any DML as blockchain tables. Difference is rows in the Immutable tables are not linked to each other. Rows does not store any hash value from previous rows.

both of them are designed to create “unchangeable”, Insert only tables. if you think that this is a real necessity especially auditing staff. for example if you store a log information for a credit card information you don’t want anyone to change them and you want to be sure that those rows are intact and unchanged on an investigation.

of course you just need to store table data for a specific amount of time like 5 years etc. No one can store them forever so at some point you must be able to delete old rows. This is something you can define while creating those tables. Also, even if I may not able to delete the rows what would happen if someone drops the table! this is of course forbidden and again you can define a drop policy while creating tables. Let’s start with a blockchain table:

you need BLOCKCHAIN keyword (or IMMUTABLE for immutable tables) at the create statement.

“NO DROP UNTIL 31 DAYS IDLE” defines how can you drop this table. if you have no actions on that table for 31 days then and only then you can drop it. By the way after creating table, if you don’t insert any row then you can drop it immediately.  you can just use NO DROP to disable dropping of this table (only way is dropping database entirely).

“NO DELETE until 30 days after insert” defines how can you delete “old” rows. in my example you can delete a row after 31 days of it’s insertion. Again you can use NO DELETE LOCKED to disable deleting from table.

hashing part is the default and you have to write this down to define blockchain structure.

let’s insert a row and try to delete or update:

as you can see modifying is forbidden. you can not even delete rows that passes the retention time that you defined (no delete until…) with a delete statement. You must use DBMS_BLOCKCHAIN_TABLE.DELETE_EXPIRED_ROWS procedure to delete rows beyond the retention.

This is BLOCKCHAIN table. of course there are much more properties like user signs etc but I will leave it here for now. For IMMUTABLE tables, almost everything is the same:

just change “blockchain” keyword to “immutable” and remove hash clause. pretty straight. as expected you can not delete or update rows. if you want to delete old rows then you must use DBMS_IMMUTABLE_TABLE.DELETE_EXPIRED_ROWS procedure.

Common Feature of both tables is that they can not be modified by authorized db users not even SYS or db vault admins. No one in the database can change those rows. Basic difference of the both tables  is Blockchain table has steps for cryptographic linking between the last row and new row but Immutable table has no option like that. what does this provide? Immutable tables can not be modified via Oracle Database Software but how about bypassing DB? if someone modify datafile using Operating system tools then your table data will be modified and you can not be aware of it. Blockchain tables, on the other hand, has a verify function. since every row linked to each other with some hash value, if you modify this data using any tool, like OS tools, verification will be failed because if a row changes then it’s hash value will be change and that hash value is stored on the next row.

Then why immutable tables are exists since blockchain tables are more secure? You might guess that, creating hash values and storing them on the next row causes some extra work which means performance issues.

Let’s make a test (with a flow on purpose):

I inserted rows using bulk insert and row by row on both table and here are the results. huh! blockchain table is almost faster than immutable table? Actually no! as I said earlier hashing operations are done at the commit time not before on the blockchain table! in my code there is no commit! that’s why we see the same performance result. how about adding commits before printing time:

here it is! at worst, immutable tables are faster more than 6 times (row by row insert) and at best, almost 40 times faster than blockchain tables. So, while choosing table type be careful. if immutable tables meet the need then use them.

this is just a glance for new table types. I will make more tests and come with new posts.

wish you healthy days.

Edit: I didn’t mentioned about COMPATIBLE parameter. it should be at least 19.10 to create blockchain table and 19.11 for immutable table.

Pragma UDF & Deterministic

Hello everyone,

I wanted to write about something simple and can improve the performance of your sql statements. Unfortunately, many database developers use too much plsql function in their SQL statements. I say too much because if you learn more about the SQL, you will use less plsql. SQL is very capable language to process data. Of course we don’t design and develop a whole system all the time. We start to work somewhere and use their standards and run old codes. Many of my clients have some basic plsql functions which does not interact with SQL like just doing basic calculations or conversions. there plsql functions are pure algorithms which can be also run by SQL too. for example:

this is a very simple VAT adding function. As you can see this calculation can be done by SQL too but let’s say someone writes this code 20 years ago and now it is used everywhere in database (maybe also in the application too). So you might not be able to change all of them but you can increase it’s performance by deterministic or pragma UDF.

let’s make a small example first:

this will create a table with 800.000 rows. now I will run a query against this table with and without function:

it takes almost 10 times more to complete the query when you use a PLSQL function.

Reason is, whenever you use a plsql function inside a query (or vice versa) something called “context switch” is happening. you can think this as, SQL statements are run by SQL Engine and PLSQL statements are run by PLSQL engines and when ever these two engines meet with opposite statement, they have to call the other engine. this is context switching. You must avoid as much as you can from this but as I said if the system you work on is run the queries like this, you might not have to change all but you can, at least, increase the performance. that where pragma udf and deterministic come.

Deterministic tells Oracle, this function will return same result for same parameter all the time! if your plsql object does not interact with sql objects like tables or time variables like sysdate then probably it is a good candidate for deterministic. Also, if you want to index a user defined plsql function it must be deterministic. you can add deterministic clause to a function which is interact with sql objects or sysdate etc but this will only cause you headaches. so be careful about that.

Pragma udf is a way to reduce context switch overhead. it does not reduce the number of context switch but it makes it smaller and faster. that way if you use a plsql function in sql, context switch will be lighter.

Let’s make a small test now, my example is very good candidate for both deterministic and pragma udf:

I created 3 version of my vat function. first is the base one, second is with a DETERMINISTIC clause and third is both deterministic and pragma udf. they all do the same work. let’s compare their performance:

I run 4 versions, with basic function, deterministic function, deterministic & pragma udf and finally of course pure sql statement. As you can see BASE version is the worst and takes 1.15 seconds average to complete. just adding a deterministic clause decreases time spent almost 5 times and adding pragma udf decrease another 30-35% based on deterministic and finally pure sql is of course the fastest. pure sql is 10 times faster than BASE version. so, if you can use pure sql use it! if it is not possible at least you can check for deterministic candidates and use pragma udf.

Of course there is something called, “subquery caching” :

as you see, subquery caching might be even faster than pragma udf and deterministic but it depends. this solution also requires changing the sql statement too.

So you can improve the performance just by small changes. find a plsql function that is so simply and used everywhere in your database (I am pretty sure there is one) and try the suggestions above. by the way, 19c has something even stronger called “SQL Macro” which I mentioned a little bit before:

http://mustafakalayci.me/2020/10/28/sql-macro-is-finally-here/

http://mustafakalayci.me/2020/10/30/sql-macro-part2-column_value-nvlparam-column_value/

I mostly mentioned for table sql macro but you can use sql macro to generate parts of sql statement too.

I hope this helps you to tune your sql & plsql statements.

wish you all healthy days.

SQL Macro Part2 column_value = nvl(:param, column_value)

Hi,

Since I am working on “SQL Macro”, I am keep testing things. so please see my previous post: http://mustafakalayci.me/2020/10/28/sql-macro-is-finally-here/

as mentioned in previous post, is it possible to get rid of column_value = nvl(:param, column_value) conditions? since they become a huge pain in big and complex queries. I provided a very cumbersome solution in previous post., I wanted to test and do more about it.

As I said I wasn’t able to read any parameter value in sql macro except numbers. So today I tried COLLECTIONS (Array) and got different results. Let’s start with COLUMNS. we can send a column list using DBMS_TF package:

so we are able to group any table based on a column and get count of it. since P_Columns is an array, we can send more columns in it but how can we use them dynamically in sql macro?

great we can use plsql to generate column list and use it in sql statement but of course this could be a little bit dangerous because we started to use “string concatenation”. Not much but a little bit because columns in DBMS_TF.COLUMS_T array must be actual columns of the previous parameter or you got an error.

since we can use collections and data is passed to plsql (unlike varchar2 parameter) may be we can use user defined collections to pass a parameter and we can! that is really excited me at first(not then).

So since I wanted to eliminate unnecessary conditions like ” column1 = nvl(:param1, column1″ if :param1 is null, I planned to pass those column name and parameter value and check them in sql macro. if parameter value is not null then add it to sql condition.

here is VERY basic version:

Viola! if I send null as the value of employee_id, just “select * from hr.employees” statement is run and a full table scan is done but when I send 100 as employee_id value “where employee_id = 100” condition is added.

PS: I am aware that my conditions are just an equality conditions and no AND/OR operator selection or grouping conditions (parenthesises for related OR  conditions) etc etc etc. this is just an example after all. I will write a proper version of it when I decided to use it.

as you can see, I send 90 as department_id value and it is in my conditions but value of salary is null and it is not in my conditions. so did we get rid of those conditions? well not exactly.

event if it seems working (and it is working) I did not used any “bind variable” in my examples. those parameters are constant values and so the query. this is the actual running code for my last example:

select “EMPLOYEE_ID”,”DEPARTMENT_ID”,”FIRST_NAME”,”SALARY” from p where EMPLOYEE_ID = 100 and DEPARTMENT_ID = 90

so no bind variables. Let’s use one:

unfortunately, when I used bind variable I got an error from line 10 of my Sf_Generate_Sql sql macro and line 10 is where the P_Where collection is read. so somehow SQL does not initiate this collection when I used bind variable and/or it is not passed to sql macro. Even if I tried the plsql version, it is not working:

so I can not use it right now. since this is a 20c feature may be we should wait for 21c for more details and more options.

Edit:

I also wanted to add one more example, Let’s get only specified type of columns from any table:

I defined 1 as number, 2 as varchar2 etc etc. so I can get only NUMBER columns from a table (any table actually) or only varchar2 columns.

SQL Macro has a great functionality improvement for SQL in my opinion. since 20c is not accessible yet I can not try SCALAR sql macro but I am confident that it will save us to write many same sql fragments for many different sql statements.

wish you healthy days.

SQL Macro is finally here!

Hi,

I was waiting for this for a long time. Finally we can use sql macro in Oracle database. Actually I was excited when I first read about this but after my tests, I stuck at some points. I was hoping to use this as “parameterized views” as in, send a parameter and use it to choose which select statement runs and I am kind of able to do that but not I expected. I have read about the documentation but not much detail. I will mentioned about some “odd” staff that I found about sql macro at the end of this post.

And this could help us about the conditions of: “where customer_name = nvl(:Cust_name, customer_name”. I think many of you know this and suffer it right 🙂 if parameter is null then bring all of them but if it is not null then just bring matched rows.


EDIT: bummer! thanks to Mike Cutz from oracle community, showed me a mistake on my performance compare below. while I was testing, I used constant ‘EMPLOYEES’ value instead of bind variable. so I am changing that part.


So, what is sql macro. Sql macro is an SQL add-on to enhance the sql capability. mostly we use PLSQL when we need to do something complex and can not do it just by sql but this cause “context switch”. you can think this as, there are two engines in Oracle, one is SQL and the other PLSQL. two engines does not do other’s job. if a plsql statement requires to run an SQL then statements is sent to SQL engine and if sql engine finds a plsql (a function in select statement for ex) then it is sent to plsql engine to run. going back and forward between these engines is called “context switch” and context switch is a performance ENEMY! So we always try to do our job in just one engine but sometimes we have to use both. This is a new feature for 20c but I am running those code on 19.6.

SQL Macro will reduce the requirement of PLSQL so there won’t be context switch and we will have more performance. Also, it does help about getting better execution plans for complex queries some how. There are 2 types of SQL Macro, Scalar and Table. default is table and I will demonstrate table SM here.

Table SQL Macro works and looks like pipelined functions but remember, this is run by sql engine not by plsql.

I just created a dummy table and created SF_TEST sql macro. sf_test looks like just a regular function with one varchar2 parameter but at the definition we use SQL_MACRO before “as” keyword. and then just return a string which has a select statement. here is usage:

as you see our sql macro worked as a pipelined function and returned object information (TMP table) with named EMPLOYEES. basically it is just running the select statement in the sql macro and this operation is done in sql engine (well at least mostly…).

let’s make another example:

cool right? I can parameterized the number of rows to return from an sql statement.

Let’s take it further:

this is really cool! we can pass a “table” as parameter (which is come with polymorphic table functions actually) to our sql macro and run our query based on this table. in my example I passed 4 (number of rows) and “hr.employees” table. please be careful, hr.employees is not a varchar2 variable! this is not a string concatenation operation. Now we can get any rows from any table:

So, let’s get back to performance enhancement. Actually I am not sure about the exact result but I run a few simple test and it showed, execution plans are more accurate. here is an example:

now, I make my first query a little bit complex with using some NVL functions and bind variables. now I use 4 bind variables (owner, object_name, object_type, object_id) but only OBJECT_NAME has a value and others are null. also please remember I created and index on OBJECT_NAME on TMP table. so in this query I am looking for all objects with name EMPLOYEES but since Oracle needs to check all values if they are null or not, execution plan becomes more complex and cost is higher. the equivalent version of this query (with this bind variable values) is:

Select * from tmp where object_name = ‘EMPLOYEES’;

and this is a very simple query which can use the index on TMP table.

EDIT: I deleted this part and rewriting it please see my previous edit at top.

I will discuss about reading variable values in sql macro after that but as a sneak peek, you can not read any variable value in sql macro except NUMBER types. so to eliminate the problem above (column = nvl(paramater, column)) we can at least do something by adding more number variables:

we can shape our sql in sql macro. as I said since I can only read NUMBER variables, I added parameters that indicates whether the real parameter is null or not and based on this information I constracted my sql statements.


What is “odd” about this?

when I see the first examples, I thought to create condition list using the parameter values. if a parameter is not null then add “column = p_parameter1” to query but something is different in this environment. even if we are writing a “function” and can use plsql commands like “if then else” or for loop, we can not read parameter values (even more oddly) except NUMBERS.

everything is normal here, I am showing the value using dbms_output and also my select is fetching parameter value from dual and both okay. query returned 1 row with value 10 and output is showing correct value.

Let’s do it with a varchar2:

as you see, when parameter type is varchar2, then I can not read it inside the sql macro! my output is null (parameter p_param is) but query returned the value of parameter. same thing happens for date type too. So I can not use “if p_param = ‘ABC’ then” condition because p_param is null inside the sql macro but I can use “if p_param = 1 then” if p_param is a number.

I am running these codes on version 19.6 but also run them livesql.oracle.com which is 19.8 and same result.

I don’t know why it is acting like this maybe a bug maybe a feature 😀 I told that this is run by sql not plsql but I think this is not entirely true. some plsql engine is involved here but parameter values are not passed to plsql engine except numbers. of course this is just a guess I am not sure about it but I would like to use parameter values inside the sql macro.

wish you all healthy, corona free days.

Oracle Table Compression Part1

Hi,

I have been dealing with table compression a lot during my development period and I still do. I wanted to share some information about table information but I believe this will be a long explanation so I will write them in parts and this is part1.

Table compression is a very useful feature of Enterprise Edition (unfortunately cannot be used in Standard Edition). Table compression will compress your data and it will reduce space requirement for the table. Not just on disk also in memory. When you compress a table, Oracle will store it compressed always so it will consume less memory space and I think this is very important for large databases with relatively small memory amounts.

Of course compression has a disadvantage and that’s why it is not default option for a table. as you can guess it has a “cpu overhead” which means your cpu will be doing some extra job about compressing and decompressing. Many times this overhead could be acceptable because you will have extra disk and memory space. Do not think this as just a space gain. while reading a data from disk (and loading into memory) you will read much more less data from disk. this will significantly increase your query performance because of less disk IO.

Basically there are 3 compression algorithm for tables:

  1. Basic
  2. Advanced (if I am not wrong, it was called “FOR OLTP” compression at earlier relases)
  3. Hybrid Columnar (or Column Store)

there are important things about those compression methods. Most important one is ADVANCED compression requires “Advanced Compression License” so it will cost you a lot! you can find detail of advanced compression license detail from this link: https://www.oracle.com/technetwork/database/options/compression/advanced-compression-datasheet-134474.pdf

Other two, Basic and Hybrid compression does not require a license but Hybrid Columnar Compression (also called as HCC or Column Store compression) requires special storage systems. your storage must be support HCC.

Basic Compression

you can compress a table with an Alter Table command or create table with a compression option. there are few important points. I’ve seen some clients that “mark” table as a compressed table but they don’t know that previous rows are not compressed! here is an example:

I created a dummy table and “mark” it as “compress” but size didn’t change and rows are not compressed! I know that because DBMS_Compression.get_compression_type returned 1 for a random row. 1 means no compression.

marking a table as “compress” means “consequent rows” will be compressed but not the existing ones. I used ROW STORE COMPRESS BASIC method above (or just “compress”) and only some specific operations can be compressed at that operation. This is the most basic compression but it has a HIGH compression ratio!

I just bulk inserted same rows (basically I doubled the number of rows) but table size increased only 4 MB not 12 MB.

So, if you want to compress existing rows, you must MOVE the table:

here it is, previous rows are also compressed. you can combine these operations with: ALTER TABLE TMP MOVE COMPRESS; it will compress currently existing rows. of course these operations will be invalidate indexes! you can use “update indexes” clause but it will increase compression time. I will talk about this at my later posts.

you can use different syntax:

Alter table tmp compress;
Alter table tmp row store compress;
Alter table tmp row store compress basic;

all commands above are the same. compress table with BASIC method. BASIC compression will compress new rows only if data is coming as BULK insert or an ARRAY insert. row by row inserts (insert into values (…) ) for example won’t be compressed. I mostly use this compression for previous months data which are not frequently accessed and when accessed, only queried not any DML operation because update also won’t be compressed.

Advanced Compression

Don’t let the naming mislead you. it is not more “advanced” than BASIC compression. actually it almost has the same compression level (HIGH) with basic compression but Advanced Table compression has some advantages. Most important one is it can be used for OLTP systems and by saying that can be used for active data not previous month data etc. performance overhead for currently DML operations are less than BASIC compression and it can even compress row by row inserts.

compression will not be done after a row processed (insert or update). when Block has reached to a threshold, Oracle will compress that block at the background. so your operations will run at fastest speed and even newly inserted / updated rows will be compressed. BUT it requires advanced compression license as I said before.

Hybrid Columnar Compression

this is the king of the compression’s. it will compress data much much much much more than first two and as you can guess it has a disadvantage. HCC is much more cpu intensive compression. it requires cpu usage far more than basic and advanced compression. Mostly used for less accessed rows. it has 2 methods which have 2 sub methods:

  1. Query Low (Lowest Compression Ratio, Lowest Cpu Usage)
  2. Query High
  3. Archive Low
  4. Archive High (Highest Compression Ratio, Highest Cpu Usage)

from 1 to 4, compression ratio increased but also cpu usage.

Basically, it uses a different logic in compression method. first two compression, compresses rows in the same blocks. writes same data once and uses a reference for actual places. Column store compression compresses columns not rows in a block and it is more logical because repetitive data mostly exists in same column, for example gender or customer id or product code etc.

So, at this post I wanted to share a real life example with compression ratios and speeds. of course I can not provide main data in here but I used a table with real data and copied it then compress the same table with different methods. Here is the results:

this small code, will compress table TMP with different methods and reports about timing and size. first one is NOCOMPRESS so we will see size of table without compression and then other methods results:

without compression this table’s size is 1408 MB. as you see BASIC and ADVANCED compression reduce the size to 280 MB but it takes 37-31 seconds to compress.

interestingly, Hybrid compression (column store) “query low” takes less than others, just 20 seconds and size reduced to 176 MB almost half of first two. You can also see that size is decreasing from “Query Low” to “Query High”. king is Query High, table size is 72 MB. 20x compression is very impressive but also as you can see, it takes 2 minutes to compress. Also, querying those compression methods will take significantly amount of time.

Compression is very useful if you use it accordingly. This is just a demonstration about compression level and basics of compression. I will mentioned about query performance of compression, CLOB compression, CLOB vs HCC compression etc at my later posts too.

that’s all for now. wish you all a health day.

Redo Log Switch Order

Hi,

I always have the impression that redo log switch occurs in group number order but a small test showed me, I was wrong.

As you know redo log is one of the most important part of a Oracle database. it almost protect everything! your data in memory, data file, your transaction etc.  whenever you made a change in database a redo entry (maybe two) is written to redo log files. So the slowest component of a Oracle database is redo log files because it always does an disk IO. as a result we mostly use fastest disks to locate redo log files that way our TPS (transaction per second) can be maximum.

Lately, I was tuning a client database and realized their redo log files and data files are on the same disk! this is already a availability issue; if something happens to disk, you will loose your data files and redo log files. there is one thing guaranteed in Oracle, if you loose an active redo log file, you will loose some data. we always have more than one redo log member (file) in a redo log group and put those files into separate disks. that way if something happens to one of disks, we can still use redo log file which is on the other disk. of course you can use an ASM disk group and reduce the risk. if you don’t have a ASM disk group then you must put your redo log files on different disks. Anyway, so I asked to hosting company to add 4 new disks. I had 4 redo log groups with 2 members each. I put first and third redo log group members into disk1 and disk2 and then second and fourth redo log group members into disk3 and disk4.

while Oracle is writing to redo log group 1, Disk1 and Disk2 will be written and when it is switched to redo log group 2, Disk3 and Disk4 will be written. reason I do this, these disk drives has some limits. while you are writing some data they are using their full speed but when you start to read another file at the same time, write speed is decreased. As you can guess, since our database in archivelog mode, when a redo log switch occurs, Oracle starts to archive previous redo log file but this means you need to read it! so I don’t want Oracle write and read at the same time for the same disk. So, when redo log group1 is full and a switch occurs, Disk1 will be read (to archive redo log file) and redo log group 2 will be start to be written which resides on disk3 and disk4. This way none of the disks will be read and written at the same time and we will get the maximum performance. At least I thought so!

while testing that structure, I realized that performance has decreased dramatically after a switch occurs. when I checked, after redo log group 1 is full, it did not switch to redo log group 2! it switched to redo log group 3 which is at the same disk groups (1 and 2).

As I said at the beginning, I always have the impression that redo log switch occurs in group number order but it does not. while moving redo log files to those disks I did not shutdown database and move redo log files. I added some new redo log groups and dropped old ones so at some point some of the redo log groups were UNUSED state (they never used) and when a switch is required, oracle switched to smallest UNUSED redo log groups.

as you can see, switch occurs from 1 to 3 and also Oracle remembers that order and keep switching in that order so if you clear your logfile or add new log group, keep in mind that next switch will be to newly added or cleared log group.

PS: by changing the disk structure as this, TPS (transaction per second) increased from 15.000 to 45.000! small changes can produce big results.

 

 

NVME Drives and Oracle Database

Hi There,

NVME drives are so hot these days. They have amazing read/write speed. my old samsung ssd drive can read and write around 550/530 MB per second and my even older 2.5inch hdd drive has 45/20 MB per second. Current NVME disc drives has around 3500 read and 3000 MB per second which is extremely high. Also, there are many people using raid 0 on nvme drives and some of them has reached 33GB per second read and 28GB write speed. so these are incredible speeds and you might want to use them in your database, especially the parts that requires too much IO like REDO Logs.

Well think again and be careful before an upgrade!

Not all Nvme drives has the same basic speed thresholds. I made some tests on a server which is a dedicated server at a hosting company and would like to share the results.

On this server there are 3 Nvme drives (TOSHIBA KXG60ZNV512G ) and 1 SSD drive (Micron 1100). SSD drive is the boot drive and Windows 2019 standard edition on it.

Now first one of the most popular test, Crystal Disk Mark Results:

SSD:

Nvme:

as you see NVME is much much faster on every category on Crystal disk mark. My basic thought was put every main component on a different NVME and check the “transaction per second” (TPS)  value and I installed 19.3 on the first NVME drive and create a database. data files reside on first Nvme, redo logs on second and archive logs on the third. That way all IO intensive components will be on different Nvme drive and I thought performance would be super high. Also I just have 1 redo log member for each redo group to eliminate extra disk IO.

To test database (a stress test actually) I used “swingbench” which is a great tool to test your database, written by Dominic Giles. Many thanks to him. I used “stress_test” model to check real transaction per second value, I used 24 simultaneous sessions and I only used INSERT statements. so screenshot of “ALL NVME” configuration test result is :

so TPS has a stable graphics with around 10,800 TPS. (testing result with all DML and select is around 18,000 TPS)

one of the biggest IO bottleneck is REDO and I think the second is archive log. when a redo log switch occurs, old redo log file is written as archive log. during that time both redo drives are read and archive log drive is written but as you see in the graph there is no down points and I assure you there were 2 – 3 redo log switches during the test above.

just for the curiosity, I put those components (redo, data file, archivelog) into different drives or put them same drives. One of them was locating redo logs into SSD drive not nvme! So, I expected much much less TPS and here is the result:

yes, that is insane! TPS was around 29,000! ((testing result with all DML and select is around 51,000 TPS) almost 3 times faster than Nvme! How is that possible. First if as you see in the graph it wasn’t stable. at some point it goes down around 7,000 TPS. That is the time redo log switch occurs. since I have 1 SSD drive I couldn’t test more but I am pretty sure if I have 2 SSDs and put a redo file each of them, that down wouldn’t happen. because SSD is not enough capacity to read and write at the same time. So SSD is 3 times faster than Nvme? not quite actually but I will come up to that later.

So what happened here? I check crystal disk mark results and they were awesome. Main problem is REDO has a very small block size which is 512 Byte. Crystal disk mark does not check that size by default so I used another disk benchmark utility. ATTO here are the results:

SSD:

Nvme:

if you check 512 Byte test, that SSD has 42 MB/s write speed and Nvme has just 28 MB/s. That explains a lot. Nvme disks are mostly designed to process big files in big chunks but not for small files in small chunks. of course this is not true for all Nvme drives. hosting companies uses “datacenter edition” nvme drives mostly. more reliable but has lower performance on some points.

For example, this is a Samsung 970 Pro Atto result:

it has around 70MB/s write speed at 512 Byte. if I have to chance to use this nvme, probably putting redo log file on this let me have much more TPS than that SSD.

So I am not saying, use SSD instead of Nvme. Just check first! as in my example it might decrease your performance.

Also I will try to test a sever with more ssd and see if I could prevent TPS down while archiving redo.

I wish you all healthy days.

Edit: by the way to eliminate the archiving load, on SSD drive, I created a one big redo file with 10GB and tested, TPS was stable and reached around 30,000 for just insert and 53,000 TPS for all operation)