Zero Data Loss DISASTER Database for Standard Edition 2

Hi,

As you know Data Guard is an Enterprise Edition option and only accepted method to protect your data against a “DISASTER” (DR) is data guard. How about Standard Edition 2(SE2)? Is this possible? Well I did it so, king of yes 🙂 with some risk of course.

Warining: I don’t take any responsibility for wrong settings. Wrong actions can damage your database. Please be careful.

if you google Oracle SE2 Data guard or Manual Log Shipping, you will find many blog post and documents about it. Even youtube has videos about this but almost all of them (at least all I’ve seen and read) are not “Zero Data Loss”. Steps are quite easy:

you have a production database based on SE2
you use another server as your DR server. (it could be on the same server if you use duplicate snapshot method)
copy all your backup
restore database and keep it in MOUNT mode
copy your archivelog ever X minutes to DR server
recover database using rman or sqlplus.

And that is it. This is a valid method. Only GAP is in this, you are transferring your archive logs but not redo logs. This means, you will certainly loose some data in case of a disaster because changes in redo logs will be lost (if you cannot reach out the server). So, what can we do? basic answer is by copying redo logs too but how? Redo logs are changing constantly. Even if you copy redo logs with in a 1 minute interval, you are still be able to lost 1 minute data.

To achieve this, you must force database to write your both redo logs and archive logs onto DR server. Not copying files in an interval, force database to write it there. Probably, you already understood what I did. to use this kind of method you should (must) have a strong, reliable, stable, fast network connection between your production and disaster servers. Because, we will send redo logs and archivelogs instantly to DR server.

I am using 19.15 databases on Oracle Linux 8 (virtual box). Same things can be done in other operating systems too. Logic will be same.

Here are the steps that I am about to do:

On the DR server, create a folder to share over network via NFS.
On the PROD server, Mount this shared folder on PROD server.
On the PROD server, Add an archive log destination to this shared folder.
On the PROD server, Add redo log members for each group to this shared folder.
On the DR server, create a copy of prod database via rman (not duplicate)
On the DR server, catalog archive logs, which resides on shared folder and recover database.
In a disaster situation, copy redo logs to their original location with original name and recover database.

that’s it. Let me demonstrate those steps.

I created 2 virtual machines. They have both Oracle DB Software and one of them (PROD) has a database and I want to use other virtual machine as DR.

Oracle home is under /u01/app/oracle/product/19c/dbhome_1
I (always) use OMF (Oracle Managed Files).
data files are under /u01/app/oracle/oradata
recovery area is under /u01/app/oracle/fast_recovery_area
SID of database is cdb
PROD server ip is 192.168.56.101
DR server ip is 192.168.56.104

1- On the DR server, create a folder to share over network via NFS

As the oracle user (Oracle db software owner user), create necessary directories.

mkdir /u01/PROD_LOGS
chmod 1777 /u01/PROD_LOGS

1 2	mkdir /u01/PROD_LOGS chmod 1777 /u01/PROD_LOGS

this is the folder that I will share over the network and mount it (map it) on the production server.

to share over network, as root user:

vim /etc/exports
#add the lines below to /etc/exports
/u01/PROD_LOGS 192.168.56.101(rw,sync)
    
exportfs -r
exportfs -v
# output of exportfs -v should be like this:
/u01/PROD_LOGS	192.168.56.101(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

#enable necessary services    
systemctl start nfs-server.service   
systemctl enable nfs-server.service  
systemctl start rpcbind            
systemctl enable rpcbind

vim /etc/exports

#add the lines below to /etc/exports

/u01/PROD_LOGS 192.168.56.101(rw,sync)

exportfs -r

exportfs -v

# output of exportfs -v should be like this:

/u01/PROD_LOGS 192.168.56.101(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)

#enable necessary services

systemctl start nfs-server.service

systemctl enable nfs-server.service

systemctl start rpcbind

systemctl enable rpcbind

2- On the PROD server, Mount this shared folder on PROD server

as oracle user, create a directory to mount shared directory on DR. /u01/app/oracle/fast_recovery_area/DR_LOGS will be used for mount point. directories in it is for OMF.

mkdir /u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog -p
mkdir /u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/archivelog -p

1 2	mkdir /u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog -p mkdir /u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/archivelog -p

as root user:

mount -t nfs 192.168.56.104:/u01/PROD_LOGS /u01/app/oracle/fast_recovery_area/DR_LOGS
    
vim /etc/fstab
#add the line below to fstab so after a restart it will be automatically mounted.
192.168.56.104:/u01/PROD_LOGS /u01/app/oracle/fast_recovery_area/DR_LOGS nfs defaults 0 0

mount -t nfs 192.168.56.104:/u01/PROD_LOGS /u01/app/oracle/fast_recovery_area/DR_LOGS

vim /etc/fstab

#add the line below to fstab so after a restart it will be automatically mounted.

192.168.56.104:/u01/PROD_LOGS /u01/app/oracle/fast_recovery_area/DR_LOGS nfs defaults 0 0

at this point if I put a file into /u01/app/oracle/fast_recovery_area/DR_LOGS directory on PROD server, it will be written to /u01/PROD_LOGS on DR server

3- On the PROD server, Add an archive log destination to the shared folder

By default for OMF, if there is no value for log_archive_dest_n parameters, then Oracle internally use log_archive_dest_1 parameter as LOCATION=USE_DB_RECOVERY_FILE_DEST. that way, archivelogs are written under /u01/app/oracle/fast_recovery_area directory with sub directory structure of course (/u01/app/oracle/fast_recovery_area(<SID>/archivelog/YYYY_MM_DD).

but! if you assign a value to any of those parameters, then it will just use that location to create archivelogs. so I will set 2 log_archive_dest_n parameter, one location for PROD server and one for shared folder.

alter system set log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST';
alter system set log_archive_dest_2='LOCATION=/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/archivelog';
      
alter system set log_archive_min_succeed_dest=2;

alter system set log_archive_dest_1='LOCATION=USE_DB_RECOVERY_FILE_DEST';

alter system set log_archive_dest_2='LOCATION=/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/archivelog';

alter system set log_archive_min_succeed_dest=2;

by setting these 2 parameters (log_archive_dest 1 and 2) we make sure that oracle will create 2 copies at the same time to 2 different location and one of them is actually out DR server. As you see there is one more parameter which is very important. log_archive_min_succeed_dest, by default is 1 and if you don’t set this as 2, in an access problem situation (network might have issues or DR server could be shutdown), Oracle will just write the first location and pass the inaccessible location. this will cause not writing necessary archive logs to DR server. if we want “zero data loss” DR db then oracle must write archive log to dr server. (also you can use MANDATORY clause).

4- On the PROD server, Add redo log members for each group to the shared folder

as the archive logs, we need to do the same things for redo log too.

alter system set db_create_online_log_dest_1='/u01/app/oracle/oradata/';
alter system set db_create_online_log_dest_2='/u01/app/oracle/fast_recovery_area';
alter system set db_create_online_log_dest_3='/u01/app/oracle/fast_recovery_area/DR_LOGS';

alter database add logfile member '/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog/redo1.log' to group 1;
alter database add logfile member '/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog/redo2.log' to group 2;
alter database add logfile member '/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog/redo3.log' to group 3;

create pfile from spfile;

alter system set db_create_online_log_dest_1='/u01/app/oracle/oradata/';

alter system set db_create_online_log_dest_2='/u01/app/oracle/fast_recovery_area';

alter system set db_create_online_log_dest_3='/u01/app/oracle/fast_recovery_area/DR_LOGS';

alter database add logfile member '/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog/redo1.log' to group 1;

alter database add logfile member '/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog/redo2.log' to group 2;

alter database add logfile member '/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog/redo3.log' to group 3;

create pfile from spfile;

as log_archive_dest_n parameter, if we set any of db_create_online_log_dest_n parameter, we need to set all three of them. 2 locations are on PROD server and 1 location is on DR server. this is not a mandatory step actually but it is a good practice. Because, we can just add redo log members (last 3 command above) to the shared location and this is enough but if one day you need to create another log group and forget to add a log member to shared folder then, you could be exposed to data loss! so set it any way.

then, add 1 member for each log group to shared folder. I use standard naming here because I need to know the groups of files by their name (we will use this later).

lastly, I create pfile to move to DR server, so we will create our DR database.

5- On the DR server, create a copy of prod database via rman

on PROD server, database “cdb” is up and running. Let’s start:

I don’t want to go all the details about standard copy backups and restore a database so, I won’t put outputs here.

On PROD server, take a backup of database via RMAN and then copy everything to DR server via scp.

rman target /
backup database;
alter system switch logfile;
exit;

rman target /

backup database;

alter system switch logfile;

exit;

from shell:

scp -r /u01/app/oracle/fast_recovery_area/CDB/ 192.168.56.104:/u01/app/oracle/fast_recovery_area/
scp -r /u01/app/oracle/product/19c/dbhome_1/dbs/initcdb.ora 192.168.56.104:/u01/app/oracle/product/19c/dbhome_1/dbs/

1 2	scp -r /u01/app/oracle/fast_recovery_area/CDB/ 192.168.56.104:/u01/app/oracle/fast_recovery_area/ scp -r /u01/app/oracle/product/19c/dbhome_1/dbs/initcdb.ora 192.168.56.104:/u01/app/oracle/product/19c/dbhome_1/dbs/

copy all backups, archivelogs etc and also pfile.

then go to DR server and create necessary directories & restore your database:

mkdir -p /u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog
mkdir -p /u01/app/oracle/oradata/CDB/onlinelog/

mkdir /u01/app/oracle/admin/cdb/adump -p

mkdir -p /u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog

mkdir -p /u01/app/oracle/oradata/CDB/onlinelog/

mkdir /u01/app/oracle/admin/cdb/adump -p

these directories will be needed for database. restore database:

rman target /
startup nomount
restore controlfile from autobackup;
alter database mount;
restore database;

rman target /

startup nomount

restore controlfile from autobackup;

alter database mount;

restore database;

at this point we have a copy prod database on DR in MOUNT mode.

somet very important thing to do

rm -r /u01/app/oracle/fast_recovery_area/CDB/onlinelog/*

1	rm -r /u01/app/oracle/fast_recovery_area/CDB/onlinelog/*

DO THIS ON DR SERVER ONLY! we are deleting any copied redo logs because during recovery mode, there must not be any redo log!

6- On the DR server, catalog archive logs, which resides on shared folder and recover database

from now on, you can create a recover.sql script and call it ever n minute from cron to recover database regularly. content of recover.sql is:

catalog start with '/u01/PROD_LOGS/CDB/archivelog' noprompt;
recover database;

1 2	catalog start with '/u01/PROD_LOGS/CDB/archivelog' noprompt; recover database;

we must catalog newly arrived archivelogs first and then recover database. I want to remind that /u01/PROD_LOGS directory is the one that we shared over network and map it on to prod server.

every recovery command will end up with an ERROR:

RMAN> @recover.sql

RMAN> catalog start with '/u01/PROD_LOGS/CDB/archivelog' noprompt;
using target database control file instead of recovery catalog
searching for all files that match the pattern /u01/PROD_LOGS/CDB/archivelog

List of Files Unknown to the Database
=====================================
File Name: /u01/PROD_LOGS/CDB/archivelog/1_17_1102635802.dbf
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/PROD_LOGS/CDB/archivelog/1_17_1102635802.dbf

RMAN> recover database;
Starting recover at 22-APR-22
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=257 device type=DISK

starting media recovery

unable to find archived log
archived log thread=1 sequence=18
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 04/22/2022 00:50:33
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 18 and starting SCN of 2803040

RMAN> @recover.sql

RMAN> catalog start with '/u01/PROD_LOGS/CDB/archivelog' noprompt;

using target database control file instead of recovery catalog

searching for all files that match the pattern /u01/PROD_LOGS/CDB/archivelog

List of Files Unknown to the Database

=====================================

File Name: /u01/PROD_LOGS/CDB/archivelog/1_17_1102635802.dbf

cataloging files...

cataloging done

List of Cataloged Files

=======================

File Name: /u01/PROD_LOGS/CDB/archivelog/1_17_1102635802.dbf

RMAN> recover database;

Starting recover at 22-APR-22

allocated channel: ORA_DISK_1

channel ORA_DISK_1: SID=257 device type=DISK

starting media recovery

unable to find archived log

archived log thread=1 sequence=18

RMAN-00571: ===========================================================

RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============

RMAN-00571: ===========================================================

RMAN-03002: failure of recover command at 04/22/2022 00:50:33

RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 18 and starting SCN of 2803040

this is an expected behavior. since we don’t apply redo logs, Oracle is looking for the next archive log file in the sequence but couldn’t find it. our recovery is now working. after archive log with sequence 18 is arrived, recover command will raise the same error above for sequence 19.

7- In a disaster situation, copy redo logs to their original location with original name and recover database

from now on, our archive and redo log files are stored on DR server too. if there is a disaster situation, you must open the database with applying all logs including redo logs. this is the only moment that we will apply redo logs.

these are the redo log locations on PROD:

/u01/app/oracle/oradata/CDB/onlinelog
/u01/app/oracle/fast_recovery_area/CDB/onlinelog
/u01/app/oracle/fast_recovery_area/DR_LOGS/CDB/onlinelog (this one is the network directory on DR server)

actual directory that redo files are written is /u01/PROD_LOGS on DR server. Those directories must be existed on DR server too (we already created them in scripts above). We must copy redo logs under /u01/PROD_LOGS to actual locations now. I deliberately named redo logs as redoN.log so that I can identify which redo file is belong to which redo group. a very simple sql script and shell script will copy redo logs to original locations:

------------------------------------
move_logs.sql
------------------------------------

set page 0
set heading off
set timing off
set trimspool on
set feed off

spool move_logs.output
select 'cp /u01/PROD_LOGS/CDB/onlinelog/redo'||group#||'.log ' || member from v$logfile;
spool off
exit;

/*************************************************************************/
/*************************************************************************/

put_redo_logs.sh
---------------------------------
#!/bin/sh

CopyCommands=$(sqlplus -s / as sysdba @move_logs.sql)

while read -r row; do
	eval "$row";
	echo "$row";
done < move_logs.output

------------------------------------

move_logs.sql

------------------------------------

set page 0

set heading off

set timing off

set trimspool on

set feed off

spool move_logs.output

select 'cp /u01/PROD_LOGS/CDB/onlinelog/redo'||group#||'.log ' || member from v$logfile;

spool off

exit;

/*************************************************************************/

put_redo_logs.sh

---------------------------------

#!/bin/sh

CopyCommands=$(sqlplus -s / as sysdba @move_logs.sql)

while read -r row; do

eval "$row";

echo "$row";

done < move_logs.output

calling put_redo_logs.sh file will run an sql script which will produce copy shell commands.

After that, you can run recover.sql one more time and this time RMAN will not raise any error and you will see “Recovery Completed” message. now you can open your database and use it.

Now, why did I delete copied redo log files via scp command on DR server? if you recover a redo log in mount mode, control file will raise some flags and all recover commands will start from the sequence of applied redo log, this will cause problems after some time because archivelogs will be deleted eventually because of retention policy. This also means, if you apply a redo log during recovery, to have a healthy copy of database, you must restore both control file & database all over again.

Pros

obviously, you have a DR db with full consistent data.

Cons

This approach contains high risk because if database cannot write archived or redo log to shared folder then production database will hang! You might have some issues about network or DR server might be restarted or shutdown. You might loose disk drives on DR server which shared folder resides on etc etc…

I will write another post about those situations and make some tests. if loosing some data for a small time then, still refreshable pluggable database would be a better option. I will also write a post about it.

if you don’t have a reliable network, this option could be catastrophic! So, try it carefully. I hope this helps in some way.

Wish you all healthy, happy days.