Oracle Tips and Techs: April 2023

Sunday, April 16, 2023

Create Oracle database Flashback logs outside FRA!

Many times, as a DBA we would be required to create database restore points so that if something messed up in the database be it due to a database upgrade or patching, application upgrade touching the database objects, etc., it would be easy to go back to the time when the operation was started without the need of restoring the database.

To facilitate this easy feature, one would need to enable or create the restore point be it a normal restore point or a guaranteed restore point. I will always prefer the guaranteed restore point if I want to secure the database of unexpected mess as a roll back plan. The flashback logs of a normal restore point can be resued if there is a space pressure in the FRA or it is no longer needed to satisfy the flashback retention target and hence the recovery window might be limited.

One downside of creating restore point is that Oracle logs another type of log file apart from the standard archive logs called the flashback logs. Until version 21c, if we enable restore point or to use flashback feature, Oracle will always use the Fast Recovery Area (FRA) space to create the flashback logs. FRA is a shared space for all the database restore and recovery related files such as archive log, control file, backup piece and also the flashback log.

Below is an example of the FRA contents from a 19c database

You can see with the default settings, both archived logs and flashback log uses the same FRA and when there is a growth in flashback log, there is a possibility of database stalling due to insufficient space when either of the logs fills the FRA.

In Oracle 23c, there are 2 new parameters introduced as below

db_flashback_log_dest
db_flashback_log_dest_size

These 2 parameters define where the flashback logs are stored outside of the FRA making the management of flashback logs easier.

SQL> alter system set db_flashback_log_dest_size=4G scope=both;

System altered.

SQL> alter system set db_flashback_log_dest='/media/sf_Oracle/Linux-823' scope=both;

System altered.

SQL> sho parameter flashback

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_flashback_log_dest                string      /media/sf_Oracle/Linux-823
db_flashback_log_dest_size           big integer 4G
db_flashback_retention_target        integer     1440
SQL>

Though we have enabled the new parameters to have a separate location for flashback logs, we still have to define FRA location and size before we start utilizing the new location for flashback logs. Otherwise, if you try to create restore point we will get the error as below.

SQL> show parameter recovery

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string
db_recovery_file_dest_size           big integer 0
recovery_parallelism                 integer     0
remote_recovery_file_dest            string
SQL>
SQL> CREATE RESTORE POINT freepdb1_rp FOR PLUGGABLE DATABASE FREEPDB1 GUARANTEE FLASHBACK DATABASE;
CREATE RESTORE POINT freepdb1_rp FOR PLUGGABLE DATABASE FREEPDB1 GUARANTEE FLASHBACK DATABASE
*
ERROR at line 1:
ORA-38784: Cannot create restore point 'FREEPDB1_RP'.
ORA-38786: Recovery area is not enabled.


SQL>

Now, let's enable FRA by allocating space and location and then check the contents before and after restore point creation.

SQL> alter system set db_recovery_file_dest_size=4G;

System altered.

SQL> alter system set db_recovery_file_dest='/opt/oracle/FRA';

System altered.

SQL> show parameter recovery

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest                string      /opt/oracle/FRA
db_recovery_file_dest_size           big integer 4G
recovery_parallelism                 integer     0
remote_recovery_file_dest            string
SQL> select * from v$flash_recovery_area_usage;

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                             0                         0               0          0
BACKUP PIECE                             0                         0               0          0
IMAGE COPY                               0                         0               0          0
FLASHBACK LOG                            0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

8 rows selected.

SQL> CREATE RESTORE POINT freepdb1_rp FOR PLUGGABLE DATABASE FREEPDB1 GUARANTEE FLASHBACK DATABASE;

Restore point created.

SQL> select * from v$flash_recovery_area_usage;

FILE_TYPE               PERCENT_SPACE_USED PERCENT_SPACE_RECLAIMABLE NUMBER_OF_FILES     CON_ID
----------------------- ------------------ ------------------------- --------------- ----------
CONTROL FILE                             0                         0               0          0
REDO LOG                                 0                         0               0          0
ARCHIVED LOG                             0                         0               0          0
BACKUP PIECE                             0                         0               0          0
IMAGE COPY                               0                         0               0          0
FOREIGN ARCHIVED LOG                     0                         0               0          0
AUXILIARY DATAFILE COPY                  0                         0               0          0

7 rows selected.

SQL>

We can see that once the restore point is created, FLASHBACK LOG entry from v$flash_recovery_area_usage has disappeared. So how do we check the details of the flashback logs? We do have a new view GV$/V$FLASHBACK_LOG_DEST to check the details.

SQL> select * from V$FLASHBACK_LOG_DEST;

NAME                                               SPACE_LIMIT SPACE_USED NUMBER_OF_FILES     CON_ID
-------------------------------------------------- ----------- ---------- --------------- ----------
/media/sf_Oracle/Linux-823                          4294967296  419430400               2          0

SQL>

Okay, so what is the use case of having a location outside of FRA?

We don't always create restore points (in turn flashback logs) and they are created as necessary. Most important times are before patching of application or database. FRA mostly is dedicated to a specific database and let's say we have multiple databases with varying work loads scheduled for patching or application upgrade involving multiple database. Managing FRA on each database could be cumbersome as few DB would easily fill up space but most database won't generate much logs. We can NFS mount a file system on multiple involved database servers and dedicate that location for flashback logs so that the space management can be done outside of the database easily as and when needed.

From performance point of view, if we have a separate location (disk) for the flahsback logs, i/o performance can be improved as i/o will be distributed between archive logs and flashback logs as both has to be written for the database to function.

References:

OLTP and Core Database (oracle.com)

Happy Flashback logging...!!!

Selvakumar Nagulan

A person with hard working ability, will power and passion to finish what he starts. An Oracle database administrator trying to explore every nook and corner possible in Oracle technology. A blogger, a story and poem writer at leisure. A guy who maintain at most patience even when the whole world rejects him.

Friday, April 7, 2023

Oracle Transparent Application Continuity (TAC) - Part 2

I have started a transaction in my bank account and money got deducted but the screen doesn't get me the confirmation of the transaction and ended with a time out error. There could be multiple reason and one of the reason being one of the node on which the session was working crashed.

How many of you encountered this situation?

Now, if the bank/application once recovered and connected to the database (on a surviving node) and there is no details of the previous transaction available to the new session just got connected. What happens to the data? Would it get committed?

If the application tries to run the same previous transaction, money might be deducted twice. If the application doesn't retry, your transaction might not be carried out and you might miss some important last date of bill payment

So, how does Oracle handles this?

In my last post, I explained about how Transparent Application Failover works and how it would be a great advantage for applications that mostly reads the database.

As a continuation of that post, we will see about the Transparent Application Continuity on what is it and how does it benefit applications utilizing this feature of Oracle.

Okay! We should know about Transaction Guard and Application Continuity before we dive into Transparent Application Continuity. Let's see them in brief to understand what they are..

Transaction Guard:

Transaction Guard provides a generic tool for applications to use for at-most-once execution in case of planned and unplanned outages. Applications use the logical transaction ID to determine the outcome of the last transaction open in a database session following an outage.

If you take the bank example, if the commit already happened and if application tries to replay the execution without knowing the status of the last transaction, this might lead to logical corruption of the data by committing duplicate transaction.

Benefits of Transaction Guard includes preserving the commit outcome, ensuring known outcome of every transaction and it can be used to provide at-most-once transaction execution

Let's see the illustration below

DB node crashed when:

Session 1 committed the transaction but acknowledgement is not yet returned to application. Application now don't know status of the previous transaction

Session 2 is idle, hence no impact to the session even this session goes off

Session 3 issues the commit but before the commit is received in the database

So without transaction guard, session 1 and 3 won't be able to proceed further since they don't know the status of the previous transaction. Client is left unknown whether transaction is committed. Logical corruption is possible if application try to rerun the transaction and missing data is possible if the transaction is not retried. This is why it is difficult to failover DML when a failure occurs on a specific node as the application fails to recognize that the last submission has committed, or shall commit sometime soon, or has not run to completion.

How do we resolve this mystery? What if we store the commit outcome in the database itself before letting the application/client know? This is what was done by introducing LTXID in Oracle database version 12.1.0.1.

Durability of commit is saved as a Logical Transaction Identifier (LTXID) at commit for all supported transaction types and is unique for each transaction. Applications can use this LTXID to determine the commit status. LTXID is persisted on commit and reused following a rollback. The transaction history table maintains the mapping of LTXIDs to database transaction. LTXID_TRANS table is created to store the LTXIDs and is not accessible by non-DBA user.

Hope it is clear what exactly is transaction guard provides us.

Transaction Guard is enabled using proper database service configuration and the requirements are as follows

1. Use a non-default service
2. Set the property COMMIT_OUTCOME=TRUE in the service to determine COMMIT is accessible and to get the commit outcome of the last transaction. RETENTION_TIMEOUT can be increased if necessary (default is 24 hrs)
3. Ensure that FAN is configured with ONS to talk with 12c database clients

In depth understanding of Transaction Guard is provided in this whitepaper

If I had to modify the orabliss_taf service that we created in our previous post to include transaction guard, we need to modify the service as below

srvctl modify service -db cdbrac -service orabliss_taf -commit_outcome TRUE

[oracle@ol819-rac2 bin]$ srvctl modify service -db cdbrac -service orabliss_taf -commit_outcome TRUE
[oracle@ol819-rac2 bin]$ srvctl config service -db cdbrac -service orabliss_taf |egrep "Commit|Retention"
Commit Outcome: true
Retention: 86400 seconds
[oracle@ol819-rac2 bin]$

Connection Pools and Application Continuity:

Application Continuity was first introduced for JDBC thin applications in Oracle 12.1 which uses an Oracle Connection pool such as Universal Connection Pool (UCP) for JDBC clients and the support for OCI was enhanced in Oracle 12.2. Connection pools can also be configured for third party applications as well.

A connection pool is a cache of database connection objects. The objects represent physical database connections that can be used by an application to connect to a database. At run time, the application requests a connection from the pool which reuses the connection objects and reduces new connection object creation. The pool also leverages many high availability and performance features available through an Oracle RAC database. These features include Fast Connection Failover (FCF), Run-time connection Load Balancing (RLB), and Connection Affinity.

An LTXID is associated with an application request at check-out from the connection pool, and is disassociated from the application request at check-in back to the pool. Between check-out and check-in, the LTXID on the session is exclusively held by that application request meaning the LTXID is associated with the next connection session. This allows to detect duplication and failover and also basic replay by third party containers replaying the last request after a non recoverable outage.

Application Continuity makes use of Transaction Guard (LTXID) and replays database requests exactly as they were originally submitted, to ensure a result consistent with the request. To do this, it first restores and validates the session on surviving instance and replays as necessary.

Mutables such as NEXTVAL of sequences and calls to sysdate are taken care by using the same values when they are replayed. This is accomplished by calls to Transaction Guard.

So KEEP privileges for mutables are required to be provided to users.

To configure Application Continuity the following parameters are needed in the service configuration.

So for Application Continuity to be enabled in the Oracle RAC, I have to modify my service as below.

srvctl modify service -db cdbrac -service orabliss_taf -failover_restore LEVEL1 -failoverdelay 30 -failoverretry 5 -failovertype TRANSACTION -replay_init_time 1800 -notification TRUE -drain_timeout 300 -stopoption IMMEDIATE

As a Database Administrator, Application Continuity has a major blocker for implementation since there needs to be code changes in the application to make use of both connection pools and get LTXID return codes and program as necessary. In this post, we are not going to test how Application Continuity works as that requires client coding.

With Oracle 18c, Oracle introduced Transparent Application Continuity which eliminates the use of connection pools (still is recommended) and in Oracle 19c, it is further enhanced and extends support to applications that do not use connection pools.

Transparent Application Continuity (TAC):

Transparent Application Continuity is a mode of Application Continuity that transparently tracks and records session and transactional state so that a DB session can be recovered following an outage. There are no code changes involved in the application and DBA doesn't even needs to know of application code to implement TAC. TAC uses LTXID and a state tracking infrastructure

To use TAC, the following configuration/steps can be followed

Use a supported client. SQLPLUS 19c or later is supported and we will be using this in our demo. Few other clients are Oracle JDBC Replay Driver 18c or later, UCP 18c or later, OCI Session Pool 19c or later, etc.
If using connection pool, connections should be returned to the connection pool
Set FAILOVER_RESTORE to AUTO
Enable mutables use in application by providing KEEP grants to Sequences and date time to users. Oracle 19c automatically KEEPs the mutables for sql.
SIDE EFFECTS (like an extenal action such as sending emails, transfer files, etc) are disabled.

With this change, if I need to enable TAC in my database I have to modify my service with below parameters (compare this with Application Continuity)

You may notice failover_restore and failovertype both defined as AUTO to enable TAC. My service modification would be as below.

srvctl modify service -db cdbrac -service orabliss_taf -failover_restore AUTO -failoverdelay 30 -failoverretry 5 -failovertype AUTO -replay_init_time 1800 -notification TRUE -drain_timeout 300 -stopoption IMMEDIATE

Demo of Transparent Application Continuity (TAC):

Let's check how TAC gives a seemless application failover with DML statements by doing the following.

Connect to database using SQLPLUS with oracle_taf without Transaction Guard enabled (without -commit_outcome=TRUE)
Perform a DML and check ltxid_trans table from a different SYS session
Shut down instance running DML and observe error
Modify oracle_taf to enable Transaction Guard and connect to database using the service
Perform DML and check ltxid_trans table from a different session
Modify oracle_taf to enable Transparent Application Continuity
Connect to database using the TAC enabled service
Perform DML and shut down instance running DML and observe the results

[oracle@ol819-rac1 ~]$ sqlplus dummy/dummy@orabliss_taf

<<< Removed Banner >>> 

dummy@node1> select count(*) from test_table;

  COUNT(*)
----------
   8199536

dummy@node1> delete from test_table where object_id=57732;

112 rows deleted.

dummy@node1> commit;

Commit complete.

dummy@node1>

Check ltxid_trans table

[oracle@ol819-rac1 ~]$ sql / as sysdba

<<< Removed Banner >>>

SQL> @/home/oracle/ccon
Enter value for container: orabliss
old:alter session set container=&container
new:alter session set container=orabliss

Session altered.

SQL>
SQL> select * from ltxid_trans;

no rows selected

SQL>

I'll now trigger a delete on the table and while the delete is running midway, I'll shut the OS (node 2) abruptly to create a crash scenario.

sys@node1> @/home/oracle/sessions.sql

   INST_ID    SID    SERIAL    SQL_ID    Username    Seconds in Wait    Command                   Machine    OS User      Status                                        Module
__________ ______ _________ _________ ___________ __________________ __________ _________________________ __________ ___________ _____________________________________________
         2    424      1281           DUMMY                                     ol819-rac1.selvapc.com    oracle     inactive    SQL*Plus

sys@node1>

dummy@node1> select sys_context('userenv','sid') from dual;

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
424

dummy@node1> delete from test_table where rownum<500000; 

499999 rows deleted.

dummy@node1>

<<< Shut down node 2 abruptly to create crash scenario >>>

dummy@node1> commit;
commit
     *
ERROR at line 1:
ORA-25405: transaction status unknown


dummy@node1> select program,sid,failover_type,failover_method,failed_over from v$session where sid=sys_context('userenv', 'sid');

PROGRAM                                                 SID FAILOVER_TYPE FAILOVER_M FAI
------------------------------------------------ ---------- ------------- ---------- ---
sqlplus@ol819-rac1.selvapc.com (TNS V1-V3)              460 SELECT        BASIC      YES

dummy@node1> select count(*) from test_table;

  COUNT(*)
----------
   8199424

dummy@node1>

We can see the transaction didn't commit even though the session got failed over to the other instance.

What if there is a select involved with the DML statement? Let's check that out..

dummy@node1> select sys_context('userenv','sid') from dual;

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
430

dummy@node1> insert into test_table select * from test_table;

Session 430 from node 1 got connected to node 2 and started an insert statement. While the statement is under process, I shutdown the node 2 abruptly.

sys@node1> @/home/oracle/sessions.sql

   INST_ID    SID    SERIAL    SQL_ID    Username    Seconds in Wait    Command                   Machine    OS User      Status      Module
__________ ______ _________ _________ ___________ __________________ __________ _________________________ __________ ___________ ___________
         2    430     10274           DUMMY                                     ol819-rac1.selvapc.com    oracle     inactive    SQL*Plus

<<< Shut down node 2 here... >>>

sys@node1> @/home/oracle/sessions.sql

no rows selected

After sometime, when I cancelled the session I got the below error

dummy@node1> insert into test_table select * from test_table;

^Cinsert into test_table select * from test_table
*
ERROR at line 1:
ORA-25408: can not safely replay call


dummy@node1>

This is because the fact that Transparent Application Failover (TAF) cannot handle transactions that include data manipulation language (DML). In this case, we need to log out of application, reconnect and rerun the transaction. We can see the session got failedover to instance 1

dummy@node1> select program,sid,failover_type,failover_method,failed_over from v$session where sid=sys_context('userenv', 'sid');

PROGRAM                                                 SID FAILOVER_TYPE FAILOVER_M FAI
------------------------------------------------ ---------- ------------- ---------- ---
sqlplus@ol819-rac1.selvapc.com (TNS V1-V3)              175 SELECT        BASIC      YES

dummy@node1>

We will now modify the service to enable Transaction guard and check the ltxid_trans table to check commit data details. Note: We are not going to write code to get the commit_outcome and test but will just see the contents of the ltxid_trans table. We will proceed with TAC once this check is done.

[oracle@ol819-rac1 ~]$ srvctl modify service -db cdbrac -service orabliss_taf -commit_outcome TRUE
[oracle@ol819-rac1 ~]$ srvctl config service -db cdbrac -service orabliss_taf |egrep "Commit|Retention"
Commit Outcome: true
Retention: 86400 seconds
[oracle@ol819-rac1 ~]$ sqlplus dummy/dummy@orabliss_taf

dummy@node1> select sys_context('userenv','sid') from dual;

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
423

dummy@node1> delete from test_table where rownum<500000;

499999 rows deleted.

dummy@node1> commit;

Commit complete.

dummy@node1>

SYS@node2> select * from ltxid_trans;

   MAJ_VERSION    MIN_VERSION    INST_ID         DB_ID                        SESSION_GUID    TXN_UID    COMMIT_NO                                START_DATE    SERVICE_ID    STATE    FLAGS    REQ_FLAGS    ERROR_CODE
______________ ______________ __________ _____________ ___________________________________ __________ ____________ _________________________________________ _____________ ________ ________ ____________ _____________
             1              2          2    2960387100 F8B741BDAFB62A60E053CA38A8C0014E           111            0 07-APR-23 08.33.32.185503000 AM +05:30                1        1        0            0             0

SYS@node2>

Now, as we have enabled transaction guard once the transaction is committed, we can see the ltxid_trans table is updated with the session and transaction information. This information will be used for replaying the transaction during failover

As we have enabled only Transaction Guard as of now and if the node crashed before committing the data, we would be getting the below error message if we try to commit the data after the session failover to the surviving instance.

dummy@node1> insert into test_table select * from test_table;

7199426 rows created.

<<< Shutdown OS abruptly >>>

dummy@node1> commit;
commit
     *
ERROR at line 1:
ORA-25402: transaction must roll back

dummy@node1>

Now, as the final phase of our testing we will enhance the orabliss_taf to enable Transparent Application Continuity (TAC) and observe the results

[oracle@ol819-rac1 ~]$ srvctl modify service -db cdbrac -service orabliss_taf -failover_restore AUTO -failovertype AUTO -failoverdelay 30 -failoverretry 5 -replay_init_time 1800 -notification TRUE -drain_timeout 300 -stopoption IMMEDIATE
[oracle@ol819-rac1 ~]$
[oracle@ol819-rac1 ~]$ sqlplus dummy/dummy@orabliss_taf

...
...

dummy@node1> select sys_context('userenv','sid') from dual;

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
232

dummy@node1> insert into test_table select * from test_table;

<<< Shut down node 2 while insert is running >>>

14898851 rows created.

dummy@node1> dummy@node1> commit;

Commit complete.

dummy@node1> select sys_context('userenv','sid') from dual;

SYS_CONTEXT('USERENV','SID')
--------------------------------------------------------------------------------
42

dummy@node1> select program,sid,failover_type,failover_method,failed_over from v$session where sid=sys_context('userenv', 'sid');

PROGRAM                                                 SID FAILOVER_TYPE FAILOVER_M FAI
------------------------------------------------ ---------- ------------- ---------- ---
sqlplus@ol819-rac1.selvapc.com (TNS V1-V3)               42 AUTO          BASIC      YES

dummy@node1>

Though I have introduced the crash in between the DML statement, the session failed over to instance 1 and completed the insert statement without any issues.

Session state from node 1 was observed as below.

SYS@node1> @/home/oracle/sessions.sql

   INST_ID    SID    SERIAL           SQL_ID    Username    Seconds in Wait    Command                   Machine    OS User      Status      Module
__________ ______ _________ ________________ ___________ __________________ __________ _________________________ __________ ___________ ___________
         1     24      7850                  DUMMY                                     ol819-rac1.selvapc.com    oracle     inactive    SQL*Plus
         2    232     62564 19uk9dbxy0zbr    DUMMY                        3 INSERT     ol819-rac1.selvapc.com    oracle     active      SQL*Plus

SYS@node1> @/home/oracle/sessions.sql

   INST_ID    SID    SERIAL           SQL_ID    Username    Seconds in Wait    Command                   Machine    OS User      Status      Module
__________ ______ _________ ________________ ___________ __________________ __________ _________________________ __________ ___________ ___________
         1     24      7850                  DUMMY                                     ol819-rac1.selvapc.com    oracle     inactive    SQL*Plus
         1     42     53960 19uk9dbxy0zbr    DUMMY                        2 INSERT     ol819-rac1.selvapc.com    oracle     active      SQL*Plus

SYS@node1>

We can see the session 232 was running the INSERT statement against INST_ID 2 and then failed over to INST_ID 1 with SID 42.

Conclusion:

So the answer to my initial question "Whether DMLs will be failed over to surviving instance if one of the instance in a Oracle RAC is crashed?" is now clear.

Oracle introduced Transaction Guard which is effectively enhanced to be utilized by Transparent Application Continuity (TAC) that can be implemented in the database without any code changes to application to have a downtime free database experience.. !

References:

applicationcontinuityformaa.pdf (oracle.com)

Server Control Utility Reference (oracle.com)

White Paper (oracle.com)

Oracle Notification Client and FAN

Database Outage? Who cares??? – Database Heartbeat (database-heartbeat.com)

Happy TACing...!!!