Looking for something? Try here..

Wednesday, January 16, 2013

APPLIED column on v$archived_log when Standby database is setup


We use a simple concept to check for any standby gap or logs apply problems in all the high availability (Dataguard) servers in our environment. Use the below query in a script and make it run from cron periodically so we don’t have to go and check manually all the time.
spool /home/oracle/dba_scripts/logs/standby_db_sync_err.txt

select PRIM-SEC from
(select MAX(SEQUENCE#) PRIM FROM V$ARCHIVED_LOG WHERE DEST_ID=1 AND ARCHIVED='YES'),
(select MAX(SEQUENCE#) SEC FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES');

spool off
Today I received log applied gaps error alert mail stating we have a log applied gap of 24 logs. My script runs every hour and sends the alert mail only when the log applied gap crosses 10 logs since I have delay parameter set in the database.
Note: I used the term “log applied gaps” as I’m meaning that the term is used to differentiate the logs applied on primary with logs applied on standby database.

I first thought that there might be some activity going on in the database and the gap will catch up soon as this happens most of the time, but now the gaps increased to 35, the next hour. So something is wrong , and I should check it.

I used the following query on the standby database which gave the below.
select sequence#, process, status from v$managed_standby;

 SEQUENCE# PROCESS   STATUS
---------- --------- ------------
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
         0 ARCH      CONNECTED
     65999 MRP0      WAIT_FOR_LOG
     66007 RFS       IDLE
     66008 RFS       IDLE
     66009 RFS       IDLE
This is obvious, RFS gets the logs as and when generated at primary and MRP waits until the “delay” time reaches before it applies. The gap is not much but only 10 logs. Then I thought why the script gives wrong information?
I ran the script manually to check and still receive the same alert mail of log applied gap 35. So I ran the query separately in primary database to check/troubleshoot what is going on. I got the below response from the primary database.
select MAX(SEQUENCE#) PRIM FROM V$ARCHIVED_LOG WHERE DEST_ID=1 AND ARCHIVED='YES';

      PRIM
----------
     66010

select MAX(SEQUENCE#) SEC FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES';

       SEC
----------
     65975

Ok, I just wait for some time and check the same again but to see the same result. (Many a times, in standby environment I get the results delayed. J)

Now, it’s time to dig Oracle support documents to find what is really going on. I thought it might be some sort of bug as I haven’t faced such an issue before. As my prediction it turns to be a bug.
The MOS article “APPLIED - Column not updated if Heartbeat-ARCH hangs [ID 1369630.1]” states the issue we are dealing with is due to the bug “Bug 6113783 - Arch processes can hang indefinitely on network [ID 6113783.8]”.

The bug is fixed in 11.2.0.1 base release and I’m currently running my database in 10.2.0.4.
There are 2 options provided.
  1. Upgrade DB to 11.2.0.1 or higher
  2. Use the work around of killing the hung ARCH process in the primary database.
Option 2 looked feasible at this time as upgrading of the DB requires a lot of approvals, application compatibility, etc., etc.,.

So how to find the hung ARCH process?

MOS says Look for Messages like
"ARCn: Becoming the heartbeat ARCH"
in the ALERT.LOG of the Primary Database to determine the current Heartbeat ARCn-Process.

So I searched my alert log file to find the below.

Fri Aug 17 05:04:08 2012
ARC2: Becoming the heartbeat ARCH
ARC3 started with pid=103, OS id=28106
Fri Aug 17 05:04:08 2012
Check for the OS process id for ARC2
[oracle@****** bdump]$ ps -ef|grep ora_arc
oracle   32224     1  0  2012 ?        02:43:53 ora_arc0_edisipd
oracle   32226     1  0  2012 ?        02:21:49 ora_arc1_edisipd
oracle   32232     1  0  2012 ?        02:41:07 ora_arc2_edisipd
oracle   32234     1  0  2012 ?        02:26:41 ora_arc3_edisipd
oracle   23429 16642  0 04:51 pts/1    00:00:00 grep ora_arc
Kill the Hung ARCH process
[oracle@****** bdump]$ kill -9 32232
Now check the alert log file again to find the below.
ARCH: Detected ARCH process failure
ARCH: STARTING ARCH PROCESSES
ARC2: Archival started
ARCH: STARTING ARCH PROCESSES COMPLETE
ARC2 started with pid=67, OS id=23425
ARC2: Becoming the heartbeat ARCH
Wed Jan 16 04:53:47 2013
Ok, now the new process has been generated. Hence check with the primary database whether the APPLIED column is updated.
select MAX(SEQUENCE#) SEC FROM V$ARCHIVED_LOG WHERE DEST_ID=2 AND APPLIED='YES';

       SEC
----------
     66003
Now it has been updated successfully. J