Looking for something? Try here..

Sunday, October 2, 2016

Oracle GoldenGate Replicat erroring with OCI Error ORA-24369

This post is to talk about an interesting issue that I troubleshooted after the project is moved on to pre-production and in soak period.

All is done with the UAT testing and all the defects were fixed and got the UAT pass signed off. Deployed the code in the production environment. Initial loading of data went on for 2 days without any issues. Now its time for real time sync. 

I started the real time replicat and monitoring the system thinking nothing should go wrong as this was a 3 months non stop, sort of no good sleep hard work project involving huge amount of learning as self and as a team. Periodically checking the status of the process and bang!!! The process got ABENDED.. 

Now started the nightmare. I started digging the issue. First, report file to see what was the issue? Ok, I have issue with some data in a particular table. Now discard file to see which record got discarded? Ok, got it. Check whether it is a valid data to the database as I'm ignoring few orphan records (records without parent records) originating from source system. Well, this is a valid data allowed to the database.

The error I received is as below.
2016-09-26 17:43:54 INFO OGG-01021 Command received from GGSCI: STATS. 

2016-09-26 17:46:02 INFO OGG-01971 The previous message, 'INFO OGG-01021', repeated 1 times. 

2016-09-26 17:46:02 WARNING OGG-00869 OCI Error ORA-24369: required callbacks not registered for one or more bind handles (status = 24369), SQL <UPDATE "DATASTORE"."WORKLOG" x SET x."DETAILS" = EMPTY_CLOB() WHERE x."ENTRYID" = :b0 RETURNING x."DETAILS" INTO :dl0>. 

2016-09-26 17:46:02 WARNING OGG-01004 Aborted grouped transaction on 'DATASTORE.WORKLOG', Database error 24369 (OCI Error ORA-24369: required callbacks not registered for one or more bind handles (status = 24369), SQL <UPDATE "DATASTORE"."WORKLOG" x SET x."DETAILS" = EMPTY_CLOB() WHERE x."ENTRYID" = :b0 RETURNING x."DETAILS" INTO :dl0>). 

2016-09-26 17:46:02 WARNING OGG-01003 Repositioning to rba 1597961 in seqno 0. 

2016-09-26 17:46:04 WARNING OGG-01154 SQL error 24369 mapping ARADMIN.T433 to DATASTORE.WORKLOG OCI Error ORA-24369: required callbacks not registered for one or more bind handles (status = 24369), SQL <UPDATE "DATASTORE"."WORKLOG" x SET x."DETAILS" = EMPTY_CLOB() WHERE x."ENTRYID" = :b0 RETURNING x."DETAILS" INTO :dl0>. 


The below statement is what was registered as abended from the discard file.
UPDATE "DATASTORE"."WORKLOG" x SET x."DETAILS" = EMPTY_CLOB() WHERE x."ENTRYID" = :b0 RETURNING x."DETAILS" INTO :dl0 
Oracle support site didn't help much at my first search and hit only 2 articles related to this error.. So as this is a production issue, opened a severity ticket and provided all the details, report files, parameter files and even the trail files as they requested for investigation.
EMPTY_CLOB() is an Oracle internal package to initialize a CLOB variable and my guess was whether this function is being called by application or by database internal call.
As the ticket is for 24/7 support, the engineers are rotated and from one of the engineer got a clue on how to make the replicat to display the SQL statements as they were applied. As per advice, included the following parameters in the replicat param file and started the replicat from GoldenGate home.

showsyntax
grouptransops 1
maxtransops 1

started the replicat as below from GoldenGate home
$ ./replicat paramfile ./dirprm/<replicat_name>.prm

The showsyntax is used to display the statements that the replicat is going to run against the database and from this method it is found that the actual run varies against the erred query.

This is the exact update statement that the showsyntax produced..
UPDATE "DATASTORE"."WORKLOG" x SET x."DETAILS" = '<LOB Data>' WHERE x."ENTRYID"='000000431066777' 
There is a query transformation as below when the error occurs.
UPDATE "DATASTORE"."WORKLOG" x SET x."DETAILS" = EMPTY_CLOB() WHERE x."ENTRYID" = :b0 RETURNING x."DETAILS" INTO :dl0 
By this time I came across another bug note "Bug 1555887" from Oracle Support site in which the problem statement had the below lines..
In OCI, as long as there's only one row to update, the OCI happy,however, if 
the test case is run once (several iterations), then run again without 
truncating the clob table, the following errors occur if more than one row 
matches the update statement where clause:
 
1st time with 2 rows - onexecute
 
ORA-24369: required callbacks not registered for one or more bindhandles

So now with this clue, I started to look for duplicate data in the table and yes, I have got the duplicate data produced in the table. So this is the root cause that the process got abended.
Now I can't just say this is a duplicate record (source might contain 2 records as well) but to figure out how this data got duplicated, I started the investigation again from application side which is the source for all the data. The application sql logs stated as below.. (Example shown of a different data as logs turned on to see the sequence of operation on the table)
....
....
....
INSERT INTO T433 (C2,C7,C8,C112,C911001009,C911001089,C911001302,C911002100,C911018012,C911018013,C911018014,C911018016,C911018045,C911018046,C911018047,C911018056,C911018057,C911018058,C911018201,C911018254,C911203021,C911203022,C911203023,C3,C5,C6,C1) 
VALUES ('TXGUser',0,'Task # 1.1.1 Assigned to Group N/A',';0;',NULL,'A00','1.1.TR0000019131158',' 1.1.1',EMPTY_CLOB(),'TR0000019131158','Notification','BRE','BRE','Ball','Notification','TXGUser','911SMC','808-827-0515','TSK000076213737','Ball','No','No','No',1475015152,'TXGUser',1475015152,'000000431352674') 
...
SELECT C911018012 FROM T433 WHERE C1 IN ('000000431352674') FOR UPDATE 
...
Setting LOB into the above row ... 
....
....
So this is just a single update happening on a single row. So duplicates from application is ruled out.
Checking in the source database where the application connects, shows no sign of duplicates as the main column (C1) has a unique constraint to avoid duplicates.
Checking on the target system reveals many duplicate data sitting happily occupying space..
SQL> select entryid, count(*) from worklog group by entryid having count(*) >1;

ENTRYID        COUNT(*)
-------------------- ----------
000000431066591               2
000000431066615               2
000000431066617               2
...
...
...
The count of those records was around 2000. So how was this possible?
I used HANDLECOLLISIONS when turning on the real time replicat which has KEYCOLS as ENTRYID for the WORKLOG table.
So is this duplicates still possible? Shouldn't it eliminate the records that were already loaded via initial loading as we have mentioned KEYCOLS in the table mapping?
The answers is in the Oracle support note Doc ID 1350948.1 OGG Can Insert Duplicate Rows When There Is No Unique Index Primary Key UI PK Defined.
Cool, I don't have an unique key defined on the target table which allowed all the data (delta data) to be inserted one more time creating duplicates making the bug to kick in here as well.
So I just removed all the duplicates by running the following block
create table del as select entryid from (select entryid, count(*) from worklog group by entryid having count(*) >1);
DECLARE
   wld        NUMBER;
   lv_stmt2   VARCHAR2 (1000);
BEGIN
   FOR eid IN (SELECT entryid
                 FROM del)
   LOOP
      lv_Stmt2 :=
            'select worklistid from (select * from worklog where entryid= '
         || eid.entryid
         || ' order by 1 desc) where rownum=1';

      EXECUTE IMMEDIATE lv_stmt2 INTO wld;

      DELETE FROM worklog
            WHERE worklistid = wld;
   END LOOP;
END;
/
drop table del purge;
Now to make sure I don't get duplicates again, I just created a unique index on ENTRYID column which is again my key column to do the table operations.
The data then replicated happily..

Moral of the story: Duplication is the worst enemy of the well built system :)

Few things:
To ignore orphan records, I used REPERROR (2291, ignore) in my parameter file.
The bug note mentioned in this blog is said to be fixed in product version 8.1.6.1.4 in the same note whereas my current database version this issue is on 11.2.0.4 with GoldenGate version 12.2.0.1
The whole troubleshooting exercise took around 3 days to fix the issue

Happy troubleshooting!!

2 comments:

  1. Thank you so much for a detailed explanation .... resolving data issue is not easy task ...

    ReplyDelete
  2. Very nice and detailed explanation. useful insight

    ReplyDelete