Looking for something? Try here..

Thursday, December 29, 2022

Troubleshooting Kerberos and Oracle CMU configuration

This summary is not available. Please click here to view the post.

Monday, December 12, 2022

Performance slowness due to Spatial component

 We had a complaint from application team that their database is running slow after the quarterly patching. We had patched many databases and only one database is complaining on slowness. The environment is Exacc and the database is a container (running version 19.16) which has multiple PDBs. So patching couldn't be a reason is the initial thought. So we stepped into investigate what is causing the slowness. 

As a first step, we pulled an AWR report for the time frame where they did some DB operations. 



We can see from above that most of the waits are related to Cluster and topping them is gc cr multi block mixedgc cr block 2-way and gc buffer busy acquire

So this means some query is running very frequently accessing 1 or 2 table blocks continuously. We need to figure out which query is running, accessing what table/index and how frequent and what's the elapsed time of the query. These information will get us to move forward. 

Since the waits are related to Cluster, let's see SQL ordered by Cluster Wait Time (Global) section under SQL Statistics



We see 2 queries that contribute to the most cluster waits. Both these are related to spatial component as the queries are as below and they had ~ 53k executions in the report
SELECT /* chk_journal_reg */ rowid FROM "MDSYS"."SDO_TXN_JOURNAL_REG" WHERE sid = :1 AND sdo_txn_idx_id = :2 AND indexptniden = :3 AND operation = :4 AND rid = :5

SELECT /* sel_dist_upd */ ROWID, ....., ins_no, del_no FROM "MDSYS"."SDO_DIST_METADATA_TABLE" WHERE sdo_index_owner = :own AND sdo_index_name = :nam AND sdo_partition_name = :ptn FOR UPDATE

Ok, these 2 queries look simple and not complicated, so what's causing this buffer busy issues? Investigation went in following direction

Check size of tables: They are too small. 

Check the execution plan of the first code with high cluster waits gave us the clue. Though we have primary key defined on all five columns (SID, SDO_TXN_IDX_ID, INDEXPTNIDEN, OPERATION, RID), the query took the path of INDEX STORAGE FAST FULL SCAN which would traverse through all the blocks unnecessarily. The query took around 4 second per execution. 

I hint the query to use the index already available (/*+ index(SDO_TXN_JOURNAL_REG SYS_C0014253) */) - this resulted in a much better plan and execution time. Execution time reduced to milliseconds. Remember this query executed around 53k times which would make a big difference in application performance. While checking for any plan change, I found that the index path was used until the patch was applied and the plan flipped

SQL> select distinct sql_id, plan_hash_value, id, operation, options
  2* from dba_hist_sql_plan where sql_id='fw4q2s9j3bckt' order by plan_hash_value, id;

          SQL_ID    PLAN_HASH_VALUE    ID           OPERATION                   OPTIONS
________________ __________________ _____ ___________________ _________________________
fw4q2s9j3bckt            1765869626     0 SELECT STATEMENT
fw4q2s9j3bckt            1765869626     1 INDEX               STORAGE FAST FULL SCAN
fw4q2s9j3bckt            2134037127     0 SELECT STATEMENT
fw4q2s9j3bckt            2134037127     1 INDEX               UNIQUE SCAN

SQL> 
So, we decided to create sql profile so that the plan stick to plan_hash_value 2134037127 using note Encouraging CBO to Pickup a Better Execution Plan Using the COE XFR SQL Profile Script (Doc ID 1955195.1)

In the mean time, Oracle support note (After Applying the 19.16 DBRU, Scheduler Jobs on Spatial Tables Take a Long Time to Execute (Doc ID 2907800.1)) mentioned a bug related to this same table and query. 

Reason for slowness is 

Running a MV refresh using the DBMS_SCHEDULER package causes spatial to use the SDO_TXN_JOURNAL_REG transaction journal table, instead of the SDO_TXN_JOURNAL_GTT transaction journal table.

So, now the solution is to revert the sql profile created and apply spatial bundle bug fix Patch 34725493 or to apply RU 19.17

Since we already had applied another one off patch for a previous issue with Spatial, we couldn't apply this bug fix patch as it was conflicting with the one off patch. Raised an SR with Oracle to get the merge patch for both the conflicting patches but Oracle couldn't provide any commitment on when the merge patch will be ready.

As we need to get the production database back to normal as soon as we can, we had to roll back the entire 19.16 RU to bring it back to pre patch state (RU 19.15) and then the application verified all the jobs are running without issues. We could also not see any cluster related wait post the patch rollback. We rolled back as 19.17 is yet to be tested in our lower environments before we promote to production.

Interestingly, Oracle maintains a note on all the patches that needs to be applied post RU patch apply under What Is The Latest Spatial Patch Bundle for 12c and Higher Databases? (Doc ID 2514624.1) So every time we apply RU to the database, we need to refer this note and apply the patch that is mentioned for the related RU as well. 

This investigation is not only a trouble shooting experience but also a knowledge gathering experience with Oracle Spatial component regarding the bugs related to Spatial and separate Spatial patch bundle, etc.,. 


Happy Trouble shooting and Patching...!!!