Looking for something? Try here..

Monday, December 20, 2021

Oracle Application/session tracing methods - 3

 This is a continuation of the previous posts regarding how to trace an individual session as discussed in the below link

Oracle Application/session tracing methods - 1

and the methods of how to trace the sessions at application or database level as discussed in the below link

Oracle Application/session tracing methods - 2

You might see when we start tracing at application level, we get a bunch of trace files as there are multiple session involved. For eg., when we trace using a module, multiple sessions can run queries involving same module. In such cases, we can't go dig each and every trace file to investigate the issue. It would be much easier if we have a single consolidated trace file so that we can have a better understand of the issue we are investigating. Also, the trace file needs to be in human readable format rather some bunch of random random numbers here and there which only the computer can understand. 


In this post we are going to look at how to consolidate multiple trace files as well how to make these trace files human readable using the below tools

  • trcsess
  • tkprof

trcsess

We can use the trcsess to consolidate trace files based on several criteria such as 

  • Session id
  • Client id
  • Service name
  • Action name
  • Module name

Let's see how to use this tool with an example. In my previous post, when I mentioned about tracing session with service name, I used Swingbench to generate load using 4 session. The session traces are as below

..
..
-rw-r-----. 1 oracle oinstall  4954796 Dec 15 00:25 orcl_ora_19415.trm
-rw-r-----. 1 oracle oinstall 44587093 Dec 15 00:25 orcl_ora_19415.trc
-rw-r-----. 1 oracle oinstall  4233304 Dec 15 00:25 orcl_ora_19413.trm
-rw-r-----. 1 oracle oinstall 37791460 Dec 15 00:25 orcl_ora_19413.trc
-rw-r-----. 1 oracle oinstall  4911273 Dec 15 00:25 orcl_ora_19411.trm
-rw-r-----. 1 oracle oinstall 44277819 Dec 15 00:25 orcl_ora_19411.trc
-rw-r-----. 1 oracle oinstall  3892558 Dec 15 00:25 orcl_ora_19409.trm
-rw-r-----. 1 oracle oinstall 34574211 Dec 15 00:25 orcl_ora_19409.trc
..
..
These 4 traces can be consolidated into a single trace file as below

Syntax: trcsess [output=] [session=] [clientid=] [service=] [action=] [module=]
[oracle@linux75-2 trace]$
[oracle@linux75-2 trace]$ trcsess output=service_orcl_trace.trc service=ORCL orcl_ora_19415.trc orcl_ora_19413.trc orcl_ora_19411.trc orcl_ora_19409.trc
[oracle@linux75-2 trace]$
[oracle@linux75-2 trace]$ ls -lrt service_orcl_trace.trc
-rw-r--r--. 1 oracle oinstall 161234967 Dec 19 23:56 service_orcl_trace.trc
[oracle@linux75-2 trace]$ head -20 service_orcl_trace.trc
*** [ Unix process pid: 19415 ]
*** 2021-12-15T00:25:01.309981+05:30
*** [ Unix process pid: 19413 ]
*** 2021-12-15T00:25:01.309867+05:30
*** [ Unix process pid: 19409 ]
*** 2021-12-15T00:25:01.309928+05:30
*** [ Unix process pid: 19415 ]
*** 2021-12-15T00:25:01.309986+05:30
*** [ Unix process pid: 19413 ]
*** 2021-12-15T00:25:01.309871+05:30
*** [ Unix process pid: 19409 ]
*** 2021-12-15T00:25:01.309933+05:30
*** [ Unix process pid: 19415 ]
*** 2021-12-15T00:25:01.309990+05:30
*** CLIENT DRIVER:() 2021-12-15T00:25:01.309999+05:30

WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=11623002622
WAIT #0: nam='SQL*Net message from client' ela= 111757 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=11623121966
WAIT #0: nam='latch: shared pool' ela= 230 address=1613615352 number=551 why=1787423120 obj#=-1 tim=11623122364
WAIT #0: nam='library cache load lock' ela= 13162 object address=1757868440 lock address=1759806192 100*mask+namespace=5177347 obj#=-1 tim=11623135636
[oracle@linux75-2 trace]$ 
You can notice that I'm providing the trace files for the trcsess to look into for trace details, otherwise trcsess will look into all the trace files inside the trace directory. 
Also, once the trace files are consolidated, the trace file shows the pid of all the sessions traced in the beginning. 

tkprof

The raw trace file as I mentioned in the beginning of this post is in Oracle proprietary format like below. 
..
..
*** 2021-12-15T00:25:01.309990+05:30
*** CLIENT DRIVER:() 2021-12-15T00:25:01.309999+05:30

WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=11623002622
WAIT #0: nam='SQL*Net message from client' ela= 111757 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=11623121966
WAIT #0: nam='latch: shared pool' ela= 230 address=1613615352 number=551 why=1787423120 obj#=-1 tim=11623122364
WAIT #0: nam='library cache load lock' ela= 13162 object address=1757868440 lock address=1759806192 100*mask+namespace=5177347 obj#=-1 tim=11623135636
WAIT #139804731097848: nam='cursor: pin S wait on X' ela= 2567 idn=3873422482 value=1735166787584 where=21474836480 obj#=-1 tim=11623138951
=====================
PARSING IN CURSOR #139804731097848 len=82 dep=1 uid=0 oct=3 lid=0 tim=11623139023 hv=3873422482 ad='71383950' sqlid='0k8522rmdzg4k'
select privilege# from sysauth$ where (grantee#=:1 or grantee#=1) and privilege#>0
END OF STMT
PARSE #139804731097848:c=379,e=3165,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2057665657,tim=11623139020
EXEC #139804731097848:c=161,e=161,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=2057665657,tim=11623139260
FETCH #139804731097848:c=74,e=74,p=0,cr=4,cu=0,mis=0,r=1,dep=1,og=4,plh=2057665657,tim=11623139366
WAIT #139804731068608: nam='cursor: pin S wait on X' ela= 3282 idn=3008674554 value=1735166787584 where=21474836480 obj#=-1 tim=11623142719
=====================
PARSING IN CURSOR #139804731068608 len=226 dep=1 uid=0 oct=3 lid=0 tim=11623142772 hv=3008674554 ad='71378ba8' sqlid='5dqz0hqtp9fru'
select /*+ connect_by_filtering index(sysauth$ i_sysauth1) */ privilege#, bitand(nvl(option$, 0), 72), grantee#, level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0
END OF STMT
PARSE #139804731068608:c=169,e=3381,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1227530427,tim=11623142771
WAIT #139804731068608: nam='PGA memory operation' ela= 8 p1=131072 p2=2 p3=0 obj#=-1 tim=11623142861
EXEC #139804731068608:c=118,e=118,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1227530427,tim=11623142960
WAIT #139804731068608: nam='PGA memory operation' ela= 7 p1=65536 p2=1 p3=0 obj#=-1 tim=11623143087
FETCH #139804731068608:c=209,e=209,p=0,cr=5,cu=0,mis=0,r=1,dep=1,og=4,plh=1227530427,tim=11623143182
FETCH #139804731068608:c=5,e=5,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=1227530427,tim=11623143213
STAT #139804731068608 id=1 cnt=1 pid=0 pos=1 obj=0 op='CONNECT BY WITH FILTERING (cr=5 pr=0 pw=0 str=1 time=215 us)'
STAT #139804731068608 id=2 cnt=1 pid=1 pos=1 obj=144 op='TABLE ACCESS BY INDEX ROWID BATCHED SYSAUTH$ (cr=3 pr=0 pw=0 str=1 time=29 us cost=3 size=20 card=2)'
STAT #139804731068608 id=3 cnt=1 pid=2 pos=1 obj=147 op='INDEX RANGE SCAN I_SYSAUTH1 (cr=2 pr=0 pw=0 str=1 time=6 us cost=2 size=0 card=2)'
STAT #139804731068608 id=4 cnt=0 pid=1 pos=2 obj=0 op='HASH JOIN  (cr=2 pr=0 pw=0 str=1 time=125 us cost=7 size=115 card=5)'
..
..
This is partially understandable but wouldn't it be nice to see the trace file in a more readable and easily understandable format? 
For this we have tkprof (transient kernel profiler) provided by Oracle. tkprof accepts a trace file as input and produces a formatted output file and there a few options that we can make use of. Explain plan for the captured queries can also be made available as an option. Usage is as below
[oracle@linux75-2 trace]$ tkprof
Usage: tkprof tracefile outputfile [explain= ] [table= ]
              [print= ] [insert= ] [sys= ] [sort= ]
  table=schema.tablename   Use 'schema.tablename' with 'explain=' option.
  explain=user/password    Connect to ORACLE and issue EXPLAIN PLAN.
  print=integer    List only the first 'integer' SQL statements.
  pdbtrace=user/password   Connect to ORACLE to retrieve SQL trace records.
  aggregate=yes|no
  insert=filename  List SQL statements and data inside INSERT statements.
  sys=no           TKPROF does not list SQL statements run as user SYS.
  record=filename  Record non-recursive statements found in the trace file.
  waits=yes|no     Record summary for any wait events found in the trace file.
  sort=option      Set of zero or more of the following sort options:
    prscnt  number of times parse was called
    prscpu  cpu time parsing
    prsela  elapsed time parsing
    prsdsk  number of disk reads during parse
    prsqry  number of buffers for consistent read during parse
    prscu   number of buffers for current read during parse
    prsmis  number of misses in library cache during parse
    execnt  number of execute was called
    execpu  cpu time spent executing
    exeela  elapsed time executing
    exedsk  number of disk reads during execute
    exeqry  number of buffers for consistent read during execute
    execu   number of buffers for current read during execute
    exerow  number of rows processed during execute
    exemis  number of library cache misses during execute
    fchcnt  number of times fetch was called
    fchcpu  cpu time spent fetching
    fchela  elapsed time fetching
    fchdsk  number of disk reads during fetch
    fchqry  number of buffers for consistent read during fetch
    fchcu   number of buffers for current read during fetch
    fchrow  number of rows fetched
    userid  userid of user that parsed the cursor

[oracle@linux75-2 trace]$ tkprof service_orcl_trace.trc service_orcl_trace.out explain='sys as sysdba' table=sys.tkprof_table sys=no sort=exeela,execpu waits=yes

TKPROF: Release 12.2.0.1.0 - Development on Mon Dec 20 00:25:40 2021

Copyright (c) 1982, 2017, Oracle and/or its affiliates.  All rights reserved.


password =
[oracle@linux75-2 trace]$ 
I have provided service_orcl_trace.trc file as input to tkprof to convert the file to service_orcl_trace.out file with explain plan for the queries along with their wait time statistics. 
I don't want to see the recursive calls made by sys and hence mentioned sys=no and would like to sort the output of the trace file with queries execute elapsed time and cpu time spent executing. 

Sample output produced by tkprof will look like below
 
********************************************************************************

SQL ID: 982zxphp8ht6c Plan Hash: 1666523684

select product_id, product_name, product_description, category_id,
  weight_class, supplier_id, product_status, list_price, min_price,
  catalog_url
from
  product_information where product_id = :1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    95536      0.57       0.91          0          0          0           0
Execute  95536      1.00       1.58          0          0          0           0
Fetch    95536      0.80       1.32         28     286608          0       95536
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   286608      2.38       3.81         28     286608          0       95536

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 89
Number of plan statistics captured: 6

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  TABLE ACCESS BY INDEX ROWID PRODUCT_INFORMATION (cr=3 pr=1 pw=0 time=543 us starts=1 cost=2 size=176 card=1)
         1          1          1   INDEX UNIQUE SCAN PRODUCT_INFORMATION_PK (cr=2 pr=0 pw=0 time=244 us starts=1 cost=1 size=0 card=1)(object id 24503)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   95536        0.00          0.16
  db file sequential read                        28        0.00          0.02
  SQL*Net message from client                 95536        0.01         24.82
  cursor: pin S                                  15        0.00          0.01
  latch: cache buffers chains                     1        0.00          0.00
********************************************************************************
 
The trace file now looks neat and easy to understand which can be interpreted and investigated for performance issues. 
A simple interpretation would be this specific select query has been run 95536 times with 28 blocks read for the disk or physical i/o and made 286608 logical i/o ti fetch 95536 records. This means the query is bringing 1 records per execution. Total elapsed time is 3.81 seconds with 2.38 seconds spent in cpu. The investigation can go for different queries to pin point the issues with the help of this tkprof trace file report. 

So we conclude our tracing methods with this and I hope it is easy to follow and understand. 

References

What is TRCSESS and How to use it ? (Doc ID 280543.1)
Pic courtesy: www.oracle.com

Happy tracing...!!!

Oracle Application/session tracing methods - 2

 In my previous post, we saw how to trace individual sessions. The challenge involved in real world performance issues is that mostly there won't be any single session that would run in the DB but the application would spawn multiple processes/sessions. So in this case if we want to trace a session, it will be tedious task to identify which session to trace. Many time there would be run away sessions and a few times even the developers wouldn't be knowing the entire work flow which will again add complexity to the existing issue. 


During these times, instead of tracing individual session we can go ahead and trace set of sessions using application tracing methods. In this post, we will look at the methods on how to do the application tracing

Application level tracing

Application tracing can be performed by using 

  • DBMS_SQL_MONITOR
  • DBMS_MONITOR

We will have a separate post on DBMS_SQL_MONITOR later and now will look into the usage of DBMS_MONITOR

DBMS_MONITOR

Information that we need to proceed with application tracing using dbms_monitor are either of the 3 below

  • Client info
  • Module
  • Action
All of the 3 might or might not be obtained from v$session as v$session could also contain null for these fields. For eg:
SQL> set sqlformat ansiconsole
SQL> select sid, serial#, username, client_info, module, action from v$session where username not in ('SYS','SYSTEM')
SID   SERIAL#  USERNAME  CLIENT_INFO  MODULE                                          ACTION
---------------------------------------------------------------------------------------------
17    63073    SOE       Swingbench Load Generator  JDBC Thin Client
401   47594    SOE       Swingbench Load Generator  JDBC Thin Client
778   11025    SOE                                  SQL*Plus
780   55219    SYSRAC                               oraagent.bin@linux75-2.selvapc.com (TNS V1-V3)
1151  4268     SOE       Swingbench Load Generator  JDBC Thin Client
1157  61639    SYSRAC                               oraagent.bin@linux75-2.selvapc.com (TNS V1-V3)
1158  10873    TEST                                 SQL Developer
1166  11417    SOE       Swingbench Load Generator  JDBC Thin Client


8 rows selected.

SQL> 
You can see a few session has CLIENT_INFO and a few doesn't and none of sessions has info on ACTION field. 

We can use DBMS_APPLICATION_INFO package to extract the details needed or to set them as well by using READ_CLIENT_INFO, READ_MODULE, SET_ACTION, SET_CLIENT_INFO or SET_MODULE procedures which will be explained below

READ_CLIENT_INFO and READ_MODULE can be used to read the details from the specific SID using v$session view
SET_ACTION, SET_CLIENT_INFO or SET_MODULE can be used to set the names for the specific program or piece of code which makes tracing easy. 
The below code provides an example of how to insert the DBMS_APPLICATION_INFO package in application program code making way for us to trace as desired. 
CREATE
	OR replace PROCEDURE up_test1 AS

BEGIN
	dbms_application_info.set_module(module_name=> 'Change name', action_name=> 'insert id');

	INSERT INTO test1 (id,name)
	VALUES (102,'ORABLISS');

	dbms_application_info.set_action(action_name=> 'update id');

	UPDATE test1
	SET id = 101
	WHERE name = 'test';

	dbms_application_info.set_module(NULL, NULL);
END;
/
You can see I have inserted DBMS_APPLICATION_INFO.SET_MODULE in 2 places one with module_name and action_name and another with only action_name. This way we can trace the parts only we need in code with large number of lines. 

Once we introduced the module_name or action_name we can trace the sessions running this piece of code without even knowing the session id of the sessions executing them. 

Use DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE procedure to trace sessions as desired for investigation

For tracing sessions using module_name:
SQL> exec dbms_monitor.SERV_MOD_ACT_TRACE_ENABLE(service_name=>'ORCL',module_name=>'Change name');

PL/SQL procedure successfully completed.

-- Run the procedure using other session which will trigger the tracing 
-- Once the procedure is completed we can disable trace as follows. 

SQL> exec dbms_monitor.SERV_MOD_ACT_TRACE_disable(service_name=>'ORCL',module_name=>'Change name');

PL/SQL procedure successfully completed.

SQL> 

For tracing sessions performing specific action:

we can also specify the action_name which will trigger tracing only when the specific action name is started in the program
SQL> exec dbms_monitor.SERV_MOD_ACT_TRACE_ENABLE(service_name=>'ORCL',module_name=>'Change name',action_name=> 'update id',waits=>true,binds=>true);

PL/SQL procedure successfully completed.

-- Run the procedure using other session which will trigger the tracing 
-- This time tracing will capture only the update statement in the procedure as the action is named as 'update id'
-- Once the procedure is completed we can disable trace as follows. 

SQL> exec dbms_monitor.SERV_MOD_ACT_TRACE_disable(service_name=>'ORCL',module_name=>'Change name',action_name=> 'update id');

PL/SQL procedure successfully completed.

SQL> 
The above SERV_MOD_ACT_TRACE_ENABLE procedure follows strict hierarchy, meaning - service_name is mandatory followed by module_name and action_name. If no module_name is provided, we can't provide action_name and if no action_anme is provided all the action under the module will be traced. 

For tracing sessions using specific service_name:

If we don't specify any module_name, all the sessions connecting to the DB using the service_name will be traced. Different applications can connect to same database using different service names as desired by the database admin to segregate them. 
SQL> exec dbms_monitor.SERV_MOD_ACT_TRACE_ENABLE(service_name=>'ORCL');

PL/SQL procedure successfully completed.

-- This time tracing will capture all the sessions that connect to DB using ORCL service
-- We can disable trace as follows. 

SQL> exec dbms_monitor.SERV_MOD_ACT_TRACE_DISABLE(service_name=>'ORCL');

PL/SQL procedure successfully completed.

SQL> 

For tracing sessions using Client_identifier:


v$session will have client_info filed populated for most of the application. If it is null like the example I gave in this post, we can use the DBMS_APPLICATION_INFO.SET_CLIENT_INFO procedure to set in the program/main procedure similar to below
CREATE
	OR replace PROCEDURE up_test2 AS

BEGIN
	dbms_application_info.set_client_info('client info test');
	dbms_session.set_identifier('client info test');

	INSERT INTO test1 (id,name)
	VALUES (102,'ORABLISS');

	UPDATE test1
	SET id = 101
	WHERE name = 'test';
	
	dbms_session.clear_identifier;
END;
/
As I have now included the DBMS_SESSION.SET_IDENTIFIER, we can trace the session without searching for the session id. 
Setting only CLIENT_INFO will be useful to find the session details to enable tracing and for automatic tracing of session with client id, we need the dbms_session.set_identifier
SQL> exec DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE('client info test');

PL/SQL procedure successfully completed.

-- Run the procedure up_test2 using other session which will trigger the tracing using client identifier
-- Once the procedure is completed we can disable trace as follows (optional)

SQL> exec DBMS_MONITOR.CLIENT_ID_TRACE_disable('client info test');

PL/SQL procedure successfully completed.

SQL>

Database level tracing

The above methods explained regarding the options of tracing parts of database and its sessions. 
If a need arise to trace the entire database (I don't think any one would do this on a production database), we can do so by DBMS_MONITOR.DATABASE_TRACE_ENABLE procedure
SQL> exec DBMS_MONITOR.DATABASE_TRACE_ENABLE;

PL/SQL procedure successfully completed.

-- All the sessions in the database will be traced
-- waits, binds, instance_name and plan_stat can be provided as arguments

-- To disable tracing

SQL> exec DBMS_MONITOR.DATABASE_TRACE_DISABLE;

PL/SQL procedure successfully completed.

SQL>

With this, we have looked at all the ways we can trace the application/DB sessions. 
All the trace files generated will be in Oracle proprietary format. To convert the trace files generated and make use of the trace files, check this link on tkprof


Happy Tracing

Oracle Application/session tracing methods - 1

 Tracing forms an important method when it comes to database and/or SQL performance tuning. In this blog post let's see the various methods we can trace a session or an application or an entire database and how to read the trace files generated 

Types of tracing: 

  • Session level
  • Application level
  • Database level


Prerequisites

  • Check timed_statistics value is TRUE;
  • Check statistics_level is typical or all
  • Set max_dump_file_size to unlimited so that the trace is not abruptly ended due to file size reached

Example tracing own session: 

When we need to trace the same session that we are connected to trouble shoot a query, we can perform the below

SQL_TRACE

[oracle@linux75-2 ~]$ sql soe/soe

SQLcl: Release 12.2.0.1.0 RC on Thu Dec 02 23:53:32 2021

Copyright (c) 1982, 2021, Oracle.  All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production


SQL> show parameter timed_statistics
NAME             TYPE    VALUE
---------------- ------- -----
timed_statistics boolean TRUE

SQL> show parameter statistics_level
NAME             TYPE   VALUE
---------------- ------ -------
statistics_level string TYPICAL

SQL> show parameter max_dump_file_size
NAME               TYPE   VALUE
------------------ ------ ---------
max_dump_file_size string unlimited

SQL> alter session set tracefile_identifier='MyTrace';

Session altered.

SQL> alter session set sql_trace=true;

Session altered.

SQL> -- Run the query that needs to be traced now
SQL> select count(*) from soe.addresses where country='India';

  COUNT(*)
----------
     19536

SQL> alter session set sql_trace=false;

Session altered.

SQL>

10046 tracing

We can also enable 10046 tracing to perform the same as below
SQL> -- set the identifier to find the trace file easily in the trace directory
SQL> alter session set tracefile_identifier='My10046';

Session altered.

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> -- Run the query to be traced now
SQL> select count(*) from soe.addresses where country='India';

  COUNT(*)
----------
     19536

SQL> alter session set events '10046 trace name context off';

Session altered.

SQL> show parameter diag
NAME            TYPE   VALUE
--------------- ------ ---------
diagnostic_dest string /u01/orcl
SQL> exit
For 10046 tracing, the following values are valid for trace level
  • 0 - Off
  • 2 - Similar to regular sql_trace
  • 4 - Same as 2, but with addition of bind variable values
  • 8 - Same as 2, but with addition of wait events
  • 12 - same as 2, but with both bind variable values and wait events

DBMS_SESSION package

Own session can also be traced using dbms_session package
SQL> alter session set tracefile_identifier='MySess';

Session altered.

SQL> exec DBMS_SESSION.SESSION_TRACE_ENABLE(waits=> true, binds=> true);

PL/SQL procedure successfully completed.

SQL> select count(*) from soe.addresses where country='India';

  COUNT(*)
----------
     19536

SQL> exec DBMS_SESSION.SESSION_TRACE_DISABLE();

PL/SQL procedure successfully completed.

SQL> exit
All the trace files will be available by default in the trace directory under diagnostic_dest. 
[oracle@linux75-2 ~]$ cd /u01/orcl/diag/rdbms/orcl/orcl/trace/
[oracle@linux75-2 trace]$ ls -lrt *My*
-rw-r-----. 1 oracle oinstall  1310 Dec  2 23:55 orcl_ora_8930_MyTrace.trm
-rw-r-----. 1 oracle oinstall  2560 Dec  2 23:55 orcl_ora_8930_MyTrace.trc
-rw-r-----. 1 oracle oinstall  2984 Dec  2 23:55 orcl_ora_8930_My10046.trm
-rw-r-----. 1 oracle oinstall 26449 Dec  2 23:55 orcl_ora_8930_My10046.trc
-rw-r-----. 1 oracle oinstall  3404 Dec  3 00:38 orcl_ora_12142_MySess.trm
-rw-r-----. 1 oracle oinstall 33729 Dec  3 00:38 orcl_ora_12142_MySess.trc
[oracle@linux75-2 trace]$ 
By default, the trace files will be generated by it's server process id. In the above, 8930 and 12142 are the process id of the sessions I logged into the database. Since there will be many processes running and by default oracle trace will be enabled for many background processes, we will be in a position that we are lost when searching for the trace file generated. To find the trace files easily, we are setting the tracefile_identifier as our first step of tracing. 

Example tracing another session:

In most circumstances as a DBA, we will not be tracing our own session but will have to another session whether a user is already logged in or is about to login. Let's take a look on how to we do this 

Start trace as soon as logon

We can create an after logon trigger to enable tracing for the session whenever a user logon to the database as below

CREATE OR REPLACE TRIGGER sys.set_trace AFTER LOGON ON DATABASE
WHEN ( user LIKE '&USERNAME' ) DECLARE
lcommand VARCHAR(200);
BEGIN
EXECUTE IMMEDIATE 'alter session set statistics_level=ALL';
EXECUTE IMMEDIATE 'alter session set max_dump_file_size=UNLIMITED';
DBMS_MONITOR.SESSION_TRACE_ENABLE (WAITS=> TRUE , BINDS=> TRUE);
END SET_TRACE ;
/

Note: The user on which the tracing should be enabled should explicitly have execute privilege on alter session and DBMS_MONITOR package. Otherwise the logon will fail due to insufficient privilege issue.

Start tracing when a session is identified for tracing

Many a times, when an issue is identified we will be in a situation where already a session is established in the database. So all we need to do is to get the details of SID and Serial# of the session to begin tracing of that session. Mostly these sessions are running with high cpu consumption or high waits or consuming high DB time. 
So once we figured out the SID and Serial#, the session trace can be done as below
SQL> -- Finding session id and serial # to enable trace
SQL> select username, sid, serial# from v$session where username='SOE';

USERNAME                              SID    SERIAL#
------------------------------ ---------- ----------
SOE                                   783      48974

SQL> -- Enable trace for the session
SQL> EXECUTE DBMS_MONITOR.SESSION_TRACE_ENABLE(SESSION_ID=>783, SERIAL_NUM=>48974, WAITS=>TRUE, BINDS=>FALSE);

PL/SQL procedure successfully completed.

SQL> -- Now all the activities on session 783,48974 will be traced
SQL> -- We can disable trace once we have confirmation from user he/she completed their work to investigate
SQL>
SQL> EXECUTE DBMS_MONITOR.SESSION_TRACE_DISABLE(SESSION_ID=>783, SERIAL_NUM=>48974);

PL/SQL procedure successfully completed.

SQL> 
If the session is disconnected before we stop tracing, tracing will automatically be stopped and upon trying to stop the tracing, we will get the below error
BEGIN DBMS_MONITOR.SESSION_TRACE_DISABLE(SESSION_ID=>7, SERIAL_NUM=>44); END;

*
ERROR at line 1:
ORA-00030: User session ID does not exist.
ORA-06512: at "SYS.DBMS_MONITOR", line 144
ORA-06512: at line 1

We can also use the DBMS_SYSTEM package to trace the session as below. DBMS_MONITOR explained above gives better control of what we need in terms of tracing
SQL> select username, sid, serial# from v$session where username='SOE';

USERNAME                              SID    SERIAL#
------------------------------ ---------- ----------
SOE                                  1151       6659

SQL> exec sys.dbms_system.set_sql_trace_in_session(1151, 6659, TRUE);

PL/SQL procedure successfully completed.

SQL> exec sys.dbms_system.set_sql_trace_in_session(1151, 6659, FALSE);

PL/SQL procedure successfully completed.

SQL> desc sys.dbms_system
...
...
PROCEDURE SET_SQL_TRACE_IN_SESSION
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SID                            NUMBER                  IN
 SERIAL#                        NUMBER                  IN
 SQL_TRACE                      BOOLEAN                 IN
...
...
SQL> 
So we now conclude with different methods of tracing a session. In the next post, we will take a look at Application tracing methods. Please check this link to learn on Application tracing methods

To convert raw trace file to human readable format, check this link on tkprof 

References

General SQL_TRACE / 10046 trace Gathering Examples (Doc ID 1274511.1)

Happy tracing...!!!

Thursday, October 28, 2021

Cross platform 11g (Windows) to 19c (Linux) upgrade and migration using AutoUpgrade utility

Source and Target details are as given below


In this post, let's look at the procedure on how to migrate and upgrade the database version 11.2.0.4 running on Windows OS to database version 19.12 running on Linux OS (Exadata in this case) and plug into an existing CDB using AutoUpgrade utility. 

Step 0: Download AutoUpgrade utility from Doc ID 2485457.1 and install latest Java from Java Downloads
AutoUpgrade utility is java based and generic for all OS.

Step 1: Prepare config files for AutoUpgrade utility

Create a config file for analyze and fix up phase of the AutoUpgrade.

global.autoupg_log_dir=C:\Oracle4Win\AutoUpgrade\logs
#
# Database number 1 
#
upg1.dbname=TEST11G
upg1.start_time=NOW
upg1.source_home=C:\Oracle11g\product\11.2.0\dbhome_1
upg1.target_home=/tmp
upg1.sid=TEST11G
upg1.log_dir=C:\Oracle4Win\AutoUpgrade\logs
upg1.upgrade_node=localhost
upg1.target_version=19
upg1.restoration=no
See that I have mentioned target_home as /tmp as we will not perform the upgrade in source server.

Step 2: Run the analyze mode of AutoUpgrade

DB environment is as below:
set ORACLE_SID=TEST11G
set ORACLE_HOME=C:\Oracle11g\product\11.2.0\dbhome_1
set PATH=C:\Oracle11g\product\11.2.0\dbhome_1\bin;%PATH%
Command used is as below:
C:\oracle\product\19.0.0\jdk\bin\java -jar C:\Oracle4Win\AutoUpgrade\autoupgrade.jar -config C:\Oracle4Win\AutoUpgrade\test11g.cfg -mode analyze
Note: Don't confuse with the 19c patch I'm using for java as I have the latest java needed in the same source server in the 19c home. You can download and install latest java without 19c on source. Actual analyze is done on the 11g database with 11g home

We can use lsj command within the utility to list the jobs with Summary


Step 3: Check status file for any action to be taken.

Though there are no errors in the status file, Oracle recommendations are to be followed for optimum upgrade and post upgrade performance. We can click on Precheck Report to go through all the recommendations provided by the utility if needed.

C:\Oracle4Win\AutoUpgrade\logs\cfgtoollogs\upgrade\auto\status

Step 4: Start fixups mode of AutoUpgrade utility

Command used is as below:
C:\oracle\product\19.0.0\jdk\bin\java -jar C:\Oracle4Win\AutoUpgrade\autoupgrade.jar -config C:\Oracle4Win\AutoUpgrade\test11g.cfg -mode fixups

Let us now see a few snaps of PRECHECKS progress...



Let us now see a few snaps of PREFIXUPS progress...



The below command will details of the specific job which we are interested in.. 
status -job <jobnumber>



Step 5: Check status file for any action to be taken.

The prechecks passed and no manual intervention needed in our case. If we click on the Prechecks Report and you will see all the details of what the software has tested, whether any user action required, any errors, warnings reported and the informational messages as well.
Mostly, the fixup stage will fix issues like purging recycle bin, gathering stats prior to upgrade, suggesting parameter values removal/changes, suggest removal of deprecated components, etc., 
If there are errors reported, they need to be taken care before attempting upgrade. 

C:\Oracle4Win\AutoUpgrade\logs\cfgtoollogs\upgrade\auto\status


This concludes the AutoUpgrade utility activity on source server. We have now prepared our source database to be upgrade ready. 

Step 6: Make a consistent backup to avoid issues during upgrade.

Consistent backup is needed here as the source is Windows and target is Linux (Cross Platform) where recovery of windows archives on Linux is not supported.
  • Mount DB and trigger RMAN full backup
  • Transfer the Backup files and initSID file to target server

SQL> create pfile='C:\SharedPath\Backup\initTEST11G.ora' from spfile;
File created.

SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> startup mount
ORACLE instance started.

Total System Global Area  563691520 bytes
Fixed Size                  2283224 bytes
Variable Size             385878312 bytes
Database Buffers          167772160 bytes
Redo Buffers                7757824 bytes
Database mounted.
SQL> exit
C:\windows\system32>rman target /

Recovery Manager: Release 11.2.0.4.0 - Production on Mon Oct 4 14:15:37 2021

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: TEST11G (DBID=1234192708, not open)

RMAN> backup database format 'C:\SharedPath\Backup\%U' include current controlfile;

Starting backup at 04-OCT-21
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=156 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
…
….
…
channel ORA_DISK_1: starting piece 1 at 04-OCT-21
channel ORA_DISK_1: finished piece 1 at 04-OCT-21
piece handle=C:\SHAREDPATH\BACKUP\020APFE3_1_1 tag=TAG20211004T141619 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 04-OCT-21

RMAN> exit
Recovery Manager complete.
Step 7: Transfer the files to target

We can use copy/paste or scp or any method to transfer files needed for the restore on target server

The above step concludes all the activity that is needed on the source system. The following steps will be performed in the target system where the database will be upgraded to 19c. 

Step 8: Make an entry for TEST11G DB in oratab and Prepare init file

TEST11G database will have 19c home as the home path in the oratab file. 
initTEST11G.ora will be as below. 
[oracle@srv1 ~]$ more /u02/app/oracle/product/19.0.0.0/dbhome_2/dbs/initTEST11G.ora
*.audit_file_dest='/u02/app/oracle/product/19.0.0.0/dbhome_2/rdbms/audit'
*.audit_trail='db'
*.compatible='11.2.0.4.0'
*.control_files='+DATAC3/WNCCPD01/CONTROLFILE/test11g_control.ctl'
*.db_block_size=8192
*.db_domain=''
*.db_name='test11g'
*.db_recovery_file_dest='+RECOC3'
*.db_recovery_file_dest_size=4385144832
*.diagnostic_dest='/u02/app/oracle'
*.memory_target=3G
*.open_cursors=300
*.processes=300
*.remote_login_passwordfile='EXCLUSIVE'
*.undo_tablespace='UNDOTBS1'
*.db_create_file_dest='+DATAC3'
*.db_create_online_log_dest_1='+DATAC3'
[oracle@srv1 ~]$

Step 9: Restore Database

I have just provided the commands that needs to be run for restoring the database and not the output. We need to open the database in upgrade mode after restore as we are restoring 11g database under 19c home. 
## Mount the database
SQL> startup nomount


## Start RMAN and run the restore command
rman target /
run
{
restore controlfile from '/oradump/selva/TEST11G/020APFE3_1_1';
alter database mount;
catalog backuppiece '/oradump/selva/TEST11G/010APFDJ_1_1';
set newname for database to '+DATAC3';
set newname for tempfile 1 to ‘+DATAC3’;
restore database;
switch datafile all;
switch tempfile all;
}
RMAN> exit

## Rename the logfiles to proper location and open DB resetlogs in upgrade mode. 
sqlplus / as sysdba
SQL> select member from v$logfile;
SQL> alter database rename file 'C:\ORACLE11G\ORADATA\TEST11G\REDO03.LOG' to '+DATAC3/TEST11G/DATAFILE/REDO03.LOG';
SQL> alter database rename file 'C:\ORACLE11G\ORADATA\TEST11G\REDO02.LOG' to '+DATAC3/TEST11G/DATAFILE/REDO02.LOG';
SQL> alter database rename file 'C:\ORACLE11G\ORADATA\TEST11G\REDO01.LOG' to '+DATAC3/TEST11G/DATAFILE/REDO01.LOG';
SQL> alter database open resetlogs upgrade;
SQL> exit
Step 10: Prepare target config file

Upgrade configuration file will be as follows
[oracle@srv1]$ more upgtest.cfg
global.autoupg_log_dir=/home/oracle/logs
upg1.dbname=TEST11G
upg1.start_time=NOW
upg1.source_home=/u02/app/oracle/product/19.0.0.0/dbhome_2
upg1.target_home=/u02/app/oracle/product/19.0.0.0/dbhome_2
upg1.sid=TEST11G
upg1.log_dir=/home/oracle/logs
upg1.upgrade_node=localhost
upg1.target_version=19
upg1.restoration=no
upg1.target_cdb=WNCCPD018
upg1.target_pdb_name=UPGTEST
 
Note: I'm providing the target_pdb_name with a different name from the original name. Also note that we are not attempting to plug the database into an existing container database. 

Step11: Start upgrade mode of AutoUpgrade utility

Command will be as follows:  
java -jar /dcwnec_oradump/selva/xxx_TEST11G/autoupgrade.jar -config /dcwnec_oradump/selva/xxx_TEST11G/upgtest.cfg -mode upgrade

Let's see a few progress snaps. The upgrade process will take around 60 to 120 minutes depending upon various factors. So we can check the progress using lsj command as below








Detailed log will have the below entry..

All the below mentioned tasks will automatically be taken care. Config file should have details regarding keys if source is encrypted with TDE. Or the stage will error which then can be fixed manually by importing keys.


Few more status snaps before we conclude...




The upgrade process is completed and now we can see the PDB is created with the name we have provided in the config file. 



Step 12: Post Upgrade:

Run below query to identify PDB violations and fix accordingly.
column message format a50
column status format a9
column type format a9
column con_id format 9
column name for a30
select con_id, name, type, message, status from PDB_PLUG_IN_VIOLATIONS
where status<>'RESOLVED' order by time;

These violations will not stop the database from running normally as these are only WARNING messages but we need to take care of fixing the violations. 
If we get an ERROR message, we will not be able to open the PDB in read write mode which requires the violation to be fixed before opening the PDB in read write mode. 

With this, we are now concluding the upgrade exercise and the database is successfully upgraded from 11g running on windows to 19c running on Linux. 

References

Happy Upgrading...!!!

Monday, September 27, 2021

Creating Auto-Login Keystores for TDE

If we need to create a TDE column or a tablespace, first requirement is to create a keystore to save the encryption key (secret). 


Provide the location (ENCRYPTION_WALLET_LOCATION) where the wallet or keystore will be stored in sqlnet.ora. This can also be provided as database parameter WALLET_ROOT if you don't want to use the sqlnet.ora method.

[oracle@linux75-2 ~]$ cat /u01/orcl/122/network/admin/sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/orcl/122/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES)

ENCRYPTION_WALLET_LOCATION =
  (SOURCE =(METHOD = FILE)(METHOD_DATA =
    (DIRECTORY = /u01/orcl/122/network/wallet/)))
SQLNET.EXPIRE_TIME=2
[oracle@linux75-2 ~]$ 

Create a keystore 

SQL> ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '/u01/orcl/122/network/wallet/' IDENTIFIED BY oracle;

keystore altered.

SQL> 
Now, keystore ewallet.p12 file will be created in the location mentioned. NEVER delete, lose or tamper this file. 
We can open and close the keystore as needed. The issue is that every time we bounce the database, we have to explicitly open the keystore. 
SQL> select * from v$encryption_wallet;

WRL_TYPE             WRL_PARAMETER                                      STATUS                         WALLET_TYPE          WALLET_OR FULLY_BAC     CON_ID
-------------------- -------------------------------------------------- ------------------------------ -------------------- --------- --------- ----------
FILE                 /u01/orcl/122/network/wallet/                      CLOSED                         UNKNOWN              SINGLE    NO                 0

SQL> administer key management set keystore open identified by oracle;

keystore altered.

SQL> select * from v$encryption_wallet;

WRL_TYPE             WRL_PARAMETER                                      STATUS                         WALLET_TYPE          WALLET_OR FULLY_BAC     CON_ID
-------------------- -------------------------------------------------- ------------------------------ -------------------- --------- --------- ----------
FILE                 /u01/orcl/122/network/wallet/                      OPEN                           PASSWORD             SINGLE    NO                 0

SQL> exit
When I tried to upgrade the database to 19c from the current 12.2 version using Autoupgrade utility, the analyze phase failed due to non availability of Auto-login keystore even though the keystore is open when the analyze phase is initiated. 

So how do we create an auto login keystore? 
Auto-login keystore is nothing but we don't have to manually open the keystore after a restart of the database. The creation is simple by the below command. 
SQL> ADMINISTER KEY MANAGEMENT CREATE LOCAL AUTO_LOGIN KEYSTORE FROM KEYSTORE '/u01/orcl/122/network/wallet/' IDENTIFIED BY oracle;

keystore altered.

SQL> !
[oracle@linux75-2 ~]$ ls -lrt /u01/orcl/122/network/wallet/
total 12
-rw-------. 1 oracle oinstall 2408 Jun 12 23:41 ewallet_2021061218110265.p12
-rw-------. 1 oracle oinstall 3848 Jun 12 23:41 ewallet.p12
-rw-------. 1 oracle oinstall 3901 Sep 26 00:21 cwallet.sso
[oracle@linux75-2 ~]$ exit
exit
The LOCAL keyword specifies the keystore can only be opened from the local machine. 
You can see a new file by name cwallet.sso is created which is the auto login keystore file. NEVER delete, lose or tamper this file as this will be an irreversible damage to the database. 

Once the auto login keystore is created, the password keystore will no longer work and the autologin keystore will automatically precedence and login and open the keystore from the next restart. 

SQL> select * from v$encryption_wallet;

WRL_TYPE   WRL_PARAMETER                                      STATUS                         WALLET_TYPE          WALLET_OR FULLY_BAC     CON_ID
---------- -------------------------------------------------- ------------------------------ -------------------- --------- --------- ----------
FILE       /u01/orcl/122/network/wallet/                      OPEN                           PASSWORD             SINGLE    NO                 0

SQL> administer key management set keystore close;
administer key management set keystore close
*
ERROR at line 1:
ORA-28389: cannot close auto login wallet


SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size                  8628936 bytes
Variable Size             595592504 bytes
Database Buffers          461373440 bytes
Redo Buffers                8146944 bytes
Database mounted.
Database opened.
SQL> set lines 200
SQL> col wrl_parameter for a50
SQL> select * from v$encryption_wallet;

WRL_TYPE   WRL_PARAMETER                                      STATUS                         WALLET_TYPE          WALLET_OR FULLY_BAC     CON_ID
---------- -------------------------------------------------- ------------------------------ -------------------- --------- --------- ----------
FILE       /u01/orcl/122/network/wallet/                      OPEN                           LOCAL_AUTOLOGIN      SINGLE    NO                 0

SQL> exit
You can see that the WALLET_TYPE is changed from PASSWORD to LOCAL_AUTOLOGIN and is also the STATUS is OPEN upon the restart of the database. 

Now, my analyze phase of the Autoupgrade also completed successfully without any errors. 


Happy Keystoring...!!!

Monday, September 6, 2021

Patching Oracle home with limited storage

 Today, while trying to apply patch on one of my test database, I encountered the following error. 

[oracle@linux75-2 32916808]$ $ORACLE_HOME/OPatch/opatch apply
Oracle Interim Patch Installer version 12.2.0.1.26
Copyright (c) 2021, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/orcl/122
Central Inventory : /u01/oraInventory
   from           : /u01/orcl/122/oraInst.loc
OPatch version    : 12.2.0.1.26
OUI version       : 12.2.0.1.4
Log file location : /u01/orcl/122/cfgtoollogs/opatch/opatch2021-09-05_23-04-44PM_1.log

Verifying environment and performing prerequisite checks...
Prerequisite check "CheckSystemSpace" failed.
The details are:
Required amount of space(3702.527MB) is not available.
UtilSession failed:
Prerequisite check "CheckSystemSpace" failed.
Log file location: /u01/orcl/122/cfgtoollogs/opatch/opatch2021-09-05_23-04-44PM_1.log

OPatch failed with error code 73
[oracle@linux75-2 32916808]$ 
/u01 file system has only around 2.6 GB of free space where as the patch is looking for around 3.7 GB of free space. As this being my test machine, I don't want to add space and was looking to work around the situation. 
[oracle@linux75-2 ~]$ df -h /u01 /u02 
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        10G  7.5G  2.6G  75% /u01
/dev/sda5        17G  9.3G  7.7G  55% /u02
[oracle@linux75-2 ~]$ 

Now, the downloaded patch (Patch 32916808: DATABASE JUL 2021 RELEASE UPDATE 12.2.0.1.210720) is < 1 GB in size. So why does the patch requires > 3.5 GB of free space. 

Starting with 10.2, Opatch does not backup only the affected modules, it also takes a backup of the complete affected libraries to $ORACLE_HOME/.patch_storage/<patchid>/backup/<lib_directory_name>/<library_name>

So the 3.5 GB of space is needed just to backup the existing software files so that Oracle can safely rollback if we encounter any issue during the patching process. 

So how do we over come this space issue now? 

Firstly, we need to bypass the opatch utility from performing the free space check during the patching operation 
Second, we need to figure out an alternate path to store the backup files for the duration of the patching

By running the $ORACLE_HOME/OPatch/opatch prereq -help, we can find all the prerequisites the OPatch utility is performing of which CheckSystemSpace is one of the prerequisite. 
If we start the opatch utility using -verbose option, we can see all the associated parameters taken during the execution of the opatch operation as below.. 

[oracle@linux75-2 ]$ $ORACLE_HOME/OPatch/opatch apply -verbose
Oracle Interim Patch Installer version 12.2.0.1.26
Copyright (c) 2021, Oracle Corporation.  All rights reserved.

Environment:
   OPatch.ORACLE_HOME=/u01/orcl/122
   oracle.installer.invPtrLoc=/u01/orcl/122/oraInst.loc
   oracle.installer.oui_loc=/u01/orcl/122/oui
   oracle.installer.library_loc=/u01/orcl/122/oui/lib/linux64
   oracle.installer.startup_location=/u01/orcl/122/oui
   OPatch.PLATFORM_ID=
   os.name=Linux
   OPatch.NO_FUSER=
   OPatch.SKIP_VERIFY=null
   OPatch.SKIP_VERIFY_SPACE=null
   oracle.installer.clusterEnabled=false
   TRACING.ENABLED=null
   TRACING.LEVEL=null
   OPatch.DEBUG=false
   OPATCH_VERSION=12.2.0.1.26
   Bundled OPatch Property File=properties
   ...
   ...
   ...
We can see OPatch.SKIP_VERIFY_SPACE is set to null by default. We will set this parameter to TRUE to skip space verification. 

Next, we will create a symbolic link for .patch_storage under ORACLE_HOME to a directory outside of ORACLE_HOME, in our case /u02
[oracle@linux75-2 122]$ pwd
/u01/orcl/122
[oracle@linux75-2 122]$ mkdir -p /u02/.patch_storage
[oracle@linux75-2 122]$ ln -s /u02/.patch_storage .patch_storage
[oracle@linux75-2 122]$ ls -lart .pat*
lrwxrwxrwx. 1 oracle oinstall 19 Sep  5 23:41 .patch_storage -> /u02/.patch_storage
[oracle@linux75-2 122]$
Now, the patching completes without any issues as below. 

[oracle@linux75-2 32916808]$ $ORACLE_HOME/OPatch/opatch apply OPatch.SKIP_VERIFY_SPACE=true
Oracle Interim Patch Installer version 12.2.0.1.26
Copyright (c) 2021, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/orcl/122
Central Inventory : /u01/oraInventory
   from           : /u01/orcl/122/oraInst.loc
OPatch version    : 12.2.0.1.26
OUI version       : 12.2.0.1.4
Log file location : /u01/orcl/122/cfgtoollogs/opatch/opatch2021-09-05_23-43-32PM_1.log

Verifying environment and performing prerequisite checks...

--------------------------------------------------------------------------------
Start OOP by Prereq process.
Launch OOP...

Oracle Interim Patch Installer version 12.2.0.1.26
Copyright (c) 2021, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/orcl/122
Central Inventory : /u01/oraInventory
   from           : /u01/orcl/122/oraInst.loc
OPatch version    : 12.2.0.1.26
OUI version       : 12.2.0.1.4
Log file location : /u01/orcl/122/cfgtoollogs/opatch/opatch2021-09-05_23-43-46PM_1.log

Verifying environment and performing prerequisite checks...
OPatch continues with these patches:   32916808

Do you want to proceed? [y|n]
y
User Responded with: Y
All checks passed.

Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/orcl/122')


Is the local system ready for patching? [y|n]
y
User Responded with: Y
Backing up files...
Applying interim patch '32916808' to OH '/u01/orcl/122'
ApplySession: Optional component(s) [ oracle.swd, 12.2.0.1.0 ] , [ oracle.swd.oui, 12.2.0.1.0 ] , [ oracle.network.cman, 12.2.0.1.0 ] , [ oracle.ons.cclient, 12.2.0.1.0 ] , [ oracle.ons.eons.bwcompat, 12.2.0.1.0 ] , [ oracle.rdbms.drdaas, 12.2.0.1.0 ] , [ oracle.oid.client, 12.2.0.1.0 ] , [ oracle.ons.daemon, 12.2.0.1.0 ] , [ oracle.network.gsm, 12.2.0.1.0 ]  not present in the Oracle Home or a higher version is found.

Patching component oracle.rdbms, 12.2.0.1.0...

Patching component oracle.rdbms.rsf, 12.2.0.1.0...

Patching component oracle.network.rsf, 12.2.0.1.0...

Patching component oracle.rdbms.util, 12.2.0.1.0...

Patching component oracle.ctx, 12.2.0.1.0...

...

...

...

Patching component oracle.jdk, 1.8.0.91.0...

OPatch found the word "error" in the stderr of the make command.
Please look at this stderr. You can re-run this make command.
Stderr output:
chmod: changing permissions of ‘/u01/orcl/122/bin/extjobO’: Operation not permitted
make: [iextjob] Error 1 (ignored)


Patch 32916808 successfully applied.
OPatch Session completed with warnings.
Log file location: /u01/orcl/122/cfgtoollogs/opatch/opatch2021-09-05_23-43-46PM_1.log

OPatch completed with warnings.
[oracle@linux75-2 32916808]$ 
Note that the opatch is started using OPatch.SKIP_VERIFY_SPACE=true
The final error can be ignored or can be fixed by running root.sh from ORACLE_HOME as root user. 
We are now done with the patching with limited storage in Oracle home file system. 

Foot note: 

The file system utilization of /u02 increased up to 14 GB (from 9.7 GB) as expected and went back to 11 GB and not completely down to 9.7 GB (using ~ 1GB post patching). 
[oracle@linux75-2 u02]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        17G  9.7G  7.3G  58% /u02
[oracle@linux75-2 u02]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        17G   12G  5.2G  70% /u02
[oracle@linux75-2 u02]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        17G   13G  4.2G  76% /u02
[oracle@linux75-2 u02]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        17G   13G  4.1G  76% /u02
[oracle@linux75-2 u02]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        17G   14G  3.9G  78% /u02
[oracle@linux75-2 u02]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        17G   14G  3.8G  78% /u02
[oracle@linux75-2 u02]$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda5        17G   11G  6.5G  62% /u02
 
This is because OPatch utility will delete all the backup files which are not required post patching and keeps any file that will be needed for rollback of the patch if intended. Check MOS Doc ID 550522.1 to understand how to clean up .patch_storage if needed. 

References: 







Happy Patching...!!!

Thursday, August 26, 2021

RMAN restore, DB_CREATE_FILE_DEST and mix of OMF and non OMF files

 We were trying to restore a production database on to a test environment which we do as a usual refresh when application needs. We followed the same procedure which was followed for other databases for long time but this time it failed for this specific database.


The init file of the target database consists the following parameters. 

*.db_create_file_dest='+ABD_DATA_DG01'
*.db_create_online_log_dest_1='+ABD_LOG_DG01'

Once the control file is placed in the respective location and the target database is in mount state, we did a restore which resulted in following error. 

Restore command used:

run {
set until time = "to_date ('22-Aug-2021 18:00', 'dd-Mon-yyyy hh24:mi')";
restore database;
recover database;
}

Error received: 

...
...
...
using channel ORA_DISK_7
using channel ORA_DISK_8

channel ORA_SBT_TAPE_1: starting datafile backup set restore
channel ORA_SBT_TAPE_1: specifying datafile(s) to restore from backup set
channel ORA_SBT_TAPE_1: restoring datafile 00031 to +ABP_DATA_DG01/ABP/sr3_28/sr3.data28
channel ORA_SBT_TAPE_1: restoring datafile 00103 to +ABP_DATA_DG01/ABP/sr3_89/sr3.data89
channel ORA_SBT_TAPE_1: restoring datafile 00129 to +ABP_DATA_DG01/ABP/DATAFILE/psapsr3usr.390.1049882411
channel ORA_SBT_TAPE_1: restoring datafile 00143 to +ABP_DATA_DG01/ABP/DATAFILE/psapsr3.402.1074438749
...
...
...
channel ORA_SBT_TAPE_8: restoring datafile 00091 to +ABP_DATA_DG01/ABP/sr3_79/sr3.data79
channel ORA_SBT_TAPE_8: restoring datafile 00104 to +ABP_DATA_DG01/ABP/sr3usr_3/sr3usr.data3
channel ORA_SBT_TAPE_8: reading from backup piece ./server1234/ABP/20210820/set628874_piece1_4a070fpn_628874_1_1
channel ORA_SBT_TAPE_1: ORA-19870: error while restoring backup piece ./server1234/ABP/20210820/set628870_piece1_46070fpm_628870_1_1
ORA-19504: failed to create file "+ABP_DATA_DG01/ABP/sr3_28/sr3.data28"
ORA-17502: ksfdcre:3 Failed to create file +ABP_DATA_DG01/ABP/sr3_28/sr3.data28
ORA-15001: diskgroup "ABP_DATA_DG01" does not exist or is not mounted
ORA-15018: diskgroup cannot be created
channel ORA_SBT_TAPE_1: starting datafile backup set restore
channel ORA_SBT_TAPE_1: specifying datafile(s) to restore from backup set
channel ORA_SBT_TAPE_1: restoring datafile 00009 to +ABP_DATA_DG01/ABP/sr3_6/sr3.data6
channel ORA_SBT_TAPE_1: restoring datafile 00037 to +ABP_DATA_DG01/ABP/sr3_34/sr3.data34
...
...
...
ORA-15018: diskgroup cannot be created
failover to previous backup

creating datafile file number=1 name=+ABP_DATA_DG01/ABP/system_1/system.data1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 08/23/2021 21:30:39
ORA-01180: can not create datafile 1
ORA-01110: data file 1: '+ABP_DATA_DG01/ABP/system_1/system.data1'

RMAN>

As per Oracle support note "RMAN is Not Restoring OMF Datafiles in Their Original Location (Doc ID 882555.1)", the order in which RMAN restores OMF files is as below
  1. If "SET NEWNAME" is specified, RMAN will use that name for restore.
  2. If the original file exists, RMAN will use the original filename for restore.
  3. If the DB_CREATE_FILE_DEST is set, RMAN will use the disk group name specified and create another name for that file in this disk group.
  4. If no DB_CREATE_FILE_DEST is set and the original file does not exist, then RMAN will create another name for that file in the original disk group.
Let's analyze where the restore went wrong. 
  • We have not set "SET NEWNAME", so point 1 is invalid here.
  • We dont have a database and overwriting the same, so point 2 is invalid
  • We have set DB_CREATE_FILE_DEST, so point 4 is invalid. 
Point 3 says RMAN should use the disk group name specified but if you see the error details closely, RMAN is checking for production database's disk group ABP_DATA_DG01 instead of the one provided to the parameter DB_CREATE_FILE_DEST = ABD_DATA_DG01. 
To top it up, there are 2 files already restored to the proper disk group before the error is thrown out. 

So what's happening here? 

After further analysis on why 2 files got created properly as expected and not others, we can see that the files which are being thrown error are not OMF and custom names provided by user. In this case, the files are created with these names by the application's space management software.

From one of the backup set: 

+ABP_DATA_DG01/ABP/sr3_28/sr3.data28 -- custom name
+ABP_DATA_DG01/ABP/sr3_89/sr3.data89 -- custom name
+ABP_DATA_DG01/ABP/DATAFILE/psapsr3usr.390.1049882411 -- Oracle Managed File (OMF)
+ABP_DATA_DG01/ABP/DATAFILE/psapsr3.402.1074438749 -- Oracle Managed File (OMF)

Since we have a mix of OMF and custom files, RMAN restored the OMF to proper disk group but it is looking for the original disk group for the non OMF files. 

How do we resolve this? 

Since we have a mix of OMF and non OMF files, we can make use of "SET NEWNAME" (point 1) for the restore to work properly. 

Revised restore command: 

run {
set until time = "to_date ('22-Aug-2021 18:00', 'dd-Mon-yyyy hh24:mi')";
SET NEWNAME FOR DATABASE TO '+ABD_DATA_DG01';
restore database;
switch datafile all;
switch tempfile all;
recover database;
}

After using SET NEWNAME command, the restore completed without any issues. 

References: 


Happy restoring...!!!

Wednesday, August 25, 2021

Database connections and controlling them - DCD

 In my previous post, we saw how to control the user connection by setting limits in database profiles. As a continuation to the post, we will see how to control user connections by using the below method

  • Dead Connection Detection


Unlike database profile limits which is applicable to both bequeath and network mode connections, this method of session control will only be applicable to network mode of connection as sqlnet.ora file will be used only when the oracle session is established using network via listener. 

Many of us would have seen the below error in the alert log file..

***********************************************************************

Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for Linux: Version 12.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
  Time: 24-AUG-2021 23:04:56
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.181)(PORT=14500))

We will see how this error is getting generated by a simple network failure simulation.
  1. Connect to database using network from a client server to DB server
  2. Simulate a network failure.
  3. Wait for sometime. (> 10 min)
Let's assume the following: 
Client machine        : linux-8 (192.168.56.181)
Database server       : linux75-2 (192.168.56.151) running 12c database
Connect identifier    : orcl 
(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.151)(PORT = 1522))) (CONNECT_DATA = (SERVICE_NAME = orcl)))

[oracle@linux-8 ~]$ sqlplus test/test@orcl

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Aug 22 19:55:59 2021
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sun Aug 22 2021 17:59:07 +05:30

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> 

Directly from within the linux-8 (client) server, I'm now simulating a network failure by shutting down the network adapter. So, now there is no network connection to communicate to the database server from my client machine. 
[oracle@linux-8 ~]$ nmcli connection down c59af835-6f37-449b-836a-47714f9fe29f
Connection 'enp0s8' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5)
[oracle@linux-8 ~]$ 

Even after long time (> 20 min), I'm still able to see the connection inside the database (see logon_time and sysdate column) though the client server is having a network issue. (I still didn't bring up network adapter in linux-8)

SQL> !hostname
linux75-2.selvapc.com

SQL> @session_info.sql

       SID     SERIAL Username   STATUS   Machine              OS User         Status   Module          LOGON_TIME           SYSDATE
---------- ---------- ---------- -------- -------------------- --------------- -------- --------------- -------------------- --------------------
       769      43069 TEST       INACTIVE linux-8.selvapc.com  oracle          inactive SQL*Plus        23-AUG-2021 23:28:41 23-AUG-2021 23:55:29

SQL>

So in this case, even though the session is not active or the source of session is dead long back, still the connection is available inside database with status INACTIVE. These types of connections will fill up database and if the processes and/or sessions parameter is set low, they become an issue leading to ORA-00020 "maximum number of processes (%s) exceeded" errors. 

So in this case, how to control connections from becoming "zombie" processes and kill them as and when needed? 

The solution is actually called as Dead Connection Detection (DCD) 

Let's see how to implement it and what happens when we do it. 
But before that even after I shut down my linux-8 machine, the connection is still available inside the database taking 1 precious session doing nothing. In order to get rid of it, we just need to kill the session. 

SQL> @session_info.sql

       SID     SERIAL Username   STATUS   Machine              OS User         Status   Module          LOGON_TIME           SYSDATE
---------- ---------- ---------- -------- -------------------- --------------- -------- --------------- -------------------- --------------------
       769      43069 TEST       INACTIVE linux-8.selvapc.com  oracle          inactive SQL*Plus        23-AUG-2021 23:28:41 24-AUG-2021 00:07:34

SQL> alter system kill session '769,43069' immediate;

System altered.

SQL> @session_info.sql

no rows selected

SQL>

Dead Connection Detection: 

So in this section, we are going to do the following. 
  1. Set sqlnet.expire_time=2 in sqlnet.ora of database server
  2. Connect to database using network from a client server to DB server
  3. Simulate a network failure.
  4. Wait for sometime. (~ 2 min)
  5. Monitor alert log
[oracle@linux75-2 admin]$ pwd
/u01/orcl/122/network/admin
[oracle@linux75-2 admin]$ more sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/orcl/122/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES)
SQLNET.EXPIRE_TIME=2

Now, I have added the sqlnet.expire_time parameter to 2 minutes. 
I'll now connect to database, simulate network failure and provide you with the alert log entries with timestamp..

SQL> @session_info.sql

       SID     SERIAL Username   STATUS   Machine              OS User         Status   Module          LOGON_TIME           SYSDATE
---------- ---------- ---------- -------- -------------------- --------------- -------- --------------- -------------------- --------------------
       783      12666 TEST       INACTIVE linux-8.selvapc.com  oracle          inactive SQL*Plus        24-AUG-2021 23:30:54 24-AUG-2021 23:32:07

SQL> @find_spid.sql

SPID
------------------------
7199

SQL>

You can see the database connection got created at 24-AUG-2021 23:01:55

[oracle@linux-8 ~]$ date
Tue Aug 24 23:03:17 IST 2021
[oracle@linux-8 ~]$ sudo nmcli connection down c59af835-6f37-449b-836a-47714f9fe29f
[sudo] password for oracle: 
Connection 'enp0s8' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)
[oracle@linux-8 ~]$ 

I have simulated network interruption at 24-AUG-2021 23:03:17

After around 2 minutes, we got the operation time out error which I mentioned in the beginning of the post. 

***********************************************************************

Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for Linux: Version 12.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
  Time: 24-AUG-2021 23:04:56
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.181)(PORT=14500))

So what is happening here? How does DCD work? 

DCD works different with database version 12.1 and above when compared to lower versions. Since we now already have 21c released for on-premise deployments, we will see the mechanism of verison 12.1 and above only. 

When SQLNET.EXPIRE_TIME is set, there are 3 parameters associated with Operating system's tcp keepalive is set. Example is of Linux and the parameters may vary according to OS.
  • TCP_KEEPALIVE_TIME (the amount of time until the first keepalive packet is sent)
    • /proc/sys/net/ipv4/tcp_keepalive_time
  • TCP_KEEPCNT(the number of probes to send)
    • /proc/sys/net/ipv4/tcp_keepalive_probes
  • TCP_KEEPINTVL (the interval between keepalive packets)
    • /proc/sys/net/ipv4/tcp_keepalive_intvl
The default values of these parameters are 7200, 9 and 75 which means by default the TCP keepalive will probe 9 times with interval of 75 seconds after 7200 seconds. 

Now, when we set the sqlnet.expire_time parameter to 2 minutes, the TCP_KEEPALIVE_TIME will be set to 120 seconds with interval and probes adjusted accordingly with a lesser value. 
If we trace the oracle process connected with the help of strace utility, we can see there a poll() call every 120 seconds as below...

[oracle@linux75-2 ~]$ sudo strace -p 7199
strace: Process 7199 attached
read(17, 0x7fc9362c8f1e, 8208)          = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigprocmask(SIG_BLOCK, [], NULL, 8)  = 0
poll([{fd=17, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=119, tv_usec=990000}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=20000}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [], NULL, 8) = 0
rt_sigreturn({mask=[]})                 = 0
read(17, 0x7fc9362c8f1e, 8208)          = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigprocmask(SIG_BLOCK, [], NULL, 8)  = 0
poll([{fd=17, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
...
...
...

Once the time out has occurred, the connection will be kicked out from the database freeing up resources. 

SQL> @session_info.sql

no rows selected

SQL>

Hope this post helps understanding Dead Connection Detection. 

Queries used: 

session_info.sql

col Username for a10
col Machine for a20
col Module for a15
col "OS User" for a15
SELECT
    sid,
    serial#           serial,
    username          "Username",
    status,
    machine           "Machine",
    osuser            "OS User",
    lower(status)     "Status",
    module            "Module",
    logon_time,
    sysdate
FROM
    v$session
WHERE
    username = 'TEST'
ORDER BY 1, 2, 3;

find_spid.sql

select SPID from v$process where ADDR in
(select PADDR from v$session where username='TEST');


References: 


Happy controlling...!!!