Looking for something? Try here..

Monday, March 21, 2022

PDB Cloning random time out issue

 This post is to discuss about a similar issue I encountered in the previous blog post

When I tried to clone the pluggable database from one cluster to another, I did get the following error. 

SQL> CREATE PLUGGABLE DATABASE PRD01 from QA01@QA01 keystore identified by xxxxx no data;
CREATE PLUGGABLE DATABASE PRD01 from QA01@QA01 keystore identified by xxxxx no data
*
ERROR at line 1:
ORA-65169: error encountered while attempting to copy file
+DATAC7/XXX04/D84A38E95B599A60E053E1A1C30AC20D/DATAFILE/undo_14.847.1096976301
ORA-17627: ORA-12170: TNS:Connect timeout occurred
ORA-17629: Cannot connect to the remote database server

SQL> 

Seems somethings is timing out and we need to figure out where. So it's the alert log which will help us. The contents of the alert log is as below 

***********************************************************************

Fatal NI connect error 12170, connecting to:
 (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(Host=dev2-scan.xxx.com)(Port=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=QA01_SVC.xxx.com)(CID=(PROGRAM=oracle)(HOST=s23)(USER=oracle))))

  VERSION INFORMATION:
        TNS for Linux: Version 19.0.0.0.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 19.0.0.0.0 - Production
  Version 19.12.0.0.0
  Time: 18-MAR-2022 18:07:48
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 0
    nt OS err code: 0
Errors in file /u02/app/oracle/diag/rdbms/xxx04/XXX041/trace/XXX041_ora_295605.trc:
ORA-17627: ORA-12170: TNS:Connect timeout occurred
ORA-17629: Cannot connect to the remote database server
********************************************************
Undo Create of Pluggable Database PRD01 with pdb id - 3.
********************************************************
Sometimes, I also got the below error for which I don't know the exact reason as I mentioned in my previous post
TNS-00505: Operation timed out
    nt secondary err code: 0
    nt OS err code: 0
2022-03-18T18:14:25.706649+00:00
Errors in file /u02/app/oracle/diag/rdbms/xxx04/XXX041/trace/XXX041_ora_269420.trc:
ORA-17627: ORA-01017: invalid username/password; logon denied
ORA-17629: Cannot connect to the remote database server
2022-03-18T18:14:25.706956+00:00
********************************************************
Undo Create of Pluggable Database PRD01 with pdb id - 3.
********************************************************
We can see the nt secondary err code: 0 meaning the timeout is caused due to listener or database and not external such as server or firewall settings. 
The connect string has the scan listener as with all the RAC tnsnames.ora settings and hence we can now check whether all the IPs are pinging in round robin and the VIPs are working fine. As expected all are working fine. A simple select from dual from the source also throws time out error at random frequency. 
SQL> select sysdate from dual@QA01;

select sysdate from dual@QA01
                         *
ERROR at line 1:
ORA-12170: TNS:Connect timeout occurred

SQL> /

SYSDATE
---------
18-MAR-22
So where are we getting the time out from? 

Upon further checking the local_listener parameter in the source database, I find some odd settings in one of the node as below. The other node has proper setting
SQL> sho parameter local

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      (ADDRESS=(PROTOCOL=TCP)(HOST=1
                                                 0.195.161.228)(PORT=1521)), (A
                                                 DDRESS=(PROTOCOL=TCP)(HOST=10.
                                                 195.161.228)(PORT=1522)),(ADDR
                                                 ESS=(PROTOCOL=TCPS)(HOST=10.19
                                                 5.161.228)(PORT=2484))
I then see the listener status to figure out what ports it is configured to listen
[oracle@s11 ~]$ lsnrctl status|more

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 20-MAR-2022 19:08:12

Copyright (c) 1991, 2021, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                13-FEB-2022 09:20:10
Uptime                    35 days 9 hr. 48 min. 1 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.0.0.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/s11/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcps)(HOST=10.195.161.226)(PORT=2484)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.195.161.225)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.195.161.226)(PORT=1521)))
Services Summary...
Service "+APX" has 1 instance(s).
  Instance "+APX1", status READY, has 1 handler(s) for this service...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
  ...
  ...
  ...
As you can clearly see here, listener is listening only on port 1521 and 2484
So now, I reset the local_listener parameter to remove the setting with port 1522 which the listener does not listen to. This second port is a mess up caused as we were trying to add another listener for a PoC and the cleanup was not proper after the removal of the configuration. 
SQL> alter system set local_listener="(ADDRESS=(PROTOCOL=TCP)(HOST=10.195.161.226)(PORT=1521)),(ADDRESS=(PROTOCOL=TCPS)(HOST=10.195.161.226)(PORT=2484))" sid='XXX041';

System altered.

SQL> sho parameter local

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
local_listener                       string      (ADDRESS=(PROTOCOL=TCP)(HOST=1
                                                 0.195.161.226)(PORT=1521)),(AD
                                                 DRESS=(PROTOCOL=TCPS)(HOST=10.
                                                 195.161.226)(PORT=2484))

After resetting the local_listener in the source database, now I'm able to clone the PDB without any issues. 
SQL> CREATE PLUGGABLE DATABASE PRD01 from QA01@QA01 keystore identified by xxxxx no data;

Pluggable database created.

SCAN listener hand over the connection to the VIP of the node in the cluster and in turn will be transferred to local listener. Here since the local listener is also configured with port 1522 which didn't exist, we got the random timeouts whenever the connection handoff done to 1522. While configuring parameters utmost care has to be taken to avoid such issues. 

Happy troubleshooting...!!!