Oracle Tips and Techs: Database connections and controlling them

In my previous post, we saw how to control the user connection by setting limits in database profiles. As a continuation to the post, we will see how to control user connections by using the below method

Dead Connection Detection

Unlike database profile limits which is applicable to both bequeath and network mode connections, this method of session control will only be applicable to network mode of connection as sqlnet.ora file will be used only when the oracle session is established using network via listener.

Many of us would have seen the below error in the alert log file..

***********************************************************************

Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for Linux: Version 12.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
  Time: 24-AUG-2021 23:04:56
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.181)(PORT=14500))

We will see how this error is getting generated by a simple network failure simulation.

Connect to database using network from a client server to DB server
Simulate a network failure.
Wait for sometime. (> 10 min)

Let's assume the following:

Client machine : linux-8 (192.168.56.181)

Database server : linux75-2 (192.168.56.151) running 12c database

Connect identifier : orcl

(DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.56.151)(PORT = 1522))) (CONNECT_DATA = (SERVICE_NAME = orcl)))

[oracle@linux-8 ~]$ sqlplus test/test@orcl

SQL*Plus: Release 19.0.0.0.0 - Production on Sun Aug 22 19:55:59 2021
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Last Successful login time: Sun Aug 22 2021 17:59:07 +05:30

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL>

Directly from within the linux-8 (client) server, I'm now simulating a network failure by shutting down the network adapter. So, now there is no network connection to communicate to the database server from my client machine.

[oracle@linux-8 ~]$ nmcli connection down c59af835-6f37-449b-836a-47714f9fe29f
Connection 'enp0s8' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5)
[oracle@linux-8 ~]$

Even after long time (> 20 min), I'm still able to see the connection inside the database (see logon_time and sysdate column) though the client server is having a network issue. (I still didn't bring up network adapter in linux-8)

SQL> !hostname
linux75-2.selvapc.com

SQL> @session_info.sql

       SID     SERIAL Username   STATUS   Machine              OS User         Status   Module          LOGON_TIME           SYSDATE
---------- ---------- ---------- -------- -------------------- --------------- -------- --------------- -------------------- --------------------
       769      43069 TEST       INACTIVE linux-8.selvapc.com  oracle          inactive SQL*Plus        23-AUG-2021 23:28:41 23-AUG-2021 23:55:29

SQL>

So in this case, even though the session is not active or the source of session is dead long back, still the connection is available inside database with status INACTIVE. These types of connections will fill up database and if the processes and/or sessions parameter is set low, they become an issue leading to ORA-00020 "maximum number of processes (%s) exceeded" errors.

So in this case, how to control connections from becoming "zombie" processes and kill them as and when needed?

The solution is actually called as Dead Connection Detection (DCD)

Let's see how to implement it and what happens when we do it.

But before that even after I shut down my linux-8 machine, the connection is still available inside the database taking 1 precious session doing nothing. In order to get rid of it, we just need to kill the session.

SQL> @session_info.sql

       SID     SERIAL Username   STATUS   Machine              OS User         Status   Module          LOGON_TIME           SYSDATE
---------- ---------- ---------- -------- -------------------- --------------- -------- --------------- -------------------- --------------------
       769      43069 TEST       INACTIVE linux-8.selvapc.com  oracle          inactive SQL*Plus        23-AUG-2021 23:28:41 24-AUG-2021 00:07:34

SQL> alter system kill session '769,43069' immediate;

System altered.

SQL> @session_info.sql

no rows selected

SQL>

Dead Connection Detection:

So in this section, we are going to do the following.

Set sqlnet.expire_time=2 in sqlnet.ora of database server
Connect to database using network from a client server to DB server
Simulate a network failure.
Wait for sometime. (~ 2 min)
Monitor alert log

[oracle@linux75-2 admin]$ pwd
/u01/orcl/122/network/admin
[oracle@linux75-2 admin]$ more sqlnet.ora
# sqlnet.ora Network Configuration File: /u01/orcl/122/network/admin/sqlnet.ora
# Generated by Oracle configuration tools.

NAMES.DIRECTORY_PATH= (TNSNAMES)
SQLNET.EXPIRE_TIME=2

Now, I have added the sqlnet.expire_time parameter to 2 minutes.

I'll now connect to database, simulate network failure and provide you with the alert log entries with timestamp..

SQL> @session_info.sql

       SID     SERIAL Username   STATUS   Machine              OS User         Status   Module          LOGON_TIME           SYSDATE
---------- ---------- ---------- -------- -------------------- --------------- -------- --------------- -------------------- --------------------
       783      12666 TEST       INACTIVE linux-8.selvapc.com  oracle          inactive SQL*Plus        24-AUG-2021 23:30:54 24-AUG-2021 23:32:07

SQL> @find_spid.sql

SPID
------------------------
7199

SQL>

You can see the database connection got created at 24-AUG-2021 23:01:55

[oracle@linux-8 ~]$ date
Tue Aug 24 23:03:17 IST 2021
[oracle@linux-8 ~]$ sudo nmcli connection down c59af835-6f37-449b-836a-47714f9fe29f
[sudo] password for oracle: 
Connection 'enp0s8' successfully deactivated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/2)
[oracle@linux-8 ~]$

I have simulated network interruption at 24-AUG-2021 23:03:17

After around 2 minutes, we got the operation time out error which I mentioned in the beginning of the post.

***********************************************************************

Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for Linux: Version 12.2.0.1.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 12.2.0.1.0 - Production
  Time: 24-AUG-2021 23:04:56
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535

TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505

TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.56.181)(PORT=14500))

So what is happening here? How does DCD work?

DCD works different with database version 12.1 and above when compared to lower versions. Since we now already have 21c released for on-premise deployments, we will see the mechanism of verison 12.1 and above only.

When SQLNET.EXPIRE_TIME is set, there are 3 parameters associated with Operating system's tcp keepalive is set. Example is of Linux and the parameters may vary according to OS.

TCP_KEEPALIVE_TIME (the amount of time until the first keepalive packet is sent)

/proc/sys/net/ipv4/tcp_keepalive_time

TCP_KEEPCNT(the number of probes to send)

/proc/sys/net/ipv4/tcp_keepalive_probes

TCP_KEEPINTVL (the interval between keepalive packets)

/proc/sys/net/ipv4/tcp_keepalive_intvl

The default values of these parameters are 7200, 9 and 75 which means by default the TCP keepalive will probe 9 times with interval of 75 seconds after 7200 seconds.

Now, when we set the sqlnet.expire_time parameter to 2 minutes, the TCP_KEEPALIVE_TIME will be set to 120 seconds with interval and probes adjusted accordingly with a lesser value.

If we trace the oracle process connected with the help of strace utility, we can see there a poll() call every 120 seconds as below...

[oracle@linux75-2 ~]$ sudo strace -p 7199
strace: Process 7199 attached
read(17, 0x7fc9362c8f1e, 8208)          = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigprocmask(SIG_BLOCK, [], NULL, 8)  = 0
poll([{fd=17, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=119, tv_usec=990000}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=20000}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [], NULL, 8) = 0
rt_sigreturn({mask=[]})                 = 0
read(17, 0x7fc9362c8f1e, 8208)          = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigprocmask(SIG_BLOCK, [], NULL, 8)  = 0
poll([{fd=17, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [ALRM], NULL, 8) = 0
setitimer(ITIMER_REAL, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=120, tv_usec=0}}, NULL) = 0
rt_sigprocmask(SIG_UNBLOCK, [ALRM], NULL, 8) = 0
...
...
...

Once the time out has occurred, the connection will be kicked out from the database freeing up resources.

SQL> @session_info.sql

no rows selected

SQL>

Hope this post helps understanding Dead Connection Detection.

Queries used:

session_info.sql

col Username for a10
col Machine for a20
col Module for a15
col "OS User" for a15
SELECT
    sid,
    serial#           serial,
    username          "Username",
    status,
    machine           "Machine",
    osuser            "OS User",
    lower(status)     "Status",
    module            "Module",
    logon_time,
    sysdate
FROM
    v$session
WHERE
    username = 'TEST'
ORDER BY 1, 2, 3;

find_spid.sql

select SPID from v$process where ADDR in
(select PADDR from v$session where username='TEST');

References:

Parameters for the sqlnet.ora File (oracle.com)

Oracle Net 12c: Changes to the Functionality of Dead Connection Detection (DCD) [1591874.1]

How to Check if Dead Connection Detection (DCD) is Enabled in 9i ,10g and 11g [395505.1]

How to keep TCP sessions alive for extended periods of time? - Red Hat Customer Portal

When using strace, what does the poll() system call do on a socket's file descriptor? - Red Hat Customer Portal

Happy controlling...!!!

Wednesday, August 25, 2021

Database connections and controlling them - DCD

No comments:

Post a Comment

Oracle Certified Specialist

Microsoft Certified: Azure Fundamentals

Oracle Certified Associate

Total Pageviews