Looking for something? Try here..

Wednesday, August 2, 2023

Linux Huge Pages and AUTO_ONLY option of Oracle 19c

HugePages is a feature integrated into the Linux kernel. A page is the basic unit of virtual memory, with the default page size being 4096 Bytes in the x86 architecture. Linux uses a mechanism in the CPU architecture called "Translation Lookaside Buffers" (TLB) to manage the mapping of virtual memory pages to actual physical memory addresses. So, when each time memory is accessed using the default page size, there is additional overhead as many TLB resources are consumed.


With the implementation of Huge Pages, we are grouping these small pages into a bigger chunk with consuming a single resource in TLB. Huge pages are of size 2MB (default) or 1GB (called as gigantic pages and can be configured). 

For Oracle Database, using HugePages reduces the operating system maintenance of page states and increases Translation Lookaside Buffer (TLB) hit ratio. 

Configuring Huge pages: 

There are a few conditions to be met before configuring huge pages. 

1. Linux kernel should support huge pages. 
2. Edit the memlock settings in /etc/security/limits.conf file. It is advisable to set the maximum locked memory limit to at least 90 percent of the current RAM when you enable HugePages memory. 
Make sure that you are running applications that requires Hugepages (if configured), as the locked memory can only be utilized by the application that requests them and they are not available for normal memory allocations. One advantage of using Hugepages is that the memory is pinned and will not be swapped out of memory providing better performance.

In my lab system, I have the memlock limits for oracle user set as below. Value is set in KB
oracle   hard   memlock    134217728
oracle   soft   memlock    134217728
The current settings can be verified by ulimit -l 
[oracle@linux-8 ~]$ ulimit -l
134217728
[oracle@linux-8 ~]$
3. Calculate the number of huge pages required. 
For Oracle database requirements, we can utilize the script provided by Oracle support via support note 401749.1
Script usage is as below. Make sure all the DBs are up and running before initiating the script as the calculation is done for all shared memory segments available when the script is run, no matter it is an Oracle RDBMS shared memory segment or not. 
SQL> sho parameter sga_max_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
sga_max_size                         big integer 4G
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0
[oracle@linux-8 ~]$ ls -lrt huge*
-rwxr-xr-x. 1 oracle oinstall 3485 Aug  1 10:34 hugepages_settings.sh
[oracle@linux-8 ~]$ ./hugepages_settings.sh

This script is provided by Doc ID 401749.1 from My Oracle Support
(http://support.oracle.com) where it is intended to compute values for
the recommended HugePages/HugeTLB configuration for the current shared
memory segments on Oracle Linux. Before proceeding with the execution please note following:
 * For ASM instance, it needs to configure ASMM instead of AMM.
 * The 'pga_aggregate_target' is outside the SGA and
   you should accommodate this while calculating the overall size.
 * In case you changes the DB SGA size,
   as the new SGA will not fit in the previous HugePages configuration,
   it had better disable the whole HugePages,
   start the DB with new SGA size and run the script again.
And make sure that:
 * Oracle Database instance(s) are up and running
 * Oracle Database 11g Automatic Memory Management (AMM) is not setup
   (See Doc ID 749851.1)
 * The shared memory segments can be listed by command:
     # ipcs -m


Press Enter to proceed...

Recommended setting: vm.nr_hugepages = 2050
[oracle@linux-8 ~]$
[oracle@linux-8 ~]$
We can see the Recommended setting is to set vm.nr_hugepages = 2050 pages. This can be manually calculated by us if we know what exact memory we need for the oracle database SGA and other dependent applcations. 
In our case, I run only the Oracle database with sga_max_size of 4 GB. Default hugepage size is 2MB. So, to have 4GB memory of 2 MB page size we would need 4GB / 2MB = 4096 MB / 2 MB = 2048 pages + Minimum pages (here 1 page) + 1 = 2050 pages. 

If you have multiple DBs running in the system, we need to calculate the cumulative amount for all the DBs that needs to reserve hugepages. 

What if I don't set the memlock parameter in limits.conf file and try to start up the database? 

It's interesting we sometimes set the hugepages to the correct value as suggested by the Oracle provided script but miss to set the limits.conf file to mention which user can lock how much memory. In such cases, Oracle database would fail to start with the below error indicated in the alert log. 
**********************************************************************
2023-07-31T09:43:20.619135+05:30
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

2023-07-31T09:43:20.619242+05:30
 Per process system memlock (soft) limit = 64K
2023-07-31T09:43:20.619279+05:30
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 4100M
2023-07-31T09:43:20.619346+05:30
 Available system pagesizes:
  4K, 2048K
2023-07-31T09:43:20.619412+05:30
 Supported system pagesize(s):
2023-07-31T09:43:20.619447+05:30
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
2023-07-31T09:43:20.619510+05:30
     2048K                5            2050               0   ORA-27125
2023-07-31T09:43:20.619545+05:30
 Reason for not supporting certain system pagesizes:
2023-07-31T09:43:20.619603+05:30
  4K - Large pagesizes only
2023-07-31T09:43:20.619652+05:30
RECOMMENDATION:
2023-07-31T09:43:20.619688+05:30
 1. Configure system with expected number of pages for every
 supported system pagesize prior to the next instance restart operation.
2023-07-31T09:43:20.622629+05:30
 2. Increase per process memlock (soft) limit to at least 4100MB
 to lock 100% of SHARED GLOBAL AREA (SGA) pages into physical memory
2023-07-31T09:43:20.622708+05:30
**********************************************************************
2023-07-31T09:43:20.622896+05:30
Oradism deallocated 5 large pages of size 2097152 bytes.
2023-07-31T09:43:20.622933+05:30
SGA: Realm creation failed
2023-07-31T10:04:04.780997+05:30
Starting ORACLE instance (normal) (OS id: 38918)
2023-07-31T10:04:04.784579+05:30
************************************************************
Instance SGA_TARGET = 4096 MB and SGA_MAX_SIZE = 4096 MB
************************************************************
2023-07-31T10:04:04.784879+05:30
****************************************************
 Sys-V shared memory will be used for creating SGA
 ****************************************************
2023-07-31T10:04:04.787059+05:30
DISM started, OS id=39038
2023-07-31T10:04:08.928252+05:30
ERROR: Failed to get available system pages to allocate memory
2023-07-31T10:04:08.929435+05:30
**********************************************************************
..
..

Init parameter use_large_pages = ONLY

With Oracle database version 11.2.0.3 and above we have database parameter use_large_pages = ONLY (Default parameter value is TRUE) which would make the database start up to fail if the expected number of huge pages are not available in the system. 
SQL> sho parameter use_large_pages

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
use_large_pages                      string      TRUE
SQL> alter system set use_large_pages=ONLY scope=spfile;

System altered.

SQL>exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0
[oracle@linux-8 ~]$ grep Huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
[oracle@linux-8 ~]$ 
We can see Hugepages is not configured (HugePages_Total : 0). Now, let me configure hugepages less than 2050 pages and try to startup the database. 
[oracle@linux-8 ~]$ vi /etc/sysctl.conf
[oracle@linux-8 ~]$ sudo vi /etc/sysctl.conf
[sudo] password for oracle:
[oracle@linux-8 ~]$
[oracle@linux-8 ~]$ cat /etc/sysctl.conf |grep huge
vm.nr_hugepages = 2000
[oracle@linux-8 ~]$ sudo sysctl -p
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 4398046511104
kernel.panic_on_oops = 1
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
vm.nr_hugepages = 2000
[oracle@linux-8 ~]$ grep Huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:    2000
HugePages_Free:     2000
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         4096000 kB
[oracle@linux-8 ~]$
[oracle@linux-8 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Aug 1 11:22:22 2023
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0

SQL> startup force
ORA-27106: system pages not available to allocate memory
Additional information: 6183
Additional information: 1
SQL>
Alert log will register the below  
Starting ORACLE instance (normal) (OS id: 70058)
2023-08-01T11:22:35.736716+05:30
************************************************************
Instance SGA_TARGET = 4096 MB and SGA_MAX_SIZE = 4096 MB
************************************************************
2023-08-01T11:22:35.736896+05:30
****************************************************
 Sys-V shared memory will be used for creating SGA
 ****************************************************
2023-08-01T11:22:35.737250+05:30
ERROR: Failed to get available system pages to allocate memory
2023-08-01T11:22:35.737286+05:30
**********************************************************************
2023-08-01T11:22:35.737319+05:30
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)

2023-08-01T11:22:35.737376+05:30
 Per process system memlock (soft) limit = 128G
2023-08-01T11:22:35.737410+05:30
 Expected per process system memlock (soft) limit to lock
 instance MAX SHARED GLOBAL AREA (SGA) into memory: 4100M
2023-08-01T11:22:35.737473+05:30
 Available system pagesizes:
  4K, 2048K
2023-08-01T11:22:35.737535+05:30
 Supported system pagesize(s):
2023-08-01T11:22:35.737572+05:30
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
2023-08-01T11:22:35.737630+05:30
     2048K             2000            2050            1997   ORA-27102
2023-08-01T11:22:35.737663+05:30
 Reason for not supporting certain system pagesizes:
2023-08-01T11:22:35.737696+05:30
  4K - Large pagesizes only
2023-08-01T11:22:35.737728+05:30
RECOMMENDATION:
2023-08-01T11:22:35.737760+05:30
 1. Configure system with expected number of pages for every
 supported system pagesize prior to the next instance restart operation.
2023-08-01T11:22:35.741864+05:30
**********************************************************************
2023-08-01T11:22:35.741932+05:30
SGA: Realm creation failed
..
If I increase the huge pages to 2050 and try to start the database, the database will start allocating the expected hugepages. 

Init parameter use_large_pages = AUTO_ONLY

Starting Oracle database version 19c, we have a new option for use_large_pages = AUTO_ONLY which is the default value for 19c databases running on Exadata systems and can be configured for on premise databases as well. 

By setting the value to AUTO_ONLY, Oracle will request Linux system automatically on how many numbers of pages are required to start the database and if the server can provision the pages, the database will be started. If the request cannot be provisioned, database will fail to start. 

For demo purpose, I'll reset the hugepages to 0 and set the value use_large_pages = AUTO_ONLY using the pfile and try to start the database. 
[oracle@linux-8 ~]$ sudo vi /etc/sysctl.conf
[sudo] password for oracle:
[oracle@linux-8 ~]$
[oracle@linux-8 ~]$ sudo sysctl -p
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 4398046511104
kernel.panic_on_oops = 1
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
vm.nr_hugepages = 0
[oracle@linux-8 ~]$ grep Huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
[oracle@linux-8 ~]$
[oracle@linux-8 ~]$ vi /oracle/db/db19/dbs/initcdb19.ora
[oracle@linux-8 ~]$ grep large_pages /oracle/db/db19/dbs/initcdb19.ora
*.use_large_pages='AUTO_ONLY'
[oracle@linux-8 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Aug 1 11:37:59 2023
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup pfile=/oracle/db/db19/dbs/initcdb19.ora
ORACLE instance started.

Total System Global Area 4294963992 bytes
Fixed Size                  9143064 bytes
Variable Size            1056964608 bytes
Database Buffers         3221225472 bytes
Redo Buffers                7630848 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0
[oracle@linux-8 ~]$
[oracle@linux-8 ~]$ grep Huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:    2050
HugePages_Free:        3
HugePages_Rsvd:        3
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         4198400 kB
[oracle@linux-8 ~]$
You can see Oracle dynamically requested Linux to allocate 2050 pages and once the request is provisioned the database is started without us needing to configure the number of huge pages required. 

Now if I try to start another instance with 100 MB sga_max_size, if the system can provision the requested pages the database will start.  
[oracle@linux-8 ~]$ vi /oracle/db/db19/dbs/initlpdemo.ora
[oracle@linux-8 ~]$ cat /oracle/db/db19/dbs/initlpdemo.ora | grep 'sga_max\|use_large'
*.sga_max_size=100m
*.use_large_pages='AUTO_ONLY'
[oracle@linux-8 ~]$
[oracle@linux-8 ~]$ . oraenv
ORACLE_SID = [oracle] ? lpdemo
ORACLE_HOME = [/home/oracle] ? /oracle/db/db19
The Oracle base has been set to /oracle/db
[oracle@linux-8 ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Tue Aug 1 11:52:02 2023
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile=/oracle/db/db19/dbs/initlpdemo.ora
ORACLE instance started.

Total System Global Area  322960856 bytes
Fixed Size                  8895960 bytes
Variable Size             239075328 bytes
Database Buffers           67108864 bytes
Redo Buffers                7880704 bytes
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0
[oracle@linux-8 ~]$ 
[oracle@linux-8 ~]$ grep Huge /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:    2206
HugePages_Free:        6
HugePages_Rsvd:        6
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         4517888 kB
[oracle@linux-8 ~]$
It is not always guaranteed that the requested hugepages will be provisioned by Linux. If the system cannot allocate the required hugepages, database instance creation will fail registering insufficient number of hugepages in the alert log. 
2023-08-01T11:59:33.682674+05:30
 Available system pagesizes:
  4K, 2048K
2023-08-01T11:59:33.682755+05:30
 Supported system pagesize(s):
2023-08-01T11:59:33.682796+05:30
  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)
2023-08-01T11:59:33.682866+05:30
     2048K              151             156             151   ORA-27102
2023-08-01T11:59:33.682899+05:30
 Reason for not supporting certain system pagesizes:
2023-08-01T11:59:33.682932+05:30
  4K - Large pagesizes only
2023-08-01T11:59:33.682964+05:30
RECOMMENDATION:
2023-08-01T11:59:33.683022+05:30
 1. Configure system with expected number of pages for every
 supported system pagesize prior to the next instance restart operation.
2023-08-01T11:59:33.697313+05:30
**********************************************************************
2023-08-01T11:59:33.697997+05:30
Oradism deallocated 151 large pages of size 2097152 bytes.
2023-08-01T11:59:33.698067+05:30
SGA: Realm creation failed

Advantages and implementations

Now as we understand how use_large_pages = AUTO_ONLY works, we need to understand where this setting can be implemented and what are the advantages

1. This setting doesn't require the prior calculation of no. of hugepages to be made with the scripts. 
2. When we add additional instances and if we intend to run those instances using hugepages, no changes necessary to system parameters are needed as the calculation and allocation of huge pages is dynamic. 
3. Can be used in development or test systems where instances are added and removed frequently. 
4. Dedicated single production database (though not recommended as a change in sga_max_size will have impact on how many pages are used and locked which might cause memory starvation if over provisioned by mistake). 

Quick note: Though the allocation of hugepages is dynamic, the pages are not released after the database is shut down and is shown under Hugepages_Free. This has an impact as this memory can't be used for normal memory allocation and is a waste of resource until next reboot. 

References:

Oracle Linux: Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration (Doc ID 401749.1)


Happy Huge Paging...!!!