Performing an OS upgrade on an ExaCS environment is a critical maintenance task. Recently, we performed a DomU patch on our Exadata X9M-2 cluster to migrate from Oracle Linux 7.9 to Oracle Linux 8.10. While the process is generally automated, specific environmental variables can introduce challenges.
This post walks through the failure we encountered, root cause analysis, and how we manually recovered the patching process.
Cluster Details
| Component |
Current Version |
Target Version |
| Cluster Image |
22.1.25.0.0.240710 |
24.1.16.0.0.250905 |
| OS |
OL 7.9 |
OL 8.10 |
| Infra Version |
25.1.4.0.0.250612 |
25.1.4.0.0.250612 |
| Storage Version |
25.1.4.0.0.250612 |
25.1.4.0.0.250612 |
1. Pre-Check Failures — Custom Packages
The initial patchmgr pre-check failed due to custom RPMs that were manually installed.
The system generated a cleanup script:
/var/log/cellos/remove_unknown_packages.201125214544.sh
The script contained multiple lines such as:
rpm -e --nodeps <package-name>
We reviewed the list carefully, removed all custom packages, and then the pre-check passed successfully.
2. Patching Started — Node 1 Upgraded, but CRS Failed to Start
During patching, patchmgr started upgrading Node 1 from Node 2 (as expected).
Node 1 successfully upgraded to Oracle Linux 8.10, but the process failed before CRS could start.
To analyze the failure, we reviewed the logs on Node 2:
/u02/dbserver.patch.zip_exadata_ol8_24.1.16.0.0.250905_Linux-x86-64.zip/
dbserver_patch_251020/patchmgr_log*/exadevdb-01.sub*.log
Log Snippet — Insufficient Root Filesystem Space
[1764094796][2025-11-25 18:20:07 +0000][INFO][./dbnodeupdate.sh][CheckFreeSpace][] v_fs:/,v_free_space:2729,v_fs_size:3200
[1764094796][2025-11-25 18:20:07 +0000][INFO][./dbnodeupdate.sh][DiaryEntry][] Entering PrintGenError Insufficient free space in file system '/'. The minimum required free space is 3200M. But available free space is 2729M. Cleanup before proceeding the actual upgrade.
[1764094796][2025-11-25 18:20:07 +0000][ERROR][./dbnodeupdate.sh][PrintGenError][] Insufficient free space in file system '/'. The minimum required free space is 3200M. But available free space is 2729M. Cleanup before proceeding the actual upgrade.
[1764094796][2025-11-25 18:20:07 +0000][INFO][./dbnodeupdate.sh][DiaryEntry][] Entering UpdateDbnodeupdateStatFile failed;
[1764094796][2025-11-25 18:20:07 +0000][INFO][./dbnodeupdate.sh][UpdateDbnodeupdateStatFile][] /opt/oracle.SupportTools/.tmp.dbnodeupdate.state
[1764094796][2025-11-25 18:20:07 +0000][INFO][./dbnodeupdate.sh][UpdateDbnodeupdateStatFile][] Copying /opt/oracle.SupportTools/.dbnodeupdate.state to /opt/oracle.SupportTools/.tmp.dbnodeupdate.state
--------------------------------------------------------------------------------------------------------------------------------
Insufficient free space in file system '/'.
Required free space: 3200 MB
Available free space: 2729M
Although the pre-check passed, the available space (~2.7 GB) was too close to the minimum and suggesting that it filled up during the upgrade.
We cleaned up the space and restarted the patch from console, but it failed again, which led us to investigate the node state.
3. Checking Node State After Failure
Node 1 — Partially Upgraded
Node 1 was upgraded, but CRS was not enabled.
Check the current image and kernel:
[root@exadevdb-01 dbnu]# imageinfo
Kernel version: 5.4.17-2136.343.5.5.el8uek.x86_64
Uptrack kernel version: 5.4.17-2136.346.6.el8uek.x86_64
Image kernel version: 5.4.17-2136.343.5.5.el8uek
Image version: 24.1.16.0.0.250905
Image activated: 2025-11-25 18:19:53 +0000
Image status: success
Exadata software version: 24.1.16.0.0.250905
Node type: GUEST
System partition: /dev/mapper/VGExaDb-LVDbSys1
[root@exadevdb-01 cellos]# imagehistory
Version : 22.1.25.0.0.240710
Exadata Live Update Version : n/a
Image activation date : 2024-08-22 16:07:13 +0000
Imaging mode : fresh
Imaging status : success
Version : 24.1.16.0.0.250905
Exadata Live Update Version : n/a
Image activation date : 2025-11-25 18:19:53 +0000
Imaging mode : patch
Imaging status : success
Check CRS/HA services:
[root@exadevdb-01 ~]# . oraenv
ORACLE_SID = [root] ? +ASM1
[root@exadevdb-01 ~]# crsctl config has
CRS-4621: Oracle High Availability Services autostart is disabled.
However, the Oracle High Availability Services (OHAS) were disabled and down.
Node 2 - Still on Old Version
Since the upgrade is driven from Node 2 to Node 1, we confirmed that Node 2 was still running the legacy OS and kernel, waiting for the process to complete.
[root@exadevdb-02 ~]# imageinfo
Kernel version: 4.14.35-2047.528.2.4.el7uek.x86_64 #2 SMP Tue Feb 27 20:52:58 PST 2024 x86_64
Uptrack kernel version: 4.14.35-2047.537.4.el7uek.x86_64 #2 SMP Fri May 31 15:52:44 PDT 2024 x86_64
Image kernel version: 4.14.35-2047.528.2.4.el7uek
Image version: 22.1.25.0.0.240710
Image activated: 2024-08-22 16:07:10 +0000
Image status: success
Node type: GUEST
System partition on device: /dev/mapper/VGExaDb-LVDbSys1
4. Resuming Patching Manually
Because Node 1 was already on the correct OS version, but the stack configuration was incomplete, we used the dbnodeupdate.sh utility to manually finish the post-upgrade steps.
Step 1: Prepare the Tooling. On Node 1, we navigated to the patch directory, created a temporary folder, and extracted the dbnodeupdate.zip.
[root@exadevdb-01 ~]# cd /u02/dbserver.patch.zip_exadata_ol8_24.1.16.0.0.250905_Linux-x86-64.zip/dbserver_patch_251020
[root@exadevdb-01 dbserver_patch_251020]# ls -alrt * dbnodeupdate*
-rw-r--r-- 1 root root 8444358 Oct 24 01:13 dbnodeupdate.zip
[root@exadevdb-01 ~]# mkdir dbnu
[root@exadevdb-01 ~]# cp dbnodeupdate.zip dbnu
[root@exadevdb-01 ~]#cd dbnu
[root@exadevdb-01 dbnu]# unzip dbnodeupdate.zip
Archive: dbnodeupdate.zip
inflating: CheckHWnFWProfile
inflating: check_stack.sh
.
.
.
inflating: uek5_upgrade-roce.table
inflating: yq
[root@exadevdb-01 dbnu]#
Step 2: Execute Post-Patch Steps We ran the script with the -c (continue/complete) flag and specified the target version
[root@exadevdb-01 dbnu]# ./dbnodeupdate.sh -c -q -t 24.1.16.0.0.250905
(*) 2025-11-27 18:46:51: Initializing logfile /var/log/cellos/dbnodeupdate.log
##########################################################################################################################
# #
# Guidelines for using dbnodeupdate.sh (rel. 25.251020): #
# #
# - Prerequisites for usage: #
# 1. Refer to dbnodeupdate.sh options. See MOS 1553103.1 #
# 2. Always use the latest release of dbnodeupdate.sh. See patch 21634633 #
# 3. Run the prereq check using the '-v' flag. #
# #
# I.e.: ./dbnodeupdate.sh -u -l /u01/my-iso-repo.zip -v (may see rpm conflicts) #
# #
# - Prerequisite rpm dependency check failures can happen due to customization: #
# - The prereq check detects dependency issues that need to be addressed prior to running a successful update. #
# - Customized rpm packages may fail the built-in dependency check and system updates cannot proceed until resolved. #
# #
# - As part of the update, rpms shipped by Exadata may be removed. #
# #
# - In case of any problem when filing an SR, upload the following: #
# - /var/log/cellos/dbnodeupdate.log #
# - /var/log/cellos/dbnodeupdate.trc #
# - /var/log/cellos/dbnodeupdate.<runid>.diag #
# - where <runid> is the unique number of the failing run. #
# #
# #
##########################################################################################################################
(*) 2025-11-27 18:47:01: Analyzing system configuration.
Active Image version: 24.1.16.0.0.250905
Active Kernel version : 5.4.17-2136.343.5.5.el8uek
Active LVM Name : /dev/mapper/VGExaDb-LVDbSys1
Inactive Image version: 22.1.25.0.0.240710
Inactive LVM Name : /dev/mapper/VGExaDb-LVDbSys2
Current user id : root
Action : finish-post cleanup and enable CRS to auto-start) - running in quiet mode
Shutdown EM agents : Yes
Shutdown stack : No (Currently stack is down)
Logfile : /var/log/cellos/dbnodeupdate.log (runid: 271125184652)
Diagfile : /var/log/cellos/dbnodeupdate.271125184652.diag
Server model : Exadata Virtual Machine
dbnodeupdate.sh rel. : 25.251020 (always check MOS 1553103.1 for the latest release of dbnodeupdate.sh)
Exadata Live Update: No
(*) 2025-11-27 18:48:06: Executing plugin /u02/dbserver.patch.zip_exadata_ol8_24.1.16.0.0.250905_Linux-x86-64.zip/dbserver_patch_251020/dbnu/dbnu-plugin.sh with arguments 271125184652 start-execfinish
(*) 2025-11-27 18:48:08: Running validations. Maximum wait time: 60 minutes.
(*) 2025-11-27 18:48:08: If the node reboots, re-run './dbnodeupdate.sh -c' after the node restarts..
(*) 2025-11-27 18:48:24: EM agent in /u02/app/oracle/product/agent13c/agent_13.5.0.0.0 stopped
(*) 2025-11-27 18:48:25: Service acpid enabled to autostart at boot
(*) 2025-11-27 18:48:26: Not Relinking Oracle homes
(*) 2025-11-27 18:48:26: Executing plugin /u02/dbserver.patch.zip_exadata_ol8_24.1.16.0.0.250905_Linux-x86-64.zip/dbserver_patch_251020/dbnu/dbnu-plugin.sh with arguments 271125184652 before-relink
(*) 2025-11-27 18:48:42: Starting Grid Infrastructure (/u01/app/19.0.0.0/grid)
(*) 2025-11-27 18:51:21: Stack started
(*) 2025-11-27 18:51:26: TFA Started
(*) 2025-11-27 18:51:27: Enabling stack to start at reboot. Disable this when the stack should not start on the next boot
(*) 2025-11-27 18:51:31: Removed obsolete kernel-transition: kernel-transition-3.10.0-0.0.0.2.el7
(*) 2025-11-27 18:51:31: Retained the required kernel-transition package:
(*) 2025-11-27 18:51:31: Disabling diagsnap for Grid Infrastructure versions older than 23c (24900613)
(*) 2025-11-27 18:52:34: All post steps are finished.
Step 3: Verify Success: The script successfully analyzed the configuration, identifying that the active image was 24.1.16.0.0.250905 and the action required was "finish-post cleanup and enable CRS to auto-start".
(*) Starting Grid Infrastructure (/u01/app/19.0.0.0/grid)
(*) Stack started
(*) TFA Started
(*) Enabling the stack to start at reboot.
(*) All post steps are finished.
We verified that the database services were running using ps -ef:
[root@exadevdb-01 dbnu]# ps -ef | grep pmon
grid ... asm_pmon_+ASM1
oracle ... ora_pmon_ERGDEVFS1
5. Completing the Cluster Upgrade
With Node 1 fully operational and the Grid Infrastructure stack running, we returned to the OCI Console. We selected the Retry Apply action for the VM Cluster.
Because Node 1 was now in a healthy, patched state, the automation recognized the completion and proceeded to patch Node 2.
The OCI Work Requests confirmed the successful completion of the "Apply Cloud VM Cluster OS Update" operation shortly thereafter.
Key Takeaways
-
Strict Space Requirements: The dbnodeupdate The process is strict about root filesystem space (3200M minimum). Even if a pre-check passes, temporary files generated during the upgrade can consume the buffer.
-
Log Location: Remember that when a node is being patched, the logs for the operation are typically found on the peer node (the node driving the update).
-
Manual Resume: If an OS update succeeds but the post-patch scripts fail (leaving CRS down), you can often rescue the node using ./dbnodeupdate.sh -c rather than rolling back immediately.
This procedure saved us significant time by allowing us to move forward with the existing OS upgrade rather than attempting a complex rollback.