Oracle Database Appliance Upgrade 19.20 to 19.24: Common Pitfalls and Solutions

Rajesh Madhavarao Nov 24, 2025 3:07:07 PM

Introduction: A Major OS Migration (OL7 to OL8)

Upgrading an Oracle Database Appliance (ODA) Bare Metal system to version 19.24 is a significant process that goes far beyond simple patching. It involves a massive operating system migration, specifically transitioning the appliance from Oracle Linux 7 (OL7) to Oracle Linux 8 (OL8). This mandates the Data Preserving Reprovisioning (DPR) process, which essentially rebuilds the server's OS while keeping the Grid Infrastructure (GI) and user data intact on the ASM disk groups.

During a recent production upgrade, we encountered a critical hard stop failure right at the beginning of the post-reprovisioning configuration, specifically during the node restore phase. The root cause was not a bug in the new software, but a subtle "configuration drift" involving DNS, magnified by separated team responsibilities.

 

Phase 1: The Initial Crisis — Diagnosing DNS Metadata Drift

The failure occurred immediately after the OS re-image when attempting to restore the system configuration using odacli restore-node -g.

Symptom Reported: "Network Plumbing Error"

  • The restoration job stalled or failed while configuring network interfaces.

  • Upon manual investigation, commands like nslookup hung indicate a complete loss of name resolution. This confirmed the DNS service was unreachable or invalid.

The Root Cause: Configuration Drift via Team Silos

In ODA, system configuration is stored in two places. The DPR process relies entirely on the ODA Metadata to restore the system. This is where the issue lay: a mismatch between what the ODA software expected and what the network actually required.

Component File/Command Status Before Fix
ODA Metadata (Stale) cat /opt/oracle/oak/restore/metadata/provisionInstance.json "dnsServers" : [ "10.5.1.6", "10.5.1.7" ] (OLD/Invalid IPs)
OS Backup (Correct) cat /opt/oracle/oak/restore/bkp/sysfiles/etc/resolv.conf nameserver 10.15.0.1, nameserver 10.15.0.2 (NEW/Valid IPs)

 

The Human Factor (Siloed Teams):

  • The DNS/Network Team retired the old DNS servers (10.5.x.x).

  • The System Administration Team manually updated the OS file (/etc/resolv.conf) to restore immediate connectivity.

The critical mistake was bypassing the ODA CLI. The odacli command is mandatory to synchronize the DNS change with the internal DCS Metadata. Because this was skipped, the ODA attempted to restore the new OL8 OS with the stale, decommissioned DNS IPs, causing the network plumbing failure.

Never manually edit network configuration files on an ODA.

To prevent this drift, always use the official procedure to update network settings. This ensures the DCS Metadata is updated simultaneously.

 


Phase 2: DNS Fix and Successful Grid Infrastructure Restore

The initial GI restore failure was immediately resolved by manually correcting the DNS entries within the ODA's internal metadata. The logs below confirm the successful execution of the core DPR steps after the underlying network issue was fixed.

1. Cleaning the Failed Attempt and Critical Warning

We first ran the cleanup script to revert the failed state and prepare the node for a fresh GI restore.

[root@test02 ~]# /opt/oracle/oak/onecmd/cleanup.pl
INFO: Log file is /opt/oracle/oak/log/test02/cleanup/cleanup_2025-11-19_15-41-58.log
...
INFO: Cleanup was successful
WARNING: After system reboot, please re-run "odacli update-repository" for GI/DB clones,
WARNING: before running "odacli restore-node -g".

 

2. Updating the Repository (The Bridge)

The new OL8 base system temporarily loses the required GI and DB software image registrations. Re-running odacli update-repository is a mandatory "bridge" step to relink the new OS to the pre-existing database software clones.

/opt/oracle/dcs/bin/odacli update-repository -f /cohesity_nfs01/upgrade19.24/oda-sm-19.24.0.0.0-240802-server.zip

/opt/oracle/dcs/bin/odacli update-repository -f /cohesity_nfs01/upgrade19.24/odacli-dcs-19.24.0.0.0-240724-GI-19.24.0.0.zip

/opt/oracle/dcs/bin/odacli update-repository -f /cohesity_nfs01/upgrade19.24/odacli-dcs-19.24.0.0.0-240724-DB-19.24.0.0.zip

/opt/oracle/dcs/bin/odacli update-repository -f /cohesity_nfs01/test02_bkp_20251104/serverarchive_test02/serverarchive_test02.zip

 

3. The Emergency Fix: Manually Updating provisionInstance.json

Since the system was down and time was of the essence, the fastest way to get provisioning moving again was to bypass the outdated DCS Metadata and inject the correct DNS information directly into the configuration file used for provisioning.

The file that needed to be updated was located at:

/opt/oracle/oak/restore/metadata/provisionInstance.json
{
  "instance" : {
    // ... other parameters
    "ntpServers" : [ "10.5.1.9" ],
    "dnsServers" : [ "10.15.0.1", "10.15.0.2" ],  <-- THE CRITICAL FIX
    "domainName" : "bridge.net",
    // ... other parameters
  }
}

 

4. Successful Grid Infrastructure Restore (odacli restore-node -g)

With the DNS metadata corrected, the GI restore executed successfully, which includes restoring the necessary OS users, groups, and the Clusterware stack.

[root@oak ~]# odacli restore-node -g

...

[root@test02 ~]# odacli describe-job -i 882fecf3-701a-4763-b60f-4a29c8757e86
Job details
----------------------------------------------------------------
                     ID:  882fecf3-701a-4763-b60f-4a29c8757e86
            Description:  Restore node service - GI
                 Status:  Success
...
Task Name                                Start Time                               End Time                                 Status
---------------------------------------- ---------------------------------------- ---------------------------------------- ----------------
...
Restart network interface pubnet         November 19, 2025 11:50:43 PM AST        November 19, 2025 11:50:49 PM AST        Success
...
Extract GI clone                         November 19, 2025 11:55:19 PM AST        November 19, 2025 11:56:30 PM AST        Success
Grid stack creation                      November 19, 2025 11:56:42 PM AST        November 20, 2025 12:08:06 AM AST        Success
...
 

Phase 3:  Database Restore

After the Grid Infrastructure restore (-g) is complete, the final step is the database restore (-d). 

 

1. Successful Database Restore (odacli restore-node -d)

With both the network and repository links validated, the final database restore job completed flawlessly.

[root@test02 ~]# odacli restore-node -d
...
[root@test02 ~]# odacli describe-job -i cfaa8eb9-2666-42c5-bef9-b9bb6e3dbd10
Job details
----------------------------------------------------------------
...
            Description:  Restore node service - DB
                 Status:  Success
...
Restore database: TEST                 November 20, 2025 1:10:10 PM AST         November 20, 2025 1:13:42 PM AST         Success
+-- Run SqlPatch                         November 20, 2025 1:11:35 PM AST         November 20, 2025 1:11:42 PM AST         Success
...
 
Step Action Why it's Critical
1. Prevention Use odacli update-netinterface for all network changes. Prevents DNS metadata drift and the initial restore-node -g failure.
2. Bridge Run odacli update-repository for all clones. Critical: Re-establishes the link between the new OL8 OS and the database software images, as warned by cleanup.pl.
3. GI Restore Run odacli restore-node -g. Restores the OS (now OL8) and Grid Infrastructure configuration.
4. DB Restore Run odacli restore-node -d. Completes the DPR process by restoring the databases.

 

 
 

Conclusion: Manual Fix for Immediate Crisis, Official Procedure for Long-Term Health

The ODA's reliance on DCS Metadata to generate configuration files like provisionInstance.json It can become a single point of failure when that metadata is out of sync with the actual operating system network settings.

When faced with a "System Unavailable" crisis during an ODA upgrade due to DNS resolution failure, a direct manual intervention—updating /opt/oracle/oak/restore/metadata/provisionInstance.json—proved to be the necessary emergency measure to immediately restore provisioning capability.

While a manual edit can save the day, it only addresses the symptom in the provisioning file, not the root cause within the DCS Metadata. Moving forward, the critical lesson is to ensure all DNS changes are performed using Oracle's recommended procedure to update the underlying metadata correctly. This provides system consistency, prevents configuration drift, and keeps future patching and provisioning operations running smoothly, making ODA management predictable and stable.