Recently, I had the task of refreshing our UAT environment using a PDB from the production database hosted in Oracle Cloud Infrastructure (OCI).
At first, the activity looked straightforward. The plan was to perform a remote PDB clone from Production to UAT. However, there was one challenge from the beginning: the source and target databases were located in different OCI VCNs.
After establishing connectivity between the environments, I started the cloning process. The clone started successfully, but after approximately one hour, it appeared to stall, and no significant progress was observed.
After some investigation and testing, I identified two important factors that affected the cloning process:
The target DBCS was running with only 1 OCPU.
The production database had an hourly archive log backup and a delete job running through Commvault.
In this blog, I will share the architecture, troubleshooting process, and lessons learned from this refresh activity.
Source Environment
Production Oracle Database Cloud Service (DBCS)
Production DBCS is hosted in the Production VCN
Target Environment
UAT Oracle Database Cloud Service (DBCS)
Target DBCS is hosted in a separate Non-Production VCN
Refresh Method
Remote PDB Clone using Database Link
The first challenge was that the Production and UAT databases were deployed in different OCI VCNs.
Since remote PDB cloning requires communication between the source and target databases, I first needed to establish network connectivity.
To achieve this, I configured Local Peering Gateways (LPGs) between the two VCNs.
The implementation included:
Log in to your tenancy in OCI, then go to the navigation menu → Networking → Virtual Cloud Network.
Select the correct compartment and click on the 'Non-Prod VCN'.
Go to the Gateways → click 'Create Local Peering Gateway'.
Select the correct compartment and click 'Create'.
Now, our LPG has been created in the 'Non-Prod VCN', and its peering status is 'New - Not connected to a peer'.
Go to Navigation Menu → Networking → Virtual Cloud Network. Select the 'Prod VCN', then go to 'Gateways' and click 'Create Local Peering Gateway'. Repeat the same steps as in the Non-Prod VCN.
Now, our LPG has been created in Prod VCN, and its Peering status is 'New-Not connected to a PEER'.
In prod, LPG click 'Establish Peering Connection'.
Then select the 'Non-Prod VCN' and configure the Non-Prod LPG as an 'Unpeered Peer Gateway'.
Now, the Peering status of both LPGs will change to Peered-Connected to a peer.
Go to Navigation Menu → Networking → Virtual Cloud Network. Select the 'Prod VCN', then go to the Routing tab.
Go to the Route Rules tab, then click the 'Add Route Rules' button.
Enter the CIDR block for the Non-Prod VCN and the name of the Non-Prod LPG, and click 'Add Route Rules'.
Do the same steps as in 2.3.1 on the Non-Prod VCN. However, this time, enter the CIDR block for the Prod VCN and the name of the Prod LPG, then click 'Add Route Rules'.
Establishing a Local Peering Gateway (LPG) between the production and non-production VCNs creates the network path, but traffic will still be blocked by default at the database layer. To allow the UAT database to initiate the clone, I needed to add the following rule to my Prod VCN.
Once the network connectivity was in place, I created a database link from the target CDB to the production database.
|
|
To create the DB link, I copied the Long CDB connection string from OCI.
|
|
Run this on Destination (UAT) CDB$ROOT.
select name from v$database@clone_link; |
After testing the database link and confirming successful connectivity, the environment was ready for the refresh operation.
The UAT DBCS was configured with only 1 OCPU because it was primarily used for testing.
I started the remote PDB clone operation. Initially, everything looked normal. The clone started successfully, and data transfer began.
However, after approximately one hour, the process appeared to stop making noticeable progress. The session remained active, but the clone was taking much longer than expected.
At this stage, I started investigating possible bottlenecks.
The command used for the clone operation was:
-- Run this on Destination (UAT) CDB$ROOT[oracle@uat script]$ cat clone_pdb.sql-- clone_pdb.sqlSET ECHO ONSET TIME ONSET TIMING ONSET PAGESIZE 0SET LINESIZE 200PROMPT *** Starting PDB Clone at:SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD HH24:MI:SS') FROM DUAL;CREATE PLUGGABLE DATABASE UATPDB FROM PRODPDB@clone_link REFRESH MODE NONE PARALLEL 2 KEYSTORE IDENTIFIED BY "PasswordForWallet"; PROMPT *** Opening PDB PRODPDB...ALTER PLUGGABLE DATABASE UATPDB OPEN;PROMPT *** Saving PDB State...ALTER PLUGGABLE DATABASE UATPDB SAVE STATE;PROMPT *** Clone Completed at:SELECT TO_CHAR(SYSDATE, 'YYYY-MM-DD HH24:MI:SS') FROM DUAL;EXIT; |
To ensure the clone process continued even after disconnecting from the session, I executed the script in the background using nohup.
[oracle@uat ~]$ nohup sqlplus / as sysdba @/home/oracle/script/clone_pdb.sql > /home/oracle/script/pdb_clone_full.log 2>&1 & |
Explanation of Key Parameters in the script:
After creation, the script opens the PDB, saves its state, and logs start/end timestamps for tracking execution time.
One of the first things I reviewed was the compute configuration of the target database.
The UAT database was running on:
Shape: VM.Standard3.Flex
OCPUs: 1
Memory: 16 GB
Network Bandwidth: 1 Gbps
The OCI console clearly showed that the VM was limited to approximately 1 Gbps network bandwidth.
The OCI console clearly showed that the VM was limited to approximately 1 Gbps network bandwidth.
Since a remote PDB clone involves transferring database blocks over the network and writing them to storage on the target system, this immediately became a potential bottleneck.
To test this theory, I temporarily increased the OCPUs from 1 to 8. Note that changing the shape (increasing OCPUs) will cause the DBCS to reboot.
To do this, I went to the navigation menu → Oracle AI Database → Oracle Base Database Service.
In the DB System page, I selected the UAT database. Then I went to the Nodes tab, clicked the Actions button, and clicked Change Shape.
Then click ... in Configure OCPU and click Update OCPU Count.
I changed the OCPU number to 8 and clicked Update.
As the OCPUs increased, the shape resources scaled accordingly.
The clone operation was significantly faster, and the overall database performance improved. This was a good reminder that in OCI, increasing OCPUs affects more than CPU capacity. It also increases available network bandwidth and storage performance, which can have a direct impact on large database operations.
While reviewing the production environment, I identified a potential risk related to archive log handling.
The production database was protected by Commvault with an hourly archive log backup job that also deletes archive logs after backup, similar to BACKUP ARCHIVELOG ALL DELETE INPUT;
Oracle requires an active archive log stream to finalize a remote PDB clone. If Commvault truncates these logs during the process, the clone may hang or fail with ORA errors because the required SCN sequence is no longer available for synchronization.
Since the clone runs for an extended period and includes a final sync phase, this behavior was identified as a potential risk. Increasing the CPU helped confirm this, as the clone progressed further and did not appear stuck.
To eliminate this risk, I temporarily stopped Commvault on production using:
|
commvault stop |
I did the second clone attempt after the following changes in comparison with the first attempt:
Scaling the UAT DBCS from 1 OCPU to 8 OCPUs
Temporarily disabling the archive log backup-and-delete job
Changing parallel to 8 in the clone script
I executed the clone operation again. This time, the refresh completed successfully without any issues. The overall performance was significantly better compared to the original attempt.
After the refresh was completed successfully, the additional resources were no longer required. To avoid unnecessary costs, I scaled the UAT DBCS back from 8 OCPUs to 1 OCPU.
Although this started as a routine PDB refresh, it became a valuable troubleshooting exercise involving OCI networking, infrastructure sizing, and operational processes.
By configuring Local Peering Gateways, scaling the database from 1 to 8 OCPUs, and reviewing archive log management, the refresh was successfully completed.
This experience highlighted that PDB clone performance issues are not always database-related—network, compute, storage, and operational factors can all have an impact.
For more information, check out our OCI Services, or contact us today, and one of our experts will be in touch.