When an Oracle PDB hot clone doesn’t work

After nearly a year, a scheduled scripted PDB hot clone worked flawlessly, and ‘suddenly’ it failed with this error in the target cdb alert log.

Some explanation:
CDB1900A/PROD is the source cdb/pdb on machine PRODUCTION
CDB1900B/CLONER is the target cdb/pdb on machine CLONE

2021-03-08 15:50:49.421000 +01:00
CREATE PLUGGABLE DATABASE CLONER FROM PROD@clone_pdb file_name_convert=('/ora1/oradata/cdb1900a/prod','/ora1/oradata/cdb1900b/cloner')
2021-03-08 16:14:53.480000 +01:00
Warning: VKTM detected a forward time drift.
Please see the VKTM trace file for more details:
2021-03-08 16:38:16.590000 +01:00
Endian type of dictionary set to little
2021-03-08 16:38:19.135000 +01:00
Pluggable Database CLONER with pdb id - 3 is created as UNUSABLE.
If any errors are encountered before the pdb is marked as NEW,
then the pdb must be dropped
local undo-1, localundoscn-0x0000000000000115
2021-03-08 16:38:20.262000 +01:00
Applying media recovery for pdb-4099 from SCN 501140373 to SCN 501213500
Remote log information: count-1
Media Recovery Start
Serial Media Recovery started
max_pdb is 4
Refresh recovery failed to find archive log for thr-1, scn-501140373
Media Recovery failed with error 65345
ORA-283 signalled during: CREATE PLUGGABLE DATABASE CLONER FROM PROD@clone_pdb file_name_convert=('/ora1/oradata/cdb1900a/prod','/ora1/oradata/cdb1900b/cloner')...

Asking the source PDB prod where the SCN should be found:

set pages 100 lines 100
col name for a70
col first_change# for 9999999999
col next_change# for 9999999999
alter session set nls_date_format='DD-MON-RRRR HH24:MI:SS';
select name, sequence#, status, FIRST_CHANGE#,NEXT_CHANGE# from v$archived_log where 501140373 between FIRST_CHANGE# and NEXT_CHANGE#;

---------------------------------------------------------------------- ---------- - -------------
/arch/cdb1900a/cdb1900a_0001_36136_1036597637.arc 36136 A 501140362

So when I ask the source PDB ppvs where the mentioned missing SCN 501140373 should be held in which archive log file and sequence 36136, it is there in a archive log:


But as you can see above, it searches in a parlog file, which begins with a higher sequence: 36143:


So there is an archive log with the right sequence to ‘get’ the right SCN 501140373, but it looks in a parlog file which is beginning 7 sequences higher.

But is this all a symptom or is it the cause? After logging an SR with Oracle, I never got the feeling that this is the cause, but the symptom of something else. To make a long story short, the cause is in the source database. But not the database itself, but it’s use, it’s heavy use as a matter of fact.

I found out that at the time the clone was scheduled, in the source database also a heavy and important scheduler job was scheduled. At the exact same time. This caused quite a load on the server and the source database itself, noticably with high CPU load.

After rescheduling the clone at a different day and time, not bothered with heavy scheduler jobs, it worked perfectly without the errors.

Now my take is, that if you take a pdb clone from a source database and the source database is to busy, the SCN’s get out of sync during the clone, resulting in a failed clone.

Now didn’t I begin this story with that everything worked fine for almost a year…?

So what changed? Now I really can’t tell, but these are the options:

  • The underlying CPU for the VM of the server of the source database changed
  • It has something to do with patching to 19.10 (happened just before the clone failed)
  • The heavy scheduler job was getting heavier all of a sudden.

Gerelateerd bericht

Geef een reactie

%d bloggers liken dit: