ORA-01555: snapshot too old along with ORA-00704: bootstrap process failure and ORA-00604: ORA-01555: snapshot too old along with ORA-00704: bootstrap process failure and ORA-00604

Recently, we had a situation where we continuously got ORA-01555 on one of our active data guard.

ORA-00604: error occurred at recursive SQL level 3

ORA-01555: snapshot too old: rollback segment number 1176 with name "_SYSSMU1176_1510088119$" too small

Environment Setup -

2 Node RAC Primary database

2 Node RAC Active Data guard

We tried to check what was causing issue related to undo tablespace. we checked free space %, undo_retention parameter but everything looks fine.

TABLESPACE TOTAL SIZE FREE SPACE % FREE

------------------------------ ---------- ---------- ----------

UNDOTBS01 92160 87723 95

SQL> show parameter undo

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

undo_management string AUTO

undo_retention integer 6500

undo_tablespace string UNDOTBS01

There were no undo blocks used in database at the time of issue.

SQL> SELECT undoblks/((end_time-begin_time)*86400) "Peak Undo Block Generation" FROM v$undostat WHERE undoblks=(SELECT MAX(undoblks) FROM v$undostat);

Peak Undo Block Generation

--------------------------

SQL> SELECT (UR * (UPS * DBS)) AS "Bytes"

FROM (select max(tuned_undoretention) AS UR from v$undostat),

2 3 (SELECT undoblks/((end_time-begin_time)*86400) AS UPS

4 FROM v$undostat

5 WHERE undoblks = (SELECT MAX(undoblks) FROM v$undostat)),

(SELECT block_size AS DBS

6 7 FROM dba_tablespaces

8 WHERE tablespace_name = (SELECT UPPER(value) FROM v$parameter WHERE name = 'undo_tablespace'));

Bytes

----------

SQL> SELECT TO_CHAR(BEGIN_TIME, 'MM/DD/YYYY HH24:MI:SS') BEGIN_TIME,

TO_CHAR(END_TIME, 'MM/DD/YYYY HH24:MI:SS') END_TIME,

UNDOTSN, UNDOBLKS, TXNCOUNT, MAXCONCURRENCY AS "MAXCON",

MAXQUERYLEN, TUNED_UNDORETENTION

FROM v$UNDOSTAT; 2 3 4 5

BEGIN_TIME END_TIME UNDOTSN UNDOBLKS TXNCOUNT MAXCON MAXQUERYLEN TUNED_UNDORETENTION

------------------- ------------------- ---------- ---------- ---------- ---------- ----------- -------------------

07/24/2017 10:39:42 07/31/2017 07:48:39 2147483647 0 0 0 0 6500

SQL> SELECT A.SID, A.USERNAME, B.XIDUSN, B.USED_UREC, B.USED_UBLK

FROM V$SESSION A, V$TRANSACTION B

WHERE A.SADDR=B.SES_ADDR; 2 3

FROM V$SESSION A, V$TRANSACTION B

ERROR at line 2:

ORA-00604: error occurred at recursive SQL level 3

ORA-01555: snapshot too old: rollback segment number 1176 with name "_SYSSMU1176_1510088119$" too small

Even data files were in auto extend

SQL> SELECT FILE_ID, AUTOEXTENSIBLE FROM DBA_DATA_FILES WHERE

TABLESPACE_NAME='&UNDOTBS'; 2

Enter value for undotbs: UNDOTBS01

old 2: TABLESPACE_NAME='&UNDOTBS'

new 2: TABLESPACE_NAME='UNDOTBS01'

FILE_ID AUT

---------- ---

1164 YES

1166 YES

1167 YES

We observed active undo extents but numbers were very small.

SQL> SELECT DISTINCT STATUS, SUM(BYTES), COUNT(*)

FROM DBA_UNDO_EXTENTS GROUP BY STATUS; 2

STATUS SUM(BYTES) COUNT(*)

------------ ---------- ----------

EXPIRED 4843438080 10521

UNEXPIRED 4809162752 5178

ACTIVE 3482451968 215

SQL> select owner, segment_name, tablespace_name, status from dba_rollback_segs where tablespace_name='UNDOTBS01';

select owner, segment_name, tablespace_name, status from dba_rollback_segs where tablespace_name='UNDOTBS01'

ERROR at line 1:

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 1205 with name "_SYSSMU1205_975967709$" too small

Below are the steps we carried out to resolve the issue.

We had 2 node RAC active data guard setup. We decided to take a chance and try to bounce one of the instance from which recovery was not performed (recovery was done using current log files). Before doing restart, we check for sync between primary and standby db (both were in sync), also we check for ORA-01555 on primary db (did not found ORA-01555).

SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"

2 FROM

3 (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,

4 (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL

5 WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1;

Thread Last Sequence Received Last Sequence Applied Difference

---------- ---------------------- --------------------- ----------

1 103675 103675 0

2 100310 100310 0

SQL>

Step 1 - Shut down instance on node 1.

SQL> shut immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL>

Step 2 - Start up instance on node 1 to check if error has gone.

SQL> startup;

ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance

ORACLE instance started.

Total System Global Area 5.3463E+10 bytes

Fixed Size 2194184 bytes

Variable Size 1.3959E+10 bytes

Database Buffers 3.9460E+10 bytes

Redo Buffers 42532864 bytes

Database mounted.

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 1

ORA-01555: snapshot too old: rollback segment number 1198 with name

"_SYSSMU1198_1178130323$" too small

Process ID: 9503

Session ID: 1892 Serial number: 3

We shut down the instance on node 1 and tried to startup instance manually but unfortunately we got above error. We did multiple restarts but we got same error. After searching on google I found few blogs related to same errors.

http://manishnashikkar.blogspot.in/2015/02/we-had-situation-where-we-restored.html

https://deryaoktay.wordpress.com/2012/10/13/571/

There was slight difference between above two scenarios, we did not performed any recovery nor upgraded database. We thought may be recovery running now instance 2 is blocking instance 1 from coming up. So we decided to cancel recovery on node 2.

Step 3 - Cancel recovery from instance on node 2 but it hanged.

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Step 4 - Shut down instance on node 2.

As we were unable to cancel recovery as it was hanged, we decided to shut down instance to release recovery process.

SQL> shut immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

SQL>

Step 5 - Shutdown and Startup instance on node 1.

Then we tried to shutdown instance on node 1 and but as it was not properly opened so we killed pmon process. Data guard bring back the instance soon we killed pmon process.

-bash-3.2$ sqlplus / as sysdba

Connected.

SQL> !ps -ef| grep pmon

oracle 12149 1 0 09:33:24 ? 0:01 ora_pmon_TESTSTB1

oracle 10145 10143 0 09:46:56 pts/7 0:00 grep pmon

oracle 10143 7221 0 09:46:55 pts/7 0:00 /usr/bin/bash -c ps -ef| grep pmon

oracle 11363 1 0 08:55:05 ? 0:01 asm_pmon_+ASM1

SQL> select name, open_mode from gv$database;

select name, open_mode from gv$database

ERROR at line 1:

ORA-01012: not logged on

Process ID: 0

Session ID: 0 Serial number: 0

SQL> exit

Disconnected

-bash-3.2$ sqlplus / as sysdba

Connected.

SQL> shut immediate;

ORA-24324: service handle not initialized

ORA-24323: value not allowed

ORA-01089: immediate shutdown in progress - no operations are permitted

SQL>

SQL> exit

Disconnected

-bash-3.2$

-bash-3.2$ kill -9 12149

-bash-3.2$

-bash-3.2$ ps -ef| grep pmon

oracle 12395 7498 0 09:48:41 pts/7 0:00 grep pmon

oracle 11363 1 0 08:55:05 ? 0:01 asm_pmon_+ASM1

-bash-3.2$

Step 6 - Startup instance on node 1.

-bash-3.2$ sqlplus / as sysdba

Connected to an idle instance.

SQL> startup mount;

ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance

ORA-10997: another startup/shutdown operation of this instance inprogress

ORA-09968: unable to lock file

SVR4 Error: 11: Resource temporarily unavailable

Additional information: 12338

SQL>

SQL> exit

Disconnected

-bash-3.2$

-bash-3.2$ ps -ef| grep pmon

oracle 12440 1 0 09:48:50 ? 0:00 ora_pmon_TESTSTB1

oracle 12767 7498 0 09:49:09 pts/7 0:00 grep pmon

oracle 11363 1 0 08:55:05 ? 0:01 asm_pmon_+ASM1

-bash-3.2$

-bash-3.2$ sqlplus / as sysdba

Connected to:

OPEN_MODE NAME DB_UNIQUE_NAME DATABASE_ROLE

-------------------- --------- ------------------------------ ----------------

READ ONLY WITH APPLY TEST TESTSTB PHYSICAL STANDBY

SQL> select process,status from v$managed_standby;

PROCESS STATUS

--------- ------------

ARCH CONNECTED

ARCH CLOSING

ARCH CONNECTED

RFS IDLE

PROCESS STATUS

--------- ------------

RFS IDLE

RFS RECEIVING

RFS IDLE

MRP0 WAIT_FOR_LOG

18 rows selected.

Step 7 - Startup instance on node 2.

As instance on node 1 was up finally with "read only with apply" status. so we started instance on node 2 and it also came up without any issue.

Step 8 - Check for sync between primary and standby db.

Switch few log files on both primary instances and check for sync on standby instance.

SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"

2 FROM

3 (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,

4 (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL

WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1; 5

Thread Last Sequence Received Last Sequence Applied Difference

---------- ---------------------- --------------------- ----------

1 103729 103729 0

2 100361 100361 0

ORA-01555: snapshot too old along with ORA-00704: bootstrap process failure and ORA-00604

Tuesday, 1 August 2017

ORA-01555: snapshot too old along with ORA-00704: bootstrap process failure and ORA-00604

No comments:

Post a Comment