Recently, we had a situation where we continuously got ORA-01555 on one of our active data guard.
ORA-00604: error occurred at recursive SQL level 3
ORA-01555: snapshot too old: rollback segment number 1176 with name "_SYSSMU1176_1510088119$" too small
Below are the steps we carried out to resolve the issue.
ORA-00604: error occurred at recursive SQL level 3
ORA-01555: snapshot too old: rollback segment number 1176 with name "_SYSSMU1176_1510088119$" too small
Environment Setup -
2 Node RAC Primary database
2 Node RAC Active Data guard
We tried to check what was causing issue related to undo tablespace. we checked free space %, undo_retention parameter but everything looks fine.
TABLESPACE TOTAL SIZE FREE SPACE % FREE
------------------------------ ---------- ---------- ----------
UNDOTBS01 92160 87723 95
SQL> show parameter undo
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
undo_management string AUTO
undo_retention integer 6500
undo_tablespace string UNDOTBS01
There were no undo blocks used in database at the time of issue.
SQL> SELECT undoblks/((end_time-begin_time)*86400) "Peak Undo Block Generation" FROM v$undostat WHERE undoblks=(SELECT MAX(undoblks) FROM v$undostat);
Peak Undo Block Generation
--------------------------
0
SQL> SELECT (UR * (UPS * DBS)) AS "Bytes"
FROM (select max(tuned_undoretention) AS UR from v$undostat),
2 3 (SELECT undoblks/((end_time-begin_time)*86400) AS UPS
4 FROM v$undostat
5 WHERE undoblks = (SELECT MAX(undoblks) FROM v$undostat)),
(SELECT block_size AS DBS
6 7 FROM dba_tablespaces
8 WHERE tablespace_name = (SELECT UPPER(value) FROM v$parameter WHERE name = 'undo_tablespace'));
Bytes
----------
0
SQL> SELECT TO_CHAR(BEGIN_TIME, 'MM/DD/YYYY HH24:MI:SS') BEGIN_TIME,
TO_CHAR(END_TIME, 'MM/DD/YYYY HH24:MI:SS') END_TIME,
UNDOTSN, UNDOBLKS, TXNCOUNT, MAXCONCURRENCY AS "MAXCON",
MAXQUERYLEN, TUNED_UNDORETENTION
FROM v$UNDOSTAT; 2 3 4 5
BEGIN_TIME END_TIME UNDOTSN UNDOBLKS TXNCOUNT MAXCON MAXQUERYLEN TUNED_UNDORETENTION
------------------- ------------------- ---------- ---------- ---------- ---------- ----------- -------------------
07/24/2017 10:39:42 07/31/2017 07:48:39 2147483647 0 0 0 0 6500
SQL> SELECT A.SID, A.USERNAME, B.XIDUSN, B.USED_UREC, B.USED_UBLK
FROM V$SESSION A, V$TRANSACTION B
WHERE A.SADDR=B.SES_ADDR; 2 3
FROM V$SESSION A, V$TRANSACTION B
*
ERROR at line 2:
ORA-00604: error occurred at recursive SQL level 3
ORA-01555: snapshot too old: rollback segment number 1176 with name "_SYSSMU1176_1510088119$" too small
Even data files were in auto extend
SQL> SELECT FILE_ID, AUTOEXTENSIBLE FROM DBA_DATA_FILES WHERE
TABLESPACE_NAME='&UNDOTBS'; 2
Enter value for undotbs: UNDOTBS01
old 2: TABLESPACE_NAME='&UNDOTBS'
new 2: TABLESPACE_NAME='UNDOTBS01'
FILE_ID AUT
---------- ---
1164 YES
1166 YES
1167 YES
We observed active undo extents but numbers were very small.
SQL> SELECT DISTINCT STATUS, SUM(BYTES), COUNT(*)
FROM DBA_UNDO_EXTENTS GROUP BY STATUS; 2
STATUS SUM(BYTES) COUNT(*)
------------ ---------- ----------
EXPIRED 4843438080 10521
UNEXPIRED 4809162752 5178
ACTIVE 3482451968 215
SQL> select owner, segment_name, tablespace_name, status from dba_rollback_segs where tablespace_name='UNDOTBS01';
select owner, segment_name, tablespace_name, status from dba_rollback_segs where tablespace_name='UNDOTBS01'
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 1205 with name "_SYSSMU1205_975967709$" too small
Below are the steps we carried out to resolve the issue.
We had 2 node RAC active data guard setup. We decided to take a chance and try to bounce one of the instance from which recovery was not performed (recovery was done using current log files). Before doing restart, we check for sync between primary and standby db (both were in sync), also we check for ORA-01555 on primary db (did not found ORA-01555).
SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
2 FROM
3 (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,
4 (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
5 WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1;
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 103675 103675 0
2 100310 100310 0
SQL>
Step 1 - Shut down instance on node 1.
SQL> shut immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
Step 2 - Start up instance on node 1 to check if error has gone.
SQL> startup;
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 5.3463E+10 bytes
Fixed Size 2194184 bytes
Variable Size 1.3959E+10 bytes
Database Buffers 3.9460E+10 bytes
Redo Buffers 42532864 bytes
Database mounted.
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-00704: bootstrap process failure
ORA-00604: error occurred at recursive SQL level 1
ORA-01555: snapshot too old: rollback segment number 1198 with name
"_SYSSMU1198_1178130323$" too small
Process ID: 9503
Session ID: 1892 Serial number: 3
We shut down the instance on node 1 and tried to startup instance manually but unfortunately we got above error. We did multiple restarts but we got same error. After searching on google I found few blogs related to same errors.
There was slight difference between above two scenarios, we did not performed any recovery nor upgraded database. We thought may be recovery running now instance 2 is blocking instance 1 from coming up. So we decided to cancel recovery on node 2.
Step 3 - Cancel recovery from instance on node 2 but it hanged.
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
Step 4 - Shut down instance on node 2.
As we were unable to cancel recovery as it was hanged, we decided to shut down instance to release recovery process.
SQL> shut immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL>
Step 5 - Shutdown and Startup instance on node 1.
Then we tried to shutdown instance on node 1 and but as it was not properly opened so we killed pmon process. Data guard bring back the instance soon we killed pmon process.
-bash-3.2$ sqlplus / as sysdba
Connected.
SQL> !ps -ef| grep pmon
oracle 12149 1 0 09:33:24 ? 0:01 ora_pmon_TESTSTB1
oracle 10145 10143 0 09:46:56 pts/7 0:00 grep pmon
oracle 10143 7221 0 09:46:55 pts/7 0:00 /usr/bin/bash -c ps -ef| grep pmon
oracle 11363 1 0 08:55:05 ? 0:01 asm_pmon_+ASM1
SQL> select name, open_mode from gv$database;
select name, open_mode from gv$database
*
ERROR at line 1:
ORA-01012: not logged on
Process ID: 0
Session ID: 0 Serial number: 0
SQL> exit
Disconnected
-bash-3.2$ sqlplus / as sysdba
Connected.
SQL> shut immediate;
ORA-24324: service handle not initialized
ORA-24323: value not allowed
ORA-01089: immediate shutdown in progress - no operations are permitted
SQL>
SQL>
SQL> exit
Disconnected
-bash-3.2$
-bash-3.2$ kill -9 12149
-bash-3.2$
-bash-3.2$
-bash-3.2$ ps -ef| grep pmon
oracle 12395 7498 0 09:48:41 pts/7 0:00 grep pmon
oracle 11363 1 0 08:55:05 ? 0:01 asm_pmon_+ASM1
-bash-3.2$
Step 6 - Startup instance on node 1.
-bash-3.2$ sqlplus / as sysdba
Connected to an idle instance.
SQL> startup mount;
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORA-10997: another startup/shutdown operation of this instance inprogress
ORA-09968: unable to lock file
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 12338
SQL>
SQL> exit
Disconnected
-bash-3.2$
-bash-3.2$ ps -ef| grep pmon
oracle 12440 1 0 09:48:50 ? 0:00 ora_pmon_TESTSTB1
oracle 12767 7498 0 09:49:09 pts/7 0:00 grep pmon
oracle 11363 1 0 08:55:05 ? 0:01 asm_pmon_+ASM1
-bash-3.2$
-bash-3.2$ sqlplus / as sysdba
Connected to:
OPEN_MODE NAME DB_UNIQUE_NAME DATABASE_ROLE
-------------------- --------- ------------------------------ ----------------
READ ONLY WITH APPLY TEST TESTSTB PHYSICAL STANDBY
SQL> select process,status from v$managed_standby;
PROCESS STATUS
--------- ------------
ARCH CONNECTED
ARCH CONNECTED
ARCH CONNECTED
ARCH CLOSING
ARCH CLOSING
ARCH CLOSING
ARCH CONNECTED
ARCH CONNECTED
ARCH CONNECTED
ARCH CONNECTED
RFS IDLE
PROCESS STATUS
--------- ------------
RFS IDLE
RFS IDLE
RFS IDLE
RFS IDLE
RFS RECEIVING
RFS IDLE
MRP0 WAIT_FOR_LOG
18 rows selected.
Step 7 - Startup instance on node 2.
As instance on node 1 was up finally with "read only with apply" status. so we started instance on node 2 and it also came up without any issue.
Step 8 - Check for sync between primary and standby db.
SQL> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
2 FROM
3 (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH,
4 (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL
WHERE ARCH.THREAD# = APPL.THREAD# ORDER BY 1; 5
Thread Last Sequence Received Last Sequence Applied Difference
---------- ---------------------- --------------------- ----------
1 103729 103729 0
2 100361 100361 0
No comments:
Post a Comment