Quantcast
Channel: advait – Persistent Storage Solutions
Viewing all articles
Browse latest Browse all 43

Library Cache: Mutex X – Bug 20879889 – Fixed in 11.2.0.4

$
0
0

I recently encountered a bug related to MView log causing very high library cache: mutex x wait events.

I will brief about the debugging steps I tried and fix for the same.

Few things to note before I proceed:-

  • We observed huge wait events for library cache: mutex X whenever we performed flip to standby or when DB was bounced. I am implying that library cache was cold and didn’t had required cursor information and object handles.
  • Load on the database was very high. This includes very high number of sessions connected to database and high number of DMLs performed on same table by many sessions.
  • Table on which DMLs are performed is having MLOG created on that based on primary key. This is required as these changes and data needs to flow to downstream databases via MView (fast refresh). So around 10 downstream sites were registered on this MLOG.

Root Cause Analysis:

After looking at the wait events, we immediately started digging into the root cause. We wanted to understand the bottleneck which is causing these wait events and remove whatever it takes to reduce these wait events.

I started with checking top 10 wait events in last 20 mins from v$active_session_history


SQL>select * from (select event, count(1) from v$active_session_history
2 where sample_time > (sysdate - 20/1440) group by event order by 2 desc) where rownum < 10;

EVENT                                    COUNT(1)
---------------------------------------- ----------
library cache: mutex X                        50943
checkpoint completed                          15170
read by other session                          5487
row cache lock                                 4205
log file sync                                  3137
flashback buf free by RVWR                     1815
db file sequential read                        1675
log file switch completion                     1611
cursor: pin S wait on X                        1516

9 rows selected.

Clearly library cache: mutex X was way higher.

We can check who is causing library cache: mutex X by checking P1 and P2 of that wait event


SQL>select event, p1, count(1) from v$active_session_history where sample_time > (sysdate - 20/1440) and event = 'library cache: mutex X' group by event, p1 order by 3;

EVENT                                    P1         COUNT(1)
---------------------------------------- ---------- ----------
library cache: mutex X                    421399181          1
library cache: mutex X                   3842104349          1
library cache: mutex X                   3477606468          1
library cache: mutex X                   2432877226          1
library cache: mutex X                    955484820          2
library cache: mutex X                        37005         25
library cache: mutex X                   1412465886        297
library cache: mutex X                   2417922189      50615

8 rows selected.

As you can see max wait events are caused by P1 = 2417922189

P1 is idn – can be used for finding the cursor related to mutex

For library cache: mutex X

  1. P1 represents library cache hash bucket number (if idn <= 131072)
  2. P1 represents hash value of the library cache object under protection (if idn > 131072)

In our case hash value was 2417922189. So it represents library cache object.

We can check if this hash value belongs to a cursor (SQL) using v$sql


SQL>select sql_id from v$sql where hash_value = 2417922189;

SQL_ID
-------------
3b7aa6f81x44d

SQL>@sql
Enter SQL_ID:- 3b7aa6f81x44d
old 3: where sql_id = '&SQL_ID'
new 3: where sql_id = '3b7aa6f81x44d'

INSERT /*+ IDX(0) */ INTO "BOOKER"."MLOG$_FULFILLMENT_DEMAND_I"
(dmltype$$,old_new$$,snaptime$$,change_vector$$,xid$$,"WAREHOUSE
_ID","CO_WAREHOUSE_ITEM_ID") VALUES (:d,:o,to_date('4000-01-01:0
0:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:x,:1,:2)

As you can see this is DML on MLOG table. So clearly MLOG seems to be the bottleneck.

Immediate Action:

Wait events were making every thing stand still and nothing was getting processed. Since database availability was critical and we couldn’t afford to loose any more time because of wait events, our immediate action was to drop MLOG.

But since DB was having thousands of sessions and all stuck in library cache: mutex X, we cannot even get a lock to drop MLOG (not even with ddl_lock_timeout). Killing a session was not helping as they were keep coming back in no time.

So we decided to take quick bounce of DB (by keeping listeners down), dropped this MLOG and started listeners again.

This has fixed the issue and there were no wait events.

On down side, all downstream sites had to do complete refresh of MView followed by fast refresh as they got deregistered from MLOG. But it was OK as size of master table was not very big.

Patch Fix:

We also tried to investigate why an MLOG can cause this issue ? But were not able to get any insight.

MLOG is a table where primary key of changed records in main table will get inserted and when downstream table refreshes the data, these records will get deleted. So we don’t expect much higher size of MLOG. Unfortunately this is case it was 7.7GB (bigger than master table size).

Because the size of MLOG was too high, whenever a downstream database is doing fast refresh it has to update SNAPTIME$$ column in MLOG table which tells MLOG which rows are refreshed by downstream DB at what time. This update might be taking more time and holding enqueue locks. But inserts should not be blocked because of this update as it inserts a new record and doesn’t collied at enqueue level. Also, we were seeing these library cache wait events just after bounce or failover.

This is identified as a bug and Oracle has provided following patch as a fix for this issue

Patch 20879889: INSERT INTO MV LOG LEAVING TOO MANY OPEN CURSORS AFTER UPGR TO 11.2.0.4

Make sure you have this patch applied if you are using MLOG in your database.

Hope this helps !!

Reference:

https://support.oracle.com/epmos/faces/PatchDetail?requestId=19179589&_afrLoop=14055698364071&patchId=20879889&_afrWindowMode=0&_adf.ctrl-state=xqyu7jtft_77

 

 


Filed under: Oracle Database 11g, Performance Tuninig Tagged: library cache: mutex x, Patch 20879889, v$active_session_history

Viewing all articles
Browse latest Browse all 43

Trending Articles