ASHes to ash…or when ASH stops working…

Oracle claims one can change the time of the Operating System without affecting the database.  This is somewhat true, since I have done this in the past, for when an OS was off by about 15 minutes…

This story started however when some Linux Admin found out the time was off…by about a YEAR! The time itself was correct, month also, even the date (note: not the day!). So it was not readily noticed.

So we figured: we change the date, and then all will be fine again..

Of course I was just finished with my schema statistics..*grmbl*, but we just couldn’t let this be…so with the stroke of a date command, the OS time was reset to the correct time (and year).

And all my statistics where suddenly off by a year..

It didn’t take long before the first errors started to appear:

ORA-00600: internal error code, arguments: [kkopmCheckSmbUpdate:2], [], [], [], [], [], [], [], [], [], [], []

This roughly translates to Oracle saying: “Your baseline is way out of date, screw it, I’m bailing out of this SQL statement”. I did mention we where working on a live database system, did I not?

Before long, managers started to look for their riot guns, and the DBA. In that order. Since I had to translate the ora-600 with the kkopmCheckSmbUpdate part with the aid of google translate from some sort of Chinese, I figured: “It’s either me or the users: I’m getting me some new statistics, right now..”

This seemed to help, and the rain of ORA-0600 messages started to dribble to a halt, but the users where of course complaining about the drop in performance (and I didn’t even do it in parallel!). But I was curious how bad it was, so I started my trusty ASH viewer…

<shameless plug> http://sourceforge.net/projects/ashv/ </shameless plug>

and saw NOTHING! First I doubted the tool, but other databases on other servers where working just fine..so I dived into the database…and all the ASH tables where empty..

AWR reports? Gone.

This could mean a couple of things, like: the MMON and MMNL OS processes bailed out. Checking the OS: nope. The processes where still there. So somehow (ghee..what could have caused this!?) they stopped updating/populating the ASH tables, including the V$ASH_HISTORY_Etc. tables.

After some heavy googling and consulting with my personal DBA Goddess (yes, she’s real!) I decided to kill the MMON and MMNL processes on the OS.

Don’t.

It does not help, and contrary to believe: they do no start up. Why? Since the whole database and OS processes where out of line, it couldn’t be other than: the database did not receive the message the processes where gone..time to check :

{all commands below are executed as sys/sysdba}

SQL> select p.pid,b.name,p.program,b.description from v$bgprocess b, v$process p
where b.paddr(+) = p.addr and p.background = '1'
order by name;

       PID NAME  PROGRAM                                          DESCRIPTION
---------- ----- ------------------------------------------------ ----------------------------------------------------------------
         9 ACMS  oracle@bedc-odb1001 (ACMS)                       Atomic Controlfile to Memory Server
        34 ARC0  oracle@bedc-odb1001 (ARC0)                       Archival Process 0
        35 ARC1  oracle@bedc-odb1001 (ARC1)                       Archival Process 1
        36 ARC2  oracle@bedc-odb1001 (ARC2)                       Archival Process 2
        37 ARC3  oracle@bedc-odb1001 (ARC3)                       Archival Process 3
        25 ASMB  oracle@bedc-odb1001 (ASMB)                       ASM Background
        21 CKPT  oracle@bedc-odb1001 (CKPT)                       checkpoint
         7 DBRM  oracle@bedc-odb1001 (DBRM)                       DataBase Resource Manager
        18 DBW0  oracle@bedc-odb1001 (DBW0)                       db writer process 0
        19 DBW1  oracle@bedc-odb1001 (DBW1)                       db writer process 1
        10 DIA0  oracle@bedc-odb1001 (DIA0)                       diagnosibility process 0
         6 DIAG  oracle@bedc-odb1001 (DIAG)                       diagnosibility process
         5 GEN0  oracle@bedc-odb1001 (GEN0)                       generic0
        38 GTX0  oracle@bedc-odb1001 (GTX0)                       Global Txn process 0
        30 LCK0  oracle@bedc-odb1001 (LCK0)                       Lock Process 0
        20 LGWR  oracle@bedc-odb1001 (LGWR)                       Redo etc.
        12 LMD0  oracle@bedc-odb1001 (LMD0)                       global enqueue service daemon 0
        16 LMHB  oracle@bedc-odb1001 (LMHB)                       lm heartbeat monitor
        11 LMON  oracle@bedc-odb1001 (LMON)                       global enqueue service monitor
        13 LMS0  oracle@bedc-odb1001 (LMS0)                       global cache service process 0
        14 LMS1  oracle@bedc-odb1001 (LMS1)                       global cache service process 1
        28 MARK  oracle@bedc-odb1001 (MARK)                       mark AU for resync koordinator
        17 MMAN  oracle@bedc-odb1001 (MMAN)                       Memory Manager
        27 MMNL  oracle@bedc-odb1001 (MMNL)                       Manageability Monitor Process 2
        26 MMON  oracle@bedc-odb1001 (MMON)                       Manageability Monitor Process
         8 PING  oracle@bedc-odb1001 (PING)                       interconnect latency measurement
         2 PMON  oracle@bedc-odb1001 (PMON)                       process cleanup
         3 PSP0  oracle@bedc-odb1001 (PSP0)                       process spawner 0
        41 QMNC  oracle@bedc-odb1001 (QMNC)                       AQ Coordinator
        24 RBAL  oracle@bedc-odb1001 (RBAL)                       ASM Rebalance master
        39 RCBG  oracle@bedc-odb1001 (RCBG)                       Result Cache: Background
        23 RECO  oracle@bedc-odb1001 (RECO)                       distributed recovery
        15 RMS0  oracle@bedc-odb1001 (RMS0)                       rac management server
        31 RSMN  oracle@bedc-odb1001 (RSMN)                       Remote Slave Monitor
        29 SMCO  oracle@bedc-odb1001 (SMCO)                       Space Manager Process
        22 SMON  oracle@bedc-odb1001 (SMON)                       System Monitor Process
        56 VKRM  oracle@bedc-odb1001 (O000)                       Virtual sKeduler for Resource Manager
         4 VKTM  oracle@bedc-odb1001 (VKTM)                       Virtual Keeper of TiMe process
        63       oracle@bedc-odb1001 (O002)
        50       oracle@bedc-odb1001 (Q001)
        49       oracle@bedc-odb1001 (Q000)
        55       oracle@bedc-odb1001 (GCR0)
        58       oracle@bedc-odb1001 (O001)

43 rows selected.

Looking at the list we find two processes still running..

27 MMNL  oracle@bedc-odb1001 (MMNL)
26 MMON  oracle@bedc-odb1001 (MMON)

Where these not supposed to be … ?? Indeed. Killed. Obviously not. So. We moved on to another database where we did not kill the OS processes, and checked again. Yup. This database also thought the processes where still there. This time I didn’t kill them on the OS, but decided to prod them a bit. Enter: oradebug. This command is a hidden feature on the SQL prompt, which allows us to wakeup processes..We already found the PID of the processes we need to get working again, so we issued the following commands:


SQL> oradebug wakeup 26
Statement processed.
SQL> oradebug wakeup 27
Statement processed.
SQL>

After a few moments, we got response in the ASH Viewer!
It actually worked to get these processes kickstarted, WITHOUT rebooting the database!

And it’s even getting better:  the database where the MMON and MMNL processes where killed on the OS? Using the command

SQL> oradebug wakeup 2
Statement processed.
SQL>

started the killed processes! This explains the sites stating killing the OS processes will work..but this is only when the database can “see” them disappear. My recommendation is to use the “oradebug  wakeup” method before resorting to these drastic measures.

Share this nugget of information, it’s a rare bit of knowledge not found in the books..

I hope it saves you some stress, and ’till next time!

Advertisements

About GemsOfProgramming

Beeing a previously enthusiastic Java programmer, I rolled into the Oracle Database Administration world. It turned out I got a knack for this, and since approx. 2000 I'm a full time DBA. My experiences touches lot of Oracle products like Forms and Reports 9/10, JDAPI, Application Server, Weblogic Fusion and of course: Oracle Enterprise Databases, JavaFX, Swing and other Java components.
This entry was posted in Databases and tagged , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s