When I left the OEM 12c installation after a couple of days nicely humming along, I was even thinking we were friends!, this morning when I wanted to get to know my new found friend better, the OMS landing page threw me a NASTY 404! What the ??
At first I thought: the whole system must be down..but who was to throw me the 404 page not found error? Hmm..So, the WLS must still be up..This basically means the OMS (Oracle Management Service) must be down. *Snif*. I must have hurt its feelings..
Oh well. Fix it first, amends later.
So: since the OMS and the OMR (Oracle Management Repository) are separated in Oracle 12c, I just thought to take a quick peek at the underlying system (aka: the database) and lo and behold:
[oracle@OEMSRV ~]$ sqlplus / as sysdba SQL*Plus: Release 220.127.116.11.0 Production on Sun Jan 12 10:46:01 2014 Copyright (c) 1982, 2011, Oracle. All rights reserved. ERROR: ORA-12162: TNS:net service name is incorrectly specified Enter user-name:
Ah, lets check the tnsping:
[oracle@OEMSRV ~]$ tnsping repos12 TNS Ping Utility for Linux: Version 18.104.22.168.0 - Production on 12-JAN-2014 10:47:44 Copyright (c) 1997, 2011, Oracle. All rights reserved. Used parameter files: /oracle/base/db/dbHome1/network/admin/sqlnet.ora Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = OEMSRV)(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = repos12))) OK (0 msec)
Hmm.. Ok, so the resolving part is working. We try another way of logging into the database:
[oracle@OEMSRV ~]$ sqlplus system@repos12 SQL*Plus: Release 22.214.171.124.0 Production on Sun Jan 12 10:37:45 2014 Copyright (c) 1982, 2011, Oracle. All rights reserved. Enter password: ERROR: ORA-00257: archiver error. Connect internal only, until freed.
Ah! This is something else again. The error makes sense, but let’s double check this anyway:
[oracle@OEMSRV ~]$ oerr ora 00257 00257, 00000, "archiver error. Connect internal only, until freed." // *Cause: The archiver process received an error while trying to archive // a redo log. If the problem is not resolved soon, the database // will stop executing transactions. The most likely cause of this // message is the destination device is out of space to store the // redo log file. // *Action: Check archiver trace file for a detailed description // of the problem. Also verify that the // device specified in the initialization parameter // ARCHIVE_LOG_DEST is set up properly for archiving.
So far we know know:
-> OMS stopped working because of it’s dependency with the OMR
-> OMR stopped working due to the archiver who is not able to write it’s files.
Side Note: Inspector Clouseau really misses out on a new career, detecting is WAY more fun with Oracle..
But to get back on track…I was fairly sure I had enough diskspace to accomodate a whole farm of clients, but let’s check on the storage space we have anyway:
[oracle@OEMSRV ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol00 607G 41G 535G 8% / /dev/sda1 99M 22M 72M 24% /boot tmpfs 3.9G 524M 3.4G 14% /dev/shm
Indeed. Ample space available.
So this means the FRA (archiver destination) setting in the database must be off, probably set to small.
Next stop: Alert log!
The alert log (yes, still looking at the database, for those wondering which one of the thousands of log files we need to look at..) shows me the following:
ARC0: Error 19809 Creating archive log file to '/oracle/base/fast_recovery_area/REPOS12/archivelog/2014_01_12/o1_mf_1_101_%u_.arc' Errors in file /oracle/base/diag/rdbms/repos12/repos12/trace/repos12_arc1_9247.trc: ORA-19815: WARNING: db_recovery_file_dest_size of 4322230272 bytes is 100.00% used, and has 0 remaining bytes available. ************************************************************************ You have following choices to free up space from recovery area: 1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard, then consider changing RMAN ARCHIVELOG DELETION POLICY. 2. Back up files to tertiary device such as tape using RMAN BACKUP RECOVERY AREA command. 3. Add disk space and increase db_recovery_file_dest_size parameter to reflect the new space. 4. Delete unnecessary files using RMAN DELETE command. If an operating system command was used to delete files, then use RMAN CROSSCHECK and DELETE EXPIRED commands. ************************************************************************ ARC1: Error 19809 Creating archive log file to '/oracle/base/fast_recovery_area/REPOS12/archivelog/2014_01_12/o1_mf_1_102_%u_.arc'
Ok, a long shot: let’s see if we can use RMAN to alter the retention policy.
[oracle@OEMSRV]$ rman target / Recovery Manager: Release 126.96.36.199.0 - Production on Sun Jan 12 11:39:19 2014 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-00554: initialization of internal recovery manager package failed RMAN-04005: error from target database: ORA-12162: TNS:net service name is incorrectly specified
Oh, and as proof: the SID IS specified correctly:
[oracle@OEMSRV]$ echo $ORACLE_SID repos12
This leaves me a bit in a pickle…Since we already determined we have a lot of GB left to use, the database won’t let me in to alter the FRA size and RMAN won’t let me in either.
The plan is now:
* Free up some space in the FRA area, hopefully the database ‘sees’ this, writes his latest arc file.
* Then I hope to get into the database, alter the size of the FRA to a bit less of space I have now,
* Then move the previous moved arch files back.
Ow. And do something like create a backup, alter the retention policies etc..Things I neglected to do since this is just a test system, and frankly I didn’t expect to have a successful installation of OEM 12c in the first attempt I gave it! 😉
Step 1: move the existing archive files out of the FRA area.
[oracle@OEMSRV]$ mkdir /oracle/tmp [oracle@OEMSRV]$ cd .. [oracle@OEMSRV]$ pwd /oracle/base/fast_recovery_area/REPOS12/archivelog [oracle@OEMSRV]$ ls 2014_01_10 2014_01_11 2014_01_12 [oracle@bedc-app2606 archivelog]$ mv 2014_01_10 2014_01_11 /oracle/tmp/
Ok, now we need to convince the database we have made more space for the archive files:
[oracle@OEMSRV]$ rman Recovery Manager: Release 188.8.131.52.0 - Production on Sun Jan 12 12:00:57 2014 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. RMAN> connect target sys/OracleRules@repos12 connected to target database: REPOS12 (DBID=89898999) RMAN> crosscheck archivelog all; <snip long listing> RMAN> DELETE NOPROMPT ARCHIVELOG UNTIL TIME "SYSDATE-1"; <snip long listing> RMAN> sql "alter system archive log current"; sql statement: alter system archive log current RMAN> exit
This should have kick started the db back into motion.
[oracle@OEMSRV]$ sqlplus system@repos12 SQL*Plus: Release 184.108.40.206.0 Production on Sun Jan 12 12:24:57 2014 Copyright (c) 1982, 2011, Oracle. All rights reserved. Enter password: XXXX Connected to: Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> show parameter db_recovery_file_dest; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest string /oracle/base/fast_recovery_are a db_recovery_file_dest_size big integer 4122M SQL> alter system set db_recovery_file_dest_size=20G; System altered. SQL>
Now this is fixed, OEM should also be responding..hopefully..
Let’s try and enter the welcome page..
YAY! My shortly lost friend is BACK! Without a restart or whatsoever!
Of course, we now need to create a backup of the database, and alter the retention of the backup etc. Also we could move back the archive files and let the system fill up the FRA and repeat this exercise…all are possibilities we have with Oracle. I for one just create a full backup and alter the retention and be done with this, since this is a test system. But it is nice to know how to fix this without rebooting the db or WLS etc..
Thanks for reading, and ’till the next post!