ORA-10458 Standby Database Requires Recovery..Or..ADG Is Out Of Whack.

One day…yes, we had one of “those days” when an power maintenance in the datacenter proved the divide and conquer strategy of power supply cabling was not applied as planned…The ADG crashed..and not in a good way. The storage just came down, and the ADG was not having it.

When the storage came back up, the database would not start:

ORA-10458: standby database requires recovery
ORA-01196: file 1 is inconsistent due to a failed media recovery session
ORA-01110: data file 1: ‘+DATACCPR1DG/DGPRD01/DATAFILE/system.287.979081635’

Ergo: no good.

Not willing to go the whole drop and re-sync road as yet, I decided to use a different tack: force a recovery and hope for the best..

Steps:

SQL> startup mount;

Start managed recovery.

SQL> alter database recover managed standby database using current logfile disconnect from session;

Cancel managed recovery.

     SQL> recover managed standby database cancel;

Open them read only.

SQL> alter database open read only;

Database altered.

SQL> recover managed standby database disconnect using current logfile;

Media recovery complete.

SQL>

And this seemed to work! Looking at the alert.log, the archivelogs missed during the outage where being processed!

Posted in Personal Ramblings | Leave a comment

SQL Developer..Or..”Let’s Be Friends Again..

Every so often, when running SQL Developer on Linux (Ubuntu), the cursor just decided to have a day off, and would just disappear. When just doing some edits, this was cumbersome, annoying as **** , but doable..However, I never really got to appreciate all the finesses and good options SQL Developer had to offer, due to this behavior..

I’ve used different versions along the years, and since I refuse to maintain Oracle databases on Windows platforms or at least manage them from a Windows platform, this issue kept creeping up on me… Until I finally start to think hard and loud..What could be the reason why sometimes this would occur not at all, and sometime right away, or after a random amount of time…

Still…No clue.. 😉

What I DO know now is this: It is to solve with a simple setting in the user prefs of the account running the SQL Developer.

In the user home open the following file:

/home/unknown/.sqldeveloper/4.1.0/product.conf

Add the following line:
AddVMOption -Dsun.java2d.opengl=true

In my case this is the only “active” line in the whole file.

Restart SQL Developer…and get acquainted with a cursor who is actually willing to stay and work with you…

ps: yes, I need to upgrade…I know.. 😉

Posted in Personal Ramblings | Leave a comment

Installing Oracle 12.2.01 On Oracle Linux 7.4..Or..”The Story Of The Tainted Kernel”

When moving from Oracle 11.2.0.4 to a new Oracle 12c environment, we wanted to build everything on the latest and greatest levels…Little did we know we would run into quite a snag..

We installed Oracle Release 7.4, with all the recommended packages. Installing was not an issue, and we proceeded with the installation of the GRID software of Oracle 12.2.0.1..yes, the one with a unpacked oracle grid home and uses gridSetup.sh

All went fine, however..when we reached the script part things started to go askew..

Selection_002.png

The orainstRoot.sh…no problem..

The root.sh however…bailed out at step 14..

Basically the execution of root.sh while installing Grid Infrastructure on RedHat Linux or Oracle Linux with RedHat Compatible Kernel (RHCK) 7 fails with the following error:

CLSRSC-594: Executing installation step 14 of 19: ‘InstallACFS’.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node01’
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘bedc-odb0001’ has completed
CRS-4133: Oracle High Availability Services has been stopped
CRS-4123: Oracle High Availability Services has been started.

CLSRSC-400: A system reboot is required to continue installing.

…Blah blah…

.. Failed…

Rendering the whole installation exercise useless.

Looking in the installer log file mentioned at the start of the root.sh, we found this:

> ACFS-9154: Loading ‘oracleoks.ko’ driver.
> modprobe: ERROR: could not insert ‘oracleoks’: Unknown symbol in module, or unknown parameter (see dmesg)
> ACFS-9109: oracleoks.ko driver failed to load.

And looking into the dmesg log:

[dmesg]
[ 14.749567] Request for unknown module key ‘Oracle Linux Kernel Module Signing Key: 2bb352412969a3653f0eb6021763408ebb9bb5ab’ err -11
[ 14.749575] oracleafd: loading out-of-tree module taints kernel.
[ 14.749577] oracleafd: module license ‘Oracle Corporation’ taints kernel.
[ 14.749578] Disabling lock debugging due to kernel taint
[ 14.749673] oracleafd: module verification failed: signature and/or required key missing – tainting kernel

Ok, we use a supported OS, so a search on support.oracle.com showed us there was a patch available: p25078431_122010_Linux-x86-64.zip (there is another one, smaller in size using the the same patch number and with ACFS in the name..

– Cleaning everything up (really deleting everything!) and unpacking the linux_grid etc..
– Installed the patch with  ./gridSetup.sh -applyOneOffs /oracle/patch/25078431

Ignore the XDISPLAY messages when installing from an non-x-terminal.

Got the successful message, and started the installation again..

And ran into exact the same issue…

Long story short (actually 1,5 frustrating day of hacking and changing a lot..) we found out the kernel was not matching! What we where running was kernel:

kernel-3.10.0-693.5.2.el7.x86_64.

What the installer was expecting (WITH PATCH!) was

kernel-3.10.0-514.16.1.el7.x86_64 (the original version)

Now the installation went without a hitch! Keep in mind when upgrading the OS, the kernel must not upgrade until the installer is up to par!

 

Posted in 12c, ASM, Databases, upgrade | Tagged | 14 Comments

Quick Setup To Monitor/Detect percieved Slowness…Or…”Put Some MDMA In The Database..”

At some time every DBA has to deal with the dreaded “The database is slow, FIX IT!” issue. To be honest? I’m not the master of tuners, and surely I’m not dealing with these things on a day-to-day basis. Oracle is not THAT bad.

However, these moment do pop-up, and sometimes, indeed, it is something in the database. And it’s up to the DBA to do something. Even if it is NOT in the DB, it’s up to the DBA to find to cause or in the least prove the DB is not the cause of the perceived slowness, but some other external (network/application/wrong query) cause.

There is a TON of information out on the Internet to find the culprit, but I needed a quick recipe to setup an trace session, so at the least I had the information at my disposal. Hence this post..

Tools of the trade:

Create an AWR report
Setup a trace with the user session which is complaining.

Creation of the AWR report is quickly done by running the script:

sql> @?/rdbms/admin/awrrpt.sql
or
sql> @?/rdbms/admin/awrrpti.sql

The difference being:
awrrpt.sql will run on the current instance, and if you want to generate a report for another instance (RAC), the awrrpti.sql script can be used. The script will ask for the database ID and the instance ID.

The interpretation of the AWR report is something for another blog, but at least we have something to look at on the screen to show off our skills.

But what if it is a more specific issue? An user complaining? Instead of wading through a whole AWR report, it can just be a specific query acting up. Or just one user is affected. In such a case, it might be more useful to resort to session tracing.

There a a lot of situations which can occur, like: the user is still logged in, experiencing the issue, the user left for home out of frustration of the slowness, but left the session running. Or just logged out, then went home after filing a complaint the db was slow and no other information *sigh*.

It would take to much time to handle all these cases, but for a future reference, I’ll document the two cases I get to deal with most:

Case 1:
Setup a trace for future logins.

Case 2 ( Note: in a future blog when there is interest or I have more time!) :
Setup a trace for a current logged in session.

Before we dive in the good stuff: there a a multitude of tools and way of doing things, like using oradebug, SET EVENT 10046, DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION,DBMS_SUPPORT.START_TRACE_IN_SESSION or DBMS_MONITOR.SESSION_TRACE_ENABLE.

It would again take to much time to give a thorough overview of the pro and cons, but in short:

oradebug
Old school style way of doing a trace. Honestly? Before my time (when Oracle 8.0 was in it’s prime time).

SET EVENT 10046
ALTER SESSION SET EVENTS ’10046 TRACE NAME CONTEXT FOREVER,LEVEL 12′;
Nice one for tracing your own session, and you are able to “replay” the issue the user experienced with either his/her account or your own privileged account (sometimes this is the case).

DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION
or to be precise: DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(SID, SERIAL#, TRUE);
Find the SID and Serial# of the session, and have a go at it..this one is known since Oracle 8i/9i..

Turn it on:

exec dbms_system.set_sql_trace_in_session(5,8888,true);

Turn it off:

exec dbms_system.set_sql_trace_in_session(5,8888,false);

DBMS_SUPPORT.START_TRACE_IN_SESSION;
Before first use: create this package first, it’s not a default available one..

To install, issue the following command:

SQL> @?/rdbms/admin/dbmssupp.sql

Now you can use this package as:

DBMS_SUPPORT.START_TRACE_IN_SESSION(SID, SERIAL#, WAITS, BINDS);

Turn it on:

exec dbms_support.start_trace_in_session(5,8888,true,true);

Turn it off:
exec dbms_support.stop_trace_in_session(5,8888);

DBMS_MONITOR.SESSION_TRACE_ENABLE(SESSION_ID, SERIAL_NUM,BINDS,WAITS);
This is the way Oracle 11g is handling things..

To turn on:
exec dbms_monitor.session_trace_enable(session_id=>5,serial_num=>8888,binds=>true,waits=>true);

To turn off:
exec dbms_monitor.session_trace_disable(session_id=>5,serial_num=>8888);

All (or most) of the above will create a trace file in the trace dir..more on this later.

Client identifier

It would make things a lot easier to identify the users in the database with some more information, instead of wading through a lot of unknown connections etc..

To do this, a logon trigger needs to be created:

CREATE OR REPLACE TRIGGER LOGON_TRIGGER
AFTER LOGON ON DATABASE
DECLARE
v_user_identifier varchar2(64);
BEGIN
SELECT SYS_CONTEXT(‘USERENV’, ‘OS_USER’)
||’:’||
SYS_CONTEXT(‘USERENV’, ‘IP_ADDRESS’)
INTO v_user_identifier
FROM dual;
DBMS_SESSION.SET_IDENTIFIER(v_user_identifier);
END;
/

If the need arise to clear out the CLIENT_IDENTIFIER for a session, just execute the following procedure:

DBMS_SESSION.CLEAR_IDENTIFIER

Case 1: Setup a trace for future logins.

CLIENT_ID_TRACE_ENABLE Procedure
This procedure will enable the trace for a given client identifier globally for the database.
Syntax

DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE(
client_id    IN  VARCHAR2,
waits        IN  BOOLEAN DEFAULT TRUE,
binds        IN  BOOLEAN DEFAULT FALSE,
plan_stat    IN  VARCHAR2 DEFAULT NULL);

To turn on:

EXECUTE DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE(‘janedoe’, TRUE, FALSE);

Note on plan_stat:
Frequency at which we dump row source statistics. Value should be ‘NEVER’, ‘FIRST_EXECUTION’ (equivalent to NULL) or ‘ALL_EXECUTIONS’.

Usage Notes

* The trace will be written to multiple trace files because more than one Oracle shadow process can work on behalf of a given client identifier.

* The tracing is enabled for all instances and persistent across restarts.

Turn off:
EXECUTE DBMS_MONITOR.CLIENT_ID_TRACE_DISABLE (‘janedoe’);

http://docs.oracle.com/cd/E25178_01/server.1111/e16638/sqltrace.htm

Posted in Performance | Tagged , | Leave a comment

DBUA 12c Reports Database Down While It’s Up..Or..”The only way is up! – Yazz- “

After doing meticulously doing the pre-upgrade  steps on a RAC converted to single instance node, we finally could get started on the actual upgrade. The initial idea is to use the DBUA, since it automates a lot of steps, and is supposed to be less error-prone.. Oracle Corp had me with the “automates” step already, but Ok.. 😉

We set the correct environment settings, checked the listener:

$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 10-FEB-2017 12:19:12

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                09-FEB-2017 11:23:14
Uptime                    1 days 0 hr. 55 min. 57 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/12.1.0/grid/network/admin/listener.ora
Listener Log File         /oracle/12.1.0/base/diag/tnslsnr/node02/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "ora12tst" has 2 instance(s).
Instance "ora12tst", status UNKNOWN, has 1 handler(s) for this service...
Instance "ora12tst", status READY, has 2 handler(s) for this service...
The command completed successfully

Yep, perfect.

[oracle@bedr-odb0002 dbs]$ sql

SQL*Plus: Release 11.2.0.4.0 Production on Fri Feb 10 12:20:59 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL>

Awesome.

Let’s go!

From a x-terminal and the Oracle 12c Home:

$/oracle/12.1.0/base/dbhome_1/bin/dbua

selection_007

Yep. This is what we want..

selection_008

Oh, wait…what!? How!? Ok, it is a clone of a RAC, we get that, but down? Nope. Absolutely not.

We could make a long story even longer by sharing all the experiments, but the main things done, where:

Un-raccing (is this even a word?) the database:

SQL> alter system set cluster_database_instances=1 scope=spfile;

System altered.

SQL> alter database disable thread 2;

Database altered.

Delete the unwanted thread and redo logfiles.

SQL> select thread#, group# from v$log order by 1;

THREAD# GROUP#
---------- ----------
1 1
1 5
1 7
2 6
2 3
2 4

6 rows selected.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter database drop logfile group 4;

Database altered.

SQL> select thread#, group# from v$log order by 1;

THREAD# GROUP#
---------- ----------
1 1
1 7
1 5
Database altered.

Drop the unwanted undo tablespace.

SQL> drop tablespace UNDOTBS2 including contents and datafiles;

Tablespace dropped.

Create pfile from spfile.

SQL> create pfile from spfile;

File created.

SQL> shutdown immediate;

SQL> exit

Remove all references for RAC components from the pfile and start the instance using the pfile.

Startup the database.

$ sql

SQL*Plus: Release 11.1.0.7.0 – Production on Wed Mar 9 20:09:28 2011

Copyright (c) 1982, 2008, Oracle. All rights reserved.

Connected to an idle instance.

SQL> create spfile from pfile;

File created.

SQL> startup
ORACLE instance started.

Total System Global Area 1.2827E+10 bytes
Fixed Size                  2265224 bytes
Variable Size            3925872504 bytes
Database Buffers         8858370048 bytes
Redo Buffers               40861696 bytes
Database mounted.
Database opened.
SQL>

But although it cleaned some stuff up, which will benefit the upgrade anyway, it didn’t solve the issue. The DBUA still claimed the database was down.

But wait! There is a new way of doing things (Ok, not really new, but still) called the srvctl utility! Would the DBUA be looking at the service of the database?

Again, long story short: it does.

To get a clean(-ish) start, First remove the service completely, and rebuild it from scratch.

$ srvctl config database -d ora12tst
Database unique name: ora12tst
Database name:
Oracle home: /oracle/11.2/base/db/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ora12tst
Database instances:
Disk Groups:
Mount point paths:
Services:
Type: RAC
Database is administrator managed

$ srvctl remove database -d ora12tst
Remove the database ora12tst? (y/[n]) y

$ srvctl add database -d ora12tst -o /oracle/11.2/base/db/dbhome_1

$ srvctl status database -d ora12tst
Database is not running.
$ srvctl start database -d ora12tst
PRKO-3119 : Database ora12tst cannot be started since it has no configured instances.

$ srvctl config database -d ora12tst
Database unique name: ora12tst
Database name:
Oracle home: /oracle/11.2/base/db/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ora12tst
Database instances:
Disk Groups:
Mount point paths:
Services:
Type: RAC
Database is administrator managed

$ srvctl add instance -d ora12tst -i ora12tst -n node02
$ srvctl start database -d ora12tst
$ srvctl config database -d ora12tst
Database unique name: ora12tst
Database name:
Oracle home: /oracle/11.2/base/db/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ora12tst
Database instances: ora12tst
Disk Groups:
Mount point paths:
Services:
Type: RAC
Database is administrator managed

$ srvctl status database -d ora12tst
Instance ora12tst is running on node node02

$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 10-FEB-2017 12:19:12

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                09-FEB-2017 11:23:14
Uptime                    1 days 0 hr. 55 min. 57 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/12.1.0/grid/network/admin/listener.ora
Listener Log File         /oracle/12.1.0/base/diag/tnslsnr/node02/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1521)))

Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "ora12tst" has 2 instance(s).
Instance "ora12tst", status UNKNOWN, has 1 handler(s) for this service...
Instance "ora12tst", status READY, has 2 handler(s) for this service...
The command completed successfully

$ sql

SQL*Plus: Release 11.2.0.4.0 Production on Fri Feb 10 12:20:59 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
$

Now we start the DBUA again…press Next on the first screen….

selection_009

YES! This is what we wanted to see…*pfieuw*.

And now… Onwards to the future…

Posted in 12c, Databases, Technical Stuff, upgrade | 2 Comments

Oracle 12c DBUA ORA-01031: Insufficient privileges..Or..”Oracle Really Knows How To Test One’s Nerves..”

During a test run of an upgrade with the DBUA, everything was peachy..did some tests, and some other thingies..Multiple terminals open like anyone not working on Windows does..Started the DBUA 12c just fine…found out: hey, you need to have extra space in the ASM to accommodate the upgrade..so I added some disks, but needed to reboot the testmachine, since, as you know it…NOTHING works as advertised, and Murphy’s law just loves to make itself count..

Long story short: after the reboot, started the DBUA again, and got hit with:

ORA-01031: Insufficient privileges; DBUA cannot continue..

..Huh? What? Who?WHY!?…Although Oracle claims the DBUA tells you every step, I just didn’t see it, except a dialog box where I could watch DBUA quit on me..

secret? Check the log file located in the ORACLE 12c BASE..Mind the 12c part..

$ORACLE_base/cfgtoollogs/dbua/logs/trace.log

Since we are trying to upgrade an 11G database to 12c, we have multiple versions of the oracle software…*hint* This is a clue on what went wrong…

This log showed us the actual part where things went wrong…After the screen where the database to be upgraded was selected, the DBUA creates a temporary database instance..and the bailout was on the starting up of this instance. Even when trying the instance startup (the files where located in the $ORACLE_HOME) it bails with exact this message..

Then we went on-line and looked up the normal answers and made things even worse, by setting/adding the parameter SQLNET.AUTHENTICATION_SERVICES = (NTS) in the sqlnet.ora file..This resulted in a total “I’m REALLY not going to start” Oracle HAS stack..nothing would start anymore, since ASM didn’t get on-line anymore..*sigh*. The new error was: “ORA-01017: invalid username/password”. So to put this mildly: DON’T SET THAT PARAMETER IN SQLNET.ORA (yet?). Perhaps later. Just…not now..trust me.

The fix…

We shouldn’t attempt things like this while it’s getting late…since when we rebooted, and logged in, we set the ORACLE_HOME to the 12c version…of ASM! We accidentally set the GRID HOME of Oracle 12c..And this causes the bailout…

So: after cleaning the SQLNET.ora file, and setting the ORACLE_HOME to the DATABASE 12c home, and making sure the PATH setting was correct..we could start the dbua…and it was able to startup the temp instance:

[pool-1-thread-1] [ 2017-02-04 17:09:02.890 CET ] [OsUtilsBase.getBaseFromOrabase:659]  oraBaseUtility /oracle/12c/base/db/bin/orabase
[pool-1-thread-1] [ 2017-02-04 17:09:02.890 CET ] [OsUtilsBase.getBaseFromOrabase:668]  cmds: /oracle/12c/base/db/bin/orabase
[pool-1-thread-1] [ 2017-02-04 17:09:02.890 CET ] [OsUtilsBase.getBaseFromOrabase:672]  envs:  ORACLE_HOME=/oracle/12c/base/db
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [OsUtilsBase.getBaseFromOrabase:682]  baseLocation from orabase
/oracle/12c/base
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [OsUtilsBase.getBaseFromOrabase:707]   orabaseLocation= /oracle/12c/base
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [SQLEngine.getEnvParams:602]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [SQLEngine.getEnvParams:612]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[pool-1-thread-1] [ 2017-02-04 17:09:02.896 CET ] [SQLEngine.initialize:358]  Execing SQLPLUS/SVRMGR process...
[pool-1-thread-1] [ 2017-02-04 17:09:02.897 CET ] [SQLEngine.initialize:395]  m_bReaderStarted: false
[pool-1-thread-1] [ 2017-02-04 17:09:02.897 CET ] [SQLEngine.initialize:399]  Starting Reader Thread...
[pool-1-thread-1] [ 2017-02-04 17:09:03.934 CET ] [OracleHome.initOptionsStopOnError:1370]  executing: startup nomount pfile='/oracle/12c/base/db/dbs/initDBUA0902805.ora'
[pool-1-thread-1] [ 2017-02-04 17:09:16.656 CET ] [OracleHome.initOptionsStopOnError:1372]  DB Options instance startup successful

and onwards with the upgrade to 12c..

p.s.: yes, this was a nice exercise to stay vigilant, but what can I say? I’m only human..(and a lot more wiser now this went wrong in a test environment, rather than production 🙂 )

Posted in 12c, ASM, Databases, upgrade | Leave a comment

OHASD Fails To Start RAC Components…Or..”Please Continue,Sir”!

On Oracle 11.2.0.4, when a node crashes, and the storage disks are a bit late to the party (as being the cause of the crash!), when starting OHASD the manual way, it can happen it says it’s starting, kicks in “reboot” mode…and does nothing else..

Root cause is easy to find on Google, when the cause is known.

When having this issue: stop and start the following script:

/etc/init.d/gcstartup

This script can cause the /etc/init.d/ohasd script to hang..

It’s a short hint, but sometimes short hints can help out in a pinch..

Posted in Databases, RAC | Leave a comment

The ASM ora-04031 error, better known as: Adding more disks to an ASM Instance than there is memory..

To accommodate a change of strategy  I needed to add an additional 20 disks to the current ASM instance I had running..This needed to be done in the shortest time possible, and when the Storage and Linux Admin where done, I could start and create the ASM Disks and diskgroups. Labeling the disks with

# oracleasm createdisk DATA099 /dev/lun99

went fine for all the LUNs, however when I wanted to created the disk groups in ASM, I was suddenly greeted by an

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool", .... )

message..and there was no more possibility to add another disk group. This made sense, when checking the SGA of the running instance:

$ sqlplus / as sysasm
sql> show parameters sga
< output here>

Fortunately: the current disks/disk groups kept running despite the out of memory of the ASM. I’m not sure if this was by design, but I at least was happy it kept humming along, especially since this ASM instance hosts all the DEV/TST environments and they needed to keep running until we had a convenient time to shut all the databases neatly down.

There are at least two solutions to this issue:

A) Allocate more memory to the ASM instance with a shutdown.

B) Prevent this issue from happening in the first place by enlarging the memory before adding disks.

In my case I’m using AMM in conjunction with /dev/shm and not ASMM with HugePages. Make sure the /dev/shm can accommodate the new setting you plan to use, otherwise the instance won’t start at the next boot/restart of the instance!

Posted in ASM, Databases, Technical Stuff | Tagged , , | Leave a comment

When Adding A Node To RAC fails..Or.. “Try It Again, Sammy!”

Normally when adding a new node to a RAC Cluster is fairly simple, provided one keeps the OS the same, and specifically the Linux kernel EXACTLY the same..But as always, it’s different in the real world where reality is creating havoc..

So, as it happened in our case: the addNode failed..to be more specific: it failed multiple times on various points.

This is our story… (ok, watched to many episodes of Law&Order)..

The first issue was: the voting disk was not found..

Error:
[ CSSD][1686370048]clssnmvDiskVerify: discovered a potential voting file
[ CSSD][2286941952]clssnmvDiskVerify: TOC format mismatch expected(0x634c7373 0x546f636b), found(0x0000 0x0000)

ASM was able to find the disks, however not able to determine the voting disk. Strange issue, but after some searching it had to do with the newer kmod-oracleasm (kmod-oracleasm-2.0.8-13.el6_8.x86_64) we had installed, where we had a lower version on the other nodes. For details that lead to the solution: Doc ID 1994371.1

In short: add to the oracleasm config file (/etc/sysconfig/oracleasm) the following:

ORACLEASM_USE_LOGICAL_BLOCK_SIZE=true 

(in the Doc ID it states to set this to ‘false’ but this was not logically for us..)

The above solved the  issue with the voting disk.

The second issue we ran into was: ASM instance refused to start..

Error:
GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:open failed with status: 2
GIM-00091: OS failure message: No such file or directory
GIM-00092: OS failure occurred at: sskgmsmr_7

When running the root.sh multiple times due to troubleshooting the first error, the file hc_* in $GRID_HOME/dbs/hc was created, but due to the voting disk error, never completed correctly.

Removing this file, and cleaning the whole oracle home of traces of this DB solved the issue, and enabled us to re-run the root.sh installation step to succes.

Posted in ASM, Databases, RAC | Tagged , | Leave a comment

Clone Database From NetBackup…Or..”Where Is The Tape, Watson?”

When cloning a live database, this has it’s impact on the live database..open door, I know, but still, it’s a fact. This is not always what we want, since the users can have a lot of issues and even on a test database, deadlines seem to apply.

To remedy this issue, and get the most out of our investment, we decided to clone a database (CLONE! not restore, this is another post on this blog!) from the latest backup we’ve had from the source database. Since we have it, why not using it?

The main reason? Speed. The source database was not loaded in any way at all, and the Netbackup infrastructure is basically not used during business hours, since the backups take place at night. When I try to sleep, or play with the XBox One..;-)

To get started, what do we need?

  • Empty storage to hold the cloned database.
  • The location of the database backup (backup server and backup client names).
  • Enough permissions to read the backup.
  • This blog. 😉

First: check if the backup actually succeeded. Yes, you can asume, but this really is an issue when you have to restore and find out the version you need is not really accessible.

If there is some information that seems to “just be there”, like where is the control file listing coming from in this post, please search the blog, basically all key information is documented on this blog. Or just send me a message, and I’ll clarify and update this post.

Let’s roll!

Open a terminal on the server where the clone needs to be located. There we need to determine the backup we want to use for the clone:

$ /usr/openv/netbackup/bin/bplist -S backup_Server -C backup_client -t 4 -l -R  -I -keyword "*DUMMY*" /

This provides us with the following list:

-rw-rw---- oracle    dba          30932992 Nov 02 02:08 /c-377112624-20161102-02
-rw-rw---- oracle    dba          30932992 Nov 02 02:07 /ctrl_dDUMMY_uvprjsa2u_s3065_p1_t926820446
-rw-rw---- oracle    dba          30932992 Nov 02 02:06 /c-377112624-20161102-01
-rw-rw---- oracle    dba         736886784 Nov 02 02:05 /arch_dDUMMY_uvnrjs9uj_s3063_p1_t926820307
-rw-rw---- oracle    dba          30932992 Nov 02 02:04 /c-377112624-20161102-00
-rw-rw---- oracle    dba         281542656 Nov 02 00:35 /bk_dDUMMY_uvlrjs4n1_s3061_p1_t926814945
-rw-rw---- oracle    dba         327942144 Nov 01 23:00 /bk_dDUMMY_uvkrjrv4v_s3060_p1_t926809247
-rw-rw---- oracle    dba          30932992 Nov 01 02:08 /c-377112624-20161101-02
-rw-rw---- oracle    dba          30932992 Nov 01 02:07 /ctrl_dDUMMY_uvirjplmk_s3058_p1_t926734036
-rw-rw---- oracle    dba          30932992 Nov 01 02:06 /c-377112624-20161101-01
-rw-rw---- oracle    dba         756023296 Nov 01 02:04 /arch_dDUMMY_uvgrjplhv_s3056_p1_t926733887
-rw-rw---- oracle    dba          30932992 Nov 01 02:03 /c-377112624-20161101-00
-rw-rw---- oracle    dba         280494080 Nov 01 00:39 /bk_dDUMMY_uverjpgio_s3054_p1_t926728792
-rw-rw---- oracle    dba         325320704 Oct 31 23:00 /bk_dDUMMY_uvdrjpaot_s3053_p1_t926722845

Optionally the SPFILE can be restored and altered to be used by the clone as a startup, but since I maintain a standard template, it’s just as easy in our situation to create the pfile by hand. Saves a restore run. But use the step below if you want to restore the SPFILE also:

We want to use the latest backup, so select the following backupfile to restore:

/c-377112624-20161102-02

Create a connection to the netbackup server and restore the spfile.

RMAN> run {
allocate channel t1 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=client),SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
send 'NB_ORA_SERV=server,NB_ORA_CLIENT=client';
restore spfile to '/tmp/spfile' from '/c-377112624-20161102-02';
release channel t1;
}

Copy the file over to the correct location, start an instance with this, create pfile, alter settings, create new spfile, start a new instance with this spfile..etc..You know the drill…

NOTE: Before we can restore the spfile, make sure the clone instance is available in NOMOUNT state! Otherwise RMAN cannot do much to help us..

NOTE: derived from some documentation, it might not hurt to set the following environment setting before starting RMAN:

$ export NB_ORA_CLIENT=client

In order to do the clone, we need to connect to the live database, however the data will be retrieved from the backup.

As such we need to connect RMAN like this:

$ rlwrap rman target sys/secret@exp04 auxiliary sys/secret@exp05

Recovery Manager: Release 11.2.0.4.0 - Production on Wed Nov 2 12:22:20 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: EXP04 (DBID=344562624)
connected to auxiliary database: EXP05 (not mounted)

RMAN> run {
allocate channel t1 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=client),SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';
allocate auxiliary channel stby01 type disk;
allocate auxiliary channel chan01 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=client),SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so64';

send 'NB_ORA_SERV=server,NB_ORA_CLIENT=client';
SET NEWNAME FOR DATABASE TO '+DATA5DG';
DUPLICATE DATABASE TO exp05
NOFILENAMECHECK;
}

sent command to channel: stby01
sent command to channel: chan01

executing command: SET NEWNAME

Starting Duplicate Db at 02-NOV-16

contents of Memory Script:
{
sql clone "alter system set  control_files =
''+DATA5DG/EXP05/controlfile/current.258.926857701'', ''+FRA5DG/EXP05/controlfile/current.257.926857701'' comment=
''Set by RMAN'' scope=spfile";
sql clone "alter system set  db_name =
''EXP04'' comment=
''Modified by RMAN duplicate'' scope=spfile";
sql clone "alter system set  db_unique_name =
''EXP05'' comment=
''Modified by RMAN duplicate'' scope=spfile";
shutdown clone immediate;
startup clone force nomount
restore clone primary controlfile;
alter clone database mount;
}
executing Memory Script

sql statement: alter system set  control_files =   ''+DATA5DG/EXP05/controlfile/current.258.926857701'', ''+FRACCDM5DG/EXP05/controlfile/current.257.926857701'' comment= ''Set by RMAN'' scope=spfile

sql statement: alter system set  db_name =  ''EXP04'' comment= ''Modified by RMAN duplicate'' scope=spfile

sql statement: alter system set  db_unique_name =  ''EXP05'' comment= ''Modified by RMAN duplicate'' scope=spfile

Oracle instance shut down

Oracle instance started

Total System Global Area   12827369472 bytes

Fixed Size                     2265224 bytes
Variable Size               3925872504 bytes
Database Buffers            8858370048 bytes
Redo Buffers                  40861696 bytes
allocated channel: stby01
channel stby01: SID=926 device type=DISK
allocated channel: chan01
channel chan01: SID=992 device type=SBT_TAPE
channel chan01: Veritas NetBackup for Oracle - Release 7.6 (2013111313)

Starting restore at 02-NOV-16

channel chan01: starting datafile backup set restore
channel chan01: restoring control file
channel chan01: reading from backup piece c-377112624-20161102-02
} channel chan01: piece handle=c-377112624-20161102-02 tag=TAG20161102T020823
channel chan01: restored backup piece 1
channel chan01: restore complete, elapsed time: 00:01:05
output file name=+DATA5DG/EXP05/controlfile/current.258.926857701
output file name=+FRA5DG/EXP05/controlfile/current.257.926857701
Finished restore at 02-NOV-16

database mounted

contents of Memory Script:
{
set until scn  6402052692;
set newname for datafile  1 to
"+DATA5DG";
set newname for datafile  2 to


executing command: SET NEWNAME
Starting restore at 02-NOV-16

channel chan01: starting datafile backup set restore
channel chan01: specifying datafile(s) to restore from backup set
channel chan01: restoring datafile 00002 to +DATA05DG
channel chan01: reading from backup piece bk_dEXP04_usjrjha6t_s2963_p1_t926460125
...

As is visible above: the controlfile is restored from tape, and the rest of the restore also gets the data from the backup lib, NOT from the live database, although we do have a connection to this.

After the data reading from tape is done, the output should look like the following:

...
datafile 77 switched to datafile copy
input datafile copy RECID=76 STAMP=926872015 file name=+DATA5DG/EXP05/datafile/za_indx.314.926870347
datafile 78 switched to datafile copy
input datafile copy RECID=77 STAMP=926872015 file name=+DATA5DG/EXP05/datafile/es_indx.315.926870413

Reenabling controlfile options for auxiliary database
Executing: alter database force logging

contents of Memory Script:
{
Alter clone database open resetlogs;
}
executing Memory Script

database opened
Finished Duplicate Db at 02-NOV-16
released channel: t1
released channel: stby01
released channel: chan01

RMAN>

And that’s it! When the process is complete, and has no errors (most will be related with running out of storage space, so check this first).

Note: This is a very careful approach, but it’s up to the DBA to allocate a lot more channels to speed up this process. I normally hit the four channels and complete the process a lot quicker than with just one..

Posted in ASM, backup, Databases, Technical Stuff | Tagged , , | Leave a comment