Installing Oracle 12.2.01 On Oracle Linux 7.4..Or..”The Story Of The Tainted Kernel”

When moving from Oracle 11.2.0.4 to a new Oracle 12c environment, we wanted to build everything on the latest and greatest levels…Little did we know we would run into quite a snag..

We installed Oracle Release 7.4, with all the recommended packages. Installing was not an issue, and we proceeded with the installation of the GRID software of Oracle 12.2.0.1..yes, the one with a unpacked oracle grid home and uses gridSetup.sh

All went fine, however..when we reached the script part things started to go askew..

Selection_002.png

The orainstRoot.sh…no problem..

The root.sh however…bailed out at step 14..

Basically the execution of root.sh while installing Grid Infrastructure on RedHat Linux or Oracle Linux with RedHat Compatible Kernel (RHCK) 7 fails with the following error:

CLSRSC-594: Executing installation step 14 of 19: ‘InstallACFS’.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘node01’
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘bedc-odb0001’ has completed
CRS-4133: Oracle High Availability Services has been stopped
CRS-4123: Oracle High Availability Services has been started.

CLSRSC-400: A system reboot is required to continue installing.

…Blah blah…

.. Failed…

Rendering the whole installation exercise useless.

Looking in the installer log file mentioned at the start of the root.sh, we found this:

> ACFS-9154: Loading ‘oracleoks.ko’ driver.
> modprobe: ERROR: could not insert ‘oracleoks’: Unknown symbol in module, or unknown parameter (see dmesg)
> ACFS-9109: oracleoks.ko driver failed to load.

And looking into the dmesg log:

[dmesg]
[ 14.749567] Request for unknown module key ‘Oracle Linux Kernel Module Signing Key: 2bb352412969a3653f0eb6021763408ebb9bb5ab’ err -11
[ 14.749575] oracleafd: loading out-of-tree module taints kernel.
[ 14.749577] oracleafd: module license ‘Oracle Corporation’ taints kernel.
[ 14.749578] Disabling lock debugging due to kernel taint
[ 14.749673] oracleafd: module verification failed: signature and/or required key missing – tainting kernel

Ok, we use a supported OS, so a search on support.oracle.com showed us there was a patch available: p25078431_122010_Linux-x86-64.zip (there is another one, smaller in size using the the same patch number and with ACFS in the name..

– Cleaning everything up (really deleting everything!) and unpacking the linux_grid etc..
– Installed the patch with  ./gridSetup.sh -applyOneOffs /oracle/patch/25078431

Ignore the XDISPLAY messages when installing from an non-x-terminal.

Got the successful message, and started the installation again..

And ran into exact the same issue…

Long story short (actually 1,5 frustrating day of hacking and changing a lot..) we found out the kernel was not matching! What we where running was kernel:

kernel-3.10.0-693.5.2.el7.x86_64.

What the installer was expecting (WITH PATCH!) was

kernel-3.10.0-514.16.1.el7.x86_64 (the original version)

Now the installation went without a hitch! Keep in mind when upgrading the OS, the kernel must not upgrade until the installer is up to par!

 

Advertisements
Posted in 12c, ASM, Databases, upgrade | Tagged | Leave a comment

Quick Setup To Monitor/Detect percieved Slowness…Or…”Put Some MDMA In The Database..”

At some time every DBA has to deal with the dreaded “The database is slow, FIX IT!” issue. To be honest? I’m not the master of tuners, and surely I’m not dealing with these things on a day-to-day basis. Oracle is not THAT bad.

However, these moment do pop-up, and sometimes, indeed, it is something in the database. And it’s up to the DBA to do something. Even if it is NOT in the DB, it’s up to the DBA to find to cause or in the least prove the DB is not the cause of the perceived slowness, but some other external (network/application/wrong query) cause.

There is a TON of information out on the Internet to find the culprit, but I needed a quick recipe to setup an trace session, so at the least I had the information at my disposal. Hence this post..

Tools of the trade:

Create an AWR report
Setup a trace with the user session which is complaining.

Creation of the AWR report is quickly done by running the script:

sql> @?/rdbms/admin/awrrpt.sql
or
sql> @?/rdbms/admin/awrrpti.sql

The difference being:
awrrpt.sql will run on the current instance, and if you want to generate a report for another instance (RAC), the awrrpti.sql script can be used. The script will ask for the database ID and the instance ID.

The interpretation of the AWR report is something for another blog, but at least we have something to look at on the screen to show off our skills.

But what if it is a more specific issue? An user complaining? Instead of wading through a whole AWR report, it can just be a specific query acting up. Or just one user is affected. In such a case, it might be more useful to resort to session tracing.

There a a lot of situations which can occur, like: the user is still logged in, experiencing the issue, the user left for home out of frustration of the slowness, but left the session running. Or just logged out, then went home after filing a complaint the db was slow and no other information *sigh*.

It would take to much time to handle all these cases, but for a future reference, I’ll document the two cases I get to deal with most:

Case 1:
Setup a trace for future logins.

Case 2 ( Note: in a future blog when there is interest or I have more time!) :
Setup a trace for a current logged in session.

Before we dive in the good stuff: there a a multitude of tools and way of doing things, like using oradebug, SET EVENT 10046, DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION,DBMS_SUPPORT.START_TRACE_IN_SESSION or DBMS_MONITOR.SESSION_TRACE_ENABLE.

It would again take to much time to give a thorough overview of the pro and cons, but in short:

oradebug
Old school style way of doing a trace. Honestly? Before my time (when Oracle 8.0 was in it’s prime time).

SET EVENT 10046
ALTER SESSION SET EVENTS ’10046 TRACE NAME CONTEXT FOREVER,LEVEL 12′;
Nice one for tracing your own session, and you are able to “replay” the issue the user experienced with either his/her account or your own privileged account (sometimes this is the case).

DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION
or to be precise: DBMS_SYSTEM.SET_SQL_TRACE_IN_SESSION(SID, SERIAL#, TRUE);
Find the SID and Serial# of the session, and have a go at it..this one is known since Oracle 8i/9i..

Turn it on:

exec dbms_system.set_sql_trace_in_session(5,8888,true);

Turn it off:

exec dbms_system.set_sql_trace_in_session(5,8888,false);

DBMS_SUPPORT.START_TRACE_IN_SESSION;
Before first use: create this package first, it’s not a default available one..

To install, issue the following command:

SQL> @?/rdbms/admin/dbmssupp.sql

Now you can use this package as:

DBMS_SUPPORT.START_TRACE_IN_SESSION(SID, SERIAL#, WAITS, BINDS);

Turn it on:

exec dbms_support.start_trace_in_session(5,8888,true,true);

Turn it off:
exec dbms_support.stop_trace_in_session(5,8888);

DBMS_MONITOR.SESSION_TRACE_ENABLE(SESSION_ID, SERIAL_NUM,BINDS,WAITS);
This is the way Oracle 11g is handling things..

To turn on:
exec dbms_monitor.session_trace_enable(session_id=>5,serial_num=>8888,binds=>true,waits=>true);

To turn off:
exec dbms_monitor.session_trace_disable(session_id=>5,serial_num=>8888);

All (or most) of the above will create a trace file in the trace dir..more on this later.

Client identifier

It would make things a lot easier to identify the users in the database with some more information, instead of wading through a lot of unknown connections etc..

To do this, a logon trigger needs to be created:

CREATE OR REPLACE TRIGGER LOGON_TRIGGER
AFTER LOGON ON DATABASE
DECLARE
v_user_identifier varchar2(64);
BEGIN
SELECT SYS_CONTEXT(‘USERENV’, ‘OS_USER’)
||’:’||
SYS_CONTEXT(‘USERENV’, ‘IP_ADDRESS’)
INTO v_user_identifier
FROM dual;
DBMS_SESSION.SET_IDENTIFIER(v_user_identifier);
END;
/

If the need arise to clear out the CLIENT_IDENTIFIER for a session, just execute the following procedure:

DBMS_SESSION.CLEAR_IDENTIFIER

Case 1: Setup a trace for future logins.

CLIENT_ID_TRACE_ENABLE Procedure
This procedure will enable the trace for a given client identifier globally for the database.
Syntax

DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE(
client_id    IN  VARCHAR2,
waits        IN  BOOLEAN DEFAULT TRUE,
binds        IN  BOOLEAN DEFAULT FALSE,
plan_stat    IN  VARCHAR2 DEFAULT NULL);

To turn on:

EXECUTE DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE(‘janedoe’, TRUE, FALSE);

Note on plan_stat:
Frequency at which we dump row source statistics. Value should be ‘NEVER’, ‘FIRST_EXECUTION’ (equivalent to NULL) or ‘ALL_EXECUTIONS’.

Usage Notes

* The trace will be written to multiple trace files because more than one Oracle shadow process can work on behalf of a given client identifier.

* The tracing is enabled for all instances and persistent across restarts.

Turn off:
EXECUTE DBMS_MONITOR.CLIENT_ID_TRACE_DISABLE (‘janedoe’);

http://docs.oracle.com/cd/E25178_01/server.1111/e16638/sqltrace.htm

Posted in Performance | Tagged , | Leave a comment

DBUA 12c Reports Database Down While It’s Up..Or..”The only way is up! – Yazz- “

After doing meticulously doing the pre-upgrade  steps on a RAC converted to single instance node, we finally could get started on the actual upgrade. The initial idea is to use the DBUA, since it automates a lot of steps, and is supposed to be less error-prone.. Oracle Corp had me with the “automates” step already, but Ok.. 😉

We set the correct environment settings, checked the listener:

$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 10-FEB-2017 12:19:12

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                09-FEB-2017 11:23:14
Uptime                    1 days 0 hr. 55 min. 57 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/12.1.0/grid/network/admin/listener.ora
Listener Log File         /oracle/12.1.0/base/diag/tnslsnr/node02/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "ora12tst" has 2 instance(s).
Instance "ora12tst", status UNKNOWN, has 1 handler(s) for this service...
Instance "ora12tst", status READY, has 2 handler(s) for this service...
The command completed successfully

Yep, perfect.

[oracle@bedr-odb0002 dbs]$ sql

SQL*Plus: Release 11.2.0.4.0 Production on Fri Feb 10 12:20:59 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL>

Awesome.

Let’s go!

From a x-terminal and the Oracle 12c Home:

$/oracle/12.1.0/base/dbhome_1/bin/dbua

selection_007

Yep. This is what we want..

selection_008

Oh, wait…what!? How!? Ok, it is a clone of a RAC, we get that, but down? Nope. Absolutely not.

We could make a long story even longer by sharing all the experiments, but the main things done, where:

Un-raccing (is this even a word?) the database:

SQL> alter system set cluster_database_instances=1 scope=spfile;

System altered.

SQL> alter database disable thread 2;

Database altered.

Delete the unwanted thread and redo logfiles.

SQL> select thread#, group# from v$log order by 1;

THREAD# GROUP#
---------- ----------
1 1
1 5
1 7
2 6
2 3
2 4

6 rows selected.

SQL> alter database drop logfile group 6;

Database altered.

SQL> alter database drop logfile group 3;

Database altered.

SQL> alter database drop logfile group 4;

Database altered.

SQL> select thread#, group# from v$log order by 1;

THREAD# GROUP#
---------- ----------
1 1
1 7
1 5
Database altered.

Drop the unwanted undo tablespace.

SQL> drop tablespace UNDOTBS2 including contents and datafiles;

Tablespace dropped.

Create pfile from spfile.

SQL> create pfile from spfile;

File created.

SQL> shutdown immediate;

SQL> exit

Remove all references for RAC components from the pfile and start the instance using the pfile.

Startup the database.

$ sql

SQL*Plus: Release 11.1.0.7.0 – Production on Wed Mar 9 20:09:28 2011

Copyright (c) 1982, 2008, Oracle. All rights reserved.

Connected to an idle instance.

SQL> create spfile from pfile;

File created.

SQL> startup
ORACLE instance started.

Total System Global Area 1.2827E+10 bytes
Fixed Size                  2265224 bytes
Variable Size            3925872504 bytes
Database Buffers         8858370048 bytes
Redo Buffers               40861696 bytes
Database mounted.
Database opened.
SQL>

But although it cleaned some stuff up, which will benefit the upgrade anyway, it didn’t solve the issue. The DBUA still claimed the database was down.

But wait! There is a new way of doing things (Ok, not really new, but still) called the srvctl utility! Would the DBUA be looking at the service of the database?

Again, long story short: it does.

To get a clean(-ish) start, First remove the service completely, and rebuild it from scratch.

$ srvctl config database -d ora12tst
Database unique name: ora12tst
Database name:
Oracle home: /oracle/11.2/base/db/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ora12tst
Database instances:
Disk Groups:
Mount point paths:
Services:
Type: RAC
Database is administrator managed

$ srvctl remove database -d ora12tst
Remove the database ora12tst? (y/[n]) y

$ srvctl add database -d ora12tst -o /oracle/11.2/base/db/dbhome_1

$ srvctl status database -d ora12tst
Database is not running.
$ srvctl start database -d ora12tst
PRKO-3119 : Database ora12tst cannot be started since it has no configured instances.

$ srvctl config database -d ora12tst
Database unique name: ora12tst
Database name:
Oracle home: /oracle/11.2/base/db/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ora12tst
Database instances:
Disk Groups:
Mount point paths:
Services:
Type: RAC
Database is administrator managed

$ srvctl add instance -d ora12tst -i ora12tst -n node02
$ srvctl start database -d ora12tst
$ srvctl config database -d ora12tst
Database unique name: ora12tst
Database name:
Oracle home: /oracle/11.2/base/db/dbhome_1
Oracle user: oracle
Spfile:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: ora12tst
Database instances: ora12tst
Disk Groups:
Mount point paths:
Services:
Type: RAC
Database is administrator managed

$ srvctl status database -d ora12tst
Instance ora12tst is running on node node02

$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 10-FEB-2017 12:19:12

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 12.1.0.2.0 - Production
Start Date                09-FEB-2017 11:23:14
Uptime                    1 days 0 hr. 55 min. 57 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /oracle/12.1.0/grid/network/admin/listener.ora
Listener Log File         /oracle/12.1.0/base/diag/tnslsnr/node02/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.1)(PORT=1521)))

Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "ora12tst" has 2 instance(s).
Instance "ora12tst", status UNKNOWN, has 1 handler(s) for this service...
Instance "ora12tst", status READY, has 2 handler(s) for this service...
The command completed successfully

$ sql

SQL*Plus: Release 11.2.0.4.0 Production on Fri Feb 10 12:20:59 2017

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> select open_mode from v$database;

OPEN_MODE
--------------------
READ WRITE

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
$

Now we start the DBUA again…press Next on the first screen….

selection_009

YES! This is what we wanted to see…*pfieuw*.

And now… Onwards to the future…

Posted in 12c, Databases, Technical Stuff, upgrade | Leave a comment

Oracle 12c DBUA ORA-01031: Insufficient privileges..Or..”Oracle Really Knows How To Test One’s Nerves..”

During a test run of an upgrade with the DBUA, everything was peachy..did some tests, and some other thingies..Multiple terminals open like anyone not working on Windows does..Started the DBUA 12c just fine…found out: hey, you need to have extra space in the ASM to accommodate the upgrade..so I added some disks, but needed to reboot the testmachine, since, as you know it…NOTHING works as advertised, and Murphy’s law just loves to make itself count..

Long story short: after the reboot, started the DBUA again, and got hit with:

ORA-01031: Insufficient privileges; DBUA cannot continue..

..Huh? What? Who?WHY!?…Although Oracle claims the DBUA tells you every step, I just didn’t see it, except a dialog box where I could watch DBUA quit on me..

secret? Check the log file located in the ORACLE 12c BASE..Mind the 12c part..

$ORACLE_base/cfgtoollogs/dbua/logs/trace.log

Since we are trying to upgrade an 11G database to 12c, we have multiple versions of the oracle software…*hint* This is a clue on what went wrong…

This log showed us the actual part where things went wrong…After the screen where the database to be upgraded was selected, the DBUA creates a temporary database instance..and the bailout was on the starting up of this instance. Even when trying the instance startup (the files where located in the $ORACLE_HOME) it bails with exact this message..

Then we went on-line and looked up the normal answers and made things even worse, by setting/adding the parameter SQLNET.AUTHENTICATION_SERVICES = (NTS) in the sqlnet.ora file..This resulted in a total “I’m REALLY not going to start” Oracle HAS stack..nothing would start anymore, since ASM didn’t get on-line anymore..*sigh*. The new error was: “ORA-01017: invalid username/password”. So to put this mildly: DON’T SET THAT PARAMETER IN SQLNET.ORA (yet?). Perhaps later. Just…not now..trust me.

The fix…

We shouldn’t attempt things like this while it’s getting late…since when we rebooted, and logged in, we set the ORACLE_HOME to the 12c version…of ASM! We accidentally set the GRID HOME of Oracle 12c..And this causes the bailout…

So: after cleaning the SQLNET.ora file, and setting the ORACLE_HOME to the DATABASE 12c home, and making sure the PATH setting was correct..we could start the dbua…and it was able to startup the temp instance:

[pool-1-thread-1] [ 2017-02-04 17:09:02.890 CET ] [OsUtilsBase.getBaseFromOrabase:659]  oraBaseUtility /oracle/12c/base/db/bin/orabase
[pool-1-thread-1] [ 2017-02-04 17:09:02.890 CET ] [OsUtilsBase.getBaseFromOrabase:668]  cmds: /oracle/12c/base/db/bin/orabase
[pool-1-thread-1] [ 2017-02-04 17:09:02.890 CET ] [OsUtilsBase.getBaseFromOrabase:672]  envs:  ORACLE_HOME=/oracle/12c/base/db
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [OsUtilsBase.getBaseFromOrabase:682]  baseLocation from orabase
/oracle/12c/base
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [OsUtilsBase.getBaseFromOrabase:707]   orabaseLocation= /oracle/12c/base
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [SQLEngine.getEnvParams:602]  Default NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[pool-1-thread-1] [ 2017-02-04 17:09:02.895 CET ] [SQLEngine.getEnvParams:612]  NLS_LANG: AMERICAN_AMERICA.AL32UTF8
[pool-1-thread-1] [ 2017-02-04 17:09:02.896 CET ] [SQLEngine.initialize:358]  Execing SQLPLUS/SVRMGR process...
[pool-1-thread-1] [ 2017-02-04 17:09:02.897 CET ] [SQLEngine.initialize:395]  m_bReaderStarted: false
[pool-1-thread-1] [ 2017-02-04 17:09:02.897 CET ] [SQLEngine.initialize:399]  Starting Reader Thread...
[pool-1-thread-1] [ 2017-02-04 17:09:03.934 CET ] [OracleHome.initOptionsStopOnError:1370]  executing: startup nomount pfile='/oracle/12c/base/db/dbs/initDBUA0902805.ora'
[pool-1-thread-1] [ 2017-02-04 17:09:16.656 CET ] [OracleHome.initOptionsStopOnError:1372]  DB Options instance startup successful

and onwards with the upgrade to 12c..

p.s.: yes, this was a nice exercise to stay vigilant, but what can I say? I’m only human..(and a lot more wiser now this went wrong in a test environment, rather than production 🙂 )

Posted in 12c, ASM, Databases, upgrade | Leave a comment

OHASD Fails To Start RAC Components…Or..”Please Continue,Sir”!

On Oracle 11.2.0.4, when a node crashes, and the storage disks are a bit late to the party (as being the cause of the crash!), when starting OHASD the manual way, it can happen it says it’s starting, kicks in “reboot” mode…and does nothing else..

Root cause is easy to find on Google, when the cause is known.

When having this issue: stop and start the following script:

/etc/init.d/gcstartup

This script can cause the /etc/init.d/ohasd script to hang..

It’s a short hint, but sometimes short hints can help out in a pinch..

Posted in Databases, RAC | Leave a comment

The ASM ora-04031 error, better known as: Adding more disks to an ASM Instance than there is memory..

To accommodate a change of strategy  I needed to add an additional 20 disks to the current ASM instance I had running..This needed to be done in the shortest time possible, and when the Storage and Linux Admin where done, I could start and create the ASM Disks and diskgroups. Labeling the disks with

# oracleasm createdisk DATA099 /dev/lun99

went fine for all the LUNs, however when I wanted to created the disk groups in ASM, I was suddenly greeted by an

ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool", .... )

message..and there was no more possibility to add another disk group. This made sense, when checking the SGA of the running instance:

$ sqlplus / as sysasm
sql> show parameters sga
< output here>

Fortunately: the current disks/disk groups kept running despite the out of memory of the ASM. I’m not sure if this was by design, but I at least was happy it kept humming along, especially since this ASM instance hosts all the DEV/TST environments and they needed to keep running until we had a convenient time to shut all the databases neatly down.

There are at least two solutions to this issue:

A) Allocate more memory to the ASM instance with a shutdown.

B) Prevent this issue from happening in the first place by enlarging the memory before adding disks.

In my case I’m using AMM in conjunction with /dev/shm and not ASMM with HugePages. Make sure the /dev/shm can accommodate the new setting you plan to use, otherwise the instance won’t start at the next boot/restart of the instance!

Posted in ASM, Databases, Technical Stuff | Tagged , , | Leave a comment

When Adding A Node To RAC fails..Or.. “Try It Again, Sammy!”

Normally when adding a new node to a RAC Cluster is fairly simple, provided one keeps the OS the same, and specifically the Linux kernel EXACTLY the same..But as always, it’s different in the real world where reality is creating havoc..

So, as it happened in our case: the addNode failed..to be more specific: it failed multiple times on various points.

This is our story… (ok, watched to many episodes of Law&Order)..

The first issue was: the voting disk was not found..

Error:
[ CSSD][1686370048]clssnmvDiskVerify: discovered a potential voting file
[ CSSD][2286941952]clssnmvDiskVerify: TOC format mismatch expected(0x634c7373 0x546f636b), found(0x0000 0x0000)

ASM was able to find the disks, however not able to determine the voting disk. Strange issue, but after some searching it had to do with the newer kmod-oracleasm (kmod-oracleasm-2.0.8-13.el6_8.x86_64) we had installed, where we had a lower version on the other nodes. For details that lead to the solution: Doc ID 1994371.1

In short: add to the oracleasm config file (/etc/sysconfig/oracleasm) the following:

ORACLEASM_USE_LOGICAL_BLOCK_SIZE=true 

(in the Doc ID it states to set this to ‘false’ but this was not logically for us..)

The above solved the  issue with the voting disk.

The second issue we ran into was: ASM instance refused to start..

Error:
GIMH: GIM-00104: Health check failed to connect to instance.
GIM-00090: OS-dependent operation:open failed with status: 2
GIM-00091: OS failure message: No such file or directory
GIM-00092: OS failure occurred at: sskgmsmr_7

When running the root.sh multiple times due to troubleshooting the first error, the file hc_* in $GRID_HOME/dbs/hc was created, but due to the voting disk error, never completed correctly.

Removing this file, and cleaning the whole oracle home of traces of this DB solved the issue, and enabled us to re-run the root.sh installation step to succes.

Posted in ASM, Databases, RAC | Tagged , | Leave a comment