Recently we have expanded our Rac One Node with an additional node..So we have a two-node Rac One. What’s in a name, right?
This was an ordeal in itself, which I could have blogged about, however it was such an unique situation with so many causes it would probably not contribute much. It helped me gaining even more knowledge about the differences between Rac and Rac One installations, so at least it was not all for naught.
When we finally added the node to the Rac One installation, the last issue arose: the actually relocation of the existing databases was failing! A new created database as a test AFTER the 2nd node was added worked perfectly. But when relocating an existing database (existing as in: before the node was added) was relocated, but the connections dropped anyway! This was not so Transparent as advertised, but after quite some searching the issue was found. And since in my searches on the Internet I couldn’t find a reference to what the steps and prerequisites exactly are for a successful fail-over on a RAC One environment, I decided to put my experiences for you guys on-line..
The current situation:
* A two node RAC One system.
* Node 1 has already 12 databases running on it, never ever where these failed over, and seemingly they are a mix of single instance databases, as well as RAC (*sigh*) databases. How this came to be? Long story, but the keyword is: cloning.
* Node two is up and running,fully configured in the Cluster but containing no databases (yet).
Overview of running processes:
On Node 01:
$/oracle/grid/bin/crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE node01 ONLINE ONLINE node02 ora.LISTENER.lsnr ONLINE ONLINE node01 ONLINE ONLINE node02 ora.asm ONLINE ONLINE node01 Started ONLINE ONLINE node02 Started ora.gsd OFFLINE OFFLINE node01 OFFLINE OFFLINE node02 ora.net1.network ONLINE ONLINE node01 ONLINE ONLINE node02 ora.ons ONLINE ONLINE node01 ONLINE ONLINE node02 ora.registry.acfs ONLINE ONLINE node01 ONLINE ONLINE node02 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE node01 ora.node01.vip 1 ONLINE ONLINE node01 ora.node02.vip 1 ONLINE ONLINE node02 ora.cvu 1 ONLINE ONLINE node01 ora.exp06.db 2 ONLINE ONLINE node01 Open ora.exp06.experiment06.svc 1 ONLINE ONLINE node01 ora.exp07.db 2 ONLINE ONLINE node02 Open ora.exp07.experiment07.svc 1 ONLINE ONLINE node02 ora.oc4j 1 ONLINE ONLINE node01 ora.scan1.vip 1 ONLINE ONLINE node01 [node01:oracle:+ASM1]$
Note:
Database exp07 is running on node 2!
Database exp06 is running on node 1!
On Node 02:
$/oracle/grid/bin/crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.DATA.dg ONLINE ONLINE node01 ONLINE ONLINE node02 ora.LISTENER.lsnr ONLINE ONLINE node01 ONLINE ONLINE node02 ora.asm ONLINE ONLINE node01 Started ONLINE ONLINE node02 Started ora.gsd OFFLINE OFFLINE node01 OFFLINE OFFLINE node02 ora.net1.network ONLINE ONLINE node01 ONLINE ONLINE node02 ora.ons ONLINE ONLINE node01 ONLINE ONLINE node02 ora.registry.acfs ONLINE ONLINE node01 ONLINE ONLINE node02 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE node01 ora.node01.vip 1 ONLINE ONLINE node01 ora.node02.vip 1 ONLINE ONLINE node02 ora.cvu 1 ONLINE ONLINE node01 ora.exp06.db 2 ONLINE ONLINE node02 Open ora.exp06.experiment06.svc 1 ONLINE ONLINE node02 ora.exp07.db 2 ONLINE ONLINE node01 Open ora.exp07.experiment07.svc 1 ONLINE ONLINE node01 ora.oc4j 1 ONLINE ONLINE node01 ora.scan1.vip 1 ONLINE ONLINE node01 [node02:oracle:+ASM2]$
Steps to relocate a database:
Make sure all the resources are created and available:
If database resource is not available:
$ srvctl add database -d ORCL -o /home/of/the/database/db_1
If instance resource is not available:
$ srvctl add instance -d ORCL -i ORCL1 -n server01
Service resource:
$ srvctl add service -d ORCL -s ORACLE01 -r ORCL $ srvctl start service -d ORCL -s ORACLE01
Check the resources:
$ crsctl stat res -t|grep exp06 ora.exp06.db ora.exp06.experiment06.svc
Make sure the connection string is TAF on the client, in the tnsnames.ora:
exp06 = (DESCRIPTION = (LOAD_BALANCE = ON) (FAILOVER = ON) (ADDRESS = (PROTOCOL = TCP)(HOST = scan-adress)(PORT = 1521)) (CONNECT_DATA = (SERVICE_NAME = experiment06) (FAILOVER_MODE = (TYPE = SELECT) (METHOD = BASIC) (RETRIES = 20) (DELAY = 1) ) ) )
Be aware to connect to the service name, not the database!
Make sure the listener is NOT “hardcoded” in the database p/spfile!
This can be confusing and difficult to catch when you are not aware of this (as I initially was):
The correct parameter must be:
SQL >show spparameter listener SID NAME TYPE VALUE -------- ----------------------------- ----------- ---------------------------- * listener_networks string * local_listener string * remote_listener string rac-scan:1521
but when looking at the parameter setting:
SQL >show parameter listener SID NAME TYPE VALUE -------- ----------------------------- ----------- ---------------------------- * listener_networks string * local_listener string (DESCRIPTION=(ADDRESS_LIST=( ADDRESS=(PROTOCOL=TCP)(HOST= bogus.com)(PORT=1521)))) * remote_listener string rac-scan:1521
This is correct, since the scan process is telling the instance at start time who is his listener! But the database needs to have an empty (or none at all) setting in the sp/pfile. If one is set, the scan process WON’T tell the database the new listener and the database drops of the grid while still being online. Connections will also drop, of course.
To clear a set listener string, issue the following command:
SQL> alter system set local_listener ='' scope=spfile;
When this all is set, make sure the database is a RAC One database:
$ srvctl config database -d exp06 -a Database unique name: exp06 Database name: Oracle home: /oracle/base/dbhome01 Oracle user: oracle Spfile: Domain: Start options: open Stop options: immediate Database role: PRIMARY Management policy: AUTOMATIC Server pools: exp06 Database instances: exp06 Disk Groups: Mount point paths: Services: experiment06 Type: RAC Database is enabled Database is administrator managed $
The TYPE is RAC..not quite what we need, so let’s convert this database:
$ srvctl convert database -d exp06 -c RACONENODE -i exp06 Database unique name: exp06 Database name: Oracle home: /oracle/base/erpccst0/dbhome01 Oracle user: oracle Spfile: Domain: Start options: open Stop options: immediate Database role: PRIMARY Management policy: AUTOMATIC Server pools: exp06 Database instances: Disk Groups: Mount point paths: Services: experiment06 Type: RACOneNode Online relocation timeout: 30 Instance name prefix: exp06 Candidate servers: bedc-odb1001 Database is enabled Database is administrator managed
Before the relocation can be successful, check the following values of the sp/pfile:
*.cluster_database=TRUE *.audit_file_dest='/oracle/base/exp06/admin/exp06/adump *.diagnostic_dest='/oracle/base/exp06' exp06_1.core_dump_dest='/oracle/base/exp06/diag/rdbms/exp06/exp06_1/cdump' exp06_2.core_dump_dest='/oracle/base/exp06/diag/rdbms/exp06/exp06_2/cdump' exp06_1.undo_tablespace='UNDOTBS1' exp06_2.undo_tablespace='UNDOTBS2' *.remote_listener='rac-scan-name:1521' REMOVE: local_listener service_names
If these steps above where executed while the “original” database was still running, you MAY need to stop/start the database. Use the srvctl utilities for this.