Refer Me for Oracle: 2013

Wednesday, 2 October 2013

Oracle Golden Gate High Availability using Oracle Clusterware

OGG High Availability using Oracle clusterware
=======================================
1) Oracle Golden Gate cluster high availability Pre-requsite.

a. Oracle Golden gate runs on one server at any time.
b. In the event of the failure on one node, the oracle GG can be started on the another node.
c. In order to resume processing on the another node, we need to maintain/store the
recover relagted files (checkpoint file & trailfiles) on shared location.
d. Oracle ACFS is the recommended cluster file system for Oracle Golden Gate binaries and
trail files in Real Application Cluster configurations for ease of management and high availability.

Note: ACFS can be used for Oracle Golden Gate trail files with no restrictions. Oracle GoldenGate installation can be done on ACFS and you can also store the recovery-related files in a cluster configuration in ACFS to make them accessible to all nodes. However if your Oracle Grid Infrastructure version is older than 11.2.0.3 then ACFS mounted on multiple servers concurrently does not currently support file locking, thus you would need to mount ACFS on only one server.

If ACFS is mounted on one server at a time then file locking is supported in pre 11.2.0.3 Grid Infrastructure releases. This file locking issue has been resolved in Oracle Grid Infrastructure release 12c and the fix has been back ported up to version 11.2.0.3.

2) Oracle clusterware

a) Oracle clusterware provides the capability to manage the third-party applications.
b) There are commands to register an application and instruct Oracle Clusterware how to manage the application in a clustered environment.
c) This capability will be used to register the Oracle GoldenGate manager process as an application managed through Oracle Clusterware.
d) Oracle Clusterware can be installed standalone without an Oracle RAC database and still manage a cluster of servers and various applications running on these servers. As such Oracle Clusterware can also be installed on more than just the database servers to form a single cluster.

3) Oracle Golden Gate Installations.

a) You may choose to perform a local installation on every server, or a single installation on a shared file system. You will need shared storage for the recovery-related files. On a Unix/Linux platform you can use a symbolic link to a central location for the shared directories.

4) Virtual IP.

a) Oracle Clusterware uses the concept of a Virtual IP address (VIP) to manage high availability for applications that require incoming network traffic (including the Oracle RAC database).
b) A VIP is an IP address on the public subnet that can be used to access a server. If the server hosting the VIP were to go down, then Oracle Clusterware will migrate the VIP to a surviving server to minimize interruptions for the application accessing the server (through the VIP).
c) This concept enables faster failovers compared to time-out based failovers on a server's actual IP address in case of a server failure.
d) For Oracle GoldenGate, you should use a VIP to access the manager process to isolate access to the manager process from the physical server that is running Oracle GoldenGate. Remote pumps must use the VIP to contact the Oracle GoldenGate manager. The VIP must be an available IP address on the public subnet and cannot be determined through DHCP.
Ask a system administrator for an available fixed IP address for Oracle GoldenGate managed through Oracle Clusterware.

5. We need to instruct Oracle clusterware how to start, stop, check process.

i) Start
a) Oracle GoldenGate manager is the process that starts all other Oracle GoldenGate processes. The only process that Oracle Clusterware should start is the manager process. Use the AUTOSTART parameter in the manager parameter file to start extract and replicat processes. You can use wild cards (AUTOSTART ER *) to start all extract and replicat processes.
b) Also note that once manager is started through Oracle Clusterware, it is Oracle Clusterware that manages its availability. If you would stop manager through the command interface ggsci, then Oracle Clusterware will attempt to restart it. Use the Oracle Clusterware commands to stop Oracle GoldenGate and prevent Oracle Clusterware from attempting to restart it.

ii) check
a) The validation whether Oracle GoldenGate is running is equivalent to making sure the Oracle GoldenGate manager runs.

iii) Stop
a) Stop must stop all Oracle GoldenGate processes, including manager. Stop may be called during a planned downtime (e.g. a server is taken out of a cluster for maintenance reasons) and/or if you manually instruct Oracle Clusterware to relocate Oracle GoldenGate to a different server (e.g. to change the load on a server). If a server crashes then all processes will go down with it, in which case they can be started on another server.

Setup
=====

1 . As of now below setup is running.

a. Source is two node rac where the GG is configured using ACFS.
b. Configured one extract and one pump process on the source.
c. Target is standalone db.
d. One Replicat process is configured in the target.

2. Now we need to register the Golden gate in Oracle cluster ware. We need to use oracle clusterware commands to create, register and set privileges on the VIP and the Oracle Golden gate application.Once registered, use the Oracle Clusterware commands to start, relocate and stop Oracle GoldenGate.

3. Add an application VIP.

a) The first step is to create an application VIP. The VIP will be used to access Oracle GoldenGate.
Oracle Clusterware will assign the VIP to a physical server, and migrate the VIP if that server were to go down or if you instruct Clusterware to do so.

b. Update the below vip in the /etc/hosts file on both the nodes , ( the vip should be on the same subnet of the public ip).

########## VIP FOR GOLDENGATE ################################

192.168.0.22 goldengate-vip.manzoor.com goldengate-vip

c. Create a application VIP using below command as root user.

[root@rhel11gr2rac1 bin]# cd /grid/11.2/bin

[root@rhel11gr2rac1 bin]# ./appvipcfg -help
Production Copyright 2007, 2008, Oracle.All rights reserved
Unknown option: help

Usage: appvipcfg create -network= -ip= -vipname=
-user=[-group=] [-failback=0 | 1]
delete -vipname=

To identifiy the network number execute the below command.

[root@rhel11gr2rac1 bin]# ./crsctl stat res -p | grep -ie.network -ie subnet | grep -ie name -ie subnet
NAME=ora.net1.network
USR_ORA_SUBNET=192.168.0.0

here ora.net1 in NAME denotes the network number which is 1, and the USR_ORA_SUBNET denotes the subnet under which
the vip will be created.

Execute the below command to create the application vip.

./appvipcfg create -network=1 -ip=192.168.0.22 -vipname=goldengate-vip -user=root
Production Copyright 2007, 2008, Oracle.All rights reserved
2013-10-01 23:18:11: Creating Resource Type
2013-10-01 23:18:11: Executing /grid/11.2/bin/crsctl add type app.appvip_net1.type -basetype ora.cluster_vip_net1.type -file /grid/11.2/crs/template/appvip.type
2013-10-01 23:18:11: Executing cmd: /grid/11.2/bin/crsctl add type app.appvip_net1.type -basetype ora.cluster_vip_net1.type -file /grid/11.2/crs/template/appvip.type
2013-10-01 23:18:13: Create the Resource
2013-10-01 23:18:13: Executing /grid/11.2/bin/crsctl add resource goldengate-vip -type app.appvip_net1.type -attr "USR_ORA_VIP=192.168.0.22,START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network),STOP_DEPENDENCIES=hard(ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x',HOSTING_MEMBERS=rhel11gr2rac1.manzoor.com,APPSVIP_FAILBACK="
2013-10-01 23:18:13: Executing cmd: /grid/11.2/bin/crsctl add resource goldengate-vip -type app.appvip_net1.type -attr "USR_ORA_VIP=192.168.0.22,START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network),STOP_DEPENDENCIES=hard(ora.net1.network),ACL='owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x',HOSTING_MEMBERS=rhel11gr2rac1.manzoor.com,APPSVIP_FAILBACK="

d) Now allow the oracle clusterware owner (eg. oracle or grid) to run the script to start the VIP.

execute the below as root.

./crsctl setperm resource goldengate-vip -u user:oracle:r-x

e) As oracle user start the vip.

[oracle@rhel11gr2rac1 bin]$ ./crsctl start resource goldengate-vip
CRS-2672: Attempting to start 'goldengate-vip' on 'rhel11gr2rac1'
CRS-2676: Start of 'goldengate-vip' on 'rhel11gr2rac1' succeeded

f) Check the status of the vip.

[oracle@rhel11gr2rac1 bin]$ ./crsctl stat res goldengate-vip
NAME=goldengate-vip
TYPE=app.appvip_net1.type
TARGET=ONLINE
STATE=ONLINE on rhel11gr2rac1

e) Now we can able to ping the vip from the other nodes. Test it in node 2.

[root@rhel11gr2rac2 ~]# ping 192.168.0.22
PING 192.168.0.22 (192.168.0.22) 56(84) bytes of data.
64 bytes from 192.168.0.22: icmp_seq=1 ttl=64 time=2.29 ms
64 bytes from 192.168.0.22: icmp_seq=2 ttl=64 time=0.389 ms
64 bytes from 192.168.0.22: icmp_seq=3 ttl=64 time=0.372 ms

--- 192.168.0.22 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.372/1.018/2.295/0.903 ms

4) Now develop and agent script.

a) Oracle Clusterware runs resource-specific commands through an entity called an agent.
The agent script must be able to accept 5 parameter values: start, stop, check, clean and abort (optional).

b) Now we will create an script to and will also place the script in the shared location, here we have placed the script under the gg home which will be accessed on both the nodes. (This is the sample script provided by oracle we can also have a customized script as per our requirement).

Script name = gg_monitor_start.sh

#!/bin/sh
#goldengate_action.scr
. ~oracle/.bash_profile
[ -z "$1" ]&& echo "ERROR!! Usage $0 "&& exit 99
GGS_HOME=/golden_gate
#specify delay after start before checking for successful start
start_delay_secs=5
#Include the Oracle GoldenGate home in the library path to start GGSCI
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${GGS_HOME}
#set the oracle home to the database to ensure Oracle GoldenGate will get
#the right environment settings to be able to connect to the database
export ORACLE_HOME=/u01/app/oracle/product/11.2/db
export CRS_HOME=/grid/11.2
#Set NLS_LANG otherwise it will default to US7ASCII

export NLS_LANG=AMERICAN_AMERICA.AL32UTF8
logfile=/tmp/crs_gg_start.log

###########################
function log
###########################
{
DATETIME=`date +%d/%m/%y-%H:%M:%S`
echo $DATETIME "goldengate_action.scr>>" $1
echo $DATETIME "goldengate_action.scr>>" $1 >> $logfile
}
#check_process validates that a manager process is running at the PID
#that Oracle GoldenGate specifies.
check_process () {
dt=`date +%d/%m/%y-%H:%M:%S`
if ( [ -f "${GGS_HOME}/dirpcs/MGR.pcm" ] )
then
pid=`cut -f8 "${GGS_HOME}/dirpcs/MGR.pcm"`
if [ ${pid} = `ps -e |grep ${pid} |grep mgr |awk '{ print $1 }'` ]
then
#manager process is running on the PID . exit success
echo $dt "manager process is running on the PID . exit success">> /tmp/check.out
exit 0
else
#manager process is not running on the PID
echo $dt "manager process is not running on the PID" >> /tmp/check.out
exit 1
fi
else
#manager is not running because there is no PID file
echo $ dt"manager is not running because there is no PID file" >> /tmp/check.out
exit 1
fi
}
#call_ggsci is a generic routine that executes a ggsci command
call_ggsci () {
log "entering call_ggsci"
ggsci_command=$1
#log "about to execute $ggsci_command"
log "id= $USER"
cd ${GGS_HOME}
ggsci_output=`${GGS_HOME}/ggsci << EOF
${ggsci_command}
exit
EOF`
log "got output of : $ggsci_output"
}

case $1 in
'start')
#Updated by Sourav B (02/10/2011)
# During failover if the “mgr.pcm” file is not deleted at the node crash
# then Oracle clusterware won’t start the manager on the new node assuming the
# manager process is still running on the failed node. To get around this issue
# we will delete the “mgr.prm” file before starting up the manager on the new
# node. We will also delete the other process files with pc* extension and to
# avoid any file locking issue we will first backup the checkpoint files and then
# delete them from the dirchk directory.After that we will restore the checkpoint
# files from backup to the original location (dirchk directory).
log "removing *.pc* files from dirpcs directory..."
cd $GGS_HOME/dirpcs
rm -f *.pc*
log "creating tmp directory to backup checkpoint file...."
cd $GGS_HOME/dirchk
mkdir tmp
log "backing up checkpoint files..."
cp *.cp* $GGS_HOME/dirchk/tmp
log "Deleting checkpoint files under dirchk......"
rm -f *.cp*
log "Restore checkpoint files from backup to dirchk directory...."
cp $GGS_HOME/dirchk/tmp/*.cp* $GGS_HOME/dirchk
log "Deleting tmp directory...."
rm -rf tmp
log "starting manager"
call_ggsci 'start manager'

#there is a small delay between issuing the start manager command
#and the process being spawned on the OS . wait before checking
log "sleeping for start_delay_secs"
sleep ${start_delay_secs}
#check whether manager is running and exit accordingly
check_process
;;
'stop')
#attempt a clean stop for all non-manager processes
call_ggsci 'stop er *'
#ensure everything is stopped
call_ggsci 'stop er *!'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'check')
check_process
exit 0
;;
'clean')
#attempt a clean stop for all non-manager processes
call_ggsci 'stop er *'
#ensure everything is stopped
call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
'abort')
#ensure everything is stopped
call_ggsci 'stop er *!'
#in case there are lingering processes
call_ggsci 'kill er *'
#stop manager without (y/n) confirmation
call_ggsci 'stop manager!'
#exit success
exit 0
;;
esac

c) Now we need to add a clusterware resource for the ggate application. As oracle user execute the
below command.

[oracle@rhel11gr2rac1 bin]$ ./crsctl add resource ggateapp -type cluster_resource -attr "ACTION_SCRIPT=/golden_gate/gg_monitor_start.sh,CHECK_INTERVAL=30,START_DEPENDENCIES='hard(goldengate-vip) pullup(goldengate-vip)', STOP_DEPENDENCIES='hard(goldengate-vip)'"

where ggateapp - is the application name we have given for golden gate resource.

START_DEPENDENCIES: there is a hard start dependency on goldengate-vip. This indicates that the VIP and the ggateapp application should always start together.

STOP_DEPENDENCIES: there is a hard stop dependency on goldengate-vip. This indicates that the VIP and the ggateapp application should always stop together.

d) Now set the ownership of the oracle golden gate application if it is different from the oracle clusterware owner eg(ggowner) If oracle goldengate owner is same then ignore the below.

As root execute the below command.

./crsctl setperm resource ggateapp -o ggowner

e) Now start the resource using oracle user.

[oracle@rhel11gr2rac1 bin]$ ./crsctl start res ggateapp
CRS-2672: Attempting to start 'ggateapp' on 'rhel11gr2rac1'
CRS-2676: Start of 'ggateapp' on 'rhel11gr2rac1' succeeded

[oracle@rhel11gr2rac1 bin]$ ./crsctl status res ggateapp
NAME=ggateapp
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on rhel11gr2rac1

[oracle@rhel11gr2rac1 bin]$ ./crsctl stop res ggateapp
CRS-2673: Attempting to stop 'ggateapp' on 'rhel11gr2rac1'
CRS-2677: Stop of 'ggateapp' on 'rhel11gr2rac1' succeeded

-- Now lets check the status in ggsci.

[oracle@rhel11gr2rac1 golden_gate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

GGSCI (rhel11gr2rac1.manzoor.com) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER STOPPED
EXTRACT STOPPED PTBLS 00:00:00 00:00:29
EXTRACT STOPPED XTBLS 00:00:02 00:00:27

--- Its showing stopped..

[oracle@rhel11gr2rac1 bin]$ ./crsctl start res ggateapp
CRS-2672: Attempting to start 'ggateapp' on 'rhel11gr2rac1'
CRS-2676: Start of 'ggateapp' on 'rhel11gr2rac1' succeeded

[oracle@rhel11gr2rac1 golden_gate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

GGSCI (rhel11gr2rac1.manzoor.com) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT RUNNING PTBLS 00:00:00 00:00:10
EXTRACT RUNNING XTBLS 00:00:01 00:00:09

--- Now lets relocate the ggapp to the another node (-- scheduled downtime)

[oracle@rhel11gr2rac1 bin]$ ./crsctl relocate resource ggateapp -f
CRS-2673: Attempting to stop 'ggateapp' on 'rhel11gr2rac1'
CRS-2677: Stop of 'ggateapp' on 'rhel11gr2rac1' succeeded
CRS-2673: Attempting to stop 'goldengate-vip' on 'rhel11gr2rac1'
CRS-2677: Stop of 'goldengate-vip' on 'rhel11gr2rac1' succeeded
CRS-2672: Attempting to start 'goldengate-vip' on 'rhel11gr2rac2'
CRS-2676: Start of 'goldengate-vip' on 'rhel11gr2rac2' succeeded
CRS-2672: Attempting to start 'ggateapp' on 'rhel11gr2rac2'
CRS-2676: Start of 'ggateapp' on 'rhel11gr2rac2' succeeded

-- Lets check the gg process on node 2.

[oracle@rhel11gr2rac2 golden_gate]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

GGSCI (rhel11gr2rac2.manzoor.com) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT RUNNING PTBLS 00:00:00 00:00:04
EXTRACT RUNNING XTBLS 00:00:05 00:00:06

[oracle@rhel11gr2rac1 bin]$ ./crsctl status resource ggateapp
NAME=ggateapp
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on rhel11gr2rac2

-- Now lets check how the failover is working.

Lets crash the node 2...(power of from vmware home)

Below are the staus after the node 2 is down....

Cluster Resources
--------------------------------------------------------------------------------
ggateapp
1 ONLINE OFFLINE
goldengate-vip
1 ONLINE OFFLINE STARTING

Cluster Resources
--------------------------------------------------------------------------------
ggateapp
1 ONLINE ONLINE rhel11gr2rac1
goldengate-vip
1 ONLINE ONLINE rhel11gr2rac1

-- Could see the ggateapp resource and goldengate-vip has been failed over from
node 2 to node 1...

Below is the output from ggsci.

GGSCI (rhel11gr2rac1.manzoor.com) 3> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT ABENDED PTBLS 00:00:00 00:04:10
EXTRACT RUNNING XTBLS 00:00:00 00:00:09

-- The manager and the extract process are started but the pump extract has been abended with the below
error.

2013-10-02 22:20:56 ERROR OGG-01031 There is a problem in network communication, a remote file problem, encryption keys for target and source do not matc
h (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "./dirdat/XT000016" (error 11, Resource temporarily unavailable)).

2013-10-02 22:20:56 ERROR OGG-01668 PROCESS ABENDING.

Source --

GGSCI (rhel11gr2rac1.manzoor.com) 17> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT ABENDED PTBLS 00:00:00 00:08:18
EXTRACT RUNNING XTBLS 00:00:00 00:00:00

GGSCI (rhel11gr2rac1.manzoor.com) 18> info PTBLS

EXTRACT PTBLS Last Started 2013-10-02 22:24 Status ABENDED
Checkpoint Lag 00:00:00 (updated 00:08:24 ago)
Log Read Checkpoint File ./dirdat/XT000019
2013-10-02 22:16:24.415084 RBA 1111

Target --

GGSCI (standalone2.manzoor.com) 2> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
REPLICAT RUNNING RTBLS 00:00:00 00:00:05

GGSCI (standalone2.manzoor.com) 3> info rtbls

REPLICAT RTBLS Last Started 2013-10-02 20:38 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:07 ago)
Log Read Checkpoint File ./dirdat/XT000016
2013-10-02 22:16:24.422522 RBA 1991

GGSCI (standalone2.manzoor.com) 4> send rtbls status

Sending STATUS request to REPLICAT RTBLS ...
Current status: At EOF
Sequence #: 16
RBA: 1991
0 records in current transaction

--- The replicat process shows it is completed all the data and currently it is at end of file.

-- Now we will do an et (extact trail) rollover on the source and

Source--

GGSCI (rhel11gr2rac1.manzoor.com) 21> alter extract ptbls etrollover

2013-10-02 22:34:04 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format, after
starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically.
EXTRACT altered.

GGSCI (rhel11gr2rac1.manzoor.com) 23> start extract ptbls

Sending START request to MANAGER ...
EXTRACT PTBLS starting

GGSCI (rhel11gr2rac1.manzoor.com) 24> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT RUNNING PTBLS 00:00:00 00:00:58
EXTRACT RUNNING XTBLS 00:00:00 00:00:08

GGSCI (rhel11gr2rac1.manzoor.com) 31> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT RUNNING PTBLS 00:00:00 00:00:05
EXTRACT RUNNING XTBLS 00:00:00 00:00:00

GGSCI (rhel11gr2rac1.manzoor.com) 32> info ptbls

EXTRACT PTBLS Last Started 2013-10-02 22:35 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
Log Read Checkpoint File ./dirdat/XT000020
2013-10-02 22:20:46.216461 RBA 1111

--- Target

in target we need to move the replicat to start again from the next sequnce since we have
did an ET rollover on the source.

GGSCI (standalone2.manzoor.com) 3> stop replicat rtbls

Sending STOP request to REPLICAT RTBLS ...
Request processed.

GGSCI (standalone2.manzoor.com) 4> alter replicat rtbls extseqno 17 extrba 0
REPLICAT altered.

GGSCI (standalone2.manzoor.com) 5> start replicat rtbls

-- Now lets update some data on source.

SQL> select count(*) from emp;

COUNT(*)
----------
4000

SQL> begin
2 for i in 4001..5000 loop
3 insert into emp values (i, dbms_random.string('U',30),30);
4 END LOOP;
5 commit;
6 end;
7 /

PL/SQL procedure successfully completed.

SQL> select count(*) from emp;

COUNT(*)
----------
5000

-- Lets check whether it replicated to target.

SQL> select count(*) from emp;

COUNT(*)
----------
5000

---- Reference -- Oracle White Paper—Oracle GoldenGate high availability with Oracle Clusterware

Note - 1527310.1,

Sunday, 14 July 2013

GG using ACFS

Oracle Golden Gate setup in 11gr2 Using ACFS.
=============================================

NOTE:1371067.1 - How To Install/Reinstall Or Deinstall ACFS Modules/Installation Manually?
How To Create A New ACFS Volume & Filesystem And Set The ACFS Filesystem Ownership To A Non-Grid/Oracle OS User? [ID 1322383.1] To BottomTo Bottom
How To Convert A General Purpose ACFS Filesystem To CRS Managed On Release 11.2.0.X? [ID 1517113.1]

1) Create and mount the ACFS file system on both the nodes.

Identify the raw disk which has to be used for creating the ASM disk, we are using oracleasm libraries for
managing the asm devices.

[root@rhel11gr2rac1 ~]# /etc/init.d/oracleasm querydisk /dev/s* | grep "is not marked"
Device "/dev/sda" is not marked as an ASM disk
Device "/dev/sda1" is not marked as an ASM disk
Device "/dev/sda2" is not marked as an ASM disk
Device "/dev/sda3" is not marked as an ASM disk
Device "/dev/sdb" is not marked as an ASM disk
Device "/dev/sdb1" is not marked as an ASM disk
Device "/dev/sdc" is not marked as an ASM disk
Device "/dev/sdd" is not marked as an ASM disk
Device "/dev/sde" is not marked as an ASM disk
Device "/dev/sdf" is not marked as an ASM disk
Device "/dev/sdg" is not marked as an ASM disk
Device "/dev/sdh" is not marked as an ASM disk
Device "/dev/sdi" is not marked as an ASM disk
Device "/dev/sdj" is not marked as an ASM disk
Device "/dev/sdk" is not marked as an ASM disk
Device "/dev/sdk1" is not marked as an ASM disk
Device "/dev/sdl" is not marked as an ASM disk

From the above we need to use the disk /dev/sdk1 for the ACFS file system. Create the asm disk in
node 1.

[root@rhel11gr2rac1 ~]# /etc/init.d/oracleasm createdisk ACFSDISK /dev/sdk1
Marking disk "ACFSDISK" as an ASM disk: [ OK ]

Scan the disk in the node 2, and check whether the created disk is listing there.

[root@rhel11gr2rac2 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks: [ OK ]

[root@rhel11gr2rac2 ~]# /etc/init.d/oracleasm listdisks | grep "ACF*"
ACFSDISK

Create the Diskgroup using this disk.

SQL> select path, header_status , inst_id from GV$asm_disk where header_status = 'PROVISIONED';

PATH HEADER_STATU INST_ID
---------------------------------------- ------------ ----------
ORCL:ACFSDISK PROVISIONED 2
ORCL:ACFSDISK PROVISIONED 1

SQL> CREATE DISKGROUP ACFDG EXTERNAL REDUNDANCY
DISK 'ORCL:ACFSDISK'
ATTRIBUTE 'compatible.asm' = '11.2', 'compatible.advm' = '11.2';

Diskgroup created.

Create the ACFS Volume.

SQL> select name, total_mb, usable_file_mb from V$asm_diskgroup WHERE NAME = 'ACFDG'

NAME TOTAL_MB USABLE_FILE_MB
------------------------------ ---------- --------------
ACFDG 9759 9707

SQL> ALTER DISKGROUP ACFDG ADD VOLUME ACFSVOLUME size 9650M;

Diskgroup altered.

SQL> select name, total_mb, usable_file_mb from V$asm_diskgroup WHERE name = 'ACFDG';

NAME TOTAL_MB USABLE_FILE_MB
------------------------------ ---------- --------------
ACFDG 9759 40

Using asmcmd verify the volume that we have created.

$asmcmd

ASMCMD> volinfo -a
Diskgroup Name: ACFDG

Volume Name: ACFSVOLUME
Volume Device: /dev/asm/acfsvolume-184
State: ENABLED
Size (MB): 9664
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:

Now create a mount pain to mount this acfs volume on both the nodes.

# mkdir /golden_gate

Create the ACFS filesystem on the acfsvolume.

# [root@rhel11gr2rac1 ~]# mkfs -t acfs -b 4k /dev/asm/acfsvolume-184
mkfs.acfs: version = 11.2.0.3.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/acfsvolume-184
mkfs.acfs: volume size = 10133438464
mkfs.acfs: Format complete.

Register the acfs filesystem on the new acfs volume.

[root@rhel11gr2rac1 ~]# acfsutil registry -f -a /dev/asm/acfsvolume-184 /golden_gate
acfsutil registry: mount point /golden_gate successfully added to Oracle Registry

Now mont the file system.

[root@rhel11gr2rac1 ~]# mount.acfs -o all

or

mount -t acfs /dev/asm/acfsvolume-184 /golden_gate

[root@rhel11gr2rac1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 21G 18G 2.0G 90% /
/dev/sda1 289M 17M 258M 6% /boot
tmpfs 816M 563M 253M 69% /dev/shm
/dev/sdb1 20G 173M 19G 1% /emcontrol
/dev/asm/acfsvolume-184
9.5G 56M 9.4G 1% /golden_gate

[root@rhel11gr2rac1 ~]# ls -ld /golden_gate/
drwxrwx--- 4 root dba 4096 Jul 13 17:14 /golden_gate/

[root@rhel11gr2rac1 ~]# chown oracle:oinstall /golden_gate/

[root@rhel11gr2rac1 ~]# ls -ld /golden_gate/
drwxrwx--- 4 oracle oinstall 4096 Jul 13 17:14 /golden_gate/

Now mount the diskgroup on the node2.

ASMCMD> mount acfdg
ORA-15032: not all alterations performed
ORA-15202: cannot create additional ASM internal change segment
ORA-15041: diskgroup "ACFDG" space exhausted (DBD ERROR: OCIStmtExecute)

Unable to mount the diskgroup on the node 2, shows space exhausted, as per oracle support
we need to remove some files in the diskgroup, but here we have used to mount acfs so we
will try the reduce the size to 9g in node 1.

[root@rhel11gr2rac1 ~]# acfsutil size -500m /golden_gate/
acfsutil size: ACFS-03162: Warning: The file system was resized, but an error occurred while resizing the ADVM volume.
File system size now 9184 MB, volume size 9664 MB.

Not able to resize the volume online, so drop and recreate the volumegroup.

SQL> alter diskgroup acfdg drop volume acfsvolume
/

Diskgroup altered.

SQL> ALTER DISKGROUP ACFDG ADD VOLUME ACFSVOLUME size 9100m;

Diskgroup altered.

SQL> select name, total_mb, usable_file_mb from V$asm_diskgroup WHERE NAME = 'ACFDG';

NAME TOTAL_MB USABLE_FILE_MB
------------------------------ ---------- --------------
ACFDG 9759 581

[oracle@rhel11gr2rac1 ~]$ asmcmd
ASMCMD> volinfo -a
Diskgroup Name: ACFDG

Volume Name: ACFSVOLUME
Volume Device: /dev/asm/acfsvolume-184
State: ENABLED
Size (MB): 9120
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage:
Mountpath:

ASMCMD> exit

[oracle@rhel11gr2rac1 ~]$ su - root
Password:

[root@rhel11gr2rac1 ~]# mkfs -t acfs -b 4k /dev/asm/acfsvolume-184
mkfs.acfs: version = 11.2.0.3.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/acfsvolume-184
mkfs.acfs: volume size = 9563013120
mkfs.acfs: Format complete.

[root@rhel11gr2rac1 ~]# acfsutil registry -f -a /dev/asm/acfsvolume-184 /golden_gate
acfsutil registry: ACFS-03161: Device : /dev/asm/acfsvolume-184 : Mount Point : /golden_gate already exists in the Oracle Registry

[root@rhel11gr2rac1 ~]# mount.acfs -o all

[root@rhel11gr2rac1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 21G 18G 2.4G 88% /
/dev/sda1 289M 17M 258M 6% /boot
tmpfs 816M 563M 253M 69% /dev/shm
/dev/sdb1 20G 173M 19G 1% /emcontrol
/dev/asm/acfsvolume-184
9.0G 83M 8.9G 1% /golden_gate

check in node 2.

ASMCMD> volinfo -a
Diskgroup Name: ACFDG

Volume Name: ACFSVOLUME
Volume Device: /dev/asm/acfsvolume-184
State: ENABLED
Size (MB): 9120
Resize Unit (MB): 32
Redundancy: UNPROT
Stripe Columns: 4
Stripe Width (K): 128
Usage: ACFS
Mountpath: /golden_gate

ASMCMD> exit
[oracle@rhel11gr2rac2 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 21G 15G 5.3G 74% /
/dev/sda1 289M 17M 258M 6% /boot
tmpfs 816M 563M 253M 69% /dev/shm
/dev/asm/acfsvolume-184
9.0G 83M 8.9G 1% /golden_gate

Now it is mounted.

Register this mountpoint in CRS.

Get the information of acfs volumes.

[root@rhel11gr2rac1 ~]# acfsutil registry
Mount Object:
Device: /dev/asm/acfsvolume-184
Mount Point: /golden_gate
Disk Group: ACFDG
Volume: ACFSVOLUME
Options: none
Nodes: all

[root@rhel11gr2rac1 bin]# ./srvctl add filesystem -d /dev/asm/acfsvolume-184 -g ACFDG -v ACFSVOLUME -m /golden_gate -u oracle
PRCA-1023 : Unable to add ACFS file system resource ora.acfdg.acfsvolume.acfs for disk group acfdg and volume acfsvolume
PRCA-1046 : File system for volume device /dev/asm/acfsvolume-184 exists in the ACFS registry

To convert an ACFS mount registry based file system, first delete it from the registry. You don't need to unmount or stop it.
You must be root or asmadmin to perform this operation.

So first delete it from the registry.

This operation will not remove or otherwise disturb the files that might already be stored on that filesystem:

[root@rhel11gr2rac1 bin]# acfsutil registry -d /dev/asm/acfsvolume-184
acfsutil registry: successfully removed ACFS volume /dev/asm/acfsvolume-184 from Oracle Registry

[root@rhel11gr2rac1 bin]# ./srvctl add filesystem -d /dev/asm/acfsvolume-184 -g ACFDG -v ACFSVOLUME -m /golden_gate -u oracle

Now start the filesystem.

[oracle@rhel11gr2rac1 ~]$ srvctl start filesystem -d /dev/asm/acfsvolume-184 -n rhel11gr2rac1
[oracle@rhel11gr2rac1 ~]$ srvctl start filesystem -d /dev/asm/acfsvolume-184 -n rhel11gr2rac2

[oracle@rhel11gr2rac1 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.acfdg.acfsvolume.acfs
ONLINE ONLINE rhel11gr2rac1 mounted on /golden_
gate
ONLINE ONLINE rhel11gr2rac2 mounted on /golden_
gate

Install the ogg on /golden_gate mount point.

Unzip and untar the software.

[oracle@rhel11gr2rac1 golden_gate]$ ./ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230_FBO
Linux, x64, 64bit (optimized), Oracle 11g on Apr 23 2012 08:32:14

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

GGSCI (rhel11gr2rac1.manzoor.com) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER STOPPED

GGSCI (rhel11gr2rac1.manzoor.com) 2> create subdirs

Creating subdirectories under current directory /golden_gate

Parameter files /golden_gate/dirprm: already exists
Report files /golden_gate/dirrpt: created
Checkpoint files /golden_gate/dirchk: created
Process status files /golden_gate/dirpcs: created
SQL script files /golden_gate/dirsql: created
Database definitions files /golden_gate/dirdef: created
Extract data files /golden_gate/dirdat: created
Temporary files /golden_gate/dirtmp: created
Stdout files /golden_gate/dirout: created

Connect to db

SQL> alter database add supplemental log data (all) columns;

Database altered.

SQL> create tablespace golden_gate datafile '+DATA/layatest/datafile/golden_gate01.dbf' size 500m;

Tablespace created.

SQL> create user ggate identified by Gate_123 default tablespace golden_gate;

User created.

Grant the below privs:-

exec dbms_goldengate_auth.grant_admin_privilege('ggate');
GRANT CONNECT,RESOURCE TO ggate;
GRANT SELECT ANY DICTIONARY, SELECT ANY TABLE TO ggate;
GRANT FLASHBACK ANY TABLE TO ggate;
GRANT CREATE TABLE TO ggate;
grant dba to ggate;
grant insert any table, update any table, delete any table to ggate;
GRANT EXECUTE ON DBMS_FLASHBACK TO ggate;
GRANT EXECUTE ON UTL_FILE TO ggate;

Execute the below sql scripts.

SQL> @/golden_gate/marker_setup.sql
SQL> @/golden_gate/ddl_setup.sql
SQL> @/golden_gate/ddl_enable.sql
SQL> @/golden_gate/role_setup.sql
SQL> @/golden_gate/ddl_pin.sql

Connect to ggsci.

> edit params ./GLOBALS

GGSCHEMA GGATE
CHECKPOINTTABLE GGATE.CHKPTBL

=== MGR process parameters============================

--------------------------------------
----------MGR PROCESS PARAMETERS -----
--------------------------------------
USERID GGATE@NCMSSIT12, PASSWORD Gate_123
PORT 7810
DYNAMICPORTLIST 7900-7950
LAGCRITICALMINUTES 5

=======================================================

> start mgr

GGSCI (rhel11gr2rac1.manzoor.com) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING

GGSCI (rhel11gr2rac1.manzoor.com) 2>

GGSCI (rhel11gr2rac1.manzoor.com) 2> dblogin userid ggate password Gate_123
Successfully logged into database.

GGSCI (rhel11gr2rac1.manzoor.com) 3> add checkpointtable ggate.chkptbl

Successfully created checkpoint table ggate.chkptbl.

Test user
=========

SQL> create tablespace manzoor datafile '+DATA/layatest/datafile/manzoor01.dbf' size 300m;

Tablespace created.

SQL> create user manzoor identified by manzoor default tablespace manzoor;

User created.

SQL> grant connect, resource to manzoor;

Grant succeeded.

SQL> grant dba to manzoor;

Grant succeeded.

SQL> conn manzoor/manzoor
Connected.

SQL> create table dept (dept_id number, department_name varchar2(30));

Table created.

SQL> alter table dept add constraint pk_dept_id primary key (dept_id);

Table altered.

SQL> create table emp (emp_id number primary key, emp_name varchar2(30), dept_id number);

Table created.

SQL> alter table emp add constraint fk_dept_id foreign key (dept_id) references dept (dept_id);

Table altered.

GGSCI (rhel11gr2rac1.manzoor.com) 3> dblogin userid ggate, password Gate_123
Successfully logged into database.

GGSCI (rhel11gr2rac1.manzoor.com) 6> add trandata manzoor.dept

Logging of supplemental redo data enabled for table MANZOOR.DEPT.

GGSCI (rhel11gr2rac1.manzoor.com) 7> add trandata manzoor.emp

Logging of supplemental redo data enabled for table MANZOOR.EMP.

GGSCI (rhel11gr2rac1.manzoor.com) 8> edit params xtbls

EXTRACT XTBLS
setenv (NLS_LANG="AMERICAN_AMERICA.AL32UTF8")
userid ggate@layatest, password Gate_123
tranlogoptions dblogreader
discardfile ./dirdat/xtbls.dsc, append
exttrail ./dirdat/XT
ddl include mapped
table manzoor.dept;
table manzoor.emp;

GGSCI (rhel11gr2rac1.manzoor.com) 10> ADD EXTRACT XTBLS, TRANLOG, BEGIN NOW, THREADS 2
EXTRACT added.

GGSCI (rhel11gr2rac1.manzoor.com) 16> add exttrail ./dirdat/XT, extract xtbls, megabytes 200
EXTTRAIL added.

GGSCI (rhel11gr2rac1.manzoor.com) 25> REGISTER EXTRACT XTBLS LOGRETENTION

2013-07-13 19:35:05 INFO OGG-01749 Successfully registered EXTRACT XTBLS to start managing log retention at SCN 1868707.

GGSCI (rhel11gr2rac1.manzoor.com) 2> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER RUNNING
EXTRACT RUNNING XTBLS 00:00:02 00:00:07

If you check the status in node 2 then it will look like as below.

GGSCI (rhel11gr2rac2.manzoor.com) 1> info all

Program Status Group Lag at Chkpt Time Since Chkpt

MANAGER STOPPED
EXTRACT ABENDED XTBLS 00:00:06 00:00:07

sql

SQL> begin
2 for i in 1..10 loop
3 insert into manzoor.dept values (dbms_random.value(1,10),dbms_random.string('U',30));
4 end loop;
5 commit;
6 end;
7 /

PL/SQL procedure successfully completed.

Lets check whether it is captured in the extract.

[oracle@rhel11gr2rac2 dirdat]$ pwd
/golden_gate/dirdat
[oracle@rhel11gr2rac2 dirdat]$ ls -lrt
total 8
-rw-rw-rw- 1 oracle oinstall 256 Jul 13 20:04 xtbls.dsc
-rw-rw-rw- 1 oracle oinstall 2842 Jul 13 20:09 XT000000

------- Yet to complete---------------------------------------------------

http://www.oracle11ggotchas.com/articles/TheImportanceofBoundedRecovery.htm

Friday, 28 June 2013

Locking issues

Testing.

User session 1

MANZ SES_1>delete from emp where empno = 1;
1 row deleted.
--- Not commited.

User Session 2

MANZ_SES_2>delete from emp where empno = 1;

--- Session hanged...

DBA Session.
SQL> select a.sid, a.serial#, a.sql_id, a.event, a.p1text, a.p2text, a.p3text, a.p1, a.p2, a.p3, b.sql_text from V$session a, V$sql b where
     a.sql_hash_value = b.hash_value and a.sql_address = b.address;
       SID    SERIAL# SQL_ID        EVENT                                                            P1TEXT
---------- ---------- ------------- ---------------------------------------------------------------- ----------------------------------------------------------------
P2TEXT                                                           P3TEXT                                                                   P1         P2         P3
---------------------------------------------------------------- ---------------------------------------------------------------- ---------- ---------- ----------
SQL_TEXT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        47         11 1u6pm5aay1kcm enq: TX - row lock contention                                    name|mode                                                        usn<<16 | slot                                                   sequence                                                         1415053318     524313       2611
delete from emp where empno = 1

-- Show the second session is waiting cause of row lock contention.

As per the document.

id1
The first identifier (id1) of the enqueue or global lock takes its value from P2 or
P2RAW. The meaning of the identifier depends on the name (P1).
id2
The second identifier (id2) of the enqueue or global lock takes its value from P3 or
P3RAW. The meaning of the identifier depends on the name (P1).

select * from dba_lock where session_id = 47;

SESSION_ID LOCK_TYPE                  MODE_HELD            MODE_REQUESTED       LOCK_ID1             LOCK_ID2             LAST_CONVERT BLOCKING_OTHERS
---------- -------------------------- -------------------- -------------------- -------------------- -------------------- ------------ --------------------
        47 Transaction                None                 Exclusive            524313               2611                         3382 Not Blocking
        47 AE                         Share                None                 100                  0                           13242 Not Blocking
        47 DML                        Row-X (SX)           None                 76334                0                            3382 Not Blocking
        47 DML                        Row-X (SX)           None                 76335                0                            3382 Not Blocking

Blocking session:-
SQL> select * from dba_lock where blocking_others = 'Blocking';
SESSION_ID LOCK_TYPE                  MODE_HELD            MODE_REQUESTED       LOCK_ID1             LOCK_ID2             LAST_CONVERT BLOCKING_OTHERS
---------- -------------------------- -------------------- -------------------- -------------------- -------------------- ------------ --------------------
        49 Transaction                Exclusive            None                 524313               2611                         3474 Blocking

SQL> select p1, p2, p3 , p1text, p2text, p3text from v$session where sid = 47;

        P1         P2         P3 P1TEXT               P2TEXT               P3TEXT
---------- ---------- ---------- -------------------- -------------------- --------------------
1415053318     524313       2611 name|mode            usn<<16 | slot       sequence

-- From the above for the haning session we could see that
SQL> select p1, p1raw from V$session where sid = 47;
        P1 P1RAW
---------- ----------------
1415053318 0000000054580006

p1 denotes name|mode ( Type of the lock and lock mode -- 5458 denotes the type and 0006 the mode which is exclusive).

The name or "type" of the enqueue or global lock can be determined by looking at the
two high order bytes of P1 or P1RAW. The name is always two characters. Use the
following SQL statement to retrieve the lock name.
select chr(bitand(p1,-16777216)/16777215)||
chr(bitand(p1,16711680)/65535) "Lock"
from v$session where sid = 47;
Lock
--------
TX

p2 denotes usn<<16 | slot ( USN is the UNDO SEGMENT NUMBER AND | SLOT IS THE UNDO SLOT) this can be identified using the value of p2
p2 value is 524313
SQL> select trunc(524313/power(2,16)) USN, mod(524313,power(2,16)) SLOT from dual;
       USN       SLOT
---------- ----------
         8         25
Using the above we can find the blocking session.
SQL> select a.sid, a.serial#, a.event, b.xidusn, b.xidslot from V$session a, V$transaction b where a.taddr = b.addr and b.xidusn = 8 and b.xidslot = 25;
       SID    SERIAL# EVENT                                                                XIDUSN    XIDSLOT
---------- ---------- ---------------------------------------------------------------- ---------- ----------
        49        179 SQL*Net message from client                                               8         25

SQL> select
(select username from v$session where sid=a.sid) blocker, a.sid,' is blocking ',
(select username from v$session where sid=b.sid) blockee, b.sid
from v$lock a, v$lock b
where a.block = 1
and b.request > 0
and a.id1 = b.id1
and a.id2 = b.id2
/
BLOCKER                               SID 'ISBLOCKING' BLOCKEE                               SID
------------------------------ ---------- ------------- ------------------------------ ----------
MANZOOR                                49 is blocking MANZOOR                                47

select username,
v$lock.sid,
trunc(id1/power(2,16)) rbs,
bitand(id1,to_number('ffff','xxxx'))+0 slot,
id2 seq,
lmode,
request
from v$lock, v$session
where v$lock.type = 'TX'
and v$lock.sid = v$session.sid
and v$session.username = 'MANZOOR'
/

USERNAME                              SID        RBS       SLOT        SEQ      LMODE    REQUEST
------------------------------ ---------- ---------- ---------- ---------- ---------- ----------
MANZOOR                                49          8         25       2611          6          0
MANZOOR                                47          8         25       2611          0          6

select
(select username from v$session where sid=a.sid) BLOCKER,
a.sid Blocker_SID,
(select serial# from V$session where sid=a.sid) SERIAL#,
' Is Blocking ',
(select username from V$session where sid=b.sid) Blocked_User,
b.sid Blocked_SID
from v$lock a, v$lock b
where a.block = 1
and b.request > 0
and a.id1 = b.id1
and a.id2 = b.id2
/
BLOCKER                        BLOCKER_SID    SERIAL# 'ISBLOCKING' BLOCKED_USER                   BLOCKED_SID
------------------------------ ----------- ---------- ------------- ------------------------------ -----------
MANZOOR                                 49        179 Is Blocking MANZOOR                                 47

Blocking session using ASH

SQL> select        count(*) cnt,
              session_id sid,
              substr(event,1,30) event,
              mod(p1,16) as lm,
                  sql_id,
                  CURRENT_OBJ# || ' ' || object_name obj
                , o.object_type type
                , CURRENT_FILE# file#
                , CURRENT_BLOCK# block#
            , blocking_session bsid
    from v$active_session_history ash,
         all_objects o
    where
            event like 'enq: T%'
      and o.object_id (+)= ash.current_obj#
   group by event,session_id,p1,sql_ID,CURRENT_OBJ#,OBJECT_NAME,OBJECT_TYPE,CURRENT_FILE#, CURRENT_BLOCK#, BLOCKING_SESSION
order by count(*)
   /
       CNT        SID EVENT                                                                                                                            LM SQL_ID        OBJ                                                                     TYPE                     FILE#     BLOCK#       BSID
---------- ---------- ------------------------------------------------------------------------------------------------------------------------ ---------- ------------- ----------------------------------------------------------------------- ------------------- ---------- ---------- ----------
      1552         47 enq: TX - row lock contention                                                                                                     4 5hnbsyk0zh8p6 -1                                                                                                   0          0         49
      4386         47 enq: TX - row lock contention                                                                                                     6 1u6pm5aay1kcm 76335 EMP                                                               TABLE                        5        142         49

SQL> select a.sample_id,a.sample_time,a.session_id,a.event,
             a.session_state,a.event,a.sql_id,
    a.blocking_session,a.blocking_session_status
    from v$active_session_history a, dba_users u
    where u.user_id = a.user_id
    and u.username = 'MANZOOR';
SAMPLE_ID SAMPLE_TIME                                                                 SESSION_ID EVENT                                                            SESSION
---------- --------------------------------------------------------------------------- ---------- ---------------------------------------------------------------- -------
EVENT                                                            SQL_ID        BLOCKING_SESSION BLOCKING_SE
---------------------------------------------------------------- ------------- ---------------- -----------
    202272 28-JUN-13 10.27.53.538 PM                                                           47 enq: TX - row lock contention                                    WAITING
enq: TX - row lock contention                                    1u6pm5aay1kcm               49 VALID
    202271 28-JUN-13 10.27.52.538 PM                                                           47 enq: TX - row lock contention                                    WAITING
enq: TX - row lock contention                                    1u6pm5aay1kcm               49 VALID

some useful site.
https://sites.google.com/site/embtdbo/wait-event-documentation/oracle-enqueues#TOC-enq:-TX---row-lock-contention-mode-6

Monday, 17 June 2013

Replicat abend issues

1. Issue
========

Replicat abended with the error

2013-06-17 17:08:45 ERROR OGG-01163 Bad column length (7) specified for column COMM_NEED_FLG in table SIEBEL.S_SRV_REQ, maximum allowable length is 4.

2. Steps handled to solve.
==========================

>> -- Problemtic Record from the report file.

2013-06-17 17:08:45 ERROR OGG-01163 Bad column length (7) specified for column COMM_NEED_FLG in table SIEBEL.S_SRV_REQ, maximum allowable length is 4.

***********************************************************************
* ** Run Time Statistics ** *
***********************************************************************

Reading ./dirdat/PQ000364, current RBA 821565342, 26218 records

Report at 2013-06-17 17:08:45 (activity since 2013-06-17 17:06:59)

From Table SIEBEL.S_SRV_REQ to SIEBEL.S_SRV_REQ:
# inserts: 0
# updates: 26218
# deletes: 0
# discards: 0

Last log location read:
FILE: ./dirdat/PQ000364
SEQNO: 364
RBA: 821565342
TIMESTAMP: 2013-06-16 01:33:00.838513
EOF: NO
READERR: 0

>> -- Problemtic record from the logdump file.

logdump> open ./dirdat/PQ000364
logdump> ghdr on
logdump> detail on
logdump> detail data
logdump> ggstoken on
logdump> ggstoken detail
logdump> pos 821565342
logdump> n

==============================================================================
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 4127 (x101f) IO Time : 2013/06/16 01:33:00.838.513
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x01) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 13685 AuditPos : 201126860
Continued : N (x00) RecCount : 1 (x01)

2013/06/16 01:33:00.838.513 FieldComp Len 4127 RBA 821565342
Name: SIEBEL.S_SRV_REQ

After Image: Partition 4 G m
0000 000d 0000 0009 312d 3131 3747 5246 4f00 0100 | ........1-117GRFO...
1500 0032 3031 322d 3031 2d30 363a 3033 3a30 313a | ...2012-01-06:03:01:
3538 0002 000b 0000 0007 312d 524a 5936 5a00 0300 | 58........1-RJY6Z...
1500 0032 3031 322d 3031 2d30 363a 3033 3a30 323a | ...2012-01-06:03:02:
3030 0004 000b 0000 0007 312d 524a 5936 5a00 0500 | 00........1-RJY6Z...
0a00 0000 0000 0000 0000 0100 0600 0500 0000 0130 | ...................0
0007 0005 0000 0001 4e00 0800 0500 0000 014e 0009 | ........N........N..
================================================================================

Note -- The above is the problemetic record where the column comm_need_flg is having some wrong data, also this
particular record is in the middle of the trasaction (TransInd : . (x01)), so all the records from the
begining to this transaction will be rolled backed by ogg.

logdump>n

___________________________________________________________________
Hdr-Ind : E (x45) Partition : . (x04)
UndoFlag : . (x00) BeforeAfter: A (x41)
RecLength : 4409 (x1139) IO Time : 2013/06/16 01:33:00.838.513
IOType : 15 (x0f) OrigNode : 255 (xff)
TransInd : . (x01) FormatType : R (x52)
SyskeyLen : 0 (x00) Incomplete : . (x00)
AuditRBA : 13685 AuditPos : 201130364
Continued : N (x00) RecCount : 1 (x01)

2013/06/16 01:33:00.838.513 FieldComp Len 4409 RBA 821569564

Note -- The above is the next record of the problemetic record. We can start the replicat from this RBA.
Before that we need to handle the previous records in this transaction.

>>-- Add the below parameters in the abended replicat group and start the replicat. This parameter will
commit the records one by one so all the records will be commited and the replicat will get abend when
exactly it reaches the problemtic record.

grouptransops 1
maxtransops 1

>>-- Now take the export of that particualr record from the source and import to the target table.

>>-- Now alter the replicat to the next good RBA which we have identified by logdump (RBA 821569564).

>>-- Remove the params GROUPTRANSOPS 1 and MAXTRANSOPS 1 from the replciat group.

ggsci>dblogin userid <username>, password <password>
ggsci>alter replicat <replicat-name>, extrba 821569564 -- No need to specify the seq no if we are altering on the same trail file seq.
ggsci> start replicat <replicat-name>

Thanks

Sunday, 26 May 2013

Re-Extract scenario in golden gate

Re-Extracting Scenario in Golden gate

Scenario.
1) One extract process which extract the details for 2 tables.
2) One Pump process been configured in the source.
3) Two replicat process on the target.

-- The pump process in the source has been configured to pump for only one table, and that table
is in sync now..
Test case--> The second table which is in the extract process but not added in the pump process
hence we need to reextract data and pump the details of the second table as well.

   --> Stop the pump process on the source.
GGSCI (rhel11gr2rac1.manzoor.com) 3> stop pgrp1
Sending STOP request to EXTRACT PGRP1 ...
Request processed.

   --> Stop all the replicat process on the target --( Since we have the same pump for all the replicat process so we need to stop the replicate to avoid the data corruption issue)


-- Check the currnet rba for all the other replicates, beacuse up to this rba this replicate has been processed the
transactions, hence we need to start the replicat after this.
GGSCI (rhel11gr2rac2.manzoor.com) 2> send rgrp1 status
Sending STATUS request to REPLICAT RGRP1 ...
Current status: At EOF
Sequence #: 9
RBA: 16760430
0 records in current transaction

GGSCI (rhel11gr2rac2.manzoor.com) 3> send rgrp1 logend
Sending LOGEND request to REPLICAT RGRP1 ...
YES.

GGSCI (rhel11gr2rac2.manzoor.com) 13> stop replicat RGRP1
Sending STOP request to REPLICAT RGRP1 ...
Request processed.

GGSCI (rhel11gr2rac2.manzoor.com) 4> stop replicat RGRP2
Sending STOP request to REPLICAT RGRP2 ...
Request processed.

--> We need to identify from where to re extract and pump
    -- in target we will open the trail file and find from where to re extract

Logdump 225 >open ./dirdat/fg000009
Current LogTrail is /golden_gate/dirdat/fg000009
Logdump 226 >ghdr on
Logdump 227 >detail on
Logdump 228 >detail data
Logdump 229 >pos 0
Reading forward from RBA 0
Logdump 233 >n
___________________________________________________________________
Hdr-Ind    :     E (x45)     Partition :     . (x04)
UndoFlag   :     . (x00)     BeforeAfter:     A (x41)
RecLength :    54 (x0036)   IO Time    : 2013/05/25 18:59:37.062.868
IOType     :     5 (x05)     OrigNode   :   255 (xff)
TransInd   :     . (x03)     FormatType :     R (x52)
SyskeyLen :     0 (x00)     Incomplete :     . (x00)
AuditRBA   :        113       AuditPos   : 224964
Continued :     N (x00)     RecCount   :     1 (x01)
2013/05/25 18:59:37.062.868 Insert               Len    54 RBA 1314
Name: MANZOOR.EMP
After Image:                                             Partition 4   G s
0000 000a 0000 0006 3130 3030 3031 0001 0008 0000 | ........100001......
0004 5465 7374 0002 000a 0000 0006 3130 3030 3031 | ..Test........100001
0003 000a 0000 0006 3130 3030 3031                | ........100001
Column     0 (x0000), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001
Column     1 (x0001), Len     8 (x0008)
0000 0004 5465 7374                               | ....Test
Column     2 (x0002), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001
Column     3 (x0003), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001

---> Since its a test case we can identify that above is the first record
not dows then auditpos -- 224964

-- Now go to source and find the rba for this auditpos 224964
-- We need to find out in which trail the auditpos start, lets start with the last two.
Logdump 125 >open ./dirdat/fg000065
Current LogTrail is /golden_gate/dirdat/fg000065
Logdump 116 >ghdr on
Logdump 117 >detail on
Logdump 118 >detail data
Logdump 119 >fileheader on
Logdump 120 >filter include auditrba 224964
Logdump 121 >show filter
Data filters are ENABLED
Include Match ANY
AuditRba   : = 224964 (x0000000000036ec4)
Exclude Match ANY
Logdump 126 >n
___________________________________________________________________
Hdr-Ind    :     E (x45)     Partition :     . (x04)
UndoFlag   :     . (x00)     BeforeAfter:     A (x41)
RecLength :    54 (x0036)   IO Time    : 2013/05/25 18:59:37.000.000
IOType     :     5 (x05)     OrigNode   :   255 (xff)
TransInd   :     . (x00)     FormatType :     R (x52)
SyskeyLen :     0 (x00)     Incomplete :     . (x00)
AuditRBA   :        113       AuditPos   : 224964
Continued :     N (x00)     RecCount   :     1 (x01)
2013/05/25 18:59:37.000.000 Insert               Len    54 RBA 1106
Name: MANZOOR.EMP
After Image:                                             Partition 4   G b
0000 000a 0000 0006 3130 3030 3031 0001 0008 0000 | ........100001......
0004 5465 7374 0002 000a 0000 0006 3130 3030 3031 | ..Test........100001
0003 000a 0000 0006 3130 3030 3031                | ........100001
Column     0 (x0000), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001
Column     1 (x0001), Len     8 (x0008)
0000 0004 5465 7374                               | ....Test
Column     2 (x0002), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001
Column     3 (x0003), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001

Filtering suppressed      2 records

--> We have found the record --- node down the rba -- 1106 , we need re extract pump from this rba.

--> Now inlcude the table in the pump process
extract pgrp1
userid ggate@source, password ggate
rmthost 192.168.0.21, mgrport 7809
rmttrail ./dirdat/fg
passthru
table manzoor.emp;
table manzoor.emp1;

--ALTER EXTRACT PGRP1, EXTSEQNO 65, EXTRBA 1106
--ALTER EXTRACT PGRP1, ETROLLOVER

GGSCI (rhel11gr2rac1.manzoor.com) 7> ALTER EXTRACT PGRP1, ETROLLOVER
2013-05-25 20:37:43 INFO    OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format, after starting the source extract,
issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically.
EXTRACT altered.

GGSCI (rhel11gr2rac1.manzoor.com) 8> ALTER EXTRACT PGRP1, EXTSEQNO 65, EXTRBA 1106
EXTRACT altered.

GGSCI (rhel11gr2rac1.manzoor.com) 2> start extract p*
Sending START request to MANAGER ...
EXTRACT PGRP1 starting

GGSCI (rhel11gr2rac1.manzoor.com) 3> info all
Program     Status      Group       Lag at Chkpt Time Since Chkpt
MANAGER     RUNNING
EXTRACT     RUNNING     PGRP1       00:00:00      00:04:08
EXTRACT     RUNNING     XGRP1       00:00:00      00:00:08

Target

Now in target check whether the 2nd table data are reflected in the trail.

Logdump 365 >open ./dirdat/fg000012
Current LogTrail is /golden_gate/dirdat/fg000012
Logdump 366 >ghdr on
Logdump 367 >detail on
Logdump 368 >detail data
Logdump 369 >pos 0
Logdump 370 > n

Hdr-Ind    :     E (x45)     Partition :     . (x04)
UndoFlag   :     . (x00)     BeforeAfter:     A (x41)
RecLength :    54 (x0036)   IO Time    : 2013/05/25 18:59:36.997.462
IOType     :     5 (x05)     OrigNode   :   255 (xff)
TransInd   :     . (x02)     FormatType :     R (x52)
SyskeyLen :     0 (x00)     Incomplete :     . (x00)
AuditRBA   :        113       AuditPos   : 227352
Continued :     N (x00)     RecCount   :     1 (x01)
2013/05/25 18:59:36.997.462 Insert               Len    54 RBA 1265
Name: MANZOOR.EMP1
After Image:                                             Partition 4   G e
0000 000a 0000 0006 3130 3030 3031 0001 0008 0000 | ........100001......
0004 5465 7374 0002 000a 0000 0006 3130 3030 3031 | ..Test........100001
0003 000a 0000 0006 3130 3030 3031                | ........100001
Column     0 (x0000), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001
Column     1 (x0001), Len     8 (x0008)
0000 0004 5465 7374                               | ....Test
Column     2 (x0002), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001
Column     3 (x0003), Len    10 (x000a)
0000 0006 3130 3030 3031                          | ....100001

--- Yes now the second table data is reflected in the trail file

GGSCI (rhel11gr2rac2.manzoor.com) 1> info all
Program     Status      Group       Lag at Chkpt Time Since Chkpt
MANAGER     RUNNING
REPLICAT    STOPPED     RGRP1       00:00:00      01:10:19
REPLICAT    STOPPED     RGRP2       00:00:00      01:45:25

-- Now alter the first replict to start from when its completed earlier.
16760262

GGSCI (rhel11gr2rac2.manzoor.com) 2> send rgrp1 status
Sending STATUS request to REPLICAT RGRP1 ...
Current status: At EOF
Sequence #: 9
RBA: 16760430
0 records in current transaction

---
Below is the report file of the RGRP1 group.
Last record for the last committed transaction is the following:
___________________________________________________________________
Trail name : ./dirdat/fg000009
Hdr-Ind    :     E (x45)     Partition :     . (x04)
UndoFlag   :     . (x00)     BeforeAfter:     A (x41)
RecLength :    54 (x0036)    IO Time    : 2013-05-25 18:59:56.995530
IOType     :     5 (x05)     OrigNode   :   255 (xff)
TransInd   :     . (x03)     FormatType :     R (x52)
SyskeyLen :     0 (x00)     Incomplete :     . (x00)
AuditRBA   :        115       AuditPos   : 48669052
Continued :     N (x00)     RecCount   :     1 (x01)
2013-05-25 18:59:56.995530 Insert             Len    54 RBA 16760262
Name: MANZOOR.EMP
___________________________________________________________________
Reading ./dirdat/fg000009, current RBA 16760430, 200000 records
Report at 2013-05-25 19:57:24 (activity since 2013-05-25 18:49:12)

===== Need to know the scn for the last transaction up to which we have already completed for the 1st transaction.

Logdump 777 >open ./dirdat/fg000009
Current LogTrail is /golden_gate/dirdat/fg000009
Logdump 778 >ghdr on
Logdump 779 >detail on
Logdump 780 >detail data
Logdump 781 >pos 16760262
Reading forward from RBA 16760262
Logdump 782 >ggstoken on
Logdump 783 >ggstoken detail
Logdump 784 >n
___________________________________________________________________
Hdr-Ind    :     E (x45)     Partition :     . (x04)
UndoFlag   :     . (x00)     BeforeAfter:     A (x41)
RecLength :    54 (x0036)   IO Time    : 2013/05/25 18:59:56.995.530
IOType     :     5 (x05)     OrigNode   :   255 (xff)
TransInd   :     . (x03)     FormatType :     R (x52)
SyskeyLen :     0 (x00)     Incomplete :     . (x00)
AuditRBA   :        115       AuditPos   : 48669052
Continued :     N (x00)     RecCount   :     1 (x01)
2013/05/25 18:59:56.995.530 Insert               Len    54 RBA 16760262
Name: MANZOOR.EMP
After Image:                                             Partition 4   G s
0000 000a 0000 0006 3230 3030 3030 0001 0008 0000 | ........200000......
0004 5465 7374 0002 000a 0000 0006 3230 3030 3030 | ..Test........200000
0003 000a 0000 0006 3230 3030 3030                | ........200000
Column     0 (x0000), Len    10 (x000a)
0000 0006 3230 3030 3030                          | ....200000
Column     1 (x0001), Len     8 (x0008)
0000 0004 5465 7374                               | ....Test
Column     2 (x0002), Len    10 (x000a)
0000 0006 3230 3030 3030                          | ....200000
Column     3 (x0003), Len    10 (x000a)
0000 0006 3230 3030 3030                          | ....200000
GGS tokens:
TokenID x52 'R' ORAROWID         Info x00 Length   20
4141 4153 6859 4141 4641 4141 416e 6441 4271 0001 | AAAShYAAFAAAAndABq..
TokenID x4c 'L' LOGCSN           Info x00 Length    7
3139 3637 3336 36                                 | 1967366
TokenID x36 '6' TRANID           Info x00 Length    9
362e 3232 2e31 3532 38                            | 6.22.1528

== The scn no is 1967366 , hence we need to start the
alter replicat RGRP1, extseqno 09 extrba 0
start replicat RGRP1 aftercsn 1967366

== Now we need to start the 2 nd replicat.
alter replicat RGRP2 EXTSEQNO 12 EXTRBA 0
start replicat rgrp2

=======================