Setting up Oracle Golden Gate between 2 Oracle instances (asynchronous)

January 29, 2010

Quick Installation guide.

Oracle GoldenGate version is 10.4 for RHEL 5 and Oracle Databases version is 10.2.0.3 (unpatched 10.2 because our live DB has the same patchlevel)

Requirements:

2 databases GGSRC (source) and GGDEST (destination)
Identical schemas in databases we want to replicate (SCOTT in our example)
Network connection between two databases.
Ports need to be opened in range 7809 – 7950 in both directions
Golden Gate installation pack: ggs_redhatAS50_x64_ora10g_64bit_v10.4.0.19_002.tar.gz (freely downloadable from oracle.com)

Preinstallation tasks:

 1. Tell database to log more (supplemental logging ~10% of more redo writing) on source side.

alter database add supplemental log data;

 2. GoldenGate user creation on source and target side

useradd -d /data/home/gguser gguser

3. I preferred to put GoldenGate database user tables into different tablespace. So create the tablespace for both databases like:

Create tablespace OGGDATA datafile ‘/u01/oradata/OGGSRC/oggdata01.dbf’ size 1000M autoextend on next 100M;

4. GoldenGate schema creation into source and target database

create user OGG identified by  OGG default tablespace OGGDATA temporary tablespace TEMP profile DEFAULT;
alter user OGG QUOTA UNLIMITED ON OGGDATA;
grant CONNECT to OGG;
grant CREATE SESSION to OGG;
grant RESOURCE to OGG;
grant SELECT ANY TABLE to OGG;
grant ALTER SESSION to OGG;
grant CREATE TABLE to OGG;
grant FLASHBACK ANY TABLE to OGG;
grant SELECT ANY DICTIONARY to OGG;
grant DBA to OGG; 

(some of the privileges do not make any sense because most of them are covered in roles CONNECT and RESOURCE)
(Never do not give DBA privileges to OGG, I did it because GG doc asked me to give insert update delete privileges for replication schema, I thought later that this was for symmetric GG set up. Then you should give object privileges for certain schema objects)

Installation part:

1. Log in or su as gguser (golden gate os user we created before), create directory for GG binaries (OGG – as Oracle Golden Gate is good), unpack your installation pack and that’s all. You have to do this on the both sides.

mkdir OGG
cd OGG

Copy you installation pack in there and:

gunzip ggs_redhatAS50_x64_ora10g_64bit_v10.4.0.19_002.tar.gz
tar –xvf ggs_redhatAS50_x64_ora10g_64bit_v10.4.0.19_002.tar

2. set up your environment:

Easiest way is to add these 4 lines into your bash_profile.

(Of course edit these lines according to your current environment.)
On the source side:

export ORACLE_HOME=/data/app/oracle/product/10.2.0
export ORACLE_SID=GGSRC
export LD_LIBRARY_PATH=/data/app/oracle/product/10.2.0/lib
export PATH=$PATH:$ORACLE_HOME/bin

On the destination side:

export ORACLE_HOME=/data/app/oracle/product/10.2.0
export ORACLE_SID=GGDEST
export LD_LIBRARY_PATH=/data/app/oracle/product/10.2.0/lib
export PATH=$PATH:$ORACLE_HOME/bin

3. Verify if your main tool ggsci works. In your installation directory execute:

 ./ggsci

If no errors when opening application, then it should work.
That’s all. Installation is done. It’s even a bit weird to describe it, because it’s too simple.

Configuration.

Source side on your ggsci prompt:

GGSCI (srchost) 1> create subdirs
GGSCI (srchost) 1> status all
GGSCI (srchost) 1> edit params mgr

Your default editor has been opened, insert following lines and save:

port 7809
lagreportminutes 5
laginfominutes 1
lagcriticalminutes 2
purgeoldextracts ./dirdat/t*, minkeepdays 2, usecheckpoints

GGSCI (srchost) 1> start mgr
GGSCI (srchost) 1> status all

If manager is running then manager configuration is ok.

GGSCI (srchost) 1> dblogin userid OGG, password OGG
GGSCI (srchost) 1> list tables scott.*

If you see the list of tables then your configuration is good and you can continue:

GGSCI (srchost) 1> add trandata SCOTT.*
GGSCI (srchost) 1> info trandata SCOTT.*

Lets create and configure now the extractor process, edit add these files and save:

GGSCI (srchost) 1> edit params xtst01

extract xtst01
userid OGG, password OGG
discardfile ./dirrpt/xtst01.dsc,purge
reportcount every 15 minutes, rate
exttrail ./dirdat/t1
table SCOTT.*;

GGSCI (srchost) 1> add extract xtst01, tranlog, begin now
GGSCI (srchost) 1> add exttrail ./dirdat/t1, extract xtst01, megabytes 100
GGSCI (srchost) 1> status all
GGSCI (srchost) 1> start xtst01

Extractor created, now we create the data pump process:

GGSCI (srchost) 1> edit params ptst01

extract ptst01
passthru
rmthost desthost, mgrport 7809
rmttrail ./dirdat/t1
table SCOTT.*;

GGSCI (srchost) 1> add extract ptst01, exttrailsource ./dirdat/t1
GGSCI (srchost) 1> add rmttrail ./dirdat/t1, extract ptst01, megabytes 100
GGSCI (srchost) 1> status all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING
EXTRACT     STOPPED     PTST01      00:00:00      00:00:25
EXTRACT     RUNNING     XTST01      00:00:00      00:00:06

We are not going to start pump process yet, because destination is not configured.

Target side on your ggsci prompt:

GGSCI (desthost) 1> create subdirs
GGSCI (desthost) 1> edit params mgr

port 7809
dynamicportlist 7900-7950
lagreportminutes 5
laginfominutes 1
lagcriticalminutes 2
purgeoldextracts ./dirdat/t*, minkeepdays 2, usecheckpoints


GGSCI (desthost) 1> start mgr
GGSCI (desthost) 1> status all
GGSCI (desthost) 1> view report mgr

Manager created and verified, now we are going to create replicat processes. I’ll create the replicat with the 5 minutes lag, I just want replication to be 5 minutes behind (testing purpose):

GGSCI (desthost) 1> edit params rtst01

replicat rtst01
userid OGG, password OGG
discardfile ./dirrpt/rtst01.dsc, purge
assumetargetdefs
reportcount every 15 minutes, rate
batchsql
deferapplyinterval 5 mins
map SCOTT.*, target SCOTT.*;

GGSCI (desthost) 1> add replicat rtst01, exttrail ./dirdat/t1

That should be all about configuration , now last two steps, lets start data pump process on source side and replicat process on destination.

GGSCI (srchost) 1> start ptst01
GGSCI (desthost) 1> start rtst01

Now just check statuses on both sides with “status all” command. If everything is running and nothing abandoned you can check event logs on both sides and activity:

view ggsev
stats xtst01, totalsonly *, reportrate sec
stats rtst01, totalsonly *, reportrate sec
send RTST01, status

That’s all about initial Oracle Golden Gate installation and setup for one asynchronous data stream.

Oracle monitoring on Nagios

January 18, 2010

Intro.

In a one point we asked ourselves, is it a good idea to give Oracle Grid Control for monitoring DB-s to our control room workers (non dba’s, non administrators), who are basically incident reporters and incident managers. We decided to try Nagios (with graphing).

Background.

And as I said, purpose of this job was creating a simple tool for control room workers to get really simplified red/green colored overview.

OEM Grid Control is a super tool, but not so super for simple monitoring, it’s more like a powerful administration tool. In my point of view the other minus of OEM is dependency on agents. My practice is shown that grid agents are fragile and from time to time they just won’t respond. Probably every Grid administrator has seen a message “Agent is not responding” and after a minute or two it’s up again. And what’s more frustrating – you can’t ignore them and you can’t turn them off, because if server dies last thing it sends is “Agent is not responding”. No DB errors because there is no agent anymore. Once I was responsible for EM environment where we had 600 – 700 databases, it was impossible to rely only on Grid monitoring.

And other bad thing, Grid is not for nondba’s and it’s too sophisticated for simple monitoring.

Things should be easy.

I struggled about a week with Nagios to get this thing normally working. Probably because I’m just not too unix/linux administrator. The idea was to combine our shell scripts with graphic interface (Nagios, of course it could be whatever monitoring tool, but Nagios has a biggest community and it’s free, you can combine basically everything into Nagios, it means that nagios is as powerful as good are you on the shell) and draw some graph for illustrating situation and trends. Our scripts output was email. But I’m not big fan of reading tons of emails, especially if I want to see a quick overview, email notifications are maybe good for storing historical data.

Concept.

Shell scripts are running on nagios server and checking oracle with simple SQL queries – no any agent dependency. (I’m not saying that Grid Control is bad, but Oracle’s contcept is just a bit different, not bad, but just doesn’t work as it should.) So our scripts which sent just emails, are going to give data for nagios. For example:

/usr/local/nagios/libexec/check_oracle –tablespace DB_NAME dbsnmp password ALL 95 90

Argument ALL means all tablespaces, thresholds are critical 95 warning 90.

Command executes sqlplus, runs a tablespace free space query and formats answer for nagios and sends it into standard output:

OK – tablespaces SYSAUX=53.60% USERS=5.10% SYSTEM=49.90% ALPEPS_DATA=52.10% ALPEPS_INDEX=.10% | SYSAUX_U=561152Kbytes;943718;996147;0;1048576 USERS_U=1683456Kbytes;30198067;31875737;0;33553408 SYSTEM_U=523264Kbytes;943718;996147;0;1048576 ALPEPS_DATA_U=5460992Kbytes;9437184;9961472;0;10485760 ALPEPS_INDEX_U=1024Kbytes;4718592;4980736;0;5242880

Nagios will show this output in web console as (last line):

The other thing I did, is graphing. For graphing I used free tool – tnt4nagios. I had to use version 4.xx instead of 6.xx because of older PHP version in our Nagios server. But it works as well.

In standard output, everything after pipe is for graphing tool (performance data in nagios terminology). It’s going to be stored in RRD database, and PNP4Nagios queries the graph drawing information from there. It mean’s that, for implementing that, I had to install Nagios, RRD tool and PNP4Nagios. There was no any rpm-s for RHEL 5.4, so I build them all from the source. PNP4Nagios is quite cool, you see these orange symbols after nagios services, if you click there, you can get all the graphs you have for certain service (I snapped first two graphs, because the page is quite long – it contains historical data for each tablespace in 5 graphs – 4 hours, 24 hours, 1 week, 1 month, 1 year):

Ok these graphs were not so interesting because there is no big changes. Thanks to pnp4nagios guys, graphs are scalable, and you can select specific time gaps and etc. Maybe ping is a bit more interesting to check:

Or maybe not J. In a same way you can check db users count, some specific wait events,

Flash recovery area growth, db locks, etc (basically whatever).

The data is stored in a round-robin database in a circular buffer, it means that storage footprint remains the same over time. No any management needed.

By the way, one simple oracle check which goes over sqlplus and does a query in a database, checks anyway automatically:

  • is db alive
  • is listener alive
  • is it possible to create session
  • is network alive
  • is server itself alive

If something goes wrong and its not possible to open sqlplus session, then something is really bad already. And cool thing is, you can see immediately sqlplus error message in Nagios, for example: maximum sessions exceeded, or database shutdown in progress, no listener etc…

Conclusion.

Power of visualizing and simplicity is the key for end users. I’m not going to talk about different nagios features, because it’s just a monitoring tool. But it’s powerful, you can implement everything in there what you are able to write on a shell. It’s easy, dynamic, configurable, handy and synoptic. Ok, maybe only setup wasn’t that easy J

I suggest that tool for visualizing little shell monitoring scripts and acting as a backup Oracle monitoring tool next to Grid Control. Until I haven’t seen something better. So far I haven’t seen other monitoring products, where I could easily implement my own shell monitoring scripts.

Next thing I’m going to try is AppMon. And then maybe Open NMS and Zabbix.