OGG实时同步Oracle到Hadoop

1. 介绍

Oracle GoldenGate在12.3.0.1版本之后,支持实时同步到Hadoop,具体版本[下载]( http://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html)

我这次测试用的版本是最新的12.3.1.1.0版本。在MOS上Oracle相应地放出了一篇OGG for Big Data版本的文档,具体文档号为:Step by Step Installation of 12.2 GoldenGate Replicat for HDFS example (文档 ID 2241503.1),但是这篇文档中只是模拟了目标端,将example中的trail file文件通过OGG Big Data,应用到HDFS中。如果源端是for oracle,那么源端和我们平时配置没有任何差别,这里就不在介绍源端配置了。

2.目标端Hadoop配置

这是比较关键的一步,要验证OGG for Big Data,那首先就得有个Hadoop,我没有去测试2241503.1文档里的Hadoop配置,在前面我自已搭建了一套4节点的Cloudera Hadoop,具体版本号为:

# hadoop version
Hadoop 2.6.0-cdh5.9.0
Subversion http://github.com/cloudera/hadoop -r 1c8ae0d951319fea693402c9f82449447fd27b07
Compiled by jenkins on 2016-10-21T08:10Z
Compiled with protoc 2.5.0
From source with checksum 5448863f1e597b97d9464796b0a451
This command was run using /opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/jars/hadoop-common-2.6.0-cdh5.9.0.jar

JDK使用的为1.7
具体搭建Cloudera Hadoop,这里就不讲了,自行查询相关资料

3.OGG for Big Data目标端配置

3.1 OGG for Big Data安装环境介结

这里OGG是安装在Hadoop集群的1个节点上,我选的是NameNode主节点,其它节点我没有测试,OGG的安装用户是Oracle,Oracle用户需要添加hadoop组,其实OGG安装用户可以随意,2241503.1文档里为ogguser

# usermod -a -G hadoop oracle
# id oracle
uid=503(oracle) gid=504(oinstall) groups=504(oinstall),505(dba),491(hadoop)

3.2 解压OGG软件

$ mkdir -p /data/ogg_bd
$ unzip -q 123110_ggs_Adapters_Linux_x64.zip
$ tar xvf ggs_Adapters_Linux_x64.tar -C /data/ogg_bd

3.3 创建目录

$ ./ggsci
GGSCI (test1) 2> create subdirs

3.4 配置replicat的参数文件

这里将OGG中的样例配置文件、样例trail文件复制过来

$ pwd
/data/ogg_bd
$ cp ./AdapterExamples/big-data/hdfs/* ./dirprm
$ cp ./AdapterExamples/trail/tr000000000 ./dirdat/

添加replicat

GGSCI (test1) 2> add replicat rhdfs, exttrail ./dirdat/tr
REPLICAT added.

GGSCI (test1) 4> info rhdfs

REPLICAT  RHDFS    Initialized  2017-10-16 15:26  Status STOPPED
Checkpoint Lag      00:00:00 (updated 00:00:06 ago)
Log Read Checkpoint  File ./dirdat/tr000000000
                    First Record  RBA 0

GGSCI (test1) 5> view param rhdfs

REPLICAT rhdfs
-- Trail file for this example is located in "AdapterExamples/trail" directory
-- Command to add REPLICAT
-- add replicat rhdfs, exttrail AdapterExamples/trail/tr
TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP QASOURCE.*, TARGET QASOURCE.*;

这里这个replicat的配置,不需要进行任何修改,主要要修改的是dirprm/hdfs.props,先看一下原始内容

$ cat hdfs.props

gg.handlerlist=hdfs

gg.handler.hdfs.type=hdfs
gg.handler.hdfs.includeTokens=false
gg.handler.hdfs.maxFileSize=1g
gg.handler.hdfs.rootFilePath=/ogg1
gg.handler.hdfs.fileRollInterval=0
gg.handler.hdfs.inactivityRollInterval=0
gg.handler.hdfs.fileNameMappingTemplate=${fullyQualifiedTableName}_${groupName}_${currentTimestamp}.txt
gg.handler.hdfs.partitionByTable=true
gg.handler.hdfs.rollOnMetadataChange=true
gg.handler.hdfs.authType=none
gg.handler.hdfs.format=delimitedtext
gg.handler.hdfs.format.includeColumnNames=true

gg.handler.hdfs.mode=tx

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

#Sample gg.classpath for Apache Hadoop
gg.classpath=/var/lib/hadoop/share/hadoop/common/*:/var/lib/hadoop/share/hadoop/common/lib/*:/var/lib/hadoop/share/hadoop/hdfs/*:/var/lib/hadoop/share/hadoop/hdfs/lib/*:/var/lib/hadoop/etc/hadoop/:
#Sample gg.classpath for CDH
#gg.classpath=/opt/cloudera/parcels/CDH/lib/hadoop/client/*:/etc/hadoop/conf
#Sample gg.classpath for HDP
#gg.classpath=/usr/hdp/current/hadoop-client/client/*:/etc/hadoop/conf

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

这个文件里,我改的内容也不多,主要是选择一下Hadoop的gg.classpath路径,我用的是CDH,所以将Sample gg.classpath for Apache Hadoop的gg.classpath注释掉,将Sample gg.classpath for CDH放出来。另外gg.handler.hdfs.rootFilePath参数也需要设置下,这个主要指定了OGG写入到HDFS的哪个目录下,我修改成/ogg,这个目录我在hdfs中手工创建了。修改后的文件内容如下:

$ cat hdfs.props

gg.handlerlist=hdfs

gg.handler.hdfs.type=hdfs
gg.handler.hdfs.includeTokens=false
gg.handler.hdfs.maxFileSize=1g
gg.handler.hdfs.rootFilePath=/ogg
gg.handler.hdfs.fileRollInterval=0
gg.handler.hdfs.inactivityRollInterval=0
gg.handler.hdfs.fileNameMappingTemplate=${fullyQualifiedTableName}_${groupName}_${currentTimestamp}.txt
gg.handler.hdfs.partitionByTable=true
gg.handler.hdfs.rollOnMetadataChange=true
gg.handler.hdfs.authType=none
gg.handler.hdfs.format=delimitedtext
gg.handler.hdfs.format.includeColumnNames=true

gg.handler.hdfs.mode=tx

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

#Sample gg.classpath for Apache Hadoop
#gg.classpath=/var/lib/hadoop/share/hadoop/common/*:/var/lib/hadoop/share/hadoop/common/lib/*:/var/lib/hadoop/share/hadoop/hdfs/*:/var/lib/hadoop/share/hadoop/hdfs/lib/*:/var/lib/hadoop/etc/hadoop/:
#Sample gg.classpath for CDH
gg.classpath=/opt/cloudera/parcels/CDH/lib/hadoop/client/*:/etc/hadoop/conf
#Sample gg.classpath for HDP
#gg.classpath=/usr/hdp/current/hadoop-client/client/*:/etc/hadoop/conf

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

修改完,就可以启动OGG for Big Data目标端了

3.5 启动replicat

先启动MGR

GGSCI (test1) 19> start mgr
Manager started.

启动replicat

GGSCI (test1) 12> start rhdfs

3.6 启动replicat问题1

OGG report中的错误信息:

2017-10-16 16:01:34  INFO    OGG-03059  Operating system character set identified as UTF-8.

2017-10-16 16:01:34  INFO    OGG-02695  ANSI SQL parameter syntax is used for parameter parsing.
REPLICAT rhdfs
TARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.props

2017-10-16 16:01:34  INFO    OGG-15052  Using Java class path: ggjava/ggjava.jar:ggjava/resources/lib/optional/log4j-1.2.17.jar:ggjava/resources/lib/optional/slf4j-log4j12-1.7.25.jar.

Source Context :
  SourceModule            : [gglib.ggdal.adapter.java.jni]
  SourceID                : [/scratch/aime/adestore/views/aime_adc4150335/oggcore/OpenSys/src/gglib/ggdal/Adapter/Java/jni/JNIHelper.cpp]
  SourceMethod            : [CJavaVM]
  SourceLine              : [153]
  ThreadBacktrace        : [17] elements
                          : [/data/ogg_bd/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7fb58890c54e]]
                          : [/data/ogg_bd/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x6ac) [0x7fb5888fcd4c]]
                          : [/data/ogg_bd/libgglog.so(_MSG_(CSourceContext*, int, CMessageFactory::MessageDisposition)+0x2d) [0x7fb5888e631d]]
                          : [/data/ogg_bd/libggjava.so(CJavaVM::CJavaVM(bool, CJavaVMOption const&)+0xf5) [0x7fb58047e795]]
                          : [/data/ogg_bd/libggjava.so(CJavaVM::Create(bool, CJavaVMOption const&)+0x54) [0x7fb58047e974]]
                          : [/data/ogg_bd/libggjava.so(ggs::gglib::ggdal::CJavaAdapter::Open()+0x105) [0x7fb5804543a5]]
                          : [/data/ogg_bd/replicat(ggs::gglib::ggdal::CDALAdapter::Open(ggs::gglib::ggunicode::UString const&)+0x20) [0x81d420]]
                          : [/data/ogg_bd/replicat(GenericImpl::Open(ggs::gglib::ggunicode::UString const&)+0x2c) [0x8068fc]]
                          : [/data/ogg_bd/replicat(odbc_param(char*, char*)+0xb1) [0x7fa461]]
                          : [/data/ogg_bd/replicat(get_infile_params(ggs::gglib::ggapp::ReplicationContextParams&, ggs::gglib::ggdatasource::DataSourceParams&, ggs::gglib::ggdatatarget::DataTargetParams&, ggs::gglib::ggmetadata::MetadataContext&)+0x9878) [0x5cfa68]]
                          : [/data/ogg_bd/replicat() [0x6d93dd]]
                          : [/data/ogg_bd/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x5e) [0x7d371e]]
                          : [/data/ogg_bd/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*)+0x173) [0x7d7ae3]]
                          : [/data/ogg_bd/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0x140) [0x7d8350]]
                          : [/data/ogg_bd/replicat(main+0x3b) [0x6dca2b]]
                          : [/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d79c1ed1d]]
                          : [/data/ogg_bd/replicat(__gxx_personality_v0+0x341) [0x54a409]]

2017-10-16 16:01:34  ERROR  OGG-15050  Error loading Java VM runtime library: (2 No such file or directory).

2017-10-16 16:01:34  ERROR  OGG-01668  PROCESS ABENDING.

解决办法:需要将Java的libjvm.so 和 libjsig.so库文件所在目录加入LD_LIBRARY_PATH环境变量,需要注意的是,LD_LIBRARY_PATH环境变量成效后,需要将MGR也重启一下
具体参考:OGG Big Data Adapter Replicat Abends Complaining JVM Error No such file or directory (文档 ID 2115831.1)

3.6 启动replicat问题2

错误信息

2017-10-16 16:02:27  INFO    OGG-15052  Using Java class path: ggjava/ggjava.jar:ggjava/resources/lib/optional/log4j-1.2.17.jar:ggjava/resources/lib/optional/slf4j-log4j12-1.7.25.jar.

2017-10-16 16:02:27  WARNING OGG-15053  Java method main(([Ljava/lang/String;)V) is not found in class oracle/goldengate/datasource/UserExitMain.

2017-10-16 16:02:27  WARNING OGG-15053  Java method getDataSource(()Loracle/goldengate/datasource/UserExitDataSource;) is not found in class oracle/goldengate/datasource/UserExitMain.

2017-10-16 16:02:27  WARNING OGG-15053  Java method shutdown(()V) is not found in class oracle/goldengate/datasource/UserExitMain.
Exception in thread "main" java.lang.UnsupportedClassVersionError: oracle/goldengate/datasource/UserExitMain : Unsupported major.minor version 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

Source Context :
  SourceModule            : [gglib.ggdal.adapter.java]
  SourceID                : [/scratch/aime/adestore/views/aime_adc4150335/oggcore/OpenSys/src/gglib/ggdal/Adapter/Java/JavaAdapter.cpp]
  SourceMethod            : [HandleJavaException]
  SourceLine              : [243]
  ThreadBacktrace        : [16] elements
                          : [/data/ogg_bd/libgglog.so(CMessageContext::AddThreadContext()+0x1e) [0x7ff38c03254e]]
                          : [/data/ogg_bd/libgglog.so(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x6ac) [0x7ff38c022d4c]]
                          : [/data/ogg_bd/libgglog.so(_MSG_String(CSourceContext*, int, char const*, CMessageFactory::MessageDisposition)+0x39) [0x7ff38c0153b9]]
                          : [/data/ogg_bd/libggjava.so(+0x2e967) [0x7ff38457b967]]
                          : [/data/ogg_bd/libggjava.so(ggs::gglib::ggdal::CJavaAdapter::Open()+0x8b5) [0x7ff38457db55]]
                          : [/data/ogg_bd/replicat(ggs::gglib::ggdal::CDALAdapter::Open(ggs::gglib::ggunicode::UString const&)+0x20) [0x81d420]]
                          : [/data/ogg_bd/replicat(GenericImpl::Open(ggs::gglib::ggunicode::UString const&)+0x2c) [0x8068fc]]
                          : [/data/ogg_bd/replicat(odbc_param(char*, char*)+0xb1) [0x7fa461]]
                          : [/data/ogg_bd/replicat(get_infile_params(ggs::gglib::ggapp::ReplicationContextParams&, ggs::gglib::ggdatasource::DataSourceParams&, ggs::gglib::ggdatatarget::DataTargetParams&, ggs::gglib::ggmetadata::MetadataContext&)+0x9878) [0x5cfa68]]
                          : [/data/ogg_bd/replicat() [0x6d93dd]]
                          : [/data/ogg_bd/replicat(ggs::gglib::MultiThreading::MainThread::ExecMain()+0x5e) [0x7d371e]]
                          : [/data/ogg_bd/replicat(ggs::gglib::MultiThreading::Thread::RunThread(ggs::gglib::MultiThreading::Thread::ThreadArgs*)+0x173) [0x7d7ae3]]
                          : [/data/ogg_bd/replicat(ggs::gglib::MultiThreading::MainThread::Run(int, char**)+0x140) [0x7d8350]]
                          : [/data/ogg_bd/replicat(main+0x3b) [0x6dca2b]]
                          : [/lib64/libc.so.6(__libc_start_main+0xfd) [0x3d79c1ed1d]]
                          : [/data/ogg_bd/replicat(__gxx_personality_v0+0x341) [0x54a409]]

2017-10-16 16:02:27  ERROR  OGG-15051  Java or JNI exception:
java.lang.UnsupportedClassVersionError: oracle/goldengate/datasource/UserExitMain : Unsupported major.minor version 52.0.

2017-10-16 16:02:27  ERROR  OGG-01668  PROCESS ABENDING.

我当前Hadoop用的Java版本是1.7,OGG for Big Data要的版本是1.8,Hadoop用的Java 1.7 版本我将环境变量配到了/etc/profile中,如下:

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_80
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

而OGG for Big Data要的1.8 Java版本,我直接配到了oracle用户的.bash_profile中,内容如下:

export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_144
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export LD_LIBRARY_PATH=/lib:/usr/lib:$JAVA_HOME/jre/lib/amd64/server:$JAVA_HOME/jre/lib/amd64; export LD_LIBRARY_PATH

处理完这两个问题,replicat就可以正常启动了

4. replicat应用结果

GGSCI (test1) 12> stats rhdfs total

Sending STATS request to REPLICAT RHDFS ...

Start of Statistics at 2017-10-16 16:13:06.

Replicating from QASOURCE.TCUSTMER to QASOURCE.TCUSTMER:

*** Total statistics since 2017-10-16 16:10:33 ***
        Total inserts                                      5.00
        Total updates                                      1.00
        Total deletes                                      0.00
        Total discards                                    0.00
        Total operations                                  6.00

Replicating from QASOURCE.TCUSTORD to QASOURCE.TCUSTORD:

*** Total statistics since 2017-10-16 16:10:33 ***
        Total inserts                                      5.00
        Total updates                                      3.00
        Total deletes                                      2.00
        Total discards                                    0.00
        Total operations                                  10.00

End of Statistics.

hdfs中查看文件

$ hdfs dfs -ls /ogg
Found 2 items
drwxr-xr-x  - oracle supergroup          0 2017-10-16 16:10 /ogg/qasource.tcustmer
drwxr-xr-x  - oracle supergroup          0 2017-10-16 16:10 /ogg/qasource.tcustord
$ hdfs dfs -ls /ogg/qasource.tcustmer
Found 1 items
-rw-r--r--  3 oracle supergroup        306 2017-10-16 16:10 /ogg/qasource.tcustmer/QASOURCE.TCUSTMER_RHDFS_2017-10-16_16-10-33.086.txt
$ hdfs dfs -cat /ogg/qasource.tcustmer/QASOURCE.TCUSTMER_RHDFS_2017-10-16_16-10-33.086.txt
IQASOURCE.TCUSTMER2015-11-05 18:45:36.0000002017-10-16T16:10:33.08200000000000000000001956CUST_CODEWILLNAMEBG SOFTWARE CO.CITYSEATTLESTATEWA
IQASOURCE.TCUSTMER2015-11-05 18:45:36.0000002017-10-16T16:10:35.71500000000000000000002126CUST_CODEJANENAMEROCKY FLYER INC.CITYDENVERSTATECO
IQASOURCE.TCUSTMER2015-11-05 18:45:39.0000002017-10-16T16:10:36.30100000000000000000003286CUST_CODEDAVENAMEDAVE'S PLANES INC.CITYTALLAHASSEESTATEFL
IQASOURCE.TCUSTMER2015-11-05 18:45:39.0000002017-10-16T16:10:36.30200000000000000000003462CUST_CODEBILLNAMEBILL'S USED CARSCITYDENVERSTATECO
IQASOURCE.TCUSTMER2015-11-05 18:45:39.0000002017-10-16T16:10:36.30200100000000000000003600CUST_CODEANNNAMEANN'S BOATSCITYSEATTLESTATEWA
UQASOURCE.TCUSTMER2015-11-05 18:45:39.0000002017-10-16T16:10:36.30300100000000000000005100CUST_CODEANNNAMECITYNEW YORKSTATENY

这个文件下载到本地操作系统后,使用vi打开,可以看到,字段间是以^A这个不可见字符隔开

关于紫砂壶

感悟技术人生
此条目发表在GoldenGate分类目录,贴了标签。将固定链接加入收藏夹。

1 则回应给 OGG实时同步Oracle到Hadoop

  1. 德富说:

    大神威武

评论已关闭。