OGG实时同步Oracle到Hadoop HBASE

1. 介绍

前一篇,OGG实时同步Oracle到Hadoop是同步到Hadoop的HDFS下的,目标端内容是文本流模式,其实OGG也支持直接同步到hbase,kfaka等,照旧样例走起。

2.同步到hbase配置过程

将样例过程复制过来

$ pwd
/data/ogg_bd
$ ls -lrt ./AdapterExamples/big-data/hbase/
total 8
-rw-r--r--. 1 oracle oinstall 332 Aug  9 11:34 rhbase.prm
-rw-r-----. 1 oracle oinstall 958 Aug  9 11:34 hbase.props
$ cp ./AdapterExamples/big-data/hbase/* ./dirprm
$ cp ./AdapterExamples/trail/tr000000000 ./dirdat/

这里rhbase.prm是应用了OGG的样例trail文件,这里我就新建一个replicat,应用从ORACLE同步过来的trail文件

GGSCI (test1) 17> add replicat rhbase2, exttrail ./dirdat/tt
REPLICAT added.


GGSCI (test1) 18> edit param rhbase2

REPLICAT rhbase2
sourcedefs /data/ogg_bd/dirdef/def_test
TARGETDB LIBFILE libggjava.so SET property=dirprm/hbase.props
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAP TEST.*, TARGET TEST.*;

OGG同步Oracle到Hadoop的HBASE,也是异构,源库我没有开DDL复制,所以这里用了表模型定义文件,具体如何在源端生成,参数defgen,这里不再啰嗦。
其实OGG BigData支持DDL直接同步的。
hbase.props文件也需要改一下,看一下原始内容:

$ cat hbase.props 

gg.handlerlist=hbase

gg.handler.hbase.type=hbase
gg.handler.hbase.hBaseColumnFamilyName=cf
gg.handler.hbase.keyValueDelimiter=CDATA[=]
gg.handler.hbase.keyValuePairDelimiter=CDATA[,]
gg.handler.hbase.encoding=UTF-8
gg.handler.hbase.pkUpdateHandling=abend
gg.handler.hbase.nullValueRepresentation=CDATA[NULL]
gg.handler.hbase.authType=none
gg.handler.hbase.includeTokens=false

gg.handler.hbase.mode=tx

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

#Sample gg.classpath for Apache HBase
gg.classpath=/var/lib/hbase/lib/*:/var/lib/hbase/conf/:
#Sample gg.classpath for CDH
#gg.classpath=/opt/cloudera/parcels/CDH/lib/hbase/lib/*:/etc/hbase/conf
#Sample gg.classpath for HDP
#gg.classpath=/usr/hdp/current/hbase-client/lib/*:/etc/hbase/conf

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

这里也需要将gg.classpath改成CDH版本,但是需要注意的是,这里没有将CDH client的路径加进来,会导致org/apache/hadoop/conf/Configuration类找不到,
报错信息如下:

ERROR 2017-10-18 12:03:50,213 [main] org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'userExitDataSource' defined in class path resource [oracle/goldengate/datasource/DataSource-context.xml]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [oracle.goldengate.datasource.GGDataSource]: Factory method 'getDataSource' threw exception; nested exception is java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'userExitDataSource' defined in class path resource [oracle/goldengate/datasource/DataSource-context.xml]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [oracle.goldengate.datasource.GGDataSource]: Factory method 'getDataSource' threw exception; nested exception is java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
        at oracle.goldengate.datasource.DataSourceLauncher.<init>(DataSourceLauncher.java:161)
        at oracle.goldengate.datasource.UserExitMain.main(UserExitMain.java:108)
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [oracle.goldengate.datasource.GGDataSource]: Factory method 'getDataSource' threw exception; nested exception is java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
        at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:189)
        at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:588)
        ... 11 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at oracle.goldengate.handler.hbase.operations.HBase1Writer.open(HBase1Writer.java:61)
        at oracle.goldengate.handler.hbase.operations.HBaseWriterFactory.init(HBaseWriterFactory.java:32)
        at oracle.goldengate.handler.hbase.HBaseHandler.init(HBaseHandler.java:236)
        at oracle.goldengate.datasource.AbstractDataSource.addDataSourceListener(AbstractDataSource.java:591)
        at oracle.goldengate.datasource.factory.DataSourceFactory.getDataSource(DataSourceFactory.java:163)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:162)
        ... 12 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 34 more

所以修改后的hbase.props如下:

$ cat hbase.props

gg.handlerlist=hbase

gg.handler.hbase.type=hbase
gg.handler.hbase.hBaseColumnFamilyName=cf
gg.handler.hbase.keyValueDelimiter=CDATA[=]
gg.handler.hbase.keyValuePairDelimiter=CDATA[,]
gg.handler.hbase.encoding=UTF-8
gg.handler.hbase.pkUpdateHandling=abend
gg.handler.hbase.nullValueRepresentation=CDATA[NULL]
gg.handler.hbase.authType=none
gg.handler.hbase.includeTokens=false

gg.handler.hbase.mode=tx

goldengate.userexit.timestamp=utc
goldengate.userexit.writers=javawriter
javawriter.stats.display=TRUE
javawriter.stats.full=TRUE

gg.log=log4j
gg.log.level=INFO

gg.report.time=30sec

#Sample gg.classpath for Apache HBase
#gg.classpath=/var/lib/hbase/lib/*:/var/lib/hbase/conf/:
#Sample gg.classpath for CDH
gg.classpath=/opt/cloudera/parcels/CDH/lib/hadoop/client/*:/opt/cloudera/parcels/CDH/lib/hbase/lib/*:/etc/hbase/conf
#Sample gg.classpath for HDP
#gg.classpath=/usr/hdp/current/hbase-client/lib/*:/etc/hbase/conf

javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=ggjava/ggjava.jar

3. 启动RHBASE2

这样配置完,就可以直接启动啦

GGSCI (test1) 20> start rhbase2

Sending START request to MANAGER ...
REPLICAT RHBASE2 starting

源端数据(Oracle)

SQL> select * from t1 order by id;

        ID NAME
---------- ----------
        1 a
        2 bbbbbbb
        3 c
        4 d
        5 e
        8 h
        9 i

7 rows selected.

目标端(hbase)

hbase(main):021:0> scan 'TEST:T1'
ROW                COLUMN+CELL                                                
1                column=cf:ID, timestamp=1508301981085, value=1            
1                column=cf:NAME, timestamp=1508301981085, value=a          
2                column=cf:ID, timestamp=1508301000924, value=2            
2                column=cf:NAME, timestamp=1508301000924, value=bbbbbbb    
3                column=cf:ID, timestamp=1508300672979, value=3            
3                column=cf:NAME, timestamp=1508300672979, value=c          
4                column=cf:ID, timestamp=1508300673013, value=4            
4                column=cf:NAME, timestamp=1508300673013, value=d          
5                column=cf:ID, timestamp=1508300673034, value=5            
5                column=cf:NAME, timestamp=1508300673034, value=e          
8                column=cf:ID, timestamp=1508300673137, value=8            
8                column=cf:NAME, timestamp=1508300673137, value=h          
9                column=cf:ID, timestamp=1508300870602, value=9            
9                column=cf:NAME, timestamp=1508300870602, value=i          
7 row(s) in 0.0560 seconds

源端插入(Oracle)

SQL> insert into t1 values(10,'jjjjj'); 

1 row created.

SQL> commit;

Commit complete.

目标端(hbase)

hbase(main):023:0> scan 'TEST:T1'
ROW              COLUMN+CELL                                              
1                column=cf:ID, timestamp=1508301981085, value=1          
1                column=cf:NAME, timestamp=1508301981085, value=a        
10              column=cf:ID, timestamp=1508302049239, value=10          
10              column=cf:NAME, timestamp=1508302049239, value=jjjjj    
2                column=cf:ID, timestamp=1508301000924, value=2          
2                column=cf:NAME, timestamp=1508301000924, value=bbbbbbb  
3                column=cf:ID, timestamp=1508300672979, value=3          
3                column=cf:NAME, timestamp=1508300672979, value=c        
4                column=cf:ID, timestamp=1508300673013, value=4          
4                column=cf:NAME, timestamp=1508300673013, value=d        
5                column=cf:ID, timestamp=1508300673034, value=5          
5                column=cf:NAME, timestamp=1508300673034, value=e        
8                column=cf:ID, timestamp=1508300673137, value=8          
8                column=cf:NAME, timestamp=1508300673137, value=h        
9                column=cf:ID, timestamp=1508300870602, value=9          
9                column=cf:NAME, timestamp=1508300870602, value=i        
8 row(s) in 0.0840 seconds

4. 其它

Hbase端可以支持数据的增、删、改,但是当删除的数据不存在时,在目标端的replicat也不abend,照常运行。
以下是OGG BigData中提供的样例,有需要可以自行研究

$ ls -l AdapterExamples/big-data/
total 12
drwxr-xr-x. 2 oracle oinstall  39 Aug  9 11:34 cassandra
drwxr-xr-x. 2 oracle oinstall  39 Aug  9 11:34 dfml
drwxr-xr-x. 2 oracle oinstall 4096 Aug  9 11:34 elasticsearch
drwxr-xr-x. 2 oracle oinstall  75 Aug  9 11:34 flume
drwxr-xr-x. 2 oracle oinstall  41 Aug  9 11:34 hbase
drwxr-xr-x. 2 oracle oinstall  39 Aug  9 11:34 hdfs
drwxr-xr-x. 2 oracle oinstall 4096 Aug  9 11:34 jdbc
drwxr-xr-x. 2 oracle oinstall  80 Aug  9 11:34 kafka
drwxr-xr-x. 2 oracle oinstall  64 Aug  9 11:34 kafka_connect
drwxr-xr-x. 2 oracle oinstall  44 Aug  9 11:34 kinesis_streams
drwxr-xr-x. 4 oracle oinstall  28 Aug  9 11:34 metadata_provider
drwxr-xr-x. 2 oracle oinstall  41 Aug  9 11:34 mongodb
-r--r--r--. 1 oracle oinstall  903 Aug  9 11:34 README.txt

通过测试,OGG for BigData,可以支持将Oracle中的CLOB、BLOB字段同步到HBASE、HDFS里

关于紫砂壶

感悟技术人生
此条目发表在GoldenGate分类目录,贴了标签。将固定链接加入收藏夹。