Monday, June 27, 2016

Migrating OpenTSDB to a different HBase cluster




For the past few months, we've been running OpenTSDB here at Looker to collect all sorts system level metrics along with JMX metrics on our test/development Hadoop clusters.  We run hundreds of tests on hourly intervals across Spark, Presto and Hive and having performance telemetry data that we can review is extremely valuable when troubleshooting any sort of performance related issue.  OpenTSBD is a time series database that allows us to store terabytes of performance metrics at millisecond precision.  It uses HBase as the datastore, and can scale to millions of writes per second.  

Initially when I first setup our HBase cluster, I didn't configure the cluster nodes to use AWS IAM Roles.  As our infrastructure evolves, we're moving more and more to IAM roles.  To the best of my knowledge, you're not able to attach/assign instance roles after an instance is running. As a result I'll need to bring up a new Hbase cluster that has the IAM Roles assigned upon instance creation.  Once I had all the various Ansible roles built, moving the HBase/OpenTSDB data over to a new cluster was relatively straightforward and easy.  In this post, I'll go over the steps I took to snapshot and copy over existing OpenTSDB tables from one cluster to another.  The basic steps involved in this migration are as follows:


  1. Bring up a new HBase cluster.  For this I wrote two different Ansible play books with a number of roles.  This was the most time consuming part since there's a number of Hadoop XML config files that come into play across HDFS, Yarn, and HBase, along with Zookeeper configuration. The first play book was the boot-cluster.yml, and the second was config-cluster.yml


(looker)[ec2-user@ip-10-x-x-x hadoop-hbase]$ ansible-playbook boot-cluster.yml --extra-vars="cluster_id=2"                                                                                                  
 [WARNING]: provided hosts list is empty, only localhost is available

PLAY ***************************************************************************

TASK [provision master node] ***************************************************
changed: [localhost]

TASK [add master to host group] ************************************************
changed: [localhost] 

TASK [wait for SSH to come up on master] ***************************************
ok: [localhost]

TASK [provision slave nodes] ***************************************************
changed: [localhost]

TASK [add slaves to host group] ************************************************
changed: [localhost]

  Once the nodes were/are running, I now install/configure all the necessary software; i.e HDFS, HBase, Yarn, Zookeeper, etc...these were broken into a number of roles with tags which allow for quickly changing/testing configuration across the cluster.

(looker)[ec2-user@ip-10-x-x-x hadoop-hbase]$ ansible-playbook -i /mnt/looker/inventory/ec2.py config-cluster.yml --extra-vars="cluster_id=2"                                                            

PLAY ***************************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [get cluster master] ******************************************************
changed: [localhost]

TASK [get cluster slaves] ******************************************************
changed: [localhost]

TASK [set_fact] ****************************************************************
ok: [localhost]

TASK [set_fact] ****************************************************************
ok: [localhost]

.....
.....
.....

TASK [mapred-site-config : check mapred-site.xml diff] *************************
changed: [10.x.x.x]
changed: [10.x.x.x]
changed: [10.x.x.x]

TASK [mapred-site-config : overwrite with merged mapred-site.xml] **************
changed: [10.x.x.x]
changed: [10.x.x.x]
changed: [10.x.x.x]

PLAY RECAP *********************************************************************
10.x.x.x               : ok=9    changed=5    unreachable=0    failed=0   
10.x.x.x               : ok=9    changed=5    unreachable=0    failed=0   
10.x.x.x               : ok=9    changed=5    unreachable=0    failed=0   
localhost              : ok=7    changed=2    unreachable=0    failed=0   

 
2. Next we need to shutdown collectd which is acting as the TCollector on each of the nodes. This will stop all the metric data from being sent to the OpenTSDB daemon(s)

(looker)[ec2-user@ip-10-x-x-x hadoop-hbase]$ ansible -i /mnt/looker/inventory/ec2.py "tag_Name_looker_emr:tag_looker_app_presto" -m shell -a '/etc/init.d/collectd stop' -u ec2-user -b

  3.  With all the collectd/TCollectors stopped, the TSD daemons can be shutdown and reconfigured. We'll need to update the tsd.storage.hbase.zk_quorum in the opentsdb.conf across all the servers running a TSD daemon. This value should point to the zookeeper nodes for the new HBase cluster. In the example below, I'm changing one of the TSD nodes identified by the tsd_node_id=1 to point to the HBase cluster with the cluster id of 2.

(looker)[ec2-user@ip-10-x-x-x opentsdb]$ ansible-playbook -i /mnt/looker/inventory/ec2.py config-opentsdb.yml  --tags get-zookeeper-nodes,opentsdb-site-config --extra-vars="cluster_id=2 tsd_node_id=1"

PLAY ***************************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [get cluster 2 zookeeper nodes] *******************************************
changed: [localhost]

TASK [set_fact] ****************************************************************
ok: [localhost]

PLAY ***************************************************************************

TASK [setup] *******************************************************************
ok: [10.x.x.x]

TASK [opentsdb-site-config : copy over opentsdb.conf] **************************
ok: [10.x.x.x]

PLAY RECAP *********************************************************************
10.x.x.x               : ok=2    changed=0    unreachable=0    failed=0   
localhost              : ok=4    changed=1    unreachable=0    failed=0   

  4. Now that nothing is being written to the HBase cluster, we can disable the OpenTSDB tables, which flushes table memstore,  and create the snapshots.

hbase(main):056:0> disable 'tsdb'
0 row(s) in 2.3480 seconds
hbase(main):057:0> disable 'tsdb-meta'
0 row(s) in 2.2400 seconds
hbase(main):058:0> disable 'tsdb-tree'
0 row(s) in 2.2300 seconds
hbase(main):059:0> disable 'tsdb-uid'
0 row(s) in 2.2270 seconds

hbase(main):060:0> snapshot 'tsdb','tsdb-snap'
0 row(s) in 0.1240 seconds
hbase(main):061:0> snapshot 'tsdb-meta','tsdb-meta-snap'
0 row(s) in 0.1190 seconds
hbase(main):062:0> snapshot 'tsdb-tree','tsdb-tree-snap'
0 row(s) in 0.1200 seconds
hbase(main):063:0> snapshot 'tsdb-uid','tsdb-uid-snap'
0 row(s) in 0.1200 seconds

hbase(main):064:0> list_snapshots
SNAPSHOT                                TABLE + CREATION TIME
 tsdb-meta-snap                         tsdb-meta (Mon Jun 27 15:52:59 +0000 2016)
 tsdb-snap                              tsdb (Mon Jun 27 15:52:56 +0000 2016)
 tsdb-tree-snap                         tsdb-tree (Mon Jun 27 15:53:02 +0000 2016)
 tsdb-uid-snap                          tsdb-uid (Mon Jun 27 15:53:04 +0000 2016)
4 row(s) in 0.0240 seconds

=> ["tsdb-meta-snap", "tsdb-snap", "tsdb-tree-snap", "tsdb-uid-snap"]


  5. For each of the tables, I run the ExportSnapshot which creates a MapReduce job which copies the data over from one node to another node. Note that HBase doesn't need to be running since this is a HDFS operation. I'm running this operation on the old HBase HMaster node.

[hadoop@ip-10-x-x-x ~]$ /usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'tsdb-snap' -copy-to hdfs://10.x.x.x:8020/hbase -mappers 5 -overwrite
2016-06-27 15:54:19,778 INFO  [main] mapreduce.Job: Job job_1466784934481_0003 running in uber mode : false
2016-06-27 15:54:19,780 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2016-06-27 15:54:29,081 INFO  [main] mapreduce.Job:  map 50% reduce 0%
2016-06-27 15:54:30,090 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2016-06-27 15:54:31,107 INFO  [main] mapreduce.Job: Job job_1466784934481_0003 completed successfully
2016-06-27 15:54:31,260 INFO  [main] snapshot.ExportSnapshot: Finalize the Snapshot Export
2016-06-27 15:54:31,269 INFO  [main] snapshot.ExportSnapshot: Verify snapshot integrity
2016-06-27 15:54:31,307 INFO  [main] snapshot.ExportSnapshot: Export Completed: tsdb-snap

Do the same thing for the tsdb-meta, tsdb-tree, and tsdb-uid tables...


  6. Once the snapshots are created and copied over to the new HBase cluster, check to make sure the snapshots are available and restore them.

hbase(main):005:0> list_snapshots
SNAPSHOT                               TABLE + CREATION TIME
 tsdb-meta-snap                        tsdb-meta (Mon Jun 27 15:52:59 +0000 2016)
 tsdb-snap                             tsdb (Mon Jun 27 15:52:56 +0000 2016)
 tsdb-tree-snap                        tsdb-tree (Mon Jun 27 15:53:02 +0000 2016)
 tsdb-uid-snap                         tsdb-uid (Mon Jun 27 15:53:04 +0000 2016)
4 row(s) in 0.0470 seconds

=> ["tsdb-meta-snap", "tsdb-snap", "tsdb-tree-snap", "tsdb-uid-snap"]
hbase(main):006:0> restore_snapshot "tsdb-snap"
0 row(s) in 0.8840 seconds

hbase(main):007:0> restore_snapshot "tsdb-meta-snap"
0 row(s) in 0.4170 seconds

hbase(main):008:0> restore_snapshot "tsdb-tree-snap"
0 row(s) in 0.3910 seconds

hbase(main):009:0> restore_snapshot "tsdb-uid-snap"
0 row(s) in 0.7370 seconds

hbase(main):010:0>

  You can export/copy the snapshots to s3 using:

[hadoop@ip-10-x-x-x~]$ /usr/lib/hbase/bin/hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot 'tsdb-snap' -copy-to s3a://<my-bucket>/hbase/ -mappers 5 -overwrite
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2016-06-27 20:11:13,298 INFO  [main] snapshot.ExportSnapshot: Copy Snapshot Manifest
2016-06-27 20:11:14,682 WARN  [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
2016-06-27 20:11:15,245 INFO  [main] impl.TimelineClientImpl: Timeline service address: http://10.x.x.x:8188/ws/v1/timeline/
2016-06-27 20:11:15,444 INFO  [main] client.RMProxy: Connecting to ResourceManager at /10.x.x.x:8032
2016-06-27 20:11:15,553 INFO  [main] client.AHSProxy: Connecting to Application History server at /10.x.x.x:10200
2016-06-27 20:11:16,669 INFO  [main] snapshot.ExportSnapshot: Loading Snapshot 'tsdb-06272016' hfile list
2016-06-27 20:11:17,037 INFO  [main] mapreduce.JobSubmitter: number of splits:2
2016-06-27 20:11:17,188 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1467039665192_0004
2016-06-27 20:11:17,518 INFO  [main] impl.YarnClientImpl: Submitted application application_1467039665192_0004
2016-06-27 20:11:17,573 INFO  [main] mapreduce.Job: The url to track the job: http://ip-10-x-x-x:8088/proxy/application_1467039665192_0004/
2016-06-27 20:11:17,573 INFO  [main] mapreduce.Job: Running job: job_1467039665192_0004
2016-06-27 20:11:26,808 INFO  [main] mapreduce.Job: Job job_1467039665192_0004 running in uber mode : false
2016-06-27 20:11:26,810 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2016-06-27 20:11:44,955 INFO  [main] mapreduce.Job:  map 50% reduce 0%
...
...
...
2016-06-27 20:12:00,186 INFO  [main] snapshot.ExportSnapshot: Finalize the Snapshot Export
2016-06-27 20:12:00,993 INFO  [main] snapshot.ExportSnapshot: Verify snapshot integrity
2016-06-27 20:12:02,501 INFO  [main] snapshot.ExportSnapshot: Export Completed: tsdb-snap

  7. Once the tables are restored, we can restart TSD daemon(s) and the collectd/TCollectors on the various nodes.


Without a deployment/orchestration tool like Ansible, complex tasks like the one above would be very cumbersome and error prone.  You might as well jam pencils in you eyes, it would feel better than doing this sort of stuff manually. 

So there you have it, Bob's your uncle !



Keeping in-sync with Salesforce

Keeping two or more data stores in-sync is always a fun task. With the help of Apache NiFi and some decently tooling, I've been able to...