Friday, September 30, 2016

JQ is a Kung fu Swiss Army knife!



If you need to do some basic reformatting of JSON data that's already been processed, try reaching for the Swiss Army knife of command line JSON Kung Fu,  JQ. There's a lot of different JSON parsing libraries out there, name your favorite language and there has to be at least a few you can use. Sometimes I just want to git 'R dun and not have to deal with writing a program, compiling, and then deploying it. For those times, the standard Unix toolkit and JQ can get you a long way quickly.


For instance, recently I was adding support for loading our Zendesk chat data into Google Big Query.  We currently load this data into Redshift, but we're moving towards using Big Query as our datastore for more and more datasets.  There were three things that I had to do with the existing data.  First was to transform all the JSON keys to snake case ,  second was to transform the timestamp field into a timestamp that I could load into Big Query, and third was to select out certain keys I wanted to load into the chat table.


So the JSON looks like this:




{"chat-date":"2016-03-01","ticket-id":"12345","chat-line-timestamp":"05:52:20 PM","chatter":"Mr. Magoo",
"chat-line":"Hello","created-at":"2016-03-01 05:52:20 PM"}
{"chat-date":"2016-03-01","ticket-id":"12345","chat-line-timestamp":"05:52:41 PM","chatter":"Daisy Duke",
"chat-line":"Hey Mr. Magoo","created-at":"2016-03-01 05:52:41 PM"}
{"chat-date":"2016-03-01","ticket-id":"12345","chat-line-timestamp":"05:52:44 PM","chatter":"Daisy Duke",
"chat-line":"How's it going?","created-at":"2016-03-01 05:52:44 PM"}


and once we run it thru the meat grinder we get what we want:




{"ticket_id":"12345","chatter":"Mr. Magoo","chat_line":"Hello","created_at":1456854740}
{"ticket_id":"12345","chatter":"Daisy Duke","chat_line":"Hey Mr. Magoo","created_at":1456854761}
{"ticket_id":"12345","chatter":"Daisy Duke","chat_line":"How's it going?","created_at":1456854764}



Dan-Young:~ dan$ gawk 'BEGIN { print strftime("%c", 1456854764); }'
Tue 01 Mar 2016 05:52:44 PM UTC


the easy JQ one-liner that made this all possible:



jq -r 'to_entries | 
map({"key": .key | gsub("-";"_")} + {"value": (if (.key =="created-at") then (.value | strptime("%Y-%m-%d %I:%M:%S %p")|mktime) 
else .value  end)}) | from_entries | with_entries(select(.key == ("chatter","created_at","ticket_id","chat_line"))) |tostring'

So next time you need to Kung fu some JSON, give the command line a try!  It's magic sauce....


So there you have it, Bob's your uncle !


Keeping in-sync with Salesforce

Keeping two or more data stores in-sync is always a fun task. With the help of Apache NiFi and some decently tooling, I've been able to...