"2015-12-12:cart::checkoutnow:impression_2155"
to hash keys with this structure: "2015-12-22_2155". Each key has a variety of key/value pairs that represent a given event type for the listing id and date. For example:
127.0.0.1:6379> hgetall "2015-12-22_2155"
1) "ldp:sellPanel:addToCart:impression"
2) "6"
3) "ldp:sellPanel::impression"
4) "1"
5) "ldp:::impression"
6) "1"
7) "ldp:sellPanel:addToCart:click"
8) "3"
9) "cart::checkoutNow:impression"
10) "3"
11) "cart::checkoutNow:click"
12) "1"
Since our requirements are fuzzy at best, I wanted to have a way to iterate quickly and prototype different functionality along with ways to query the data. For this I chose to leverage the powerful scripting language Lua. Starting in version 2.6, the Lua interpreter was built into Redis. Lua is very powerful and extremely fast and is a great way to go if you have to do "advanced" data processing with Redis. A full description of the language can be found on the Lua.org website, I would encourage you to take a quick read through the documentation. One key point to highlight is that the Lua table is the data structure in Lua. All structures that other languages offer---arrays, records, lists, queues, sets---are represented with tables in Lua. At first this may seem like a constraint, but in reality it doesn't turn out to be that way. You can store pretty much anything in tables.
The initial requirements were as follows:
A. Run a predefined calculation against a listing id for a given day
B. Be able to run multiple predefined calculations against a given listing id for a given day
C. Run A and B over a list of listing ids
D. Run C for a given date range
The first part of data wrangling was to get the list of keys we're interested in. For this I used the SCAN operator. SCAN provides an iterator allowing for incremental iteration, returning only a small number of elements per call. This is the recommended way when working with keys since calling KEYS may block the server for long periods of time. The SCAN looks something likes this and returns a cursor based iterator:
repeat
ok=redis.call("scan",cursor,
"match",start_year .. "-" .. start_month .. "-*_*",
"count",100)
cursor = ok[1]
scan_keys = ok[2]
....
....
The ok variable is a Lua table, where the first row is a cursor id returned by the server, and the second row is the keys. Each time we call redis.call(), we provide the next cursor id that the server provided from previous call. This continues until the server returns "0", at that point we've iterated thru all the keys for the call. Since the above call will give us back all the keys that match a given range between our starting year and month, we'll need to further whittle down the data to the set of data we're interested in. To filter the keys down, a simple higher-order function called "filter" would do the trick.
...
filter = function (func, tbl)
local t = {}
for i,v in pairs(tbl) do
if func(v) then
table.insert(t,v)
end
end
return t
end,
within_date_range = function (k)
local _,_,y,m,d,l = string.find(k,"(%d%d%d%d)-(%d%d)-(%d%d)_(%d+)")
local key_ts = y + m + d
local start_ts = start_year + start_month + start_day
local end_ts = end_year + end_month + end_day
if (key_ts >= start_ts and key_ts <= end_ts and listing_ids[l]) then
return true
else
return false
end
end,
...
Calling the filter function will prune down the keys to just the date range we're interested in. Note, currently there's no support to span between months, this is something we can work on later. Once we have the key set of interest, we can iterate over the metrics we're interested in and calculate them.
The initial calculations are all stored as Lua functions in a table. The first two such metrics we want to report on are related to click through rates. For example, knowing how many people add an item to their cart and either proceed to checking out or abandoned their cart. When the need to add additional metric calculations arises, we can easily add functions to the utils table.
metrics = {
['ldp_add_to_cart_ctr'] = function (k)
local r = nil
local a = redis.call("hget",k, "ldp:sellPanel:addToCart:click")
local b = redis.call("hget",k, "ldp:sellPanel:addToCart:impression")
if n and d then
r = (a/b)*100
end
return k,r
end,
['cart_to_checkout_ctr'] = function (k)
local r = nil
local a = redis.call("hget",k, "cart::checkoutNow:click")
local b = redis.call("hget",k, "cart::checkoutNow:impression")
if n and d then
r = (a/b)*100
end
return k,r
end
...
...
local x = utils.filter(utils.within_date_range,scan_keys)
for _, key in ipairs(x) do
for metric,_ in pairs(metrics_to_gather) do
local k,ctr = utils.metrics[metric](key)
if ctr then
table.insert(results,metric ..'=>'..k..' : ' .. utils.format_percentage(ctr))
end
end
end
There's a few ways to test this out, of course you can easily run the code against your production Redis instance, but I scripted an Ansible script that logins to the instance, runs a bgsave, scp's the dump.rdb file over to my workstation, and then I run redis-server locally.
MBP-200729:~ dan.young$ redis-cli -a '<snowden-knows>' --eval ~/projects/redis/lua/listing-stats.lua 0 , cart_to_checkout_ctr,ldp_add_to_cart_ctr 2015-12-30 2015-12-31 2149
1) "cart_to_checkout_ctr=>2015-12-31_2149 : 100.00%"
2) "ldp_add_to_cart_ctr=>2015-12-31_2149 : 33.33%"
3) "cart_to_checkout_ctr=>2015-12-30_2149 : 25.00%"
4) "ldp_add_to_cart_ctr=>2015-12-30_2149 : 13.64%"
The first argument to --eval is the script name, and the second represents the key names, which can be accessed via the global KEYS variable. Since we're generating the list of the KEYS within the script itself, and not providing any via the command line, we provide 0 for the KEYS argument. All the additional arguments should not represent key names and can be accessed via the global ARGV variable. In the above, the ARGV[1] would be the cart_to_checkout_ctr,ldp_add_to_cart_ctr and ARGV[2] and ARGV[3] would be 2015-12-30 and 2015-12-31 respectively, and ARGV[4] would be 2149. For testing purposes, using eval is fine, but once you get to a stable point where you want to run this on your production instance, I would recommend to cache the code on the Redis server using the SCRIPT LOAD command.
redis-cli -h 127.0.0.1 -p 6379 -a '<snowden-knows>' \
> SCRIPT LOAD "$(cat ~/projects/redis/lua/listing-stats.lua)"
"64b561bb63a178360e02f8d63a3c1cbf6925e387"
redis-cli -h 127.0.0.1 -p 6379 -a '<snowden-knows>' \
> EVALSHA "64b561bb63a178360e02f8d63a3c1cbf6925e387" 0 cart_to_checkout_ctr,ldp_add_to_cart_ctr 2015-12-30 2015-12-31 2149
1) "cart_to_checkout_ctr=>2015-12-31_2149 : 100.00%"
2) "ldp_add_to_cart_ctr=>2015-12-31_2149 : 33.33%"
3) "cart_to_checkout_ctr=>2015-12-30_2149 : 25.00%"
4) "ldp_add_to_cart_ctr=>2015-12-30_2149 : 13.64%"
Redis is cool...what else can be said....
So there you have it, Bob's your uncle !