BigQuery: The Flow Awakens
20 Jan 2016
What you will need
In this article we will address a way of using the GAP (Google Analytics Premium) BigQuery link up to both discover and analyse user flows. So first things first, what are we going to need?
Google Analytics Premium
- To have access to the required granularity we are going to need GAP
BigQuery (linked to GAP)
- This will allow us to manipulate the more granular data properly. Arranging the integration from GAP to BigQuery is free and retroactive, so if you have not set this up yet, don’t worry; you can do it now and you will still be able to analyse historical data.
- I must also mention that there is a cost associated with using BigQuery (both storing and processing data). However, Google provides a generous allowance to each Premium client that will cover most needs.
Java and the d3.js library
- After we get the data out of BigQuery we need to create the visualisation to make it easy to analyse. We prefer Java and the d3 library for this task.
- We would also recommend the use of an IDE to assist with this step; we used BlueJ.
Once this has all been set-up we can start creating the visualisation. First though let’s see some examples of what we will be looking to create.
Page Flow Report
Blog Author Flow Report
As can be seen we can organically generate flows for pages. In fact, we can generate flows for any combination of dimensions and metric conditions available to us in BigQuery. We can even create new dimensions and metrics using BigQuery SQL, as in the example above.
The Method
So, how are these set up?
There are two main sections, BigQuery and Java.
BigQuery extracts the required data using the GAP link and rearranges this into flows. Java is then used, along with d3, to create the visualisation of these flows.
BigQuery
The aim of this section is to create a double-delimited string for the flow we are interested in. In our queries we used ‘|’ and ‘,’ to separate sessions and hits within a session respectively.
We will not go into the SQL side in depth but the task essentially boils down to:
Extract the hits you are concerned with
- For example for a page flow report, you would only want page type hits.
Create the in-session flows
- The GROUP_CONCAT method is quite useful for this
Concatenate these flows together to create the final string
As you can see, the base process is quite simple. However, these steps can becomes more complex when additional filters are used, especially the extraction step.
Java
Now that we have our double-delimited page flow string we can plug it into our Java function. In our setup this string was contained in a .csv file (BigQuery’s default export type).
The Java function then interprets this string to generate a JSON tree, the root node being (entrance). The root node then branches into the landing pages/events/categories, the landing page nodes then branch into the second pageview/event/category and so on and so forth.
We then finally present the visualisation using an HTML file which references the JSON tree generated by the Java code.
Conclusion
Overall, the process is quite useful not only for its ability to easily visualise known flows through the site but especially for its ability to discover flows between pages/events/categories. Since all that needs defining is which hits to include and how they should be classified, it can be very organic in its discovery of new flows.
To view this blog written by Joel Heighway on Periscopix's website, please click here.
Please login to comment.
Comments