Tools for pulling statistics from Google Analytics.
collective.googleanalytics is a product that integrates Plone with Google Analytics. It includes both tracking and reporting functionality. The product enables easy tracking of the standard Google statistics as well as external links, e-mail address clicks and file downloads. It also defines Analytics reports that are used to query Google and display the results using Google Visualizations. Reports are Zope objects that can be imported and exported using GenericSetup XML and modified on a site-by-site basis. The product currently provides a portlet that can display results of reports as well as a control panel for authorizing your site and configuring settings.
Contents
To install collective.googleanalytics, add it to the eggs section of your buildout. Then re-run buildout and restart Zope.
On Plone 3 it is recommended to use the following version pins:
[versions] collective.googleanalytics = 1.4.1 gdata = 2.0.17 simplejson = 2.3.3
When you install the product from the Plone Add-ons control panel, a new control panel called Google Analytics will be added to your Plone site. In this control panel, you can authorize the site to access your Google Analytics data. If you receive an error when you attempt to authorize, see the section on Registering a Domain below.
After authorizing your site, you can select the profile where you want to track analytics for this site. Choose a profile from the dropdown menu. If you do not see any profiles listed, make sure that your Google Analytics account has access to at least one profile. Also, be sure to remove any Google Analytics tracking code that you may have pasted in the Site control panel.
Once you have selected a profile for tracking, you can choose which tracking plugins to enable. Tracking plugins allow you to track events on your site in Google Analytics in addition to the normal data on page views and visits. Which plugins you have available to you depends on which products you have installed. For information on the default tracking plugins, see the section on Tracking Plugins below.
Finally, you can choose roles to exclude from tracking. Google Analytics data will not be recorded for users with the selected roles. Note that this setting does not affect the inclusion of scripts specified in the Site control panel.
On the settings tab, you can also configure the amount of time, in minutes, that account information and report results will be cached, reducing the need to query Google. Sixty minutes is the default caching interval.
When you authorize your site to access your Google Analytics data, you may receive an error:
The site "http://example.com" has not been registered.
When this happens, you need to register your domain with Google before you can authorize your site. Follow steps 1-3 of Google's directions for registering a new domain. In step two, you can upload the file provided by Google to your Plone site root. Be sure to use the File content type, and make sure that the short name matches the filename specified by Google.
collective.googleanalytics ships with four tracking plugins:
After you have authorized your site in the control panel, you can begin using Analytics reports. First navigate to the page where you would like to display the report results. Where you place the portlet depends on your goals and the type of reports you are displaying. Even if you place a portlet on a public page, the portlet will only be visible to users who have the "View Google Analytics Reports" permission, which, by default is assigned to Managers.
Next, place the portlet as you normally would, using the manage portlets page and selecting Google Analytics from the list of available portlets. In the portlet add form, you can set the title of the portlet, the profile the portlet will use and the reports it will display. Note that the title of the portlet is not the same as the title of the report, which is displayed inside the portlet. The profile is the Google property where the statistics that you want to access live. Which profiles you can select within the portlet depends on which profiles the Analytics account you provide to Plone can access. If you do not see any profiles in the dropdown menu, check to be sure that you have authorized your site in the Google Analytics control panel.
You can select more than one report to display within a single portlet. Each report performs its own query to Google, however, so including many reports on a single page may cause a decrease in performance. To change the order of the reports within a portlet, see the section on Managing Reports below.
Once you have set the portlet title and profile and selected one or more reports, click save to add the portlet. When you navigate to the page where you assigned the portlet, you should see the results of your report.
collective.googleanalytics ships with twelve default reports:
Reports live in a Plone tool called portal_analytics. To view and modify reports, navigate to the root of the site in the ZMI and click on the portal_analytics utility. Since they are standard Zope objects, reports can be copied, pasted, renamed, deleted, imported and exported using the Zope buttons beneath the list of reports. They can also be moved up and down in the list using the up, down, top and bottom buttons. The order of reports in portal_analytics controls the order that they will appear in portlets throughout the site.
By default, Analytics reports are rendered asynchronously using jQuery. This improves site performance by allowing the body of page to render without waiting for a response from Google Analytics. The basic flow of a request that renders an Analytics report might go as follows:
Analytics reports are persistent Zope objects that store the arguments used to query Google and the options needed to display the query result as a Google Visualization. They store this information as properties on themselves. These properties can be set using GenericSetup XML or through the web in the ZMI.
It may be helpful to think of Analytics reports as having five logical sections, each of which has its own properties:
The report settings section consists of five properties that control the display and behavior of the report. None of these properties accept TAL or TALES.
Plugins are multi-adapters on the context, the request and the report that extend the default functionality of the report. Two plugins ship with collective.googleanalytics. See the sections on Contextual Results Plugin and Variable Date Range Plugin for more details.
Note that some plugins add additional dimension, metric and visualization choices, which are not available until the report is saved. As a result, it is generally a good idea to save the report immediately after adding or removing any plugin.
The query criteria section of the report is made up of all the properties that begin with the word query. These properties determine the query that is sent to Google to retrieve Analytics data. All of these properties accept TALES expressions. They have access to the TALES variables defined in the section on Using TAL and TALES in Reports as well as any TALES objects provided by the selected plugins.
The table builder section of the report includes three properties. Together these properties are responsible for taking the query results returned by Google and transforming them into a results table that can be used in as the data source for a visualization or otherwise displayed in the report body.
In order to perform this transformation, these three properties use TALES expressions that return Python lists. The TALES expressions have access to three special functions that allow them to extract data from the data feed returned by Google:
Returns the value of the given dimension across the specified dimensions and metrics using the specified aggregation method (unique_list by default). If no values are found, the default value, an empty list by default, is returned.
For example, the following TALES expression would return a list of all the browsers returned by the query:
python:dimension('ga:browser')
Returns the value of the given metric across the specified dimensions and metrics using the specified aggregation method (sum by default). If no values are found, the default value, 0 by default, is returned.
To get the sum of the values of 'ga:visits' in records where 'ga:browser' equals 'Mozilla,' we could use this expression:
python:metric('ga:visits', {'ga:browser': 'Mozilla'})
In reports that use the Variable Date Range Plugin, the value of the specified argument is often set to an element in the list returned by the possible_dates method.
Returns a list of dictionaries containing all possible values for the given date dimension in the current date range. If no dimensions are specified, all of the date dimensions in the query are used.
This method is commonly used in place of the dimension method in reports that include date dimensions to ensure that the table contains one row for each date unit in the date range.
These three properties make up the table builder section of the report:
The titles for the table columns. It must be a TALES expression that evaluates to a Python list of strings. If and where these titles appear depends on the type of visualization. For the Table visualization, for example, they appear as the table column headings.
In most reports, the table columns expression is a static Python list:
python:['Visits']
It is, of course, possible to use TALES variables to populate the columns list:
python:[date_range_unit, 'Visits']
In complex tables, the number of columns may be determined by the results returned by the query. In this example, the first column is "Date" and the names of the remaining columns are the names of the browsers returned by the query:
python:['Date'] + dimension('ga:browser')
The expression that produces the set of row keys used generate the rows in the results table. It is specified as a TALES expression that evaluates to a Python iterable with one element for each row in the final table.
When the report renderer is asked for the results table rows, it first evaluates the row repeat expression. It then iterates over each element in the resulting list and evaluates the table rows expression with the current element assigned to the variable "row."
Typically the values of the row repeat expression are generated using the dimension function or the possible_dates function:
python:dimension('ga:pagePath')
or:
possible_dates
See the section on Using TAL and TALES in Reports for more information about the use of these functions.
The contents of each table row. It is must be a TALES expression that evaluates to a Python list containing the value of the "cells" for that table row. The table rows expression has access to two special TALES varables:
In tables with only one column, the value of the rows expression is often the same as the value of the row key:
python:[row]
In two column tables, the value of one column is typically the row key, and the other is a metric value looked up using the row key:
python:[row, metric('ga:visits', {'ga:browser': row})]
In complex, multi-column tables, it may be necessary to iterate over the columns variable using a Python list comprehension:
python:[row] + [metric('ga:visits', {'ga:browser': row, 'ga:operatingSystem': c}) for c in columns[1:]]
The visualizaiton settings section of the report consists of the visualization type and visualization options properties. These properties are used to create javascript that uses the Google Visualizations API to render the data table produced by the table builder section above.
A list of options and values, in the format of TAL defines, that specify the options for the visualization. The available options depend on the type of visualization selected. It is important that the option expressions evaluate to the data type that the visualization expects.
For example, the height of a visualization that accepts an integer height option could be set as follows:
height python:300
The report body consists of a single property that contains the TAL template for the report. This block of TAL code is evaluated when the report is rendered. TALES expressions within this code have access to the normal objects described in the section on Using TAL and TALES in Reports. They also can access all of the public methods provided by the report renderer. In the report body, these methods must be accessed using view/method_name or python:view.method_name():
Many of the properties of the Analytics report object accept TALES expressions or TAL as their values. (For information about which properties accept TALES and TAL, see the section on Report Properties above.) All of the TAL code and TALES expressions have access to a standard set of Python objects and variables:
This plugin provides tools to make reports page specific. It modifies the default caching policy to cache report results on a per-page basis instead of for the entire site. It also provides several helper TALES variables that simplify the process of creating page-specific reports:
Analytics reports can specify fixed start and end dates for their queries. It is generally more useful, however, to allow the date range to be set when the report is evaluated. The Variable Date Range Plugin provides this functionality. In order to set the date range, it looks in the request for one of these special keys:
An integer specifying the number of days prior to the current date to use as the report start date. The end date is assumed to be the current date. The date_range argument can also accept a string keyword that evaluates to a particular date range depending on the current context. Current keywords include:
Since Analytics reports are rendered asynchronously, these keys must be set in the request sent by the asynchronous loader, not in the original request.
Since dates for reports are dynamic, the plugin also provides two special dimensions that are date sensitive. This allows the granularity of the report results to be set based on the date range selected. (For example, if you specify a date range of a year, you probably don't want to segment your results by day. Instead, viewing results by month would be a more appropriate choice.) The two special dimensions are:
Note that that these two dimensions must be selected from the list of query dimensions to be included in the query. If they are not available in the list of possible dimensions, be sure to save the report after selecting the Variable Date Range Plugin from the list of plugins.
The plugin also provides two helper variables that are useful in report templates:
Now that you are familiar with the properties that make up an Analytics report, it's time to try creating a new report from scratch. In this example, we will create a report that calculates and displays the site-wide bounce rate over a period of time segmented by browser.
This example presents a fairly complex report. For examples of simpler reports, consult the default reports in portal_analytics. In many cases, you can probably modify one of these reports to suit your needs by substituting dimensions and metrics. If, however, you find that you need to create a more complicated multi-dimensional report, read on:
Navigate to the root of the site in the ZMI and click on the portal_analytics tool.
Click the Add Google Analytics Report button.
We'll give our new report the ID site-bounce-rate-browser-line, following the naming convention of the default reports. This naming convention is optional, but it helps to keep things organized. Then click the add button.
Click on the new report to edit it. Give it a title of Site Bounce Rate By Browser: Line Chart and this description:
This report displays the site-wide bounce rate segmented by the user's browser. It is useful for gauging how effective our site's new multimedia features are in each browser.
Leave the i18n domain as collective.googleanalytics, the default value. If we were going to translate this report, we might use the domain defined in our site's theme product.
From the list of categories, select Site Wide.
From the list of plugins, select Variable Date Range. After making your selection, click the Save button to populate the list of dimensions with the new options.
Now the difficult part: determining the arguments for our query. If we consult the common calculations page in the Google's Dimensions and Metrics Reference (see the section on Where to Learn More for the link), we see that bounce rate is calculated as follows:
ga:bounces/ga:entrances
So, set the query metrics to ga:bounces and ga:entrances.
We also know that we want to segment our results by browser, so we'll set our query dimension to ga:browser. Be sure to also select date_range_dimension and date_range_sort_dimension from the bottom of the dimensions list.
In the query filters enter:
ga:entrances>10
Strictly speaking, we wouldn't need this filter. But for a site with a lot of traffic, we probably don't care about the results browsers for that have fewer than 10 entrances in a given period of time. So, we use this filter to eliminate them from the results.
In the query sort box, enter the dimensions provided by the Variable Date Range Plugin:
date_range_dimension date_range_sort_dimension
In query maximum results, leave the default value, python:1000.
Now that our query arguments are complete, we can work on our results table. Let's begin by drawing out what our table should look like:
"Day" | "Firefox" | "Internet Explorer" | "Safari" | "Chrome" |
---|---|---|---|---|
"5" | 60 | 70 | 54 | 63 |
"6" | 64 | 69 | 59 | 68 |
"7" | 63 | 72 | 65 | 68 |
Etc. |
Note that the day column contains strings, not integers. This is necessary so that the line chart visualization will treat these values as labels instead of data.
Great! Now we can write the expressions to generate the table. Enter this expression in the table columns expression field:
python:[date_range_unit] + dimension('ga:browser')
This expression combines the value of the date_range_unit, which is provided by the Variable Date Range Plugin, with all of the possible values of the ga:browser dimension.
For the table row repeat expression, enter:
possible_dates
This expression will populate the row keys with dictionaries that contain the values of date_range_dimension and date_range_sort_dimension. We use possible_dates instead of dimension(date_range_dimension) because we want one entry for every period of time in the current date range, even if there aren't any results for that particular period of time.
removing the line breaks:
python:[str(row[date_range_dimension])] + [int(100*float(metric('ga:bounces', row))/(float(metric('ga:entrances', row)) + 0.0001)) for c in columns[1:] if not row.update({'ga:browser': c})]
Whoa! That looks complicated! If we break down the expression into its parts, however, it's easy to see what's going on:
[str(row[date_range_dimension])]
This part of the expression creates a list with a single element: the value of date_range_dimension as a string. Recall that, in this expression, row is a dictionary that contains key-value pairs for date_range_expression and date_range_sort_expression.
Now let's skip to the end of the expression:
for c in columns[1:]
This code serves as the repeat expression in a Python list comprehension that generates the bounce rate for each browser for the specified date. columns[1:] represents the list of browser names generated by dimension('ga:browser'):
if not row.update({'ga:browser': c})
This tricky bit of code updates the row dictionary to include the value of the current browser as it iterates over the list of browsers. That way we can pass row to the metric() method to get value of the metric for the date and browser we are currently evaluating. We use 'if not' because the update method returns None, which evaluates to False.
Finally, the rest of the expression is just the math used to calculate the bounce rate:
int(100*float(metric('ga:bounces', row))/(float(metric('ga:entrances', row)) + 0.0001))
We have to convert the values we get back from metric() into floating point numbers so that the division operates as we expect. We also add a tiny number to the denominator to avoid getting a divide by zero error if the value of ga:entrances is zero. Finally, we multiply the result by 100 to get a percentage and round the result to the nearest integer.
We're almost done! From the visualization type drop down menu, choose LineChart.
In the visualization options box, enter these option definitions, one per line:
title string:Bounce Rate By Browser height python:250 titleX python:date_range_unit titleY string:Bounce Rate (%) smoothLine python:True
These options are all aesthetic. Once you become familiar with Google visualizations, you can adjust them to fit your personal preferences. For a full list of the options available for each visualization, visit the Google Visualization Gallery referenced in the section on Where to Learn More.
In the report body field, enter this block of TAL code, which renders the line chart visualization:
<div tal:replace="structure view/visualization"></div>
Any product that imports a GenericSetup profile can define Analytics reports. These reports should be defined in a file called analytics.xml in the product's GenericSetup profile directory. The easiest way to generate the XML for a report is to create the report through the web and then export it.
For example, after following the instructions above for creating a new report, you could use the portal_setup tool in the ZMI to create a snapshot of the site. Then you could navigate to the analytics.xml file in the snapshot and copy and paste the appropriate XML into your product's analytics.xml file.
If you find that you need to write the GenericSetup XML for a report by hand, consult the analytics.xml file in this product's profiles/default directory for guidance. Keep in mind that any XML or XML reserved characters must be properly escaped.
Creating and managing Analytics reports requires knowledge of the Google Analytics API, the Google Visualizations API, and Zope and Plone technologies such as TAL and TALES. These are resources that you may find helpful in learning these technologies:
Google Analytics Data Export API Documentation
This is the best place to start for learning the ins and outs of Google Analytics. Of particular interest are these pages:
This reference describes the arguments used to query Google.
This tool allows you to try out queries interactively, which can be extremely helpful in the process of creating and debugging reports.
Dimensions and Metrics Reference
This page describes each available dimension and metric. Also see the subpages on valid combinations and common calculations.
gdata API Reference
gdata is the Python module that interacts with the Google API. This documentation is most useful for developers who wish to contribute to or extend collective.googleanalytics. The relevant documentation is divided into two sections:
gdata.analtyics.service Reference
This documentation describes the API for the analytics service objects that gdata provides. collective.googleanalytics uses both the AccountsService and the AnalyticsDataService.
This reference documents the response objects returned by a query to Google Analytics.
Google Visualizations API Documentation
The visualizations API documentation provides an overview of what Google Visualizations are and how they work.
This gallery provides examples of each type of visualization and documents the options the options that each accepts.
This chapter from the Zope2 Book offers and introduction to TAL, TALES and related technologies.
This chapter from the Zope2 Book describes some of the more advanced features of the TAL specification.
This appendix from the Zope2 Book provides a comprehensive overview of TAL and TALES as they are used in Zope page templates.
Note: 1.0b1 changes the syntax for the table-building section of the report (previously the report column labels and expressions) and the report body. The upgrade step overwrites these properties for the default reports that ship with the product. If you have customized these properties on the default reports, be sure to rename the customized reports in portal_analytics before running the upgrade step to avoid losing your changes.