Friday, January 27, 2012

CDA release - 12.01.26

Aaaand, since there's been a lot of time without a release, CDA 12.01.26 is here.



  • Solved Redmine Bug #104: CDA Cache Manager -> UI didn't update after delete a query
  • Implemented Redmine Feature #105: CDA Cache Manager -> Delete all queries belonging to a cda file 
  • Fix a bug where html output would duplicate output in some cases
  • Cache refactor; cache monitor: +removeAll, require admin permissions
  • cachemanager: user feedback for server requests
  • Support for cache plugin bean. Serialization changed
  • Added version info to cachemanager and SelfTest Page
  • Sorted out some images on SelfTest Page

Ctools installer with -b stable will get this version

Thursday, January 26, 2012

CCC release - 12.01.25

New CCC release 12.01.25 (standalone version, soon to be included in the next stable CDF release) and already available if you're using ctools installer with -b dev



Changelog:

  • Implemented Redmine Feature #107 - Control number of labels on the linear Axis for categorical charts (show "MinorTicks" option, including 2nd axis) 
  •  Implemented Redmine Feature #108 - Control number of ticks on the linear Axis for categorical charts ("DesiredTickCount", including 2nd axis with independent scale) 
  • Implemented Redmine Feature #109 - Rounded maximum for linear axis in categorical charts ("DomainRoundMode" option, including 2nd axis with independent scale)
  • Solved Redmine Issue #78 - Fix the vertical order in which series are drawn, so that when applicable, they show from top to bottom.
  • Solved Redmine Issue #121 - Tooltips in barcharts do not appear if bars overflow.
  • Solved Redmine Issue #103 - Ordinal axis grids not being drawn
  • MultiValueTranslator: issue when no categories
  • Solved valueFormat receives numeric value, doesn't parse
  • Fixed typo of property name in LegendPanel 
  • Add multi-series barline support
  • useCompositeAxis compatible with flat arrays
  • vml namespace conflict: revert sparkline, declaration in protovis-msie no longer lazy
  • align horizontal text in composite vertical axis towards the chart; revert convention breaks in multiline conditional expressions
  • workaround issue in 16th decimal position in IE9 64bit
  • Fixed regression with bulletcharts being translated in 10px down
  • Added new (and some of the missing) documentation to the testZZZ.html files
  • Fixed the drawing of bars and grid lines on the ordinal scale: they were not centered with the tick and label 
  • In linear axis, made minorTicks "extend" (major)ticks, so that visibility (through .visible or .strokeStyle) of the later affects the former. 
  • testZZZ.html files documentation mencioned '{x,y}AxisFullGrid_' instead of the correct value'{x,y}AxisGrid_'.
  • Fixed linear axis grid to show a line on the last tick (as opposed to the ordinal axis, that does not show the last line). When EndLine is active, it is drawn above the last grid line.
  • Fixed bug in the positioning of linear scale labels that revealed it self (don't know why) only on time series charts * Fixed bug in time series scale range calculation when with a second axis * Fixed bug in the drawing of minor ticks on time series scales (date arithmetic issues)
  • Fixed regression bug in ScatterCharts (DotChart, LineChart, StackedLineChart and StackedAreaChart) that caused null values to break line drawing. 
  • Fixed the visibility of the first grid line of a time series axis - it did not show because, in this case, the first tick is not on the origin.
  • Fixed compatibility issue between jQuery.sparkline and protovis-msie when in IE8.
  • heatgrid: +scalingType:'discrete' (interval-based, no color interpolation)
  • tipsy w/ followMouse: don't fall out of window
  • Heatgrid: ignore null values in min/max calculations; nullShape not taking correct index into account;
  • solved dangling variable reference

Great stuff! :)

Tuesday, January 24, 2012

Firefox Telemetry - From adhoc R analysis to CDE Dashboards

Time to put on my analyst hat. No matter what dashboard we're trying to build, it will all fail if the underlying analysis isn't good.

 A dashboard is a great way to allow users to get information on a specific subject, considering we know what we're going to show. On this case I had absolutely no idea.

 When we're sitting on a bunch of data, we need to go through a discovery phase where we'll actually decide what information can be valuable to the user.


Telemetry Analysis

Telemetry is a project from Mozilla that aims to make the products better - Firefox, Thunderbird and Fennec (codename for Firefox Mobile) by analyzing performance data sent by users while doing their real-world activity and the impact that developer changes had on that performance. The goal is simple - make better, happier, more productive.
 
As one can imagine, we have a bunch of data. All the submissions are primarily stored in HBase and later aggregated into ElasticSearch, allowing more versatile / real time analysis. We were then able to get a dynamic view over the data that summed up all the contributions from the users:


I previously blogged about the techniques that allow us to get data from/to elasticsearch, and once again it proved invaluable method. On this case, due to the huge amount of data, we had to use kettle's UDJC step, initially developed by Mozilla Metrics' chief engineer Daniel Einspanjer along with Jackson JSON processor to achieve high performance while processing the huge dataset of information and submitting some kettle improvements along the way.


New questions

This allowed developers to be able to view the impact of their changes and had the best effect a data tool can have while answering some questions - raise other questions.
 
 Most of the following questions were related to time-based analysis, and be able to track, over time, the impact of the changes on a specific probe over a period of time. This would have the immediate effect of giving people data to decide if a specific release channel would be ready to pass to the next channel on the rapid release cycle and answer some of the new questions that the new process brings:

- Is Aurora ready to move to Beta?

- Are we getting the expected performance improvement in Nightly?

As a stretch goal, my personal objective was to implement any kind of system that allowed us to quickly identify regressions on the code without having to manually go through all the probes.


Back to basics - Kimball's DataWarehouse

This required a new approach on the data. Or rather, an old approach. In Business Intelligence, we live in exciting times where we have tons of available technologies that allow us to choose the best tool for the job (I recently did a blog post on the subject). But  let's not forget 20 years of knowledge. This specific set of questions required building a standard, Kimball style data warehouse.


Telemetry Evolution

The goal is to have a way to track the improvements on the project's code by tracking, over time, the evolution of some key metrics we chose. Currently, the ones that are being tracked are:
  • Mean
  • Standard Dev.
  • Median
  • Percentiles (25,50,75)
Since telemetry data is stored in buckets on the client side, this values are not statistically accurate; they are not the mean and stddev of a particular probe, eg CYCLE_COLLECTOR, they are, instead, the mean of the bucketed values after submission. Same for all the others. However, if not an absolutely accurate representation of the end user's scenario, proved to be very effective in quantifying and measuring changes.


Concepts

We'll consider platform builds for a specific application, version and OS as having the same codebase. So our key is platformBuildID-appName-appVersion-OS, and we consider that to be our "primary key", and all submissions with similar keys are aggregated together and considered to be generated from the same code.


On a daily basis we'll query telemetry and we'll query for the builds made on the last 7 days (configurable value). In this there's the assumption that after 7 days is enough for sampling and changes to the main kpis after that period would be due to environmental changes and not due to code reasons. This number is currently being studied to find out the best value to use.

We're also discarding, for this datawarehouse, all submission with less than 500 counts, in order to have a good enough sample size.



R Analysis

Spent about two weeks building this datawarehouse. With no guarantees that the results would yield anything decent. So once I had a resultset I could work with, took the opportunity to use R to analyze the data. This has been, for ages, an item on my to-do list.

R is an insanely powerful statistical analysis tool with tons of packages that will guarantee that the bottleneck  will be your own mathematical knowledge (or lack of), making it one of the analysts' favorite tool.

R does wonders when we have the data in a tabular format and want to do ad-hoc analysis, so I picked a resultset and started playing with the data. I used CYCLE_COLLECTOR probe evolution on windows platform and Nightly channel.

The first thing I did was trying to get a feeling of the shape of the data (this, obviously, after a couple of days trying to find my way around R). After a while, it was looking like this:


The initial analysis led to a relation between the submission counts and mean / std dev. The higher the count, the lower the mean and standard deviation. This is coherent with something the metrics team already knew - the initial submissions are not representative of the general population, so on this case size really matters.

Also tried for a while to find a statistical model to this data, mostly around fitting a normal distribution and thus trying to get more analysis from the parameters, like the CDF and other density functions. This proved to be a frustrating task, as no decent fit came from it.

Due to the all the distinct types of probes in the code, we decided only take in consideration means and standard deviation, and looking at the evolution on time. This is the view we decided to use:


In a single chart we could be able to tell the evolution of the CYCLE_COLLECTOR with point position represents the mean, size of the points represent standard deviation (not the accurate value, but according to the scale) and color coded representing the size of the sample.


From R to CDE Dashboard

The next step, after knowing the kind of analysis we need to give to developers, is to build a dashboard that allows users to get this data from the BI system automatically, up to date and with the ability to quickly parametrize it. And obviously skip the need for the consumers of the data to have knowledge on R.

All the Ctools were made having in mind the capability to be able to virtually build *anything*, and replicate a R analysis is a very good challenge. Here's the end result after.... 2 days


With all the live connections to the data users can freely play with the data and change the parameters to be able to quickly see the impacts of the code



Discovery

One of the biggest advantages of a datawarehouse is that it comes with an astonishing query language, MDX. In our case (and for anyone using pentaho as a BI server) we're using Mondrian as the Rolap engine that allows those queries to run.

MDX is very well suited for answering business questions, behaving particularly well on time-based analysis. So the next step was building a table that could compare the last 7 day average with the prior 28 days average. Big shifts would indicate either improvements or regressions. Here's the resulting table, ordered  by default on regressions:


A regression of 1590% was immediately noticed. Clicking on that row allowed to inspect the actual histogram distribution:



I immediately checked with one of the firefox developers that mentioned that there was an error in that specific build that caused the counters of this probe to be totally skewed up. Success!

It's instantly rewarding to find out that the number of improvements absolutely outnumber the amount of regressions. One of my favorite ones that show all the improvements that developers have been putting in the code is IMAGE_DECODE_ON_DRAW_LATENCY


This is currently being used by the internal product developers to give them metrics over their code and the metrics team is working on allowing contributors outside the company to be able to take advantage of this tools.


Help Mozilla helping you

This is only possible to do with the help of users that are willing to submit their performance data back to mozilla. This is what we do with your data. There's absolutely nothing that can be traced back to you, as privacy is always the number one concern at mozilla. And here's how you can help:





Wednesday, January 18, 2012

Multiple parameters in CDF / CDE

This tech tip shows how to configure multiple parameters to work in CDE / CDF.

We can use any query, but on this case we'll start with the parameter wizard, located in the datasources panel.

Create a new dashboard and go to the datasources panel. Under the "Wizards" select parameter wizard. Select a cube / dimension and drag it to the rows.


 I selected the multiplebutton but any of the other multiple selection component would work too. The multiple button component supports both single selection (default) or multiple selection. We need to activate it in the component that was generated by the wizard


The generated code works, but we may want to define a set of defaults be preselected. In order to do that, we need to delete our simple parameter and  add a custom parameter with the following code:
["[Markets].[EMEA]","[Markets].[NA]"]



This array has to match the id's we use in the parameter. That way, when we preview the dashboard we get those parameters selected by default as we wanted

Have fun!


-pedro

Wednesday, January 11, 2012

[RFC] CDC - Community Distributed Cache

Time to discuss a new member for the CTools:

CDC - Community Distributed Cache

 Objective

CDC is a new plugin to allow usage of a distributed cache engine implemented using Hazelcast in two distinct scenarios:

1. Mondrian Cache
2. CDA Cache


Features

  • Regarding the Mondrian Cache, the plugin should allow toggling between Hazelcast and the default Mondrian cache. This is achieved by changing the properties file for mondrian
  •  As for the CDA cache, the plugin should also allow changing from Hazelcast to the existing cache provider and vice versa.
  •  Plugin should provide an easy to use way to install an Hazelcast node in a standalone machine, enabling it to join the cache cluster.

Besides the backend functionality, the plugin must have an user interface (can be a cde dashboard, for instance) that allows:

1.Toggling between cache providers both for  Mondrian and for CDA

2. Checking cache status (Running or not running / How many elements are cached)

3. Change cache strategy and parameters

4. Graphically view cluster composition (how many hazelcast nodes and where they are) and related information (percentage memory used, number of cached elements in each node - we need to check what is exposed by Hazelcast)

5. Cache cleaning operation.

6. View cached information - Ability to see cached key/values and ability to search by key (again, check what Hazelcast provides for this kind of operations).


A way to clean individual cache blocks in Mondrian is also needed. By this we mean that we should be able to easily clean the cache entries related to a given cube or query. This should also be supported by a gui that enables drilling down on the cubes metadata.
This funcionality can either be developed as part of the Community Distributed Cache or as a standalone plugin.

---------

This is what we have in plan - Any comments, features, requests?