Insights don’t fall out of data without the application of rigorous analysis, but the good news is that much of the routine analysis and presentation of “known unknowns” can be automated.
The key to generating data insights is having a good clean foundation to build on, traditionally a data warehouse but more recently a data lake – a large repository of data that isn’t necessarily highly structured the way a data warehouse is. Big data is the common term used to describe systems that include and are built on top of data lakes. The original premise was that there is so much data, being captured so quickly, and from so many sources – often referred to as volume/velocity/variety and more v words – that the cost of ordering and structuring it upfront is not worth it. That original premise was mostly wrong, so commonly there are processes to add structure and even duplicate data into multiple structures to make run-time processing more efficient.
That data storage layer – big or otherwise – needs to have different data sources integrated together, or at least staged/landed and co-located.
If the target for your insights is a computer – or a service – you want a clean and stable API. If it’s a human then you need some good data viz.
If you have all of those parts so far then you have the makings of a business intelligence (BI) system.
Taking it further than BI you might then want to do some analytics, maybe some machine learning or other smarts to automate the detection of patterns and anomalies within the data.
If that sounds like a lot of work, well it is, but it’s worth it if you do it right. I’ve found there are some useful things to keep in mind that apply to pretty much any system I’ve worked with, no matter how exotic, and I suspect these things are universal and will hold true for the foreseeable future.
- There are large volumes of data available
- Get the database, hadoop or other data engine to do the processing work, don’t bring it all to the visualisation on the client side – although I’ve seen some systems do this well as a hybrid approach, for example MapD.
- Sending large volumes of data over the network to a browser will take a long time
- Aim for trickling smaller amounts of data to the consumer by breaking up the report into smaller reports, because waiting short periods for each part is less annoying that waiting ages for the whole lot to load
- Consider using workflow and story telling based navigation
- Start with a summary or a set of anomalous points of interest
- 99% of the data is boring – business as usual. Stick to the interesting parts
- No matter how much people tell you they want to see all the data, they really don’t
- What they really want is proof and reassurance that they can trust the calculations and aggregations
- They might want to dump the data and play with it themselves in Excel or import it to some other system and that’s OK, but a better approach is to save them that work and highlight insights for them
- Visualising more than 100 data-points on a chart is possible but not practical given your eyes and the size of your screen
- Seeing all the data at once is information-overload
- Try to help your users by providing an ‘exception’ report or highlighting areas that warrant explanation or further investigation.
- This is even more important when working with mobile or tablet sized screens.
- Locking up the data source for a huge query will not only make for a long wait but will also degrade the experience of other users
- Start at a high aggregation level and zoom in to a specific period or category of interest
- Alternatively start with a default filter for a small recent time period and set of categories
- for example, the last couple of days or weeks, and the most common categories, or a total.
Data visualisation is as much science as it is an art form. From a design perspective, charts and graphs need to be visually appealing, making good use of colour (pay attention to the branding and brand standards), shapes, contrast and layout. From a science perspective, the tool needs to convey knowledge and information clearly and succinctly. Don’t overload the tool with information, stick to the pertinent items.
Visualisations can be either exploratory or take a more guided story-telling approach. If you are developing a tool for others, take the story telling approach. Consider the user experience, the psychology of cognition, the design, the flow, and the key points being conveyed. Approach it from the point of view of answering business questions that either drive efficiency gains or lead to the generation of new business value.
For more information see:
Shneiderman’s “Eight Golden Rules of Interface Design”
Will soon be universally accepted as not a separate entity, but rather an encompassing concept that includes multi-faceted performance perspectives. Whether you are running a business, researching for investment purposes or auditing for process and values alignment, sustainability is the core concept that predicts longevity and resilience of organisations.
Science and Engineering, especially biomimicry or nature-inspired, are powerful tools to balance the competing priorities of quality of life, continued prosperity and preservation of our planetary life support system. People need meaningful purpose, economies need energy and the planet must be protected from waste and pollution caused by the fulfilment of these other goals.