9 posts Posts by Kimalbrecht

Time Series Visualizations – An overview

“Time-series — sets of values changing over time”
A Tour Through the Visualization Zoo 
http://hci.stanford.edu/jheer/files/zoo/

This description of the word “Time-Series” is very close to the explanation in Oxfords dictionary which adds that the word comes from a statistic background and often the intervals are equal within the time-series.
http://www.oxforddictionaries.com/definition/english/time-series?q=time-series

Within our research project we are mainly interested in the visualization part within the vast field of statistics. In the book “The Visual Display of Quantitative Information” Edward Tufte defines time-series visualizations as:

“With one dimension marching along to the regular rhythm of seconds, minutes, hours, days, weeks, months, years, centuries, or millennia, the natural ordering of the time scale gives this design a strength and efficiency of interpretation found in no other graphic arrangement.” 
Edward R. Tufte
The Visual Display of Quantitative Information
p. 28

Classical datasets of time series visualizations are temperature, wind, condensation (or any other kind of weather measurement), stock data, population change, electricity usage etc. the field is so vast that Tufte writes that in a study that analysed graphics between 1974 and 1980 75% of the graphics where time-series visualizations. Obviously more than 30 years later the field has changed but time-series still seams to be an important part within the area.

In my opinion most Security Network Data doesn’t provide information with changing values over time initially. For example Flow Data is structured through nodes and edges with additional information. These single incidents in time don’t hold the same characteristics as usual time-series datasets where one value changes. But on a certain level of abstraction (for example by counting incidents within set timeframes) or by combining time-series with other methods like network visualizations this kind of graphics could be very helpful for us.

This article first summarises a few classical time-series examples and than looks at recent developments in the field.

The first time-series visualization was designed in the tenth or possibly eleventh century. It shows the changing positions of the planets with the time on the x-axis.

As we will see the use of the x-axis is still the most common form of presenting time-series graphics. Nathan Yau gives an overview of the most common forms of time-series visualizations in his book “data points” which are in his opinion bar graphs, line charts, dot plots & dot-bar graphs. All of this charts are actually similar in what they do. The only difference is the graphical representation of the data. While all of them use the time dimension on the x-axis, Nathan Yau gives two examples for different representation methods. Radial plots, which are similar to line charts, just circular and calendar heat maps.

Jeffrey Heer, Michael Bostock, and Vadim Ogievetsky from Stanford University are giving a different overview of time-series visualizations in their article “A Tour Through the Visualization Zoo”. Their overview starts with index charts, which is an interactive line chart.

Index Chart

Stacked Graphs. Which are Area Charts that are stacked on top of each other. They are also called stream graphs. What makes them special is the fact that we get a visual summation of all time-series values.

The controversy around stacked graphs is very big. Alberto Cairo, graphics director at El Mundo Online wrote in a blog article that stacked graphs are “one of the worst graphics the New York Times have published – ever!” on the other hand the publisher of the first paper on stacked graphs wrote: “simplifying the user’s task of tracking individual themes through time by providing a continuous ‘flow’ from one time point to the next”. Furthermore, “we believe this metaphor is familiar and easy to understand and that it requires little cognitive effort to interpret the visualization” both points seam valid to me the cognitive effort needed in some contemporary visualizations is so high that it becomes hard to understand them without putting a lot of effort into them. Stacked Graphs are very simple to understand for the complexity they hold but the information output that can be generated from them is questionable. Andy Kirk from visualisingdata.com credits both sides very fairly in his blog article about the graphs with these comments:

“… a streamgraph is a fantastic solution to displaying large data sets to a mass audience.”

“The main problem facing static streamgraphs lies in the difficulty of reading data points formed by uncommon shapes.”

Tools: D3, Processing

Paper: ThemeRiver: Visualizing Theme Changes over Time,

Stacked Graphs – Geometry & Aesthetics

Example: The Ebb and Flow of MoviesHow Different Groups Spend Their Day, Trace (this one is about visualizing wireless networks)

 

Stacked Graph

Small Multiples are multiple time-series graphs (what kind these graphs are is another question, in this case, area charts) arranged within a grid. Small multiples are more use full to understand different datasets on its own and not as a summary apposed to the stacked graphs.

Small Multiples

The last example from the article are horizon graphs. These are actual also area charts which are mirrored and separated by occupacity. This is especially interesting in combination with small multiples because the “data density” is much higher than which classic area charts which leads to more information in a smaller space. An important factor when we are dealing with big datasets.

Horizon Graph

There is some interesting research about the usefulness of horizon graphs that I recommend: ToolPaperArticle

 

The list of graphics from the Stanford Group are much more contemporary than the examples from Nathan Yau, but still all of these examples use the same mechanism to visualize time-series data by using one axis as a dimension for time. This now more than 1.000 years old way to visualize time is helpful and very common but might not always be the best choice. As we know from scatter-plot visualizations our two space dimensions within a graphic are maybe the most powerful ones for pattern recognition and time might not be the main factor to identify these patterns. So what other ways are there to use time as a dimension within a visualization a part from space?

Animation:
At least since Hans Roslings famous TED talks the usage of animation for displaying time is common and it seams to be the most obvious way to visualize time very literal though time. But the technique needs to be used with caution.
Tamara Munzners visualization principles give a great insight on page 59 why visualizing time with animation is dangerous:

Principle: external cognition vs. internal memory

  • easy to compare by moving eyes between side-by-side views –harder to compare visible item to memory of what you saw

Implications for animation

  • great for choreographed storytelling
  • great for transitions between two states
  • poor for many states with changes everywhere

There is also a paper about the topic which gives more insights into the problem.

Small multiples:
I already mentioned small multiples above but as I raised before the idea behind small multiples is more of a frame for visualizations than an actual kind of visualization. Like this we can also use each multiple as a timeframe. A beautiful example of small multiples with time as a dimension comes from the NYTimes Graphics department.

Binning time in bubbles:
The idea here is to use bubble charts where the time dimension gets binned by minutes, days, years etc. into one bubble and compared to each other. In the Nasdaq 100 Index example each year is represented by one bubble.

Scatterplots:
Scatterplots where time is displayed as connected points against two variables. This is similar to the animation idea. But in this case the animated dots leave behind a path behind. Also here the NYTimes has a good example.

GED VIZ / Boris Müller & Raureif / Bertelsmann Foundation / 2013

GED VIZ is a HTML5 visualization of economic and demographic relations between countries as network relations. It is highly customizable through different datasets, all countries worldwide and based on a time line. The customized graphic can be exported and used externally.

 

GED VIZ

Screen Shot 2013-08-16 at 11.27.14 AM

512 Paths to the White House / Mike Bostock / 2012

Mike Bostock and Shan Cartners 512 Paths to the White House shows all possible paths to victory to the two 2012 US Presidential candidates Mitt Romney and Barack Obama.

512 Paths to the White House

Screen Shot 2013-08-06 at 9.58.43 AM

Every Day Of My Life / Marcin Ignac / 2010

Marcin Ignacs visualization „Every Day of My Life“ is a static poster visualizing his computer usage statistics from the last 2.5 years. Each line represents one day. Colored areas represent different applications while black represent that his computer was turned off. Through this his sleeping patterns, coffee breaks and sleepless nights in front of the computer become visible.

Every Day Of My Life

every-day-of-my-life_years every-day-of-my-life_1

Scatterplot Matrix / Mike Bostock / 2013

The scatterplot matrix visualizations from Mike Bostock match each row within the dataset against each other. By choosing a range within one matrix all selected data-points within each cell gets highlighted.

Scatterplot Matrix

 

Screen Shot 2013-08-08 at 2.42.10 PM Screen Shot 2013-08-08 at 2.41.46 PM

 

Map your moves / Moritz Stefaner / 2010

Map your moves represents more than 4.000 immigration and emigration patterns from over 1.700 people. Each circle represents one zip code in the area in New York the size represent moving from and moving to citizens. The colors represent people moving into the city in red and people moving out of New York in blue.

Map your moves

Screen Shot 2013-08-15 at 4.40.33 PM Screen Shot 2013-08-15 at 4.40.22 PM

Code Red Visualisations

Code Red was a computer worm observed on the internet in July 2001. On the 12th of the month the malware program began to replicated itself to spread to other computers through networks of Microsoft’s IIS web-server. Once a system got attacked the worm checked the system clock of the machine, if the date was between the 1st and the 19th of the month code red generated a random list of IP addresses from a static seed and infected the machines of those IP addresses. From the 20th to the 28th of the month the worm started a Denial-of-Service attack against the website whitehouse.gov. Through a research project at the Interaction Design Laboratories at the University of Applied Sciences Potsdam we tried to find different visualization formats to develop a better understanding of the worm.

Autonomous System Network

Visualisation of 15.000 attacked Autonomous Systems and their connections to each other during the Code Red epidemic. The connectivity of the links is represented by their colour and size. Magenta nodes are only rawly connected. Blue nodes are highly connected autonomous systems also called “hubs”. The connectedness of a node is measured in degrees, how many links do refer and go out from each node. The most attacked node is a not too well connected system within the network, an AS from the Korean Telecom which received 13.835 attacks. It is coloured green within the network. The two most connected nodes are UUNET which was one of the largest Internet providers in the United States it got attacked 10.767 times. And the most connected link toplink GmbH a german VoIP provider which only got attacked 34 times. In many network systems like cells or diseases epidemics spread through the hubs of a system and by doing so also affect those the most. In the chase of code red this can’t be said.

Attacks Radial

All attacks mapped by time and their location in latitude and longitude on a radial layout. Each point represents one attack and the time when it got attacked. The nodes are coloured in by the length of the attack, from red if the system was only attacked for seconds up to 30 hours in blue. All countries with more than 4.000 attacks are mapped around the radial layout by their longitude.

Attacks Timeline

All attacks mapped by time and Autonomous system. The same dataset as the Attacks-Radial-Lat-Lon-Time this time not radial but on a coordinate system. What’s interesting here are the different interpretations we can make from the two datasets. While it becomes clear were the attacks go in the radial version, in this version the anomalies at 17h become much more clearer as well as the abrupt end of the worm after 24h.

Autonomous System Hiveplot

Actually this graphic is not really readable and there are other forms to visualize Autonomous Systems Networks that are more helpful. But in two instances the structuring of the nodes can help to develop an understanding of the network. First it shows how much bigger the two biggest nodes are in the network compared to the rest and it shows the long tail there are a large amount of nodes with only one connection and very little nodes with more than that. This kind of network is very easy to attack and epidemics can spread very quickly.

Visualizing a day of financial transactions on NASDAQ

Design and technology studio Stamen visualized financial transactions of buy and sell data on NASDAQ during a single day.

What’s interesting about this visualization is the density of information that is captured within the dataset and the use of our pattern recognition capabilities to see repetitions and outliers of such a dense set of data.

http://content.stamen.com/visualizing_a_day_of_financial_transactions_on_nasdaq

http://content.stamen.com/visualizing_a_day_of_financial_transactions_on_nasdaq_part_2

For each transaction they mapped:

  • time of the transaction, to the second
  • whether it was buy or sell
  • price of the transaction
  • number of shares traded

Each image represents one minute in time and shows every trade that happens within the timeframe.

Each trade is shown as a circle:

  • Every vertical row is a second in time. So the left hand side of the screen is the beginning of the minute, the middle of the screen is 15 seconds in, and the right hand side of the screen is the end of the minute, with 60 seconds in between.
  • Blue dots are buys, yellow dots are sells
  • The vertical axis is the price of the transaction; the top of the screen is cheaper stocks and the bottom is more expensive stocks.
  • The size of the dot is the number of shares traded; small dots are for a few shares and larger dots are for a larger number of shares.

8:30-8:31 AM
log_minute_60_smThe images always show one minute of transaction. Bursts like this one at 8:30 become easily visible.

 

9:29-9:30 AM
log_minute_149_sm

Before trading opens for the public a dense wave of small transactions happens.

 

9:30-9:31 AM

log_minute_150_sm

Opening of the public trade creates a massive burst of activity.

 

In these visualizations a unique color represents each trader:

minute_515_4The orangish square above shows a single trader perform a burst of concentrated activity within precisely deliniated margins.

A unique color to represent each stock. The data is the same than in the image above. It becomes visible that the single trader trades a wide range of small stocks.

minute_515_4-1

Packetloop

Packetloop is a tool to analyse network traffic through data visualization. It inspects every packet, conversation, protocol and file to find threats and variations from normal traffic. It doesn’t visualize live data rather it is build on file uploads. There are four different ways Packetloop represents the data, by threats, sessions, protocols and files by location. But so far only the threats visualization works.

Screen Shot 2013-05-27 at 6.03.00 PM Screen Shot 2013-05-30 at 11.26.02 AM