3 articles Articles posted in Visualization Work

PixelCarpet

The paper about the Pixel Carpet is one of the results from a collaboration between data visualization researchers from FHP and computer security engineers of various institutions. It builds on the observation that security engineers know their data and the requirements of their work very well. However, they might not be acquainted with advanced visualization techniques. Visualization researchers, on the other hand, know methods to visualize and analyze data but usually lack insight into the specific requirements of computer network security. The paper revolves around two main contributions:

  • results and learnings from a co-creative approach of jointly developing visualizations
  • a pixel oriented visualization technique that graphically represents multi-dimensional data sets (such as computer log files), reflecting ideas from the collaboration

You can get and read the full paper here (27 MB or 4 MB without video). Please feel free to comment to this post or contact us for any details.

Landstorfer, Herrmann, Stange, Dörk, Wettach (2014): Weaving a Carpet from Log Entries: a Network Security Visualization Built with Co-Creation. in Visual Analytics Science and Technology (VAST), 2014 IEEE Conference on, 2014 (to appear)

Co-creative Approach

User centered approaches are well known in the visualization community (although not always implemented) [D'Amico et al. 2005, Munzner et al. 2009]. Jointly developing the visualizations themselves, however, is rather rare. As we have very good experience with co-creative techniques in design and innovation, we wanted to apply them to the domain of data visualization as well. For example, we tried to experiment with data sets during a day-long workshop with a larger group of stakeholders (a session we called the “data picnic” because everyone brought his/her data and tools).

Visualization

For this paper, we focused on a pixel oriented technique [Keim 2000] to fullfill requirements such as visualization of raw data or a chronological view of data to preserve the course of events. We stack graphical representations for various parameters of a log line (such as IP, user name, request or message) so that we get small columns for each log line. Lining up these stacks produces a dense visual representation with distinct patterns. This is why we call it the Pixel Carpet. Other subgroups of our research group took different approaches that can be found at other places in this blog.

Snapshot of the Pixel Carpet interface. Each "multi pixel" represents one log line, as it a appears at the bottom of the screen.Snapshot of the Pixel Carpet interface. Each “multi pixel” represents one log line, as it a appears at the bottom of the screen.

Data and Code

Our data sources included an ssh log (~13.000 lines, unpublished for privacy reasons) and an Apache (web server) access log (~145.000 lines, unpublished), and ~4.500 lines (raw data available, including countries from ip2geo .csv | .json ).

We implemented our ideas in a demonstrator in plain HTML/JavaScript (demo online – caution, will heavily stress your CPU). It helped us iterate quickly and evaluate the idea at various stages, also with new stakeholders. While the code achieves what we need, we are also aware that computing performance is rather bad. If you want to take a look or even improve it, you can find it on github.

To bring it closer to a productive tool, we would turn the Pixel Carpet into a plugin for state-of-the-art data processing engines such as ElasticSearch/Kibana or splunk (scriptable with d3.js since version 6).

IPython: interactive/self-documenting data analysis

IPython is an “interactive” framework for writing python code. Code snippets can be run at the programmer’s will and the output will be displayed right below the code. Together with rich input from html-markup to iFrames, an entire workflow can be fully documented. This is very handy for learning, of course, but also to make a complex analysis of a computer incident available and transparent to later readers. As everything (docu, code, output) gets “statically” saved in JSON, the documentation is even independent of the availability of data sources. (Note: there is also a special “Notebook viewer” available online so the reader doesn’t have to know/have IPython her/himself)

As a couple of powerful viz and analysis libraries are available for Python (such as PANDAS), this is (almost) ideal for recording an analysts way to a result.

Ideas for improvement:

  1. make it even more interactive/auto-updating so that changes in one place (“cell”) show up in other places at once (maybe even work with realtime sources?) – maybe towards frameworks like puredata/MAX: this would help explore various parameters for the analysis functions.
  2. Think about some auto-recording functions so that documentation becomes easier and the “author” has to think less about it. This might be especially possible in the narrow context of network security analysis where certain procedures are standardized or very common.

See how it works, e.g. with PCAPS (German)

Thanks to Genua who shared their internal training so well recorded and so generously!

Tags: , , , ,

Code Red Visualisations

Code Red was a computer worm observed on the internet in July 2001. On the 12th of the month the malware program began to replicated itself to spread to other computers through networks of Microsoft’s IIS web-server. Once a system got attacked the worm checked the system clock of the machine, if the date was between the 1st and the 19th of the month code red generated a random list of IP addresses from a static seed and infected the machines of those IP addresses. From the 20th to the 28th of the month the worm started a Denial-of-Service attack against the website whitehouse.gov. Through a research project at the Interaction Design Laboratories at the University of Applied Sciences Potsdam we tried to find different visualization formats to develop a better understanding of the worm.

Autonomous System Network

Visualisation of 15.000 attacked Autonomous Systems and their connections to each other during the Code Red epidemic. The connectivity of the links is represented by their colour and size. Magenta nodes are only rawly connected. Blue nodes are highly connected autonomous systems also called “hubs”. The connectedness of a node is measured in degrees, how many links do refer and go out from each node. The most attacked node is a not too well connected system within the network, an AS from the Korean Telecom which received 13.835 attacks. It is coloured green within the network. The two most connected nodes are UUNET which was one of the largest Internet providers in the United States it got attacked 10.767 times. And the most connected link toplink GmbH a german VoIP provider which only got attacked 34 times. In many network systems like cells or diseases epidemics spread through the hubs of a system and by doing so also affect those the most. In the chase of code red this can’t be said.

Attacks Radial

All attacks mapped by time and their location in latitude and longitude on a radial layout. Each point represents one attack and the time when it got attacked. The nodes are coloured in by the length of the attack, from red if the system was only attacked for seconds up to 30 hours in blue. All countries with more than 4.000 attacks are mapped around the radial layout by their longitude.

Attacks Timeline

All attacks mapped by time and Autonomous system. The same dataset as the Attacks-Radial-Lat-Lon-Time this time not radial but on a coordinate system. What’s interesting here are the different interpretations we can make from the two datasets. While it becomes clear were the attacks go in the radial version, in this version the anomalies at 17h become much more clearer as well as the abrupt end of the worm after 24h.

Autonomous System Hiveplot

Actually this graphic is not really readable and there are other forms to visualize Autonomous Systems Networks that are more helpful. But in two instances the structuring of the nodes can help to develop an understanding of the network. First it shows how much bigger the two biggest nodes are in the network compared to the rest and it shows the long tail there are a large amount of nodes with only one connection and very little nodes with more than that. This kind of network is very easy to attack and epidemics can spread very quickly.