This vignette assumes an understanding of IP addresses and networks. Please consult
vignette("ipaddress-classes", "ipaddress") for a very basic introduction.
Data visualization of the IP address space is challenging because there are so many unique addresses (approximately 4.3 billion for IPv4 and 3.8 × 1038 for IPv6). Owing to the hierarchical nature of address space, we must plot the addresses on a discrete scale (not a continuous scale). It’s simply not possible to display (or interpret) such a large number of discrete levels simultaneously.
There are a few actions we can take to improve the situation:
These are handled by the
curve arguments of
coord_ip(), respectively. This vignette describes these actions in more detail.
As an example, consider the 32-bit representation of the IPv4 address
192.168.0.124. If we wanted to visualize this single address within the full context of the IPv4 address space, we’d need to simultaneously display 232 discrete levels (roughly 4.3 billion).
To reduce the visualized information, we could only show a subnetwork of the full address space. In our example, we could only display the
192.0.0.0/8 network using
coord_ip(canvas_network = ip_network("192.0.0.0/8")). This would effectively filter addresses where the leading 8 bits match the specified network, thereby reducing the number of discrete levels to 224 (roughly 16.8 million).
Alternatively, we could make each discrete level represent a network of addresses. To do this, we’d need to use a summary function to reduce the network data to a single value. In our example, we could make each discrete level represent a network with a prefix length of 24 using
coord_ip(pixel_prefix = 24). This would effectively neglect the trailing 8 bits of the 32-bit address, thereby further reducing the number of discrete levels to 216 (65,536).
These two techniques become even more important in the IPv6 address space, which uses 128-bit addresses.
Note: To prevent accidentally plotting an unreasonably large number of discrete levels, ggip limits the number of plotted bits to 24. This means the
coord_ip() arguments must satisfy:
pixel_prefix - prefix_length(canvas_network) <= 24
Inspired by an xkcd comic originally published in December 2006, we use a space-filling curve to map IP data (one-dimensional) to Cartesian coordinates (two-dimensional). This means our discrete levels become represented by pixels. Two curves are commonly chosen for this task: the Hilbert curve and the Morton curve (also known as the Z curve). Compared to other space-filling curves, these are advantageous because they preserve locality (i.e. subnetworks remain close together).
The curve order represents how nested the curve is and therefore determines how many data points can be visualized. Conversely, choosing the number of plotted bits (see above) determines the order of the curve. Since space-filling curves are fractal, increasing the curve order effectively improves the image resolution (plotted networks remain in the same overall location).
IP data is most commonly displayed on a Hilbert curve because it has optimal locality preservation.
This curve starts in the top-left corner and ends in the top-right corner. It is chosen using
coord_ip(curve = "hilbert").
The Morton curve technically offers slightly poorer locality preservation than the Hilbert curve. However, the discontinuous jumps in the curve actually correspond to crossing IP network boundaries. In this sense, the Morton curve is a more natural representation of the IP network structure. For example, the start and end addresses of a network are always located diagonally across from each other.
This curve starts in the top-left corner and ends in the bottom-right corner. It is chosen using
coord_ip(curve = "morton").
Finally, let’s consider a specific example.
coord_ip( canvas_network = ip_network("0.0.0.0/0"), pixel_prefix = 4, curve = "hilbert" )
This coordinate system will use a 2nd order Hilbert curve to visualize the entire IPv4 address space, where each vertex represents a