This vignette assumes an
understanding of IP addresses and networks. Please consult
vignette("ipaddress-classes", "ipaddress")
for a very basic
introduction.
Data visualization of the IP address space is challenging because there are so many unique addresses (approximately 4.3 billion for IPv4 and 3.8 × 1038 for IPv6). Owing to the hierarchical nature of address space, we must plot the addresses on a discrete scale (not a continuous scale). It’s simply not possible to display (or interpret) such a large number of discrete levels simultaneously.
There are a few actions we can take to improve the situation:
These are handled by the canvas_network
,
pixel_prefix
and curve
arguments of
coord_ip()
, respectively. This vignette describes these
actions in more detail.
As an example, consider the 32-bit representation of the IPv4 address
192.168.0.124
. If we wanted to visualize this single
address within the full context of the IPv4 address space, we’d need to
simultaneously display 232
discrete levels (roughly 4.3 billion).
To reduce the visualized information, we could only show a subnetwork
of the full address space. In our example, we could only display the
192.0.0.0/8
network using
coord_ip(canvas_network = ip_network("192.0.0.0/8"))
. This
would effectively filter addresses where the leading 8 bits match the
specified network, thereby reducing the number of discrete levels to
224 (roughly 16.8
million).
Alternatively, we could make each discrete level represent a network
of addresses. To do this, we’d need to use a summary function to reduce
the network data to a single value. In our example, we could make each
discrete level represent a network with a prefix length of 24 using
coord_ip(pixel_prefix = 24)
. This would effectively neglect
the trailing 8 bits of the 32-bit address, thereby further reducing the
number of discrete levels to 216 (65,536).
These two techniques become even more important in the IPv6 address space, which uses 128-bit addresses.
Note: To prevent accidentally plotting an
unreasonably large number of discrete levels, ggip limits the number of
plotted bits to 24. This means the coord_ip()
arguments
must satisfy:
pixel_prefix - prefix_length(canvas_network) <= 24
Inspired by an xkcd comic originally published in December 2006, we use a space-filling curve to map IP data (one-dimensional) to Cartesian coordinates (two-dimensional). This means our discrete levels become represented by pixels. Two curves are commonly chosen for this task: the Hilbert curve and the Morton curve (also known as the Z curve). Compared to other space-filling curves, these are advantageous because they preserve locality (i.e. subnetworks remain close together).
The curve order represents how nested the curve is and therefore determines how many data points can be visualized. Conversely, choosing the number of plotted bits (see above) determines the order of the curve. Since space-filling curves are fractal, increasing the curve order effectively improves the image resolution (plotted networks remain in the same overall location).
IP data is most commonly displayed on a Hilbert curve because it has optimal locality preservation.
This curve starts in the top-left corner and ends in the top-right
corner. It is chosen using coord_ip(curve = "hilbert")
.
The Morton curve technically offers slightly poorer locality preservation than the Hilbert curve. However, the discontinuous jumps in the curve actually correspond to crossing IP network boundaries. In this sense, the Morton curve is a more natural representation of the IP network structure. For example, the start and end addresses of a network are always located diagonally across from each other.
This curve starts in the top-left corner and ends in the bottom-right
corner. It is chosen using coord_ip(curve = "morton")
.
Finally, let’s consider a specific example.
This coordinate system will use a 2nd order Hilbert curve to
visualize the entire IPv4 address space, where each vertex represents a
/4
network.