Norm: A Brief Introduction to the "Size" of Vectors in Machine Learning

A quantitative measure of the length or size of a vector is often necessary in vector and matrix operations in machine learning, and we often just refer to the length of a vector as Vector's Norm.

In mathematics, a norm is a function from a vector space over the real or complex numbers to the nonnegative real numbers, that satisfies certain properties pertaining to scalability and additivity and takes the value zero only if the input vector is zero. A pseudonorm or seminorm satisfies the same properties, except that it may have a zero value for some nonzero vectors. **1

Recently, during our research on adversarial attacks, we needed to quantitatively measure the "perturbation size" between the adversarial and their corresponding benign images. In fact, in machine learning, whether "adversarial samples" or other images can essentially be represented as vectors, stored and computed as Numpy matrices. This article briefly describes some of the $\ell_p$ norm calculations and implementations.

Adversarial examples is a type of vulnerability in neural network models. For an image classification model, adversarial examples are produced by adding imperceptible perturbations onto the input images, in order that the model may misclassify the contents of the images.

$\ell_p$ Norm

The $\ell_p$ norm is actually a "set of norms" in a vector space 2. In my line of research, $\ell_p$ norm are also often used to measure the magnitude of the "perturbation" of an adversarial sample. We define all the $\ell_p$ norm as,

$\ell_p: L_p(\vec{x})=(\sum_{i=1}^n |x_i|^p)^{1/p},$

in which $p$ can be $1$, $2$, and $\infty$. They are naturally called $\ell_1$ norm、$\ell_2$ norm and $\ell_{\infty}$ norm.

$\ell_0$ Norm

Technically, $\ell_0$ norm is not norm (because, by definition, $p$ in $1/p$ cannot be 0). Still, this norm represents the number of non-zero elements in the vector. Then in the context of adversarial attack, it represents the number of non-zero elements in the "perturbation" vector.

$\ell_1$ Norm

The $\ell_1$ norm represents the sum of the lengths of all the vectors in a vector space. A better description would be that in a vector space, you need to walk from the start of one vector to the end of another, so the distance you travel (the total length through the vector) is the $\ell_1$ norm of the vector.

As shown above, $\ell_1$ can be calculated according to the following equation,

$\ell_1(\vec{v})=\|\vec{v}\|_1=|a|+|b|$

much the way a New York taxicab travels along its route. Therefore, the $\ell_1$ norm is also known as the Taxicab norm or Manhattan norm. Generally, it is formulated as,

$\ell_1(\vec{x})=\|\vec{x}\|_1= \sum^n_{i=1} |x_i|.$

$\ell_2$ norm

The $\ell_2$ norm is one of the more commonly used measures of vector size in the field of machine learning. $\ell_2$ norm, also known as the Euclidean norm, represents the shortest distance required to travel from one point to another.

In the example shown above, the $\ell_2$ paradigm is calculated according to the following equation,

$\ell_2(\vec{v})=\|\vec{v}\|_2=\sqrt{(|a|^2+|b|^2)}$

More generally, it is formulated as,

$\ell_2(\vec{x})=\|\vec{x}\|_2=\sqrt{(|x_1|^2+|x_2|^2+...+|x_n|^2)}.$

$\ell_\infty$ norm

$\ell_\infty$ norm is the easiest to understand, i.e. the length (size) of the element with the largest absolute value inside the vector element.

$\ell_\infty(\vec{v})=\|\vec{v}\|_\infty=\max(|a|,|b|)$

For example, given a vector $\vec{v}=[-10,3,5]$, the $\ell_\infty$ norm of the vector is $10$.

Code Implementation

In my research, I tend to use the $\ell_p$ norm to measure the perturbation size of adversarial examples. Unfortunately, instead of relying on any framework, we implement out attack from scratch, which means no automatically output direct distance value for all $\ell_p$ paradigm calculations, so I need to Numpy to calculate them instead.

For an image img, and its adversary example adv, we can easily compute the perturbation perturb.

# perturb is a numpy array
perturb = adv - img

Then, we can use Numpy to compute the $\ell_p$ norm of the perturbation perturb.

# import numpy and relevant libraries
import numpy as np
from numpy.linalg import norm

# L0
_l0 = norm(perturb, 0)
# L1
_l1 = norm(perturb, 1)
# L2
_l2 = norm(perturb)
# L∞
_linf = norm(perturb, np.inf)

In fact, the implementation of numpy.linalg.norm is just a vector operation that uses the definition of $\ell_p$. For example, $\ell_1(x)$ is just,

_l1_x = np.sum(np.abs(x))

and $\ell_\infty(x)$ is just,

_linf_x = np.max(np.abs(x))

and etc. Howver during production, without special cases, we should directly use numpy.linalg.norm. On top of that, Numpy's official documentation also gives methods for calculating $\ell_p$ norms in different cases for matrices and vectors.