Norm: A Brief Introduction to the "Size" of Vectors in Machine Learning

Cover image

A quantitative measure of the length or size of a vector is often necessary in vector and matrix operations in machine learning, and we often just refer to the length of a vector as Vector's Norm.

In mathematics, a norm is a function from a vector space over the real or complex numbers to the nonnegative real numbers, that satisfies certain properties pertaining to scalability and additivity and takes the value zero only if the input vector is zero. A pseudonorm or seminorm satisfies the same properties, except that it may have a zero value for some nonzero vectors. **1

Recently, during our research on adversarial attacks, we needed to quantitatively measure the "perturbation size" between the adversarial and their corresponding benign images. In fact, in machine learning, whether "adversarial samples" or other images can essentially be represented as vectors, stored and computed as Numpy matrices. This article briefly describes some of the p\ell_p norm calculations and implementations.

What is adversarial examples?

Adversarial examples is a type of vulnerability in neural network models. For an image classification model, adversarial examples are produced by adding imperceptible perturbations onto the input images, in order that the model may misclassify the contents of the images.

p\ell_p Norm

The p\ell_p norm is actually a "set of norms" in a vector space 2. In my line of research, p\ell_p norm are also often used to measure the magnitude of the "perturbation" of an adversarial sample. We define all the p\ell_p norm as,

p:Lp(x)=(i=1nxip)1/p,\ell_p: L_p(\vec{x})=(\sum_{i=1}^n |x_i|^p)^{1/p},

in which pp can be 11, 22, and \infty. They are naturally called 1\ell_1 norm、2\ell_2 norm and \ell_{\infty} norm.

0\ell_0 Norm

Technically, 0\ell_0 norm is not norm (because, by definition, pp in 1/p1/p cannot be 0). Still, this norm represents the number of non-zero elements in the vector. Then in the context of adversarial attack, it represents the number of non-zero elements in the "perturbation" vector.

1\ell_1 Norm

The 1\ell_1 norm represents the sum of the lengths of all the vectors in a vector space. A better description would be that in a vector space, you need to walk from the start of one vector to the end of another, so the distance you travel (the total length through the vector) is the 1\ell_1 norm of the vector.

As shown above, 1\ell_1 can be calculated according to the following equation,


much the way a New York taxicab travels along its route. Therefore, the 1\ell_1 norm is also known as the Taxicab norm or Manhattan norm. Generally, it is formulated as,

1(x)=x1=i=1nxi.\ell_1(\vec{x})=\|\vec{x}\|_1= \sum^n_{i=1} |x_i|.

2\ell_2 norm

The 2\ell_2 norm is one of the more commonly used measures of vector size in the field of machine learning. 2\ell_2 norm, also known as the Euclidean norm, represents the shortest distance required to travel from one point to another.

In the example shown above, the 2\ell_2 paradigm is calculated according to the following equation,


More generally, it is formulated as,


\ell_\infty norm

\ell_\infty norm is the easiest to understand, i.e. the length (size) of the element with the largest absolute value inside the vector element.


For example, given a vector v=[10,3,5]\vec{v}=[-10,3,5], the \ell_\infty norm of the vector is 1010.

Code Implementation

In my research, I tend to use the p\ell_p norm to measure the perturbation size of adversarial examples. Unfortunately, instead of relying on any framework, we implement out attack from scratch, which means no automatically output direct distance value for all p\ell_p paradigm calculations, so I need to Numpy to calculate them instead.

For an image img, and its adversary example adv, we can easily compute the perturbation perturb.

# perturb is a numpy array
perturb = adv - img

Then, we can use Numpy to compute the p\ell_p norm of the perturbation perturb.

# import numpy and relevant libraries
import numpy as np
from numpy.linalg import norm

# L0
_l0 = norm(perturb, 0)
# L1
_l1 = norm(perturb, 1)
# L2
_l2 = norm(perturb)
# L∞
_linf = norm(perturb, np.inf)

In fact, the implementation of numpy.linalg.norm is just a vector operation that uses the definition of p\ell_p. For example, 1(x)\ell_1(x) is just,

_l1_x = np.sum(np.abs(x))

and (x)\ell_\infty(x) is just,

_linf_x = np.max(np.abs(x))

and etc. Howver during production, without special cases, we should directly use numpy.linalg.norm. On top of that, Numpy's official documentation also gives methods for calculating p\ell_p norms in different cases for matrices and vectors.

◀ Möbius Inversion and BeyondLIB-HFI Docs ▶