Skip to content

Tensor Trains, mostly as probability distributions

License

Notifications You must be signed in to change notification settings

stecrotti/TensorTrains.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TensorTrains.jl

Build Status codecov

⚠️ This package is still work in progress, some breaking changes should be expected.

What is a Tensor Train?

A Tensor Train is a type of tensor factorization involving the product of 3-index tensors organized on a one-dimensional chain. In the context of function approximation and probability, a function of $L$ discrete variables is in Tensor Train format if it is written as

$$f(x^1, x^2, \ldots, x^L) = \sum_{a^1,a^2,\ldots,a^{L-1}} [A^1(x^1)]_{a^1}[A^2(x^2)]_{a^1,a^2}\cdots [A^{L-1}(x^{L-1})]_{a^{L-2},a^{L-1}}[A^L(x^L)]_{a^{L-1}}$$

where, for every choice of $x^l$, $A^l(x^l)$ is a real-valued matrix and the matrix sizes must be compatible. The first matrix must have 1 row and the last matrix should have 1 column, such that the product correctly returns a scalar.

The Tensor Train factorization can be used to parametrize probability distributions, which is the main focus of this package. In this case, $f$ should be properly normalized and always return a non-negative value.

Tensor Trains with Periodic Boundary Conditions

A slight generalization, useful to describe systems with periodic boundary conditions is the following:

$$f(x^1, x^2, \ldots, x^L) = \sum_{a^1,a^2,\ldots,a^{L}} [A^1(x^1)]_{a^1,a^2}[A^2(x^2)]_{a^2,a^3}\cdots [A^{L-1}(x^{L-1})]_{a^{L-1},a^{L}}[A^L(x^L)]_{a^{L},a^1}$$

In other words, to evaluate $f$ one takes the trace of the product of matrices.

Notation and terminology

Tensor Trains are the most basic type of Tensor Network. Tensor networks are a large family of tensor factorizations which are often best represented in diagrammatic notation. For this reason, the term bond is used interchangeably as index. The indices $a^1,a^2,\ldots,a^{L-1}$ are usually called the virtual indices, while $x^1, x^2, \ldots, x^L$ are the physical indices.

Tensor Trains are used to parametrize wavefunctions in many-body quantum physics. The resulting quantum state is called Matrix Product State. In such context, the entries are generally complex numbers, and a probability can be obtained for a given state by taking the squared absolute value of the wavefunction.

In this package we focus on the "classical" case where the Tensor Train directly represents a probability distribution $p(x^1, x^2, \ldots, x^L)$.

Efficient computation

Given a Tensor Train some simple recursive strategies can be employed to do the following operations in time $\mathcal O (L)$

Compute the normalization

$$Z = \sum_{x^1, x^2, \ldots, x^L} \sum_{a^1,a^2,\ldots,a^{L-1}} [A^1(x^1)]_{a^1}[A^2(x^2)]_{a^1,a^2}\cdots [A^{L-1}(x^{L-1})]_{a^{L-2},a^{L-1}}[A^L(x^L)]_{a^{L-1}}$$

such that

$$\begin{aligned} 1&=\sum_{x^1, x^2, \ldots, x^L}p(x^1, x^2, \ldots, x^L)\\&=\sum_{x^1, x^2, \ldots, x^L}\frac1Z \sum_{a^1,a^2,\ldots,a^{L-1}} [A^1(x^1)]_{a^1}[A^2(x^2)]_{a^1,a^2}\cdots [A^{L-1}(x^{L-1})]_{a^{L-2},a^{L-1}}[A^L(x^L)]_{a^{L-1}} \end{aligned}$$

Compute marginals

Single-variable

$$p(x^l=x) = \sum_{x^1, x^2, \ldots, x^L} p(x^1, x^2, \ldots, x^L) \delta(x^l,x)$$

and two-variable

$$p(x^l=x, x^m=x') = \sum_{x^1, x^2, \ldots, x^L} p(x^1, x^2, \ldots, x^L) \delta(x^l,x)\delta(x^m,x')$$

Extract samples

Via hierarchical sampling

$$p(x^1, x^2, \ldots, x^L) = p(x^1)p(x^2|x^1)p(x^3|x^1,x^2)\cdots p(x^L|x^1,x^2,\ldots,x^{L-1})$$

by first sampling $x^1\sim p(x^1)$, then $x^2\sim p(x^2|x^1)$ and so on.

What can this package do?

This small package provides some utilities for creating, manipulating and evaluating Tensor Trains interpreted as functions, with a focus on the probabilistic side. Each variable $x^l$ is assumed to be multivariate. Whenever performing some probability-related operation, it is responsability of the user to make sure that the Tensor Train always represents a valid probability distribution.

Common operations are:

  • evaluate a Tensor Train at a given set of indices
  • orthogonalize_left!, orthogonalize_right!: bring a Tensor Train to left/right orthogonal form
  • compress! a Tensor Train using SVD-based truncations
  • normalize! a Tensor Train in the probability sense (not in the $L_2$ norm sense!), see above
  • sample from a Tensor Train intended as a probability ditribution, see above
  • +,-: take the sum/difference of two TensorTrains

Example

Let's construct and initialize at random a Tensor Train of the form

$$f\left((x^1,y^1), (x^2,y^2), (x^3,y^3)\right) = \sum_{a^1,a^2} [A^1(x^1,y^1)]_{a^1}[A^2(x^2,y^2)]_{a^1,a^2}[A^3(x^3,y^3)]_{a^2}$$

where $x^l\in\{1,2\}, y^l\in\{1,2,3\}$.

using TensorTrains
L = 3        # length
q = (2, 3)   # number of values taken by x, y
d = 5        # bond dimension
A = rand_tt(d, L, q...)    # construct Tensor Train with random positive entries
xy = [[rand(1:qi) for qi in q] for _ in 1:L]    # random set of indices
p = evaluate(A, xy)    # evaluate `A` at `xy`
compress!(A; svd_trunc = TruncThresh(1e-8));    # compress `A` to reduce the bond dimension
pnew = evaluate(A, xy)
ε = abs( (p - pnew)/p )

References

Related packages

  • TensorTrains.jl: conceived for the application of Tensor Train decomposition to elliptic PDEs, does not cover anything related to probability
  • Tensor-Train-Julia: less lightweight, mostly designed for quantum applications, still WIP
  • Itensors.jl: a full-fledged Tensor Network library, mostly designed for quantum applications. Interface is more intuitive, but likely less efficient if all you need to do is simple operations on 1D Tensor Networks