Links und Funktionen

You are here: Home / Teaching / Winter 2016/17 / Lecture notes - WS 2016 / Convolution Codes II


Convolution Codes II

Plain Text icon Convol2.txt — Plain Text, 16 KB (16479 bytes)

File contents

Decoding of convolutional codes

As the convolutional codes show no (known) algebraic structure beyond
linearity, one cannot exploit such structure for decoding. As a
result, one uses general search methods such as dynamic programming,
branch&bound etc to find an input that haas minimal Hamming distance
from the received word. 

These methods are often generalized to probabilities.  One then
presupposes a certain distribution of channel disturbances and
sometimes of inputs and decodes to that input which is most likely
given the received data (Maximum Likelihood). In some cases this
constitutes an advantage over algebraic decoding procedures used with
block codes. Furthermore, these methods can be generalised to
multi-level inputs and outputs, e.g. different signal or voltage
levels (oft-input, soft output (SISO)). In this course, we only look
at the binary case and Hamming distance, though.

In practice one does not encode the entire information as a single
word but partitions it into frames of a certain length, e.g. 256bits
that are transmitted and then decoded as a whole. When coding the next
frame the encoder recommences in the initial state henceforth denoted
0s. Decoding becomes easier if the final state the encoder is in is
known to the decoder. For this reason the encoder is placed into the
zero state by appending M zeros to the message where M is the length
of influence. Thus, with a frame size 256 Bit and a length of
influence of 7 only 249 bits can be transmitted which are then
followed by 7 zeroes.

Definition: Consider an (n,k)-convolutional encoder with M*k shift
registers.  Its *state set* is Q={0,1}^{M*k}. If q:Q is a state and xs
: ({0,1}^k)^* is an input then we denote delta(q,xs) the state reached
after encoding xs when starting in state q. We denote
out(q,xs):({0,1}^n)^* the resulting output.

We denote xs.xs' the concatenation of sequences xs and xs'. 

It is clear that 

delta(q,xs.xs') = delta(delta(q,xs),xs')
out(q,xs.xs') = out(q,xs).out(delta(q,xs),xs')

Now, for ys:({0,1}^n)^* we define 

M(q1,q2,ys) = min{d(out(q1,xs),ys) | delta(q1,xs)=q2, |xs|=|ys|}

Thus, M(q1,2,ys) is the minimum distance of ys from a codeword
provided that encoding started and ended in q1 and q2 respectively.

NB: d(x,y) = Hamming distance between u und v. 

The following recurrence holds: 

M(q1,q2,()) = if q1=q2 then 0 else oo
M(q1,q2,ys.a) = min{M(q1,q,ys) + d(a,out(q,q2,x)) | x:{0,1}^k, q:Q}

The following recurrence allows one to actually compute input
sequences that witness these minimal distances (or error numbers).
D(q1,q2,()) = ()
D(q1,q2,ys.a) = D(q1,q,ys).x, where q and x are chosen in such a way that
M(q1,q,ys)+d(a,out(q,q2,x)) = M(q1,q2,ys.a) = 
     min{M(q1,q,ys) + d(a,out(q,q2,x)) | x:{0,1}^k, q:Q}

Decoding thus amounts to computing D(0s,0s,zs) where zs with |zs|=m
is the output received.

If one knows parts of the input sequence (e.g. at the end of a frame)
one can use this fact upon decoding by setting x to the corresponding

Viterbi - Algorithm

The Viterbi Algorithm consists of evaluating the above recurrences in
a bottom-up iterative fashion using dynamic programming.  Thus, given
a received message zs with |zs|=m then one tabulates D(0s,q2,ys) und
M(0s,q2,ys) for q:Q and ys a prefix of zs.  Starting from the entries
with |ys|=0 this table can then be successively filled up using the
recurrences until finally the desired entry D(0,0,zs) has been
found. Of course, in order to save space the M-entries will not
contain the entire xs-sequence but only the respective last symbol and
a pointer to the corresponding previous M-entry. Regardless of this
optimization, the size of the table remains considerable namely O(|Q|
* m) = O(2^M * m).

The operation of the Viterbi algorithm in the case k=1 an be
visualized with the *trellis diagram*. This is a graph with |Q|*(m+1)
nodes arranged in m+1 columns of height |Q|=2^L and labelled with
(q,i) for q:Q and 0<=i<=m.

A dotted/solid line links (q1,i) and (q2,i+1), if delta(q1,0)=q2 /
delta(q1,1)=q2. In each case the edges are labelled with the
corresponding output.

For decoding one places the received message under the trellis diagram
and labels each edge with the Hamming distance of the output segment
(of size n) found there and the corresponding segment of the received
message. Then the trellis diagram can be filled with the D- and
M-values starting from the left.

Example (from Heise-Quattrocchi) for the encoder corresponding to 

                  ( 1+D^2 )
             G1 = (       )

and m=11, thus m-L=9. The entry 111010001 is padded with two zeroes
and then encoded to the sequence

 11 10 01 10 00 01 11 00 11 01 11

We assume that the received message is 

 11 00 11 10 00 11 11 00 11 00 11
    ^  ^        ^            ^
with errors having occurred at the marked positions. 

Here is the corresponding trellis diagram: 

    |   11    00    11    10    00    11    11    00    11    00    11 
    |\     \      \   ; \    ; \    ; \    ; \    ; \    ;      ;      ; 
    |  0     2     0 ;    1 ;    2 ;    0 ;    0 ;    2 ;      ;      ;
    |    \     \    ;\     ;\     ;\     ;\     ;\     ;\     ;      ; 
[10]|      *.    *. 0  *. 1  *. 2  *. 0  *. 0  *. 2  *. 0  *. 2  *. 0  
    |      |.    |.  2 |.  1 |.  0 |.  2 |.  2 |.  0 |.  2  .    .    
    |      \ .   \;./  \;./  \;./  \;./  \;./  \;./  \;./   ;.   ;. 
    |       \  1 ;\/ 1 ;\/ 2 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ;  1 
[01]|         \  *-\  .*-\  .*-\  .*-\  .*- \ .*- \ .*-\  .*    .*
    |          \    \,    \,    \,    \,     \,    \,    \,     ,
    |              1 \    0\   1 \   1 \    1 \   1 \   1 \   1  
    |           1|.   1|.   0.    1|.   1|.   1|.   1|.   1|.   
[11]|            *--1--*--2--*--1--*--1--*--1--*--1--*--1--*

The stars are now replaced with the corresponding D-values starting
from the left and shown here for the first three columns:

    |   11    00    11    10    00    11    11    00    11    00    11 
    |\     \      \   ; \    ; \    ; \    ; \    ; \    ;      ;      ; 
    |  0     2     0 ;    1 ;    2 ;    0 ;    0 ;    2 ;      ;      ;
    |    \     \    ;\     ;\     ;\     ;\     ;\     ;\     ;      ; 
[10]|      0.    4. 0  2. 1  *. 2  *. 0  *. 0  *. 2  *. 0  *. 2  *. 0  
    |      |.    |.  2 |.  1 |.  0 |.  2 |.  2 |.  0 |.  2  .    .    
    |      \ .   \;./  \;./  \;./  \;./  \;./  \;./  \;./   ;.   ;. 
    |       \  1 ;\/ 1 ;\/ 2 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ;  1 
[01]|         \  1-\  .2-\  .*-\  .*-\  .*- \ .*- \ .*-\  .*    .*
    |          \    \,    \,    \,    \,     \,    \,    \,     ,
    |              1 \    0\   1 \   1 \    1 \   1 \   1 \   1  
    |           1|.   1|.   0.    1|.   1|.   1|.   1|.   1|.   
[11]|            1--1--2--2--*--1--*--1--*--1--*--1--*--1--*

At that point the codeword can be read off. In this case, a complete
reconstruction was possible.

Decoding with recursive encoders

For recursive encoders one needs a possibility for resetting them to the 
zero state. This can be done using a special gate that switches the feedback offso that the incoming zero tail then resets the encoder like in the non recursive case. The effect of this gate must then be taken into account in the final columns of the trellis diagram. 

The Viterbi algorithm is relatively robust against such modifications
since it makes hardly any assumptions about the structure of the
encoder. It would work equally well on encoders based on an arbitrary
finite state machine including those that do not stem from fed-back
shift registers. In particular, it makes no assumption on linearity.

Hidden Markov

The Viterbi algorithm is also used in order to estimate the sequence of states in a hidden Markov model. Example: suppose the weather can be rainy, sunny, or cloudy, and there are activities cycling, shopping, television.
For each weather situation we have a probability depending on the
*previous* weather situation. For example, Pr(rainy|rainy) = 0.7, Pr(sunny|rainy) = 0.1, Pr(cloudy|rainy) = 0.2, Pr(rainy|sunny) = 0.3... 
For each activity we have a probability depending on the weather situation, e.g. Pr(rainy|cycling)=0.1, Pr(sunny|cycling)=0.5, etc.

Now, given an observed sequence of activities we want to estimate the
most likely sequence of weather situations (which are assumed
unknown). E.g. a friend tells you on the phone what they have been
doing the last few days and you want to draw some conclusions about
the weather during that time. "Serious" applications exist in speech
recognition: "weather" <-> phoneme uttered, "activity" <-> signal

Fano - Algorithm

The space usage of the Viterbi algorithm of O(2^L*m) becomes
impracticable with large L. Fano's algorithm uses less space but more
time. In its simplest version it computes for each possible entry the
Hamming distance of its encoding to the received message and takes the

In practice one just searches for an input such that its encoding has a
Hamming distance less or equal to a fixed bound h.
All entries with a prefix that in itself yields a
Hamming distance > h can be discarded. Thus, if one works off the
possible entries using a depth-first search on a binary tree of depth
m-L then a subtree whose prefix exceeds the  bound may be

Here is a recursive implementation of Fano's algorithm for the case k=1. 

Fano(zs,xs,d,q,h) =

/* returns entry us extending xs such that the encoding of has
distance <= h from zs. Fail if no such us exists.*/

/* d is the distance of the encoding of xs from the corresponding
prefix of zs. q is the state reached after working off xs. These
arguments are a function of zs and xs and added only to improve
efficiency. */

if (d>h) return  Fail;

t :=  the distance between out(q,0) and the corresponding part of zs.
/* t:{0..n} */

res0 := Fano(zs,xs.0,d+t,delta(q,0),h);

if (|xs| >= m - L) return res0;

if (res0 != Fail) return res0;

t :=  the distance between out(q,1) and the corresponding part of zs.

res1 := Fano(zs,xs.1,d+t,delta(q,1),h);
return res1; 

/* End of algorithm */

In order to find the best possible decoding one calls Fano(zs,[],h)
for h=0,1,2,... until for the first time a proper value (not Fail) is

In practice one uses bigger steps, e.g. h=1,2,4,7,11,... and obtains a
more efficient but suboptimal decoding.


Appending the M so-called tail bits can lead to unacceptable waste in
the case of a long length of influence M. In this case, one may either
not use tail bits which results in having to estimate the final
state. Thus, one chooses q so that D(0s,q,zs) is
minimal. Alternatively, one can use circular termination, also known
as tailbiting, which we shall now describe.

This method consists of choosing the initial state of the encoder
depending on the frame to be coded in such a way that the final state
will be equal to this initial one. Thus, in order to transmit us one
first determines q such that delta(q,us)=q and then sends
out(q,us). Of course this requires processing the input twice and thus
appropriate buffering.

The advantage of tailbiting as opposed to estimating the final state
as described above is that with the latter method the error correction
properties are inhomogeneously distributed across the frame. Errors in
the last few bits of the frame cannot be corrected very well since not
much is known about the states surrounding them. With tailbiting the
error correction properties are spread more evenly. It would be nice
to confirm this experimentally.

In order to calculate the circulation state q (depending us) and also
to ascertain its existence we use linearity. Writing states as column
vectors one finds an L x L matrix A and an L x k matrix B each with
entries from GF(2)={0,1} such that:

                     delta(q,x) = Aq+Bx

Thus, for xs = [x_0;...;x_{m-1}] one has:

delta(q,xs) =  A^m q + sum(A^{j-1}Bx_{m-j}, j=1..m)

E.g.: delta(q,[x_0;x_1;x_2]) = 
      A(A(Aq+Bx_0)+Bx_2)+Bx_2 = 
      AAAq + AABx_0 + ABx_1 + Bx_2

Now q = delta(q,xs) if and only if

(E-A^m)q = sum(A^{j-1}Bx_{m-j}, j=1..m) 

with E the unit matrix. 

Since sum(A^{j-1}Bx_{m-j}, j=1..m) = delta(0s,[x_0,...,x_{m-1}]), one
gets the circulation state q according to the formula q :=
(E-A^m)^{-1} delta(0s,xs). 

If (E-A^m) fails to be invertible, tailbiting does not work for the
parameters A and m. The state permutation

                 q_1 |-> (E-A^m)^{-1}q_1

may be tabulated once and for all as a table of size 2^L*2^L. This
then yields the following algorithm for calculating the circulation

1) Encode xs starting from 0s. Let q_1 be the resulting final state. 
2) Return q =  (E-A^m)^{-1}q_1 using the table.

The circulation state is uniquely determined by the input. This gives
the following decoding method.

Compute D(q1,q1,zs) for all states q1 by running the Viterbi algorithm
|Q| times simultaneously. Then choose q so, that D(q,q,zs) is minimal
and decode to M(q,q,zs).

Of course, this method is rather expensive since the Viterbi algorithm
must be run |Q| times. For small frame sizes this is not so
problematic and it is in those cases that tailbits actually matter. 

Puncturing of convolutional codes

An (n,k)-convolutional code has rate k/n, in particular for k=1 the
rate never exceeds 1/2. To achieve larger rates one can delete certain
bits of the frame prior to transmission and replace them with zeroes
at the receiver's end. In general, this will incur additional errors
(in case the deleted bits were not zero) but this can then be
corrected just like the real errors. We thus increase the rate at the
expense ofthe error correction properties

A glimpse at turbo codes

In the beginning of the 90s Glavieux, Berrou, Battail developed a new
kind of concatenated convolutional codes with corresponding
decoders. Due to the feedback circuitry used in the decoders these
were called *turbo codes* and realised hitherto unachieved error
correction properties "near the Shannon limit" with nevertheless
reasonable computational effort.

Here we show a turbo code that appears in the WiFi standard IEEE802.16
(from Berrou: Codes et Turbocodes, Springer 2007).

            ---------------------------------> x1
           |       ----------->(+)--->(+)----> x2
           |      |             |      |
 |             ^         |             |
 |             |--------(+)<-----------
 |              ----------->(+)--->(+)--------> x3
 |             |             |      |
       	       ^         |             |

We recognise two copies of the recursive convolutional encoder G4 from
above.  The unit [Perm] reads a frame of k=24 bits and then, while
reading the next k-bit frame outputs the previous in permuted order
with respect to a fixed permutation. The unit [Buff] buffers a k-bit
frame, i.e., acts like [Perm] but with the identity
permutation. Furthermore, the convolution encoders are initialised
using tailbiting.

We first notice that the turbo code is still linear so that the
crucial parameter is its minimum weight. Intuitively, this should be
rather high for if "by accident" we have an input frame for which the
G4 encoder produces a low weight output then "it is likely" that after
permutation this will no longer be the case so that the other G4
encoder then produces a high weight output and vice versa.

The difficulty lies in the decoding. Due to the spatial interaction
between the different bit positions the comparatively efficient
dynamic programming ("Viterbi algorithm") that was used for plain
convolutional codes is not possible. Instead, an approximate,
iterative algorithm is used which in addition must work with
continuous probabilities rather than discrete values. We refer to the
literature, e.g., the abovementioned work by Berrou et al for details.

Document Actions