# Convolution Codes II

Convol2.txt — Plain Text, 16 KB (16479 bytes)

## File contents

Decoding of convolutional codes ------------------------------- As the convolutional codes show no (known) algebraic structure beyond linearity, one cannot exploit such structure for decoding. As a result, one uses general search methods such as dynamic programming, branch&bound etc to find an input that haas minimal Hamming distance from the received word. These methods are often generalized to probabilities. One then presupposes a certain distribution of channel disturbances and sometimes of inputs and decodes to that input which is most likely given the received data (Maximum Likelihood). In some cases this constitutes an advantage over algebraic decoding procedures used with block codes. Furthermore, these methods can be generalised to multi-level inputs and outputs, e.g. different signal or voltage levels (oft-input, soft output (SISO)). In this course, we only look at the binary case and Hamming distance, though. In practice one does not encode the entire information as a single word but partitions it into frames of a certain length, e.g. 256bits that are transmitted and then decoded as a whole. When coding the next frame the encoder recommences in the initial state henceforth denoted 0s. Decoding becomes easier if the final state the encoder is in is known to the decoder. For this reason the encoder is placed into the zero state by appending M zeros to the message where M is the length of influence. Thus, with a frame size 256 Bit and a length of influence of 7 only 249 bits can be transmitted which are then followed by 7 zeroes. Definition: Consider an (n,k)-convolutional encoder with M*k shift registers. Its *state set* is Q={0,1}^{M*k}. If q:Q is a state and xs : ({0,1}^k)^* is an input then we denote delta(q,xs) the state reached after encoding xs when starting in state q. We denote out(q,xs):({0,1}^n)^* the resulting output. We denote xs.xs' the concatenation of sequences xs and xs'. It is clear that delta(q,xs.xs') = delta(delta(q,xs),xs') out(q,xs.xs') = out(q,xs).out(delta(q,xs),xs') Now, for ys:({0,1}^n)^* we define M(q1,q2,ys) = min{d(out(q1,xs),ys) | delta(q1,xs)=q2, |xs|=|ys|} Thus, M(q1,2,ys) is the minimum distance of ys from a codeword provided that encoding started and ended in q1 and q2 respectively. NB: d(x,y) = Hamming distance between u und v. The following recurrence holds: M(q1,q2,()) = if q1=q2 then 0 else oo M(q1,q2,ys.a) = min{M(q1,q,ys) + d(a,out(q,q2,x)) | x:{0,1}^k, q:Q} The following recurrence allows one to actually compute input sequences that witness these minimal distances (or error numbers). D(q1,q2,()) = () D(q1,q2,ys.a) = D(q1,q,ys).x, where q and x are chosen in such a way that M(q1,q,ys)+d(a,out(q,q2,x)) = M(q1,q2,ys.a) = min{M(q1,q,ys) + d(a,out(q,q2,x)) | x:{0,1}^k, q:Q} Decoding thus amounts to computing D(0s,0s,zs) where zs with |zs|=m is the output received. If one knows parts of the input sequence (e.g. at the end of a frame) one can use this fact upon decoding by setting x to the corresponding value. Viterbi - Algorithm ------------------- The Viterbi Algorithm consists of evaluating the above recurrences in a bottom-up iterative fashion using dynamic programming. Thus, given a received message zs with |zs|=m then one tabulates D(0s,q2,ys) und M(0s,q2,ys) for q:Q and ys a prefix of zs. Starting from the entries with |ys|=0 this table can then be successively filled up using the recurrences until finally the desired entry D(0,0,zs) has been found. Of course, in order to save space the M-entries will not contain the entire xs-sequence but only the respective last symbol and a pointer to the corresponding previous M-entry. Regardless of this optimization, the size of the table remains considerable namely O(|Q| * m) = O(2^M * m). The operation of the Viterbi algorithm in the case k=1 an be visualized with the *trellis diagram*. This is a graph with |Q|*(m+1) nodes arranged in m+1 columns of height |Q|=2^L and labelled with (q,i) for q:Q and 0<=i<=m. A dotted/solid line links (q1,i) and (q2,i+1), if delta(q1,0)=q2 / delta(q1,1)=q2. In each case the edges are labelled with the corresponding output. For decoding one places the received message under the trellis diagram and labels each edge with the Hamming distance of the output segment (of size n) found there and the corresponding segment of the received message. Then the trellis diagram can be filled with the D- and M-values starting from the left. Example (from Heise-Quattrocchi) for the encoder corresponding to ( 1+D^2 ) G1 = ( ) (1+D+D^2) and m=11, thus m-L=9. The entry 111010001 is padded with two zeroes and then encoded to the sequence 11 10 01 10 00 01 11 00 11 01 11 We assume that the received message is 11 00 11 10 00 11 11 00 11 00 11 ^ ^ ^ ^ with errors having occurred at the marked positions. Here is the corresponding trellis diagram: | 11 00 11 10 00 11 11 00 11 00 11 |--------------------------------------------------------------------- | [00]|*..2..*..0..*..2..*..1..*..0..*..2..*..2..*..0..*..2..*..0..*..2..* |\ \ \ ; \ ; \ ; \ ; \ ; \ ; ; ; | 0 2 0 ; 1 ; 2 ; 0 ; 0 ; 2 ; ; ; | \ \ ;\ ;\ ;\ ;\ ;\ ;\ ; ; [10]| *. *. 0 *. 1 *. 2 *. 0 *. 0 *. 2 *. 0 *. 2 *. 0 | |. |. 2 |. 1 |. 0 |. 2 |. 2 |. 0 |. 2 . . | \ . \;./ \;./ \;./ \;./ \;./ \;./ \;./ ;. ;. | \ 1 ;\/ 1 ;\/ 2 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ; 1 [01]| \ *-\ .*-\ .*-\ .*-\ .*- \ .*- \ .*-\ .* .* | \ \, \, \, \, \, \, \, , | 1 \ 0\ 1 \ 1 \ 1 \ 1 \ 1 \ 1 | 1|. 1|. 0. 1|. 1|. 1|. 1|. 1|. [11]| *--1--*--2--*--1--*--1--*--1--*--1--*--1--* The stars are now replaced with the corresponding D-values starting from the left and shown here for the first three columns: | 11 00 11 10 00 11 11 00 11 00 11 |--------------------------------------------------------------------- | [00]|*..2..2..0..2..2..1..1..*..0..*..2..*..2..*..0..*..2..*..0..*..2..* |\ \ \ ; \ ; \ ; \ ; \ ; \ ; ; ; | 0 2 0 ; 1 ; 2 ; 0 ; 0 ; 2 ; ; ; | \ \ ;\ ;\ ;\ ;\ ;\ ;\ ; ; [10]| 0. 4. 0 2. 1 *. 2 *. 0 *. 0 *. 2 *. 0 *. 2 *. 0 | |. |. 2 |. 1 |. 0 |. 2 |. 2 |. 0 |. 2 . . | \ . \;./ \;./ \;./ \;./ \;./ \;./ \;./ ;. ;. | \ 1 ;\/ 1 ;\/ 2 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ;\/ 1 ; 1 [01]| \ 1-\ .2-\ .*-\ .*-\ .*- \ .*- \ .*-\ .* .* | \ \, \, \, \, \, \, \, , | 1 \ 0\ 1 \ 1 \ 1 \ 1 \ 1 \ 1 | 1|. 1|. 0. 1|. 1|. 1|. 1|. 1|. [11]| 1--1--2--2--*--1--*--1--*--1--*--1--*--1--* At that point the codeword can be read off. In this case, a complete reconstruction was possible. Decoding with recursive encoders -------------------------------- For recursive encoders one needs a possibility for resetting them to the zero state. This can be done using a special gate that switches the feedback offso that the incoming zero tail then resets the encoder like in the non recursive case. The effect of this gate must then be taken into account in the final columns of the trellis diagram. The Viterbi algorithm is relatively robust against such modifications since it makes hardly any assumptions about the structure of the encoder. It would work equally well on encoders based on an arbitrary finite state machine including those that do not stem from fed-back shift registers. In particular, it makes no assumption on linearity. Hidden Markov ------------- The Viterbi algorithm is also used in order to estimate the sequence of states in a hidden Markov model. Example: suppose the weather can be rainy, sunny, or cloudy, and there are activities cycling, shopping, television. For each weather situation we have a probability depending on the *previous* weather situation. For example, Pr(rainy|rainy) = 0.7, Pr(sunny|rainy) = 0.1, Pr(cloudy|rainy) = 0.2, Pr(rainy|sunny) = 0.3... For each activity we have a probability depending on the weather situation, e.g. Pr(rainy|cycling)=0.1, Pr(sunny|cycling)=0.5, etc. Now, given an observed sequence of activities we want to estimate the most likely sequence of weather situations (which are assumed unknown). E.g. a friend tells you on the phone what they have been doing the last few days and you want to draw some conclusions about the weather during that time. "Serious" applications exist in speech recognition: "weather" <-> phoneme uttered, "activity" <-> signal received. Fano - Algorithm ---------------- The space usage of the Viterbi algorithm of O(2^L*m) becomes impracticable with large L. Fano's algorithm uses less space but more time. In its simplest version it computes for each possible entry the Hamming distance of its encoding to the received message and takes the minimum. In practice one just searches for an input such that its encoding has a Hamming distance less or equal to a fixed bound h. All entries with a prefix that in itself yields a Hamming distance > h can be discarded. Thus, if one works off the possible entries using a depth-first search on a binary tree of depth m-L then a subtree whose prefix exceeds the bound may be discarded. Here is a recursive implementation of Fano's algorithm for the case k=1. Fano(zs,xs,d,q,h) = /* returns entry us extending xs such that the encoding of xs.us has distance <= h from zs. Fail if no such us exists.*/ /* d is the distance of the encoding of xs from the corresponding prefix of zs. q is the state reached after working off xs. These arguments are a function of zs and xs and added only to improve efficiency. */ if (d>h) return Fail; t := the distance between out(q,0) and the corresponding part of zs. /* t:{0..n} */ res0 := Fano(zs,xs.0,d+t,delta(q,0),h); if (|xs| >= m - L) return res0; if (res0 != Fail) return res0; t := the distance between out(q,1) and the corresponding part of zs. res1 := Fano(zs,xs.1,d+t,delta(q,1),h); return res1; /* End of algorithm */ In order to find the best possible decoding one calls Fano(zs,[],h) for h=0,1,2,... until for the first time a proper value (not Fail) is returned. In practice one uses bigger steps, e.g. h=1,2,4,7,11,... and obtains a more efficient but suboptimal decoding. Tailbiting ---------- Appending the M so-called tail bits can lead to unacceptable waste in the case of a long length of influence M. In this case, one may either not use tail bits which results in having to estimate the final state. Thus, one chooses q so that D(0s,q,zs) is minimal. Alternatively, one can use circular termination, also known as tailbiting, which we shall now describe. This method consists of choosing the initial state of the encoder depending on the frame to be coded in such a way that the final state will be equal to this initial one. Thus, in order to transmit us one first determines q such that delta(q,us)=q and then sends out(q,us). Of course this requires processing the input twice and thus appropriate buffering. The advantage of tailbiting as opposed to estimating the final state as described above is that with the latter method the error correction properties are inhomogeneously distributed across the frame. Errors in the last few bits of the frame cannot be corrected very well since not much is known about the states surrounding them. With tailbiting the error correction properties are spread more evenly. It would be nice to confirm this experimentally. In order to calculate the circulation state q (depending us) and also to ascertain its existence we use linearity. Writing states as column vectors one finds an L x L matrix A and an L x k matrix B each with entries from GF(2)={0,1} such that: delta(q,x) = Aq+Bx Thus, for xs = [x_0;...;x_{m-1}] one has: delta(q,xs) = A^m q + sum(A^{j-1}Bx_{m-j}, j=1..m) E.g.: delta(q,[x_0;x_1;x_2]) = A(A(Aq+Bx_0)+Bx_2)+Bx_2 = AAAq + AABx_0 + ABx_1 + Bx_2 Now q = delta(q,xs) if and only if (E-A^m)q = sum(A^{j-1}Bx_{m-j}, j=1..m) with E the unit matrix. Since sum(A^{j-1}Bx_{m-j}, j=1..m) = delta(0s,[x_0,...,x_{m-1}]), one gets the circulation state q according to the formula q := (E-A^m)^{-1} delta(0s,xs). If (E-A^m) fails to be invertible, tailbiting does not work for the parameters A and m. The state permutation q_1 |-> (E-A^m)^{-1}q_1 may be tabulated once and for all as a table of size 2^L*2^L. This then yields the following algorithm for calculating the circulation state: 1) Encode xs starting from 0s. Let q_1 be the resulting final state. 2) Return q = (E-A^m)^{-1}q_1 using the table. The circulation state is uniquely determined by the input. This gives the following decoding method. Compute D(q1,q1,zs) for all states q1 by running the Viterbi algorithm |Q| times simultaneously. Then choose q so, that D(q,q,zs) is minimal and decode to M(q,q,zs). Of course, this method is rather expensive since the Viterbi algorithm must be run |Q| times. For small frame sizes this is not so problematic and it is in those cases that tailbits actually matter. Puncturing of convolutional codes --------------------------------- An (n,k)-convolutional code has rate k/n, in particular for k=1 the rate never exceeds 1/2. To achieve larger rates one can delete certain bits of the frame prior to transmission and replace them with zeroes at the receiver's end. In general, this will incur additional errors (in case the deleted bits were not zero) but this can then be corrected just like the real errors. We thus increase the rate at the expense ofthe error correction properties A glimpse at turbo codes ------------------------ In the beginning of the 90s Glavieux, Berrou, Battail developed a new kind of concatenated convolutional codes with corresponding decoders. Due to the feedback circuitry used in the decoders these were called *turbo codes* and realised hitherto unachieved error correction properties "near the Shannon limit" with nevertheless reasonable computational effort. Here we show a turbo code that appears in the WiFi standard IEEE802.16 (from Berrou: Codes et Turbocodes, Springer 2007). ---------------------------------> x1 | | ----------->(+)--->(+)----> x2 | | | | -[Buff]--o--(+)-o->[D]-o->[D]-o->[D]-o | ^ | | | |--------(+)<----------- -o | | ----------->(+)--->(+)--------> x3 | | | | -[Perm]--(+)-o->[D]-o->[D]-o->[D]-o ^ | | |--------(+)<----------- We recognise two copies of the recursive convolutional encoder G4 from above. The unit [Perm] reads a frame of k=24 bits and then, while reading the next k-bit frame outputs the previous in permuted order with respect to a fixed permutation. The unit [Buff] buffers a k-bit frame, i.e., acts like [Perm] but with the identity permutation. Furthermore, the convolution encoders are initialised using tailbiting. We first notice that the turbo code is still linear so that the crucial parameter is its minimum weight. Intuitively, this should be rather high for if "by accident" we have an input frame for which the G4 encoder produces a low weight output then "it is likely" that after permutation this will no longer be the case so that the other G4 encoder then produces a high weight output and vice versa. The difficulty lies in the decoding. Due to the spatial interaction between the different bit positions the comparatively efficient dynamic programming ("Viterbi algorithm") that was used for plain convolutional codes is not possible. Instead, an approximate, iterative algorithm is used which in addition must work with continuous probabilities rather than discrete values. We refer to the literature, e.g., the abovementioned work by Berrou et al for details.

Document Actions