GNU recode, version 3.3

Go to the previous, next section.

Overall organization

The main driver constructs, while initializing all conversion modules, a table giving all the conversion routines available (single steps) and for each, the starting charset and the ending charset. If we consider these charsets as being the nodes of a directed graph, each single step may be considered as oriented arc from one node to the other. A cost is attributed to each arc: for example, a high penality is given to single steps which are prone to loosing characters, a low penality is given to those which need studying more than one input character for producing an output character, etc.

Given a starting code and a goal code, recode computes the most economical route through the elementary recodings, that is, the best sequence of conversions that will transform the input charset into the final charset. To speed up execution, recode looks for subsequences of conversions which are simple enough to be merged, it then dynamically creates new single steps, of course, use them.

A double step is a sequence of two single steps, the output of the first being the special charset rfc1345 (which is not directly available to the user), the input of the second single step being also rfc1345. A special machinery dynamically produces efficient, reversible, mergeable single steps out of these double steps.

The main part of recode is written in C, as are most single steps. A few single steps need to recognize sequences of multiple characters, they are often better written in flex.

Go to the previous, next section.