Sunday, April 3, 2011

iconv and strtr combination

I had to decode input strings that some times were encoded using a known charset (ex: ISO-8859-2) and other times an unknown encoding.

I must apply for the first case a conversion from ISO-8859-2 to UTF-8 using iconv.
$t = iconv("ISO-8859-2", "UTF-8//TRANSLIT//IGNORE", $t);

For the unknown encoding I must convert it using strtr.
$pairs = array('ã' => 'ă', 'º' => 'ş'); // for example
$t = strtr($t, $pairs);

Applying both conversions to the input string secventially will affect the output, malforming the characters.

The encoding should be somehow detected before applying any of the given conversions. But it is difficult and time consuming to attempt to determine the encoding of the input.

SOLUTION:
Apply strtr conversion first and compare the output with the input and if they are different, that's the right conversion, if not, the iconv conversion mut be applied.

function convChars($t) {
    $pairs = array('ã' => 'ă', 'º' => 'ş'); // for example
    $t2 = strtr($t, $pairs);
    if($t2 != $t) {
        $t = $t2;
    } else {
        $t = iconv("ISO-8859-2", "UTF-8//TRANSLIT//IGNORE", $t);
    }
    return $t;
}

No comments: