Overcoming missing Unicode support in PHP


 

The lack of Unicode support in PHP is displeasing, but there are workarounds that allow you to develop proper internationalized applications even in PHP. The first problem you have to solve is proper representation of Unicode data. PHP uses so-called binary strings — in PHP, a string is not a string of Unicode characters, but rather a sequence of bytes. You can internally store all strings in UTF-8 encoding and make sure that all input to and output from the script is properly encoded and decoded.

In theory, you can use other encodings than UTF-8, but UTF-8 creates less trouble than other systems. Many PHP libraries already expect that strings are encoded in UTF-8, including all functions working with XML and the newly added intl library. To smoothly work with UTF-8-encoded strings, it is best to encode characters in UTF-8 and send output from scripts in UTF-8.

Still, turning everything into UTF-8 does not solve anything. If you encode a Latin character with an accent or a non-Latin character in UTF-8, you will obtain two, three, of four bytes, which confuses PHP string functions that compute string length or work with substrings. Listing 1 demonstrates this problem.

Listing 1. Problems related to improper Unicode support in PHP

<?php

Header("Content-type: text/plain;charset=utf-8");

$text["en"] = "The Hitchhiker's Guide to the Galaxy";
$text["es"] = "Guía del autoestopista galáctico";
$text["cs"] = "Stopařův průvodce po Galaxii";
$text["ru"] = "Путеводитель хитч-хайкера по Галактике";
$text["ja"] = "銀河ヒッチハイク・ガイド";

foreach($text as $lang => $t)
{
echo $lang, ": ", $t, " (", strlen($t), " vs. ", mb_strlen($t, "utf-8"), ")\n";
}
?>
No comments

Enter your email address:

Delivered by FeedBurner

OR

 Subscribe in a reader

 
Latest Blogs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Tips for optimizing php code by PHP Expert Important SERVER Variables in PHP - By PHP Expert Improved Error Messages in PHP 5 - By PHP Expert New Object Oriented Features - By PHP Expert New Object Oriented Features - By PHP Expert New Object Oriented Features - By PHP Expert New Object Oriented Features - By PHP Expert New Object Oriented Features - By PHP Expert New Object Oriented Features - By PHP Expert New Object Oriented Features - By PHP Expert Object Overloading in PHP 5 Persistable Classes - By PHP Expert Dynamic Getter/Setter Methods - By PHP Expert New Functions in PHP 5 New Directives - By PHP Expert Exception Handling - By PHP Expert Password Encryption in PHP - By PHP Expert Output Buffering in PHP - By PHP Expert Page Excerpts Using CURL - By PHP Expert Quick and Easy Google Site Search - By PHP Expert Always Be Notified When Google Crawls Your Site - By PHP Expert How to POST Form Data using CURL - By PHP Expert Cryptography for web developers - By PHP Expert Dynamically Loading JavaScript Files - by PHP Expert What is Web 2.0 - By PHP Expert