Overcoming missing Unicode support in PHP


 

The lack of Unicode support in PHP is displeasing, but there are workarounds that allow you to develop proper internationalized applications even in PHP. The first problem you have to solve is proper representation of Unicode data. PHP uses so-called binary strings — in PHP, a string is not a string of Unicode characters, but rather a sequence of bytes. You can internally store all strings in UTF-8 encoding and make sure that all input to and output from the script is properly encoded and decoded.

In theory, you can use other encodings than UTF-8, but UTF-8 creates less trouble than other systems. Many PHP libraries already expect that strings are encoded in UTF-8, including all functions working with XML and the newly added intl library. To smoothly work with UTF-8-encoded strings, it is best to encode characters in UTF-8 and send output from scripts in UTF-8.

Still, turning everything into UTF-8 does not solve anything. If you encode a Latin character with an accent or a non-Latin character in UTF-8, you will obtain two, three, of four bytes, which confuses PHP string functions that compute string length or work with substrings. Listing 1 demonstrates this problem.

Listing 1. Problems related to improper Unicode support in PHP

<?php

Header("Content-type: text/plain;charset=utf-8");

$text["en"] = "The Hitchhiker\'s Guide to the Galaxy";
$text["es"] = "Guía del autoestopista galáctico";
$text["cs"] = "Stopařův průvodce po Galaxii";
$text["ru"] = "Путеводитель хитч-хайкера по Галактике";
$text["ja"] = "銀河ヒッチハイク・ガイド";

foreach($text as $lang => $t)
{
echo $lang, ": ", $t, " (", strlen($t), " vs. ", mb_strlen($t, "utf-8"), ")\\n";
}
?>
No comments

Enter your email address:

Delivered by FeedBurner

OR

 Subscribe in a reader

 
jQuery UI provides a comprehen
 
Program Plan   I drafted a p
 
I present to you my skills, ac
 
Introduction One of the issue
 
If you are a PHP developer and
 
cURL is a great tool to help y
 
cformsII cforms is a powerful
 
  The lack of Unicode su
 
History PHP-GTK was origina
 
Performance on the web is stra
 
Listen t
 
What\'s the number one cost in
 
When you\'re discussing the In
 
Classe
 
A service-oriented architectur
 
Introduc
 
PHP Crons and Linux Linux has
 
Cross site scripting (XSS) is
 
What Makes a Web 2.0 Applicati
 
As you develop web application
 
Cryptogr
 
Posting
 
Have you
 
Resources The Google API - ht
 
Get Started
 
Output B
 
If you r
 
PHP has some really sweet new
 
There were some new php.ini di
 
In PHP 5 there are some new fu
 
The following code implements
 
The following code snippet imp
 
A fine implementation of the o
 
Exception handling PHP 5 adds
 
Support for dereferencing obje
 
Static members Classes defini
 
Explicit object cloning In or
 
final methods The final keywo
 
Interfaces Gives the ability
 
The new object oriented featur
 
Sometimes its the little thing
 
Consider your file is at locat
 
# If a method can be static, d