Description:
------------
htmlspecialchars() and other such functions returns an empty string in PHP 5.4 and above when Latin1 characters are passed. Where in previous versions
Latin1 characters were expected as the default. This makes upgrading to PHP 5.4 difficult because it breaks backward-compatibility. When in fact this
function should continue to work normally. See below.
Expected result:
----------------
You should not need to specify what your character set is for this function as long as ASCII is a subset of the character set. (Examples: UTF-8,
cp1252, etc) as the search and replace behavior is the same for all such character sets. You should only need to specify your specific character set
if ASCII isn't a subset character set so that the search and replace behavior can be adjusted. That would be an improvement and would make legacy use
of htmlspecialchars() that expects it to work on latin1 characters backward compatible. As it is now, most people are writing their own functions for
backward compatibility rather than passing as a parameter that they're still using the Latin1 charset. This is annoying because there's no reason that
this function needs to stop working in the first place.
Actual result:
--------------
this function returns an empty string when latin1 characters are passed. People upgrading to PHP 5.4 wind up with broken code, and are forced to
debug.
Explication:
-------------
But Latin1 is not a subset of UTF-8. You are getting an empty string returned because the string contains a Latin1 character which is invalid in
UTF-8. Most web sites these days work in UTF-8 which means if you are blindly filtering using Latin1 in htmlspecialchars() and outputting UTF-8 as
most sites do these days, you have a potential security hole. That was the reason for this change. |