lunes, 1 de septiembre de 2008

Charsets encodings (latin, UTF, ISO) en apache, mysql, php, etc

rfc2376 acerca de los documentos XML:
"the use of the charset parameter is STRONGLY RECOMMENDED...
"UTF-8" [RFC-2279] is the recommended value, representing the UTF-8 charset.
UTF-8 is supported by all conforming XML processors [REC-XML]"

En apache:
AddDefaultCharset On enables a default charset of iso-8859-1
http://httpd.apache.org/docs/2.2/mod/core.html

En Mysql:
By default, MySQL uses the latin1 (cp1252 West European) character set and the latin1_swedish_ci collation that sorts according to Swedish/Finnish rules. These defaults are suitable for the United States and most of Western Europe.
http://mysql2.mirrors-r-us.net/doc/refman/5.0/en/character-sets.html

latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set. This means it is the same as the official ISO 8859-1
http://dev.mysql.com/doc/refman/5.5/en/charset-we-sets.html

Internet Explorer:
Microsoft Internet Explorer uses the character set specified for a
document to determine how to translate the bytes in the document into
characters on the screen or on paper. By default, Internet Explorer uses
the character set specified in the HTTP content type returned by the
server to determine this translation. If this parameter is not given,
Internet Explorer uses the character set specified by the meta element
in the document. It uses the user's preferences if no meta element is
specified.
http://msdn.microsoft.com/en-us/library/aa752010(VS.85).aspx

2 comentarios:

pbustos dijo...

En /etc/httpd/conf/httpd.conf:

# Specify a default charset for all pages sent out. This is
# always a good idea and opens the door for future internationalisation
# of your web site, should you ever want it. Specifying it as
# a default does little harm; as the standard dictates that a page
# is in iso-8859-1 (latin1) unless specified otherwise i.e. you
# are merely stating the obvious. There are also some security
# reasons in browsers, related to javascript and URL parsing
# which encourage you to always set a default char set.

pbustos dijo...

JSP también tiene a latin1 como el default:

contentType="mimeType [; charset=characterSet ]" |

"text/html;charset=ISO-8859-1"

The MIME type and character encoding the JSP page uses for the response. You can use any MIME type or character set that are valid for the JSP container. The default MIME type is text/html, and the default character set is ISO-8859-1.

Ref: http://java.sun.com/products/jsp/syntax/2.0/syntaxref2010.html