Encoding Issue
Character encoding issue - Generating feed from database
ID: 613Status: ClosedVersion: N/AReport Date: September 9, 2011Product: FeedWriter Class
Description

A fix to resolve the original issues relating to this bug identified by Ziggi was implemented in the v3.1 update.  Information about the update can be viewed on the blog post: Php FeedWriter Version 3.1 Update

The fix included the encoding char-set (if supplied to the FeedWriter constructor) in htmlentities statements called from the writeConstruct() function of the FeedWriter class.  The Character encoding was previously only included as the encoding of the XML document.

[update: 18th September 2011]

After some testing, it appears that it is still possible to break a feed by including certain characters in the content, and by supplying an incorrect character set to the FeedWriter constructor.  The issue now is that Php warnings and error are not currently caught when incompatible characters/char-sets are used.

Error handing is to be incorporated into the class (FeedWrier::writeConstruct) to help prevent an invalid XML document being generated where incompatible characters are supplied with an incorrect character set.

In some cases the XMLWriter object throws an exception when attempting to output characters that are not supported by a particular character set, and in other cases the feed XML is produced as expected, but proceeds a Php warning regarding modifying the header content type after information has already been sent to the browser.  The configuration of the web server / Php also appears to play a part in the latter.  In either case, error handling will help to resolve these issues, but additional information will also be posted to the blog and included in the related documentation to help assist with issues caused by the encoding of characters.

Disabling warnings/errors may also help if experiencing header content modification warnings, which in most cases won’t impact the output of the feed.

For more information about the data types and feed configuration used to help control output of a feed please see this post on the blog, and check back at soon for additional related entries.


The initial bug, Found by Ziggi (Thanks for pointing this one out):

In FeedWriter.php, lines 1703 and 1706 both make use of htmlentities() function when writing the xml data.  Depending on the character encoding of data stored in a database, the feed may break if the encoding charset is not included.

 

Example: For ‘UTF-8’ encoding, the following lines would need to be changed:

line 1703:

$writer->writeCData(htmlentities($data));

Change to:

$writer->writeCData(htmlentities($data, ENT_COMPAT, 'utf-8'));

 

line 1706:

$writer->writeRaw(htmlentities($data));

Change to:

$writer->writeCData(htmlentities($data, ENT_COMPAT, 'utf-8'));

 

Where ENT_COMPAT in this case converts double quotes, but retains single quotes.  Other options are ENT_QUOTES (convert double and single quotes), ENT_NOQUOTES (leave quotes intact) and ENT_IGNORE (discard invalid code). Please see the htmlentities Php Documentation for more information if required ( http://php.net/manual/en/function.htmlentities.php ).

——————–

This bug will be fixed by allowing the character encoding to be set when configuring the feed.  A default charset will be used if not explicitly changed (likely UTF-8 or ISO-8859-1).  The fix will be included in the next minor update (ETA: 10th September 2011), along with some other minor bug fixes.

No comments yet

Leave a Reply

You must be logged in to post a comment.