Tuesday, 30 March 2010

CharacterSet Fun

I noticed a few feeds were being read in yesrss, which when displayed in any browser, were showing the unfortunate 'unknown character' symbol (as shown).

Now, all feeds are downloaded remotely by the webserver first. This encoding error was traced back to the very root of the initial request. To be more exact, the StreamReader class, which is populated by a HttpWebResponse.GetResponse() was using the default constructor.

I believe this uses the servers' default encoding type, and encodes the response strea in that same format.

Now, after a bit of digging, the HttpWebResponse class has a property called CharacterSet, and as luck may have it, StreamReader also has a constructor that takes the type System.Text.Encoding.

A quick modification of the request code and voila.

var encoding = Encoding.Default;
if (!string.IsNullOrEmpty(response.CharacterSet))
encoding = Encoding.GetEncoding(response.CharacterSet);
catch (Exception ex)
using (var reader = new StreamReader(response.GetResponseStream(), encoding))
Body = reader.ReadToEnd();

Apologies to any users inconvenienced by this.

No comments:

Post a Comment