c# - How do I ignore the UTF-8 Byte Order Marker in String comparisons? -

- February 15, 2012

i'm having problem comparing strings in unit test in c# 4.0 using visual studio 2010. same test case works in visual studio 2008 (with c# 3.5).

here's relevant code snippet:

byte[] rawdata = getdata(); string data = encoding.utf8.getstring(rawdata);  assert.areequal("constant", data, false, cultureinfo.invariantculture);

while debugging test, data string appears naked eye contain same string literal. when called data.tochararray(), noticed first byte of string data value 65279 utf-8 byte order marker. don't understand why encoding.utf8.getstring() keeps byte around.

how encoding.utf8.getstring() not put byte order marker in resulting string?

update: problem getdata(), reads file disk, reads data file using filestream.readbytes(). corrected using streamreader , converting string bytes using encoding.utf8.getbytes(), should've been doing in first place! help.

well, assume it's because raw binary data includes bom. remove bom after decoding, if don't want - should consider whether byte array should consider bom start with.

edit: alternatively, use streamreader perform decoding. here's example, showing same byte array being converted 2 characters using encoding.getstring or 1 character via streamreader:

using system; using system.io; using system.text;  class test {     static void main()     {         byte[] withbom = { 0xef, 0xbb, 0xbf, 0x41 };         string viaencoding = encoding.utf8.getstring(withbom);         console.writeline(viaencoding.length);          string viastreamreader;         using (streamreader reader = new streamreader                (new memorystream(withbom), encoding.utf8))         {             viastreamreader = reader.readtoend();                    }         console.writeline(viastreamreader.length);     } }

Search This Blog

Soju

c# - How do I ignore the UTF-8 Byte Order Marker in String comparisons? -

Comments

Post a Comment

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -