c# - How do I ignore the UTF-8 Byte Order Marker in String comparisons? -
i'm having problem comparing strings in unit test in c# 4.0 using visual studio 2010. same test case works in visual studio 2008 (with c# 3.5).
here's relevant code snippet:
byte[] rawdata = getdata(); string data = encoding.utf8.getstring(rawdata); assert.areequal("constant", data, false, cultureinfo.invariantculture); while debugging test, data string appears naked eye contain same string literal. when called data.tochararray(), noticed first byte of string data value 65279 utf-8 byte order marker. don't understand why encoding.utf8.getstring() keeps byte around.
how encoding.utf8.getstring() not put byte order marker in resulting string?
update: problem getdata(), reads file disk, reads data file using filestream.readbytes(). corrected using streamreader , converting string bytes using encoding.utf8.getbytes(), should've been doing in first place! help.
well, assume it's because raw binary data includes bom. remove bom after decoding, if don't want - should consider whether byte array should consider bom start with.
edit: alternatively, use streamreader perform decoding. here's example, showing same byte array being converted 2 characters using encoding.getstring or 1 character via streamreader:
using system; using system.io; using system.text; class test { static void main() { byte[] withbom = { 0xef, 0xbb, 0xbf, 0x41 }; string viaencoding = encoding.utf8.getstring(withbom); console.writeline(viaencoding.length); string viastreamreader; using (streamreader reader = new streamreader (new memorystream(withbom), encoding.utf8)) { viastreamreader = reader.readtoend(); } console.writeline(viastreamreader.length); } }
Comments
Post a Comment