Author | Message | Time |
---|---|---|
TehUser | Something that will probably come up if you write network code in .NET is converting a string to bytes. I've mainly seen this done with the following code: [code]byte[] byteBuffer = Encoding.ASCII.GetBytes(stringBuffer);[/code] This, as it turns out, can be a bad thing. For those of us not intimately familiar with the ASCII standard, it turns out that ASCII characters only use 7 bits. What does this mean for your conversion? It means that any byte that is too high to be displayed in 7 bits (0xFF, for example) will be converted into 0x3F, the question mark. This can be particularly confusing when you're trying to discover why, when you send a packet with 0xFF at the front, you end up disconnected all of the time. So, the simplest solution that I have found is to write your own function to convert strings to byte arrays. | June 18, 2005, 10:56 PM |
K | I had this problem myself. I worked around it by using Encoding.Default which, at least on my system, appears to be ANSI encoding, although it probably varies by system. | June 18, 2005, 11:14 PM |
kamakazie | Probably best to use Encoding.UTF8 at least for Battle.net. Why would you be insert high-ascii characters into a string anyways? | June 19, 2005, 5:53 PM |
TehUser | Because it's not for Battle.net. | June 20, 2005, 2:29 AM |
kamakazie | [quote author=TehUser link=topic=11889.msg116588#msg116588 date=1119234579] Because it's not for Battle.net. [/quote] The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes. | June 20, 2005, 5:00 AM |
JoeTheOdd | 0xFF is part of the Battle.net protocol header. | June 24, 2005, 12:47 AM |
Myndfyr | [quote author=0x4A6F655B7838365D link=topic=11889.msg117114#msg117114 date=1119574071] 0xFF is part of the Battle.net protocol header. [/quote] Yes, but that's irrelevant because using a string as a buffer is unintelligent. | June 24, 2005, 2:07 AM |
dRAgoN | [code] Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte() Dim bytOut(inBuf.Length - 1) As Byte Dim intI As Integer For intI = 0 To (inBuf.Length - 1) bytOut(intI) = CByte(Asc(Mid(inBuf, intI + 1))) Next Return bytOut End Function Private Function EncodeByteArrayToString(ByVal inBuf As Byte(), ByVal numByts As Integer) As String Dim bytOut As String Dim intI As Integer For intI = 0 To (numByts - 1) bytOut += Chr(Val(inBuf(intI))) Next Return bytOut End Function[/code] You could consider doing something like this instead of useing that junk ms class, that class will only give you your bytes for character value range between 0 and 128. | July 10, 2005, 10:49 PM |
Myndfyr | No, that *method* of that *instance* of that class will only give you values under 128. Using a proper instance, you could do it the right way. [quote author=dxoigmn link=topic=11889.msg116603#msg116603 date=1119243606] [quote author=TehUser link=topic=11889.msg116588#msg116588 date=1119234579] Because it's not for Battle.net. [/quote] The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes. [/quote] I ignored this question before because I wasn't sure what you meant. However, if you're using Unicode characters, you can have characters over 128 that would be lost using Encoding.ASCII. That's why you should using Encoding.Unicode, or Encoding.UTF8, or the proper encoding based on the character set in use. | July 10, 2005, 11:58 PM |
kamakazie | [quote author=MyndFyre link=topic=11889.msg120066#msg120066 date=1121039935] No, that *method* of that *instance* of that class will only give you values under 128. Using a proper instance, you could do it the right way. [/quote] Another thing to add: letting the framework do the conversion is smarter since there are so many things you can mess up with all the different types of encodings there are out there. Kinda of off-topic but the same can be said for crypto. | July 11, 2005, 1:52 AM |
dRAgoN | [code]Imports System Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte() Dim bytOut(inBuf.Length - 1) As Byte Dim intI As Integer For intI = 0 To (inBuf.Length - 1) bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0) Next Return bytOut End Function[/code] Another solution which would probably be better off being used than the one I posted above. Was no need to edit the post above since it works too, but it's not thinking in the .net way (I guess thats a better way of saying it heh). | July 14, 2005, 9:07 PM |
Myndfyr | No, I)ragon, you're still wrong, even more so now than before. 1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length. .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length. Natively, .NET String objects are stored in Unicode, with each character having two bytes. You use the .Length property, not .Length - 1. 2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255. Unicode characters over 255 will be lost. 3.) Why would you do this at all? System.Text.Encoding provides built-in managed support for all text encodings natively supported by Windows. Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want. Stop trying to fight it! | July 14, 2005, 10:37 PM |
dRAgoN | [quote author=MyndFyre link=topic=11889.msg120742#msg120742 date=1121380630] No, I)ragon, you're still wrong, even more so now than before. 1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length. .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length. Natively, .NET String objects are stored in Unicode, with each character having two bytes. You use the .Length property, not .Length - 1. 2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255. Unicode characters over 255 will be lost. 3.) Why would you do this at all? System.Text.Encoding provides built-in managed support for all text encodings natively supported by Windows. Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want. Stop trying to fight it! [/quote] Enlighten me on how what I did is "wrong" again, since I have passed every byte through this as a string starting from char 0x00 ending at char 0xFF all of which being successfull same with the other function above the one I just posted. | July 14, 2005, 11:53 PM |
kamakazie | [quote author=MyndFyre link=topic=11889.msg120742#msg120742 date=1121380630] No, I)ragon, you're still wrong, even more so now than before. 1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length. .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length. Natively, .NET String objects are stored in Unicode, with each character having two bytes. You use the .Length property, not .Length - 1. [/quote] The .Length -1 is due to the inane fact (and one of a few nuances I have with VB) that when you define an array you tell it what the upper index is, not the number of elements in the array. Arrays in VB are zero-based though! | July 15, 2005, 12:54 AM |
Myndfyr | Ahh. K, I was wrong about the length thing. I)ragon, I whipped up this program to demonstrate the differences in output when you have an extended character set in use: [code] Imports System.Text Module Module1 Const JPText As String = "これはひもである高価値非英国の特性を使用する。" Sub Main() Console.WriteLine("Output string: {0}", JPText) Console.WriteLine("String length: {0}", JPText.Length) Dim dragonBytes() As Byte dragonBytes = EncodeStringToByteArray(JPText) Console.WriteLine("Dragon's method byte array length: {0}", dragonBytes.Length) Console.WriteLine("|)ragon's method byte array output:") WriteByteArray(dragonBytes) Console.WriteLine("I)ragon's method returning to a string: {0}", _ Encoding.Default.GetString(dragonBytes)) ' I used Encoding.Default because it supports character values ' up to 255. Console.WriteLine() Dim myndBytes() As Byte myndBytes = CorrectEncoding(JPText) Console.WriteLine("MyndFyre's method (Unicode) byte array length: {0}", myndBytes.Length) Console.WriteLine("MyndFyre's method (Unicode) array output:") WriteByteArray(myndBytes) Console.WriteLine("MyndFyre's method (Unicode) returning to a string: {0}", _ Encoding.Unicode.GetString(myndBytes)) Console.WriteLine() myndBytes = CorrectEncoding(JPText, Encoding.UTF8) Console.WriteLine("MyndFyre's method (UTF-8) byte array length: {0}", myndBytes.Length) Console.WriteLine("MyndFyre's method (UTF-8) array output:") WriteByteArray(myndBytes) Console.WriteLine("MyndFyre's method (UTF-8) returning to a string: {0}", _ Encoding.UTF8.GetString(myndBytes)) Console.Read() End Sub 'I)ragon's function Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte() Dim bytOut(inBuf.Length - 1) As Byte Dim intI As Integer For intI = 0 To (inBuf.Length - 1) bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0) Next Return bytOut End Function 'MyndFyre's functions - note increased flexibility Private Function CorrectEncoding(ByVal inBuf As String, ByVal encStyle As Encoding) As Byte() Return encStyle.GetBytes(inBuf) End Function Private Function CorrectEncoding(ByVal inBuf As String) As Byte() Return CorrectEncoding(inBuf, Encoding.Unicode) End Function Private Sub WriteByteArray(ByVal buffer() As Byte) Dim go As Boolean = True Dim start As Integer = 0 Do Until go = False go = WriteByteLine(buffer, start) start = start + 16 Loop End Sub Private Function WriteByteLine(ByVal buffer() As Byte, _ ByVal index As Integer) As Boolean Dim i As Integer Dim res As Boolean = True For i = index To index + 15 If i < buffer.Length Then Console.Write("{0:x2} ", buffer(i)) Else Console.Write(" ") res = False End If If i = index + 7 Then Console.Write(" ") End If Next Console.Write(" ") For i = index To index + 15 Dim b As Byte If i < buffer.Length Then b = buffer(i) Else b = &H20 'space End If Dim c As Char c = ChrW(b) If Char.IsLetterOrDigit(c) Or Char.IsPunctuation(c) Or Char.IsSymbol(c) Or c = " " Then Console.Write(c.ToString()) Else Console.Write(".") End If If i = index + 7 Then Console.Write(" ") End If Next Console.WriteLine() Return res End Function End Module [/code] Note that to save this in VB.NET, you have to go to Save As... and click the arrow on the "Save" button in the dialog, and select "Save with Encoding." For the purposes of this project, I chose Unicode (UTF-8 with Signature). This is what is output (note that Console programs do not support the extended character set and hence display ??????????? when the Japanese text is displayed): [pre] Output string: ??????????????????????? String length: 23 Dragon's method byte array length: 23 |)ragon's method byte array output: 53 8c 6f 72 82 67 42 8b d8 a1 24 5e f1 fd 6e 79 S.or.gB. O¡$^ñyny 27 92 7f 28 59 8b 02 '..(Y.. I)ragon's method returning to a string: SOor,gB<O¡$^ñyny''⌂(Y<☻ MyndFyre's method (Unicode) byte array length: 46 MyndFyre's method (Unicode) array output: 53 30 8c 30 6f 30 72 30 82 30 67 30 42 30 8b 30 S0.0o0r0 .0g0B0.0 d8 9a a1 4f 24 50 5e 97 f1 82 fd 56 6e 30 79 72 O.¡O$P^. ñ.yVn0yr 27 60 92 30 7f 4f 28 75 59 30 8b 30 02 30 '`.0.O(u Y0.0.0 MyndFyre's method (Unicode) returning to a string: ??????????????????????? MyndFyre's method (UTF-8) byte array length: 69 MyndFyre's method (UTF-8) array output: e3 81 93 e3 82 8c e3 81 af e3 81 b2 e3 82 82 e3 a..a..a. _a..a..a 81 a7 e3 81 82 e3 82 8b e9 ab 98 e4 be a1 e5 80 .§a..a.. é«.ä.¡å. a4 e9 9d 9e e8 8b b1 e5 9b bd e3 81 ae e7 89 b9 ☼é..è.±å ..a.rç.. e6 80 a7 e3 82 92 e4 bd bf e7 94 a8 e3 81 99 e3 æ.§a..ä. ¿ç."a..a 82 8b e3 80 82 ..a.. MyndFyre's method (UTF-8) returning to a string: ??????????????????????? [/pre] As you can see, you lost your data when you encoded this string with your method (by the way, this is what happened when I decoded your byte array with ASCII, Unicode, UTF-7, and UTF-8, respectively): [pre] I)ragon's method returning to a string: S♀or☻gB♂X!$^q}ny'↕⌂(Y♂☻ I)ragon's method returning to a string: ??????????? I)ragon's method returning to a string: S?or?gB?O¡$^ñyny'?⌂(Y?☻ I)ragon's method returning to a string: SorgB?$^ny'⌂(Y☻ [/pre] I hope you see why your method would cause problems for internationalization, and agree that it would just be better if we let the professionals who have already done the work for us manage our strings. (By the way, that Japanese text came from Google where I translated it from: "This is a string that uses higher-value non-English characters.") | July 15, 2005, 2:32 AM |
dRAgoN | Well I was under the impression this was for a server so lets see this, this way then. When your viewing the data recieved via the socket what are you going to see. [pre]take one of your lines here 53 30 8c 30 6f 30 72 30 82 30 67 30 42 30 8b 30 S0.0o0r0 .0g0B0.0 d8 9a a1 4f 24 50 5e 97 f1 82 fd 56 6e 30 79 72 O.¡O$P^. ñ.yVn0yr 27 60 92 30 7f 4f 28 75 59 30 8b 30 02 30 '`.0.O(u Y0.0.0[/pre] true or false your going to have data recieved in this format. | July 15, 2005, 5:26 PM |