Valhalla Legends Forums Archive | .NET Platform | Warning: ASCII.GetBytes()

AuthorMessageTime
TehUser
Something that will probably come up if you write network code in .NET is converting a string to bytes.  I've mainly seen this done with the following code:

[code]byte[] byteBuffer = Encoding.ASCII.GetBytes(stringBuffer);[/code]

This, as it turns out, can be a bad thing.  For those of us not intimately familiar with the ASCII standard, it turns out that ASCII characters only use 7 bits.  What does this mean for your conversion?  It means that any byte that is too high to be displayed in 7 bits (0xFF, for example) will be converted into 0x3F, the question mark.  This can be particularly confusing when you're trying to discover why, when you send a packet with 0xFF at the front, you end up disconnected all of the time.  So, the simplest solution that I have found is to write your own function to convert strings to byte arrays.
June 18, 2005, 10:56 PM
K
I had this problem myself.  I worked around it by using Encoding.Default which, at least on my system, appears to be ANSI encoding, although it probably varies by system.
June 18, 2005, 11:14 PM
kamakazie
Probably best to use Encoding.UTF8 at least for Battle.net. Why would you be insert high-ascii characters into a string anyways?
June 19, 2005, 5:53 PM
TehUser
Because it's not for Battle.net.
June 20, 2005, 2:29 AM
kamakazie
[quote author=TehUser link=topic=11889.msg116588#msg116588 date=1119234579]
Because it's not for Battle.net.
[/quote]

The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes.
June 20, 2005, 5:00 AM
JoeTheOdd
0xFF is part of the Battle.net protocol header.
June 24, 2005, 12:47 AM
Myndfyr
[quote author=0x4A6F655B7838365D link=topic=11889.msg117114#msg117114 date=1119574071]
0xFF is part of the Battle.net protocol header.
[/quote]
Yes, but that's irrelevant because using a string as a buffer is unintelligent.
June 24, 2005, 2:07 AM
dRAgoN
[code]    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = CByte(Asc(Mid(inBuf, intI + 1)))
        Next
        Return bytOut
    End Function

    Private Function EncodeByteArrayToString(ByVal inBuf As Byte(), ByVal numByts As Integer) As String
        Dim bytOut As String
        Dim intI As Integer

        For intI = 0 To (numByts - 1)
            bytOut += Chr(Val(inBuf(intI)))
        Next
        Return bytOut
    End Function[/code]
You could consider doing something like this instead of useing that junk ms class, that class will only give you your bytes for character value range between 0 and 128.
July 10, 2005, 10:49 PM
Myndfyr
No, that *method* of that *instance* of that class will only give you values under 128.  Using a proper instance, you could do it the right way.

[quote author=dxoigmn link=topic=11889.msg116603#msg116603 date=1119243606]
[quote author=TehUser link=topic=11889.msg116588#msg116588 date=1119234579]
Because it's not for Battle.net.
[/quote]

The question wasn't assuming you're doing something for battle.net (the previous comment was just a general thought about Battle.net though). I'm just wondering what requires you to put high-ascii characters into a string. Seems like a byte array would be more useful considering that is what you're doing in the end anyways. The question was generated because you said something about sending 0xff and the high-bit being lost. I guess what I was trying to get at is using a String as a buffer is a bad idea and probably best to use a array of bytes.
[/quote]
I ignored this question before because I wasn't sure what you meant.  However, if you're using Unicode characters, you can have characters over 128 that would be lost using Encoding.ASCII.  That's why you should using Encoding.Unicode, or Encoding.UTF8, or the proper encoding based on the character set in use.
July 10, 2005, 11:58 PM
kamakazie
[quote author=MyndFyre link=topic=11889.msg120066#msg120066 date=1121039935]
No, that *method* of that *instance* of that class will only give you values under 128.  Using a proper instance, you could do it the right way.
[/quote]

Another thing to add: letting the framework do the conversion is smarter since there are so many things you can mess up with all the different types of encodings there are out there. Kinda of off-topic but the same can be said for crypto.
July 11, 2005, 1:52 AM
dRAgoN
[code]Imports System

    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0)
        Next
        Return bytOut
    End Function[/code]
Another solution which would probably be better off being used than the one I posted above.
Was no need to edit the post above since it works too, but it's not thinking in the .net way (I guess thats a better way of saying it heh).
July 14, 2005, 9:07 PM
Myndfyr
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255.  Unicode characters over 255 will be lost.

3.) Why would you do this at all?  System.Text.Encoding provides built-in managed support for all text encodings natively supported by Windows.  Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want.

Stop trying to fight it!
July 14, 2005, 10:37 PM
dRAgoN
[quote author=MyndFyre link=topic=11889.msg120742#msg120742 date=1121380630]
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.

2.) By only taking the first byte of a two-byte character value (as BitConverter.GetBytes(char) returns a 2-byte array), you're losing any value over 255.  Unicode characters over 255 will be lost.

3.) Why would you do this at all?  System.Text.Encoding provides built-in managed support for all text encodings natively supported by Windows.  Simply calling Encoding.UTF8.GetBytes, Encoding.Default.GetBytes, or other similar methods will give you what you want.

Stop trying to fight it!
[/quote]
Enlighten me on how what I did is "wrong" again, since I have passed every byte through this as a string starting from char 0x00 ending at char 0xFF all of which being successfull same with the other function above the one I just posted.
July 14, 2005, 11:53 PM
kamakazie
[quote author=MyndFyre link=topic=11889.msg120742#msg120742 date=1121380630]
No, I)ragon, you're still wrong, even more so now than before.

1.) .NET Strings are not null-terminated, and their Length properties reflect their actual length.  .NET String objects are serialized in wide pascal format, with two bytes preceeding the data specifying the length.  Natively, .NET String objects are stored in Unicode, with each character having two bytes.  You use the .Length property, not .Length - 1.
[/quote]

The .Length -1 is due to the inane fact (and one of a few nuances I have with VB) that when you define an array you tell it what the upper index is, not the number of elements in the array. Arrays in VB are zero-based though!
July 15, 2005, 12:54 AM
Myndfyr
Ahh.  K, I was wrong about the length thing.

I)ragon, I whipped up this program to demonstrate the differences in output when you have an extended character set in use:
[code]
Imports System.Text

Module Module1
    Const JPText As String = "これはひもである高価値非英国の特性を使用する。"

    Sub Main()
        Console.WriteLine("Output string: {0}", JPText)
        Console.WriteLine("String length: {0}", JPText.Length)

        Dim dragonBytes() As Byte
        dragonBytes = EncodeStringToByteArray(JPText)
        Console.WriteLine("Dragon's method byte array length: {0}", dragonBytes.Length)
        Console.WriteLine("|)ragon's method byte array output:")
        WriteByteArray(dragonBytes)
        Console.WriteLine("I)ragon's method returning to a string: {0}", _
            Encoding.Default.GetString(dragonBytes))
        ' I used Encoding.Default because it supports character values
        ' up to 255.

        Console.WriteLine()

        Dim myndBytes() As Byte
        myndBytes = CorrectEncoding(JPText)
        Console.WriteLine("MyndFyre's method (Unicode) byte array length: {0}", myndBytes.Length)
        Console.WriteLine("MyndFyre's method (Unicode) array output:")
        WriteByteArray(myndBytes)
        Console.WriteLine("MyndFyre's method (Unicode) returning to a string: {0}", _
            Encoding.Unicode.GetString(myndBytes))

        Console.WriteLine()

        myndBytes = CorrectEncoding(JPText, Encoding.UTF8)
        Console.WriteLine("MyndFyre's method (UTF-8) byte array length: {0}", myndBytes.Length)
        Console.WriteLine("MyndFyre's method (UTF-8) array output:")
        WriteByteArray(myndBytes)
        Console.WriteLine("MyndFyre's method (UTF-8) returning to a string: {0}", _
            Encoding.UTF8.GetString(myndBytes))

        Console.Read()
    End Sub

    'I)ragon's function
    Private Function EncodeStringToByteArray(ByVal inBuf As String) As Byte()
        Dim bytOut(inBuf.Length - 1) As Byte
        Dim intI As Integer

        For intI = 0 To (inBuf.Length - 1)
            bytOut(intI) = BitConverter.GetBytes(inBuf.Chars(intI))(0)
        Next
        Return bytOut
    End Function

    'MyndFyre's functions - note increased flexibility
    Private Function CorrectEncoding(ByVal inBuf As String, ByVal encStyle As Encoding) As Byte()
        Return encStyle.GetBytes(inBuf)
    End Function
    Private Function CorrectEncoding(ByVal inBuf As String) As Byte()
        Return CorrectEncoding(inBuf, Encoding.Unicode)
    End Function

    Private Sub WriteByteArray(ByVal buffer() As Byte)
        Dim go As Boolean = True
        Dim start As Integer = 0
        Do Until go = False
            go = WriteByteLine(buffer, start)
            start = start + 16
        Loop
    End Sub

    Private Function WriteByteLine(ByVal buffer() As Byte, _
        ByVal index As Integer) As Boolean

        Dim i As Integer
        Dim res As Boolean = True

        For i = index To index + 15
            If i < buffer.Length Then
                Console.Write("{0:x2} ", buffer(i))
            Else
                Console.Write("   ")
                res = False
            End If

            If i = index + 7 Then
                Console.Write(" ")
            End If
        Next

        Console.Write("  ")

        For i = index To index + 15
            Dim b As Byte
            If i < buffer.Length Then
                b = buffer(i)
            Else
                b = &H20 'space
            End If

            Dim c As Char
            c = ChrW(b)

            If Char.IsLetterOrDigit(c) Or Char.IsPunctuation(c) Or Char.IsSymbol(c) Or c = " " Then
                Console.Write(c.ToString())
            Else
                Console.Write(".")
            End If

            If i = index + 7 Then
                Console.Write(" ")
            End If
        Next
        Console.WriteLine()
        Return res
    End Function
End Module
[/code]
Note that to save this in VB.NET, you have to go to Save As... and click the arrow on the "Save" button in the dialog, and select "Save with Encoding."  For the purposes of this project, I chose Unicode (UTF-8 with Signature).

This is what is output (note that Console programs do not support the extended character set and hence display ??????????? when the Japanese text is displayed):
[pre]
Output string: ???????????????????????
String length: 23
Dragon's method byte array length: 23
|)ragon's method byte array output:
53 8c 6f 72 82 67 42 8b  d8 a1 24 5e f1 fd 6e 79   S.or.gB. O¡$^ñyny
27 92 7f 28 59 8b 02                               '..(Y..
I)ragon's method returning to a string: SOor,gB<O¡$^ñyny''⌂(Y<☻

MyndFyre's method (Unicode) byte array length: 46
MyndFyre's method (Unicode) array output:
53 30 8c 30 6f 30 72 30  82 30 67 30 42 30 8b 30   S0.0o0r0 .0g0B0.0
d8 9a a1 4f 24 50 5e 97  f1 82 fd 56 6e 30 79 72   O.¡O$P^. ñ.yVn0yr
27 60 92 30 7f 4f 28 75  59 30 8b 30 02 30         '`.0.O(u Y0.0.0
MyndFyre's method (Unicode) returning to a string: ???????????????????????

MyndFyre's method (UTF-8) byte array length: 69
MyndFyre's method (UTF-8) array output:
e3 81 93 e3 82 8c e3 81  af e3 81 b2 e3 82 82 e3   a..a..a. _a..a..a
81 a7 e3 81 82 e3 82 8b  e9 ab 98 e4 be a1 e5 80   .§a..a.. é«.ä.¡å.
a4 e9 9d 9e e8 8b b1 e5  9b bd e3 81 ae e7 89 b9   ☼é..è.±å ..a.rç..
e6 80 a7 e3 82 92 e4 bd  bf e7 94 a8 e3 81 99 e3   æ.§a..ä. ¿ç."a..a
82 8b e3 80 82                                     ..a..
MyndFyre's method (UTF-8) returning to a string: ???????????????????????
[/pre]

As you can see, you lost your data when you encoded this string with your method (by the way, this is what happened when I decoded your byte array with ASCII, Unicode, UTF-7, and UTF-8, respectively):
[pre]
I)ragon's method returning to a string: S♀or☻gB♂X!$^q}ny'↕⌂(Y♂☻
I)ragon's method returning to a string: ???????????
I)ragon's method returning to a string: S?or?gB?O¡$^ñyny'?⌂(Y?☻
I)ragon's method returning to a string: SorgB?$^ny'⌂(Y☻
[/pre]

I hope you see why your method would cause problems for internationalization, and agree that it would just be better if we let the professionals who have already done the work for us manage our strings.

(By the way, that Japanese text came from Google where I translated it from: "This is a string that uses higher-value non-English characters.")
July 15, 2005, 2:32 AM
dRAgoN
Well I was under the impression this was for a server so lets see this, this way then.

When your viewing the data recieved via the socket what are you going to see.
[pre]take one of your lines here
53 30 8c 30 6f 30 72 30  82 30 67 30 42 30 8b 30  S0.0o0r0 .0g0B0.0
d8 9a a1 4f 24 50 5e 97  f1 82 fd 56 6e 30 79 72  O.¡O$P^. ñ.yVn0yr
27 60 92 30 7f 4f 28 75  59 30 8b 30 02 30        '`.0.O(u Y0.0.0[/pre]
true or false your going to have data recieved in this format.
July 15, 2005, 5:26 PM

Search