Valhalla Legends Forums Archive | Battle.net Bot Development | [C#] Dynamically compiling a classic CheckRevision formula

AuthorMessageTime
Myndfyr
Wanted to share something that I came up with the other day.

A classic CheckRevision value string looks like this:
[tt]A=166443184 B=361259356 C=197717253 4 A=A-S B=B-C C=C+A A=A+B[/tt]

History has shown us that Crev generally has four "values" - A, B, C, and S, where A-C are state values and S is the current int32 being processed from the file.  My understanding from what people have told me is that the Battle.net issues similar value strings over a variable period, in which the formulae at the end are repeated, and just the state seeds are changed.  Our challenge is to execute the formulae efficiently.  Presently, MBNCSUtil and BN# don't do so well - memory usage is OK, but they are slow because they rely on some voodoo math and arrays (ported from iago's implementation in Java) to do the actual math.

This is generally OK for a normal client, but if I'd ever wanted to implement something like BNLS - where I'd be reusing the same formula for a short period of time - it'd be better to do something more efficient.  Well, I've started down the path to that.

CheckRevision as it is currently implemented in my code is here.  The modified form is here:
[code]
       public static int DoCheckRevision(
           string valueString,
           string[] files,
           int mpqNumber)
       {
           if (valueString == null)
               throw new ArgumentNullException("valueString");
           if (files == null)
               throw new ArgumentNullException("files");
           if (files.Length != 3)
               throw new ArgumentOutOfRangeException("files", files, "Files must be exactly 3 in length.");

           StdCrevImpl impl = FindOrConstructImpl(valueString);

           uint A, B, C, S;

           InitializeValues(valueString, out A, out B, out C);

           A ^= hashcodes[mpqNumber];

           byte[] currentOperandBuffer = new byte[1024];
           for (int i = 0; i < files.Length; i++)
           {
               using (FileStream currentFile = new FileStream(files[i], FileMode.Open, FileAccess.Read, FileShare.Read))
               {
                   while (currentFile.Position < currentFile.Length)
                   {
                       long currentFilePosition = 0;
                       long amountToRead = Math.Min(currentFile.Length - currentFilePosition, 1024);
                       currentFile.Read(currentOperandBuffer, 0, (int)amountToRead);

                       if (amountToRead < 1024)
                       {
                           byte currentPaddingByte = 0xff;
                           for (int j = (int)amountToRead; j < 1024; j++)
                           {
                               unchecked
                               {
                                   currentOperandBuffer[j] = currentPaddingByte--;
                               }
                           }
                       }

                       for (int j = 0; j < 1024; j += 4)
                       {
                           S = BitConverter.ToUInt32(currentOperandBuffer, j);

                           impl(ref A, ref B, ref C, ref S);
                       }
                   }
               }
           }

           return unchecked((int)C);
       }
[/code]

The highlights of differences are:
[list]
[*] It calls "FindOrConstructImpl(valueString)" and then initializes state values A, B, C, and S
[*] Then it calls "InitializeValues(valueString, out A, out B, out C)" instead of running a loop to find the seeds and formulae
[*] Finally, in the inner loop of reading the file, it calls the implementation retrieved before.
[/list]

Right now I've been testing against the sample crev string posted above, and haven't yet written the code to translate the formula into actual non-hardcoded functions.  But it's promising.  I started with this:
[code]
           return (StdCrevImpl)((string valueString, out uint A, out uint B, out uint C) =>
           {
               A = A - S;
               B = B - C;
               C = C + A;
               A = A + B;
           });
[/code]

Then I translated it into the equivalent IL:
[code]
       private static StdCrevImpl FindOrConstructImpl(string valueString)
       {
           if (dynamicMethod != null)
               return dynamicMethod;
           
           // A=166443184 B=361259356 C=197717253 4 A=A-S B=B-C C=C+A A=A+B
           string formula = "A=A-S B=B-C C=C+A A=A+B";
           Type paramTypes = typeof(uint).MakeByRefType();
           DynamicMethod method = new DynamicMethod(formula, typeof(void), new Type[] { paramTypes, paramTypes, paramTypes, paramTypes });
           var gen = method.GetILGenerator();
           
           gen.Emit(OpCodes.Ldarg_0); // A =
           gen.Emit(OpCodes.Ldarg_0); // A
           gen.Emit(OpCodes.Ldind_U4); // *A
           gen.Emit(OpCodes.Ldarg_3); // S
           gen.Emit(OpCodes.Ldind_U4); // *S
           gen.Emit(OpCodes.Sub); // *A - *S
           gen.Emit(OpCodes.Stind_I4); // *A = *A - *S

           gen.Emit(OpCodes.Ldarg_1); // B =
           gen.Emit(OpCodes.Ldarg_1); // B
           gen.Emit(OpCodes.Ldind_U4); // *B
           gen.Emit(OpCodes.Ldarg_2); // C
           gen.Emit(OpCodes.Ldind_U4); // *C
           gen.Emit(OpCodes.Sub); // *B - *C
           gen.Emit(OpCodes.Stind_I4); // *B = *B - *C

           gen.Emit(OpCodes.Ldarg_2); // C
           gen.Emit(OpCodes.Ldarg_2); // C
           gen.Emit(OpCodes.Ldind_U4); // *C
           gen.Emit(OpCodes.Ldarg_0); // A
           gen.Emit(OpCodes.Ldind_U4); // *A
           gen.Emit(OpCodes.Add); // *C + *A
           gen.Emit(OpCodes.Stind_I4);

           gen.Emit(OpCodes.Ldarg_0); // A =
           gen.Emit(OpCodes.Ldarg_0); // A
           gen.Emit(OpCodes.Ldind_U4); // *A
           gen.Emit(OpCodes.Ldarg_1); // B
           gen.Emit(OpCodes.Ldind_U4); // *B
           gen.Emit(OpCodes.Add); // *A + *B
           gen.Emit(OpCodes.Stind_I4);

           gen.Emit(OpCodes.Ret);

           dynamicMethod = method.CreateDelegate(typeof(StdCrevImpl)) as StdCrevImpl;
           return dynamicMethod;
       }
[/code]

The dynamic method can then be stored according to the appropriate formula part of the string.  It executes about four times faster to emit and compile the method at runtime than to execute the loop/array-offset version that currently lives.

Some challenges to overcome:
* Decide whether to use a "standard Crev" with A, B, C, and S; or to be able to account for additional state (such as D, E, F?)
* Decide if and when to use this in a standard library or keep it completely separate.

Still, I thought it was cool - I just learned about dynamic methods, and didn't know they could be completely freed up like normal (non-.NET) programming.  That makes them cheap and easy - always a good combination!
December 3, 2009, 7:55 AM
Imperceptus
I think easy is relative choice of words myndfyre. lol. 
December 3, 2009, 6:37 PM
HdxBmx27
Just a observation form JBLS, in all of the years it has been running, it has NEVER had a Old style value string that was not in the format:
A=x B=y C=z 4 A=A?S B=B?C C=C?A A=A?B
Except when it was empty (null) which is caused by old verbytes :P
It would be cool to implement the ability to support weird formats, but, would it be feasible? Would it be beneficial. Or would it just be a waist of process time?
I plan on adding seed compilation like this when I re-write Warden :P

But none the less, cool.
December 3, 2009, 7:23 PM
Myndfyr
[quote author=Hdx link=topic=18122.msg183806#msg183806 date=1259868210]
Just a observation form JBLS, in all of the years it has been running, it has NEVER had a Old style value string that was not in the format:
A=x B=y C=z 4 A=A?S B=B?C C=C?A A=A?B
Except when it was empty (null) which is caused by old verbytes :P
It would be cool to implement the ability to support weird formats, but, would it be feasible? Would it be beneficial. Or would it just be a waist of process time?
I plan on adding seed compilation like this when I re-write Warden :P

But none the less, cool.
[/quote]
Thanks.

I don't think it would be a terrible waste of processor time.  For instance, today I started working on rewriting the formulas to actually dynamically compile (instead of being statically written like I pasted before).  So I did this, and these dictionaries should be O(1) (or close):
[code]        private static readonly Dictionary<char, OpCode> Operators = new Dictionary<char, OpCode>
        {
            { '+', OpCodes.Add },
            { '-', OpCodes.Sub },
            { '*', OpCodes.Mul },
            { '/', OpCodes.Div_Un },
            { '^', OpCodes.Xor },
            { '|', OpCodes.Or },
            { '&', OpCodes.And }
        };

        private static Dictionary<char, OpCode> StdArgumentList = new Dictionary<char, OpCode>
        {
            { 'A', OpCodes.Ldarg_0 },
            { 'B', OpCodes.Ldarg_1 },
            { 'C', OpCodes.Ldarg_2 },
            { 'S', OpCodes.Ldarg_3 }
        };[/code]

That allowed me to use a more simple method on a per-formula basis:
[code]
        private static void CompileStandardFormula(ILGenerator generator, string formula)
        {
            char dest = formula[0];
            char opA = formula[2];
            char opB = formula[4];
            char op = formula[3];

            if (!StdArgumentList.ContainsKey(dest) || !StdArgumentList.ContainsKey(opA) || !StdArgumentList.ContainsKey(opB)
                || !Operators.ContainsKey(op))
                throw new InvalidRevisionCheckException();

            generator.Emit(StdArgumentList[dest]);
            generator.Emit(StdArgumentList[opA]);
            generator.Emit(OpCodes.Ldind_U4);
            generator.Emit(StdArgumentList[opB]);
            generator.Emit(OpCodes.Ldind_U4);
            generator.Emit(Operators[op]);
            generator.Emit(OpCodes.Stind_I4);
        }
[/code]

So, all told supporting things like divide, xor, and, and or are practically free.  However, the detection mechanism for whether we're using a standard string or something more exotic (like A=2948819 B=S+A B=3059591 etc) or whatnot would be the cost of running a regex and then handling it properly, and then spinning off the more expensive and deliberate formula compiler.
December 3, 2009, 9:33 PM
HdxBmx27
That sounds like a rather good implementation, but, as i said before the formula for these older crevs are standard.

So you could have it all hard coded, and simply replace the opcode for the operator.
Don't have the code off the top of my head but if its always:
A=A?S B=B?C C=C?A A=A?B
You could have everything but the ?s hard coded, and just code[3] = OpCodes.Sub, etc..
But, I don't know how ILGenerator works, is something like that possible?

You have to use a reg exp or some other method to determine if you need to use a more robust function, so no reason not to assume as much as you can.
I hate not being able to express what i'm saying in non-stupid language :P
December 4, 2009, 1:10 AM
lord2800
I got bored and decided to implement this whole thing, and here's what I came up with:

[code]using System;
using System.Collections.Generic;
using System.IO;
using System.Reflection.Emit;
using System.Reflection;

namespace BNet
{
class CheckRevision
{
private static Dictionary<int, OpCode> Ldargs = new Dictionary<int, OpCode>() {
{0, OpCodes.Ldarg_0}, {1, OpCodes.Ldarg_1}, {2, OpCodes.Ldarg_2}, {3, OpCodes.Ldarg_3}
};

private static Dictionary<char, OpCode> Operators = new Dictionary<char, OpCode>() {
{'+', OpCodes.Add}, {'-', OpCodes.Sub}, {'*', OpCodes.Mul}, {'/', OpCodes.Div},
{'|', OpCodes.Or},  {'&', OpCodes.And}, {'^', OpCodes.Xor}
};

private delegate void FileHasher(ref uint a, ref uint b, ref uint c, ref uint s, byte[] f);

private static uint[] hashes = new uint[] {0xE7F4CB62, 0xF6A14FFC, 0xAA5504AF, 0x871FCDC2, 0x11BF6A18, 0xC57292E6, 0x7927D27E, 0x2FEC8733};

public static uint ComputeHash(string formula, string mpqFile, FileStream gameExe, FileStream bnclientDll, FileStream d2clientDll)
{
byte[] game = File.ReadAllBytes(gameExe.Name);
byte[] bnclient = File.ReadAllBytes(bnclientDll.Name);
byte[] d2client = File.ReadAllBytes(d2clientDll.Name);

int mpq = Convert.ToInt32(mpqFile[mpqFile.LastIndexOf('.') - 1].ToString(), 10);
uint[] values = new uint[4];
IEnumerable<FormulaOp> ops = BuildFormula(formula, ref values);

// UGLY HACK: use the hardcoded mpq seed
// TODO: figure out how to get the real mpq seed
values[0] ^= hashes[mpq];

FileHasher HashFile = BuildFileHasher(ops);

HashFile(ref values[0], ref values[1], ref values[2], ref values[3], game);
HashFile(ref values[0], ref values[1], ref values[2], ref values[3], bnclient);
HashFile(ref values[0], ref values[1], ref values[2], ref values[3], d2client);

return values[2];
}

private static FileHasher BuildFileHasher(IEnumerable<FormulaOp> ops)
{
Type uint_t = typeof(uint).MakeByRefType();
DynamicMethod method = new DynamicMethod("HashFile", typeof(void),
new Type[] { uint_t, uint_t, uint_t, uint_t, typeof(byte[]) });
MethodInfo touint32 = typeof(BitConverter).GetMethod("ToUInt32");

ILGenerator gen = method.GetILGenerator();

Label start = gen.DefineLabel();
LocalBuilder index = gen.DeclareLocal(typeof(int));
LocalBuilder len = gen.DeclareLocal(typeof(int));

// initialize the loop counter
gen.Emit(OpCodes.Ldc_I4_0);
gen.Emit(OpCodes.Stloc, index);

// load the length of the array into a local
gen.Emit(OpCodes.Ldarg, (short)4);
gen.Emit(OpCodes.Ldlen);
gen.Emit(OpCodes.Conv_I4);
gen.Emit(OpCodes.Stloc, len);

// start of loop across the file
gen.MarkLabel(start);

// load the value of arg4 at index into the address of arg3
gen.Emit(OpCodes.Ldarg_3);
gen.Emit(OpCodes.Ldarg_S, (byte)4);
gen.Emit(OpCodes.Ldloc, index);
gen.EmitCall(OpCodes.Call, touint32, null);
gen.Emit(OpCodes.Stind_I4);

// for each op in the formula...
foreach(var op in ops)
{
// load the result address
gen.Emit(Ldargs[op.Result]);

// load the first value
gen.Emit(Ldargs[op.Variable1]);
gen.Emit(OpCodes.Ldind_U4);

// load the second value
gen.Emit(Ldargs[op.Variable2]);
gen.Emit(OpCodes.Ldind_U4);

// execute the operator
gen.Emit(Operators[op.Operation]);

// store the result in the result address
gen.Emit(OpCodes.Stind_I4);
}

// increment the loop counter
gen.Emit(OpCodes.Ldloc, index);
gen.Emit(OpCodes.Ldc_I4_4);
gen.Emit(OpCodes.Add);
gen.Emit(OpCodes.Stloc, index);

// jump back to the top of the label if the loop counter is less arg4's length
gen.Emit(OpCodes.Ldloc, index);
gen.Emit(OpCodes.Ldloc, len);
gen.Emit(OpCodes.Blt, start);
gen.Emit(OpCodes.Ret);

FileHasher del = (FileHasher)method.CreateDelegate(typeof(FileHasher));
return del;
}

private static IEnumerable<FormulaOp> BuildFormula(string formula, ref uint[] values)
{
List<FormulaOp> ops = new List<FormulaOp>();
string[] tokens = formula.Split(' ');
foreach(string token in tokens)
{
string[] param = token.Split('=');

if(param.Length == 1)
continue;

int res = WhichVariable(param[0][0]);
if(char.IsDigit(param[1][0]))
values[res] = Convert.ToUInt32(param[1]);
else
{
string method = param[1];
ops.Add(new FormulaOp(method[1], res, WhichVariable(method[0]), WhichVariable(method[2])));
}
}
return ops;
}
private static int WhichVariable(char param)
{
int res = (param) - 'A';
if(res > 2) res = 3;
return res;
}

private class FormulaOp
{
public int Variable1 { get; private set; }
public int Variable2 { get; private set; }
public int Result { get; private set; }
public char Operation { get; private set; }
public FormulaOp(char op, int result, int variable1, int variable2)
{
Result = result; Variable1 = variable1; Variable2 = variable2; Operation = op;
}
}
}
}[/code]

So how fast is it? With a cold cache, it executes in ~40ms on average. With a warm cache, however, that number drops to a near constant 16ms. I'm pretty sure that's the lower bound for System.Diagnostics.Stopwatch, so it's probably a bit faster than that even. And I could speed this up by p/invoking MapViewOfFile and friends, I just don't think it's worth the effort at this point. The code necessary to wrap the native MapViewOfFile functions is out there, however. I could probably do better by building the formula a bit cleaner too, and by not passing a FileStream and passing the filename directly instead (less property accesses and whatnot), but meh. Also note that this code is slightly more direct and slightly smaller than what the compiler will produce if you were to write the formula in code as a plain C# function. It uses a few more ops and another label to have the condition test at the bottom of the loop and uses a different branch (brtrue instead of blt). Since I know the file will never be null at the start, I can simply go directly into the loop.

Also: If you look closely, you can see bits of MBNCSUtil's CheckRevision formula builder in there. I used MBNCSUtil as a reference when implementing my own formula builder, but then ended up rewriting a good portion of the code anyway, since my initial attempt was fairly broken.
May 6, 2010, 5:31 AM

Search