Valhalla Legends Forums Archive | Advanced Programming | CRC checking executable's running code

AuthorMessageTime
tA-Kane
Suppose I write a program which needs to be able to check itself for any unauthorized modifications made to it (whether in the executable's file or added after it's been launched). With the obvious problems aside (self-modifying code, storing data within executable space, etc), I'll need to be able to get the program to find the boundaries of its own executable memory space and CRC check it. Correct me if I'm wrong, but any modifications to the executable file would *most likely* also show up in the program's executable code (and if not, then there are other safeguards against data section tampering), would they not?

So with that in mind, how might I go about getting (eg, what API calls) the program's executable code memory boundaries? Are there any things to consider when accessing such memory without actually executing it? This is, of course, assuming that the CRC code will be within those boundaries and will of course include itself in the checksumming process.

Another thing to think about is when using this in connection with a server verification scheme ... it is possible for a modified executable to always send the correct checksum by either modifying the code the sends the checksum to the server, modifying the checksum algorithm to simply immediately return the correct value, or etc. What would be a feasible method of adding some randomness to the code which would make the checksum almost never be a static value? Perhaps have the server send a random value or maybe some code to inject at various places within the CRC algorithm, which could alter the result but yet will not make the algorithm unstable?
July 25, 2005, 4:29 AM
tA-Kane
For anyone interested, I seem to have created a seemingly-working function to do what I need:

[code]
void CheckExecutables(unsigned char Sum[16])
{
SIZE_T Length;
char * Current, i;
MEMORY_BASIC_INFORMATION Info;
MD5_CTX MD5;
HANDLE Instance;
Instance = GetCurrentProcess(); // must use GetCurrentProcess() ... the one from WinMain() doesn't have sufficient access privs

if (!Sum)
return;

MD5Init(&MD5);
for (Current = (char *)sInfo.lpMinimumApplicationAddress; Current < sInfo.lpMaximumApplicationAddress; )
{
Length = VirtualQueryEx(Instance, Current, &Info, sizeof(Info));
if (Length)
{
// wasn't a kernel-mode memory address
if ((Info.State & MEM_COMMIT) && !(Info.State & MEM_RESERVE))
{
// is an accessable allocated region
if (Info.Protect & (PAGE_EXECUTE | PAGE_EXECUTE_READ | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY))
{
// is an executable region
// let's MD5 it!
MD5Update(&MD5, (unsigned char *)Info.AllocationBase, (unsigned int)Info.RegionSize);
}
}
Current += Info.RegionSize;
}
else
{
Current += sInfo.dwPageSize;
}
}
MD5Final(Sum, &MD5);
}
[/code]

Would anyone care to critique?
July 27, 2005, 7:27 PM
Adron
You'll be checksumming all the loaded dlls, as well as possibly some data, depending on the architecture. Your checksum will give different results from time to time.
July 27, 2005, 10:45 PM
tA-Kane
Yes, I've noticed. I've found *this* code to be more reliable:

[code]
void CheckExecutables(unsigned char Sum[16])
{
SIZE_T Length;
char *Current;
MEMORY_BASIC_INFORMATION Info;
MD5_CTX MD5;
HANDLE Instance;
SYSTEM_INFO sInfo;
Instance = GetCurrentProcess(); // must use GetCurrentProcess() ... the one from WinMain() doesn't have sufficient access privs

GetSystemInfo(&sInfo);
Sum = NewSum;

MD5Init(&MD5);
for (Current = (char *)sInfo.lpMinimumApplicationAddress; Current < sInfo.lpMaximumApplicationAddress && Current < (char *)0x40000000; )
{
Length = VirtualQueryEx(Instance, Current, &Info, sizeof(Info));
if (Length)
{
// wasn't a kernel-mode memory address
if ((Info.State & MEM_COMMIT) && !(Info.State & MEM_RESERVE))
{
// is an accessable allocated region
if (Info.Protect & (PAGE_EXECUTE | PAGE_EXECUTE_READ | PAGE_EXECUTE_READWRITE | PAGE_EXECUTE_WRITECOPY) && (Info.Type & MEM_IMAGE))
{
// is an executable region
// let's MD5 it!
MD5Update(&MD5, (unsigned char *)Info.AllocationBase, (unsigned int)Info.RegionSize);
}
}
Current += Info.RegionSize;
}
else
{
Current += sInfo.dwPageSize;
}
}
MD5Final(Sum, &MD5);
}
[/code]

Note that the two differences are the check to make sure the memory is below 0x40000000 (the 2GB limit, where application memory and system DLL memory is differentiated, if I'm not mistaken), and the check to make sure that the region is of type MEM_IMAGE. While I doubt that only checking MEM_IMAGE-type regions will result in checking all code regions, it does seem to eliminate the number of changes to the code during normal operation. Loading more DLLs seems to change the sum, which is partly what I want. I still need to verify that the sum changes if the "user" alters runtime code, but I'm confident that it will, since the code should reside in a MEM_IMAGE-type region, should it not? At least, it has in the sample project I've whipped up as well as the two other projects I temporarily added this to.

Of course, the checksum would change if the user loads a different (perhaps older or newer) version of user DLLs (eg, ones loaded via LoadLibrary() and such), correct?

Edit:
I do have one other odd question though; do you know if the MEMORY_BASIC_INFORMATION.Type property's possible values (MEM_IMAGE, MEM_MAPPED, MEM_PRIVATE) are mutually exclusive? How about the .State property (same question)?
July 28, 2005, 7:14 AM
tA-Kane
I have been working with another user to create an algorithm which is less prone to checksumming data sections. I've been able to come up with this code, with his assistance:
[code]
bool CheckExecutables2(unsigned char Sum[16])
{
// need to enumerate the loaded libraries
// then, will need to sort the libraries' names alphabetically to ensure that they are always added to the checksum in the same order
// note that you cannot sort libraries by full path, because path could change if the dll was loaded from one place instead of another place
// checksum the modules, in sorted order... since the executable is returned within the modules, it will also be checksummed  :)
HANDLE Process;
BOOL Success;
DWORD Length, ModuleCount, Result, i, LastLoc, SectionCount;
HMODULE *UnsortedModules = NULL;
HMODULE *Modules = NULL; // modules, sorted
PIMAGE_DOS_HEADER pDosHeader;
PIMAGE_NT_HEADERS pNTHeader;
PIMAGE_SECTION_HEADER pSectionHeader;
char **ModuleFilenames = NULL; // pointers to module filenames, unsorted
char Current[MAX_PATH], Last[MAX_PATH]; // pointers to filenames (eg, within the pathnames) for sorting
bool rVal;
MD5_CTX MD5;

Process = GetCurrentProcess();

Success = EnumProcessModules(Process, NULL, 0, &Length);
if (!Success)
return false; // unable to get module count... ewwwwwwwwww!!!
enummodules:
ModuleCount = Length / sizeof(HMODULE);
try {
UnsortedModules = new HMODULE[ModuleCount];
} catch (std::bad_alloc) {
UnsortedModules = NULL;
} if (!UnsortedModules) {
return false; // unable to allocate HMODULE array
}

Success = EnumProcessModules(Process, UnsortedModules, ModuleCount*sizeof(HMODULE), &Length);
if (!Success)
{
// unable to enumerate modules! eww!!
rVal = false;
goto cleanup;
}
if (Length != (ModuleCount * sizeof(HMODULE)))
{
// oh VERY funny... loaded or unloaded a module after getting module count... BAH!
delete[] UnsortedModules;
goto enummodules; // try again
}
try {
ModuleFilenames = new char*[ModuleCount];
} catch (std::bad_alloc) {
ModuleFilenames = NULL;
} if (!ModuleFilenames) {
rVal = false;
goto cleanup;
}
memset(ModuleFilenames, 0, ModuleCount*sizeof(char**));
Result = FillModuleFilenames(Process, ModuleFilenames, UnsortedModules, ModuleCount);
if (Result == (DWORD)-1) {
// bad param... eww
rVal = false;
goto cleanup;
}

try {
Modules = new HMODULE[ModuleCount];
} catch (std::bad_alloc) {
Modules = NULL;
} if (!Modules) {
// ugh...
rVal = false;
goto cleanup;
}
// module handles are allocated, module paths are allocated and retrieved, now need to sort plzkthx
for (Result = 0; Result < ModuleCount; Result++) {
for (i = Last[0] = 0, LastLoc = (DWORD)-1; i < ModuleCount; i++) {
if (!ModuleFilenames[i])
continue;
strcpy(Current, ModuleFilenames[i]);
PathStripPath(Current);
if (strcasecmp(Current, Last) > 0) {
strcpy(Last, Current);
LastLoc = i;
}
}
if (LastLoc != (DWORD)-1) {
Modules[Result] = UnsortedModules[LastLoc];
delete[] ModuleFilenames[LastLoc]; // need to delete and set to NULL to make sure that we don't check it again
ModuleFilenames[LastLoc] = NULL;
} else {
Modules[Result] = NULL; // set to invalid
}
}
// now need to checksum the modules' code sections
MD5Init(&MD5);
for (i = 0; i < ModuleCount; i++) {
if (!Modules[i]) // no filename? bleh... gonna have to skip it... should possibly return error status
continue;
pDosHeader = (PIMAGE_DOS_HEADER)Modules[i];
pNTHeader = (PIMAGE_NT_HEADERS)(pDosHeader->e_lfanew + (char *)pDosHeader);
SectionCount = pNTHeader->FileHeader.NumberOfSections;
pSectionHeader = IMAGE_FIRST_SECTION(pNTHeader);
for (Length = 0; Length < SectionCount; Length++, pSectionHeader++) {
if (pSectionHeader->Characteristics & (IMAGE_SCN_CNT_CODE | IMAGE_SCN_MEM_EXECUTE))
MD5Update(&MD5, (unsigned char *)((DWORD)Modules[i] + pSectionHeader->VirtualAddress), pSectionHeader->Misc.VirtualSize);
}
}
MD5Final(Sum, &MD5);

rVal = true;
cleanup:
if (UnsortedModules)
delete[] UnsortedModules;
if (ModuleFilenames) {
for (Length = 0; Length < ModuleCount; Length++)
if (ModuleFilenames[Length])
delete[] ModuleFilenames[Length];
delete[] ModuleFilenames;
}
if (Modules)
delete[] Modules;
return rVal;
}

DWORD FillModuleFilenames(HANDLE Process, char **ModuleFilenames, HMODULE *Modules, DWORD ModuleCount)
{
if (!ModuleFilenames || !Modules)
return (DWORD)-1;

DWORD i, bad, Result;
for (i = bad = 0; i < ModuleCount; i++) {
if (ModuleFilenames[i])
delete[] ModuleFilenames[i];
try {
ModuleFilenames[i] = new char[MAX_PATH];
} catch (std::bad_alloc) {
ModuleFilenames[i] = NULL;
} if (!ModuleFilenames[i]) {
// eww, unable to allocate!
bad++;
continue;
}
Result = GetModuleFileNameEx(Process, Modules[i], ModuleFilenames[i], MAX_PATH);
if (!Result) {
delete[] ModuleFilenames[i];
ModuleFilenames[i] = NULL;
bad++;
}
}
i -= bad;
if (i < 0)
i = 0;
return i;
}
[/code]

Quite a fair bit longer in both source code and execution time. Definitely needs to be cleaned up, especially in the area of sorting the module names. It does provide a different sum than the previous algorithm, however that's rather understandable since it's almost to be expected; the principals are slightly different... the previous algorithm would checksum *all* executable memory address ranges, while this one should only checksum the memory address ranges loaded from the file (eg, if the program allocates another memory range and copies executable code to that memory range, then that memory range will not be checksummed). With this in mind, however, the previous checksumming function could (and would) mistake some purely data sections for executable sections, would it not?

Edit:
While the checksum appears to stay the same for the duration of a program's single run, it does not appear to be the same between runs, so either there's a bug in here somewhere or this function is *not* what I need.  :(

Edit2:
I seem to have fixed the problem I was having. It seems that I was adding the incorrect base address for the start of each memory region to be checksummed, as well as only checksumming the first section, whether or not it was executable, instead of checksumming all executable sections. Now the checksum is the same between instances of the program.  :)

I think this will do nicely.
July 28, 2005, 12:08 PM
Adron
What exactly are you trying to accomplish?

This might work sometimes to verify that it's the same user running the exe, and that no dlls have been changed. It will require updates often.

The sum will change every month the 15th when Windows Update posts new dlls. It will change every time the user installs an application with global hooks, say "Comet Cursor" (loads a dll into the address space of every gui application). And on system with multiple dlls with the same base address, it will give a different checksum every time the program is run.


July 28, 2005, 3:31 PM
tA-Kane
The algorithm must checksum all code related to the application, eg, code derived from the application's source code, as well as any specified DLLs' codes (which the second algorithm I provided should be able to do with some more tweaks), and of course, to help prevent tampering of the checksum-enabled (or -disabled) DLLs, checksum that list as well.

The checksum must be the same in all cases (eg, from day one to year million, theoretically), assuming that the code has not been tampered with nor upgraded to a newer version.

It'd be helpful if the algorithm checksummed the static data as well (strings, etc).

Would there be any way to prevent "Comet Cursor" from being installed to the application? If not, then how could you expect  the algorithm to know the difference between user-friendly "Comet Cursor" and a similar-by-design hacker-friendly program/utility? In either case, it's probably best to not include such thing into the checksum to keep the overall feel of the application the same as the overall feel of the system.


Edit:
oops, was still writing this and accidentally pushed post... oh well
July 28, 2005, 7:12 PM
Adron
Well, you are expecting to have one checksum for each user of your application, not the same checksum for everyone?

Particularly, as soon as you include Windows' dlls in your checksum, you have to expect the checksum to change monthly.

And no, there's no easy way to know the difference between a hacker's dll and some other dll installed by a mouse or joystick or toolbar or similar.

And no, checking each dll in your address space won't find all code that has been injected.
July 28, 2005, 9:59 PM
Myndfyr
[quote author=tA-Kane link=topic=12327.msg122243#msg122243 date=1122534857]
Note that the two differences are the check to make sure the memory is below 0x40000000 (the 2GB limit, where application memory and system DLL memory is differentiated, if I'm not mistaken),
[/quote]
I'm not really qualified to respond to the rest of your discussion, but I wanted to point out that 2gb is located from 0x00000000 to 0x7fffffff.  Memory beyond 2gb is 0x80000000 to 0xffffffff.  ;)
July 28, 2005, 10:10 PM
tA-Kane
The checksum needs to the be the same for everyone. And like I said, it wouldn't be hard to add a check in the second algorithm to only checksum my own DLLs, instead of system DLLs.

You are right, MyndFyre, now that I think about it. But the thing is, whenever I look at the memory map of a program in a debugger, the system DLLs are located from 0x40000000 to 0x7FFFFFFF. I just saw 0x4xxxxxxx and figured 2-4GB, without doing calculations.  :P

In any case, I think the second algorithm will work better because it can better identify the module name associated with a given memory region, and thus, better filter out the system DLLs.
July 29, 2005, 11:17 AM
Adron
If the checksum needs to be the same for everyone, then yes, you can only checksum your own dlls. You should also be careful only to checksum your actual code. You wouldn't for example want to checksum the import addresses.

And now given that you don't checksum the system dlls, it would be even easier to load evil_hacker.dll into your process ;)
July 29, 2005, 6:02 PM
tA-Kane
Of course. But without guaranteeing that everyone patches *all* system DLLs the moment they're available (or, by the time that the next update is available), then it'd be an endless cycle with nearly endless possible "valid" checksums.
July 29, 2005, 6:22 PM
Adron
Yes, you have virtually endless possible checksums. Fun, eh? ;)
July 29, 2005, 6:25 PM
TheMinistered
I guess I'll try to actually be helpful to you.  A crc check can be used to detect modifications to the executables, but is rather weak because it's easily defeated via patching your cmp.  I suggest looking into more complex protections such as symmetric code encryption/decryption, this schema can further be used to provide "leak protection" as the key can include specific computer specs.
September 16, 2005, 7:15 PM
tA-Kane
A CRC check is also quite a bit simpler than inline function encryption/decryption, in my opinion. If you know of a rather simple (and free) method of doing such with varying start and end encrypted regions and varying degrees of encryption levels (eg, whether using different keys, different length keys, or even different algorithms  for different regions) and still maintains a decent runtime speed to the end-user on very old machines, then please be my guest and point me in the right direction. Searching google for this kind of information would take a lot of time and effort in the best case, especially since I doubt what I specifically want already exists and is free.

On a side note... if I were ever to get my own domain, these kinds of things would be an interesting addition to an advanced-level programming section I could create. I should get off my ass and make one.
September 17, 2005, 4:25 AM
Kp
[quote author=tA-Kane link=topic=12327.msg128436#msg128436 date=1126931142]A CRC check is also quite a bit simpler than inline function encryption/decryption, in my opinion. If you know of a rather simple (and free) method of doing such with varying start and end encrypted regions and varying degrees of encryption levels (eg, whether using different keys, different length keys, or even different algorithms  for different regions) and still maintains a decent runtime speed to the end-user on very old machines, then please be my guest and point me in the right direction. Searching google for this kind of information would take a lot of time and effort in the best case, especially since I doubt what I specifically want already exists and is free.[/quote]

Skywing wrote quite a bit of code along this line several years ago.  If you can find him and get his OK for it, I can make it available to you.  He may want to keep some/all of it secret though, since at least parts of it ended up in BinaryChat's anti-leak design.
September 17, 2005, 9:25 PM
rabbit
[quote author=Kp link=topic=12327.msg128525#msg128525 date=1126992319]If you can find him
[/quote]HAHAHAHAHAHAHA!!  Yeah, right :P
September 17, 2005, 10:05 PM
UserLoser.
[quote author=Kp link=topic=12327.msg128525#msg128525 date=1126992319]
[quote author=tA-Kane link=topic=12327.msg128436#msg128436 date=1126931142]A CRC check is also quite a bit simpler than inline function encryption/decryption, in my opinion. If you know of a rather simple (and free) method of doing such with varying start and end encrypted regions and varying degrees of encryption levels (eg, whether using different keys, different length keys, or even different algorithms  for different regions) and still maintains a decent runtime speed to the end-user on very old machines, then please be my guest and point me in the right direction. Searching google for this kind of information would take a lot of time and effort in the best case, especially since I doubt what I specifically want already exists and is free.[/quote]

Skywing wrote quite a bit of code along this line several years ago.  If you can find him and get his OK for it, I can make it available to you.  He may want to keep some/all of it secret though, since at least parts of it ended up in BinaryChat's anti-leak design.
[/quote]

Which BinaryChat?  Because I know some people who cracked some old binary ZeroBot and BinaryChat 2.00 :P
September 17, 2005, 10:27 PM
TheMinistered
An older version of binary chat was defeated by zorm, mainly.  I helped a little but wasn't much interested-- if you want a similar version of skywing's protection then i'll post it in a couple days, I just need to find the cd.
September 20, 2005, 5:08 PM
tA-Kane
I'm interested ... but if it's derived from someone else's source code (even, especially, via disassembling a private executable), I don't think I should accept it without authorization from the origional author.
September 20, 2005, 5:18 PM
Arta
What is your goal in these endeavours?
September 20, 2005, 5:27 PM
rabbit
70 /\/\4|<3 1337 |-|4><, |)|_||-|!
September 21, 2005, 1:55 AM
tA-Kane
[quote author=Arta[vL] link=topic=12327.msg128787#msg128787 date=1127237245]
What is your goal in these endeavours?
[/quote]1) Learn (advanced programming techniques are an interesting read for me)
2) Protect (help protect the program I'm working on against unauthorized use and/or deviations from its intended use)
3) Assist (others could benefit from the knowledge gained here)
September 21, 2005, 8:21 AM
Arta
Well, 1 & 3 are great, but 2 is a waste of time :)

Nonetheless: I don't see the value in a CRC. Once you've generated it, it'll come down to "if(CRC != what I'm expecting) quit" - which is trivially bypassed. Strong encryption of your code is the only way to go.
September 21, 2005, 10:22 AM
tA-Kane
Why quit? That's so obvious. I was thinking more along the lines of if (CRC != what I'm expecting), introduce specific bugs into the program.
September 21, 2005, 12:39 PM
Arta
Well, ok, but whatever you do won't be very hard to crack, is my point. Strong encryption, on the other hand, could be.
September 21, 2005, 4:08 PM
KkBlazekK
Could you explain a method of Strong Encryption Arta?  I can't seem to think of anyway but what you just said sucked (I agree, btw).
September 22, 2005, 10:39 PM
rabbit
CRC code and then UPX?  Not sure if it will work, never tried it.
September 23, 2005, 12:10 AM
Arta
Encrypt your exe and decrypt it, or sections of it, at runtime, a la BC. Note that this isn't uncrackable either, but it's harder to break.
September 23, 2005, 12:16 AM
UserLoser.
[quote author=rabbit link=topic=12327.msg128992#msg128992 date=1127434250]
CRC code and then UPX?  Not sure if it will work, never tried it.
[/quote]

UPX? What's the point of using public compression when you can just decompress it via "upx.exe -d yourbot.exe"

[quote author=Arta[vL] link=topic=12327.msg128993#msg128993 date=1127434564]
Encrypt your exe and decrypt it, or sections of it, at runtime, a la BC. Note that this isn't uncrackable either, but it's harder to break.
[/quote]

Yeah, but BC is also scrambled at the specific sections and must be descrambled before decrypting which makes it even more complicated.  But yeah, a method like this is really good and could take a while to crack
September 23, 2005, 1:23 AM
rabbit
[quote author=UserLoser link=topic=12327.msg129000#msg129000 date=1127438612]
[quote author=rabbit link=topic=12327.msg128992#msg128992 date=1127434250]
CRC code and then UPX?  Not sure if it will work, never tried it.
[/quote]

UPX? What's the point of using public compression when you can just decompress it via "upx.exe -d yourbot.exe"
[/quote]To annoy people who don't realize it's UPX'd, of course!
September 23, 2005, 2:33 AM
UserLoser.
[quote author=rabbit link=topic=12327.msg129032#msg129032 date=1127442785]
[quote author=UserLoser link=topic=12327.msg129000#msg129000 date=1127438612]
[quote author=rabbit link=topic=12327.msg128992#msg128992 date=1127434250]
CRC code and then UPX?  Not sure if it will work, never tried it.
[/quote]

UPX? What's the point of using public compression when you can just decompress it via "upx.exe -d yourbot.exe"
[/quote]To annoy people who don't realize it's UPX'd, of course!
[/quote]

Well if they were to look at it in a disassembler, they'd see that the .text and .data sections would probably be named "upx1"
September 23, 2005, 2:50 AM

Search