Advanced Programming | Pattern recognition

Author	Message	Time
Skywing	I'm interested in implementing some kind of automated pattern recognition to defeat certain types of spam attacks. For instance, it would be advantangeous if clients could recognize that large numbers of users with randomized alphanumeric names joining the channel within a short period of time is suspicious and most likely a spam attack. Any suggestions on how to detect commonality or patterns between names in these kinds of situations?	November 3, 2003, 11:24 PM
Etheran	start off by looking for numbers within the name.	November 4, 2003, 1:05 AM
Eibro	I remember coming across some interesting C code a few weeks back. It was a password integrity routine, it would almost always return false for pure english-ish words. (for instance, Eibro is not a real word, but it probably wouldn't pass the test) I'll be damned if I can find it now, though I remember it speaking of some sort of 'triple rule' for all words in the english language. I googled for about 5 minutes, but I can't seem to come up with it again. This could be used, or modified to suit your purposes. If users joining/leaving in a short period of time pass this test, they're most likely randomized. I'm pretty sure the function output degrees of integrity, which would be much more useful than just true/false.	November 4, 2003, 3:08 AM
CupHead	Although it may be a little complex for this situation, a basic neural net would work well for this. It'd have to be "trained" to recognize name entropy, but with a sufficient number of examples, it wouldn't be a problem. Anyway, probably too complex a solution for what you need, but just a suggestion.	November 4, 2003, 2:25 PM
St0rm.iD	CBR (case-based reasoning) would be a great way to attack this I think. There's a great article in JavaPro about it (search www.javapro.com for cbr). Essentially, it involves isolating X number of criteria for each result/search and placing them on an X dimensional coordinate system. Plot a few results (examples: mail from a friend, sex spam, viagra spam, nigerian spam, security announcement, etc) on the graph, and for each incoming mail, plot the mail on the graph and see which result it is closest to. Then, you can decide whether or not it is spam or not.	November 5, 2003, 1:14 AM
Tuberload	After today's example of a mass-loading bot, I think a start to automatically banning people off of patterns in there names would be to have some sort of learning mode that learns each time the channel is massed. For example, right now all the bots in Op [vL] are random numeric names 15 characters in length. Have the bot save those parameters and from now on ban anyone one with a username that matches the parameters. The bot could be programmed to recognize new parameters during these attacks, and ban accordingly.	November 7, 2003, 6:35 AM
St0rm.iD	That's definately a neural net.	November 8, 2003, 1:28 AM
Skywing	Keep in mind that the solution should not be so processor intensive that floodbot attacks will effectively DoS the client out of useablility.	November 8, 2003, 1:34 AM
Tuberload	[quote author=St0rm.iD link=board=23;threadid=3399;start=0#msg28012 date=1068254904] That's definately a neural net. [/quote] Yes and no, I am thinking we could make it a little simpler.	November 8, 2003, 5:27 AM
CupHead	With the neural net, you just have to train it to an adequate point, after that, any code for adjusting weights can go. You just need the algorithm and the post-training weights stored in a file in order to do the actual check.	November 8, 2003, 5:43 AM
Tuberload	When a flood/spam attack happens, recognize the pattern used by the individual. This will more than likely match a pattern used by the particular flood bot. Then create a profile, and when any suspicious activity occurs that fits a profile take the necessary actions specific to your channels needs. If a flood/spam attack takes place that does not fit a specific profile you could just have your bot in a learning mode trying to figure it out, but not allowing itself to crash or flood. You could also consolidate profiles between members of your botnet to add to the learning process. While this may not prevent every occurrence, it will prevent a large percentage of people who use other peoples tools, and keep those persistent in flooding battle.net channels on their toes. Profiles could contain all characteristics of the bots used in the attack from their usernames, to their client type, to what they spam. This is my idea to this problems solution, so feel free to build upon this or share your own. I think this is a problem many people could learn from.	November 8, 2003, 9:53 AM
indulgence	Goal: To determine a viable means for determining commonality among strings. Concept: Trim all numbers from a name. Then generate a duality table which creates a running pair sequence for the names as shown below (Should contain 14 WORDs which should be the running 2 byte sequence) These entries are then individually summed and then compared against each entry in other names and the difference stored. The total difference from the pairs is the commonality. The higher the number - the greater the uniqueness. A threshold can be set and you can opt to only do this check against unknown/unflagged users. This should eliminate nearly every bot attack. When you trim the numbers - you eliminate the ability of bots to create a higher commonality seed due to the differentials created because of differences in numbers. To get a better result you could lowercase or uppercase the string prior to this check as well. Here are some usernames their 15 byte hex equivalent and the duality table generation results: [code] Name: indulgence HexStr: 69 6E 64 75 6C 67 65 6E 63 65 00 00 00 00 00 Duality Table results: 69 6E -> 00D7 6E 64 -> 00D2 64 75 -> 00D9 75 6C -> 00E1 6C 67 -> 00D3 67 65 -> 00CC 65 6E -> 00D3 6E 63 -> 00D1 63 65 -> 00C8 65 00 -> 0065 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 Name: ecnegludni HexStr: 65 63 6E 65 67 6C 75 64 6E 69 00 00 00 00 00 Duality Table results: 65 63 -> 00C8 63 6E -> 00D1 6E 65 -> 00D3 65 67 -> 00CC 67 6C -> 00D3 6C 75 -> 00E1 75 64 -> 00D9 64 6E -> 00D2 6E 69 -> 00D7 69 00 -> 0069 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 Name: thuscelackpiss HexStr: 74 68 75 73 63 65 6C 61 63 6B 70 69 73 73 00 Duality Table results: 74 68 -> 00DC 68 75 -> 00DD 75 73 -> 00E8 73 63 -> 00D6 63 65 -> 00C8 65 6C -> 00D1 6C 61 -> 00CD 61 63 -> 00C4 63 6B -> 00CE 6B 70 -> 00DB 70 69 -> 00D9 69 73 -> 00DC 73 73 -> 00E6 73 00 -> 0073 Name: hismajesty. HexStr: 6D 61 6A 65 73 74 79 2E 00 00 00 00 00 00 00 Duality Table results: 6D 61 -> 00CE 61 6A -> 00CB 6A 65 -> 00CF 65 73 -> 00D8 73 74 -> 00E7 74 79 -> 00ED 79 2E -> 00A7 2E 00 -> 002E 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 Results of comparison for: hismajesty. (vs) thuscelackpiss thuscelackpiss hismajesty. Results -------------- ----------- ------- 00DC 00CE -> 000E 00DD 00CB -> 0012 00E8 00CF -> 0019 00D6 00D8 -> 0002 00C8 00E7 -> 001F 00D1 00ED -> 001C 00CD 00A7 -> 0026 00C4 002E -> 0096 00CE 0000 -> 00CE 00DB 0000 -> 00DB 00D9 0000 -> 00D9 00DC 0000 -> 00DC 00E6 0000 -> 00E6 0073 0000 -> 0073 ------- Commonality Seed -> 05E9 Results of comparison for: indulgence (vs) ecnegludni indulgence ecnegludni Results ---------- ---------- ------- 00D7 00C8 -> 000F 00D2 00D1 -> 0001 00D9 00D3 -> 0006 00E1 00CC -> 0015 00D3 00D3 -> 0000 00CC 00E1 -> 0015 00D3 00D9 -> 0006 00D1 00D2 -> 0001 00C8 00D7 -> 000F 0065 0069 -> 0004 0000 0000 -> 0000 0000 0000 -> 0000 0000 0000 -> 0000 0000 0000 -> 0000 ------- Commonality Seed -> 005A [/code]	November 17, 2003, 8:09 PM
warz	interesting.	November 18, 2003, 1:52 PM
Brolly	Also, you could use a letter-commonality table, but that would take more time. EASTICL...WYZX	November 20, 2003, 2:27 AM
thetempest	very very nice =) indulgence, very nice	December 3, 2003, 7:38 PM

Author

Message

Time

Skywing

I'm interested in implementing some kind of automated pattern recognition to defeat certain types of spam attacks. For instance, it would be advantangeous if clients could recognize that large numbers of users with randomized alphanumeric names joining the channel within a short period of time is suspicious and most likely a spam attack.

Any suggestions on how to detect commonality or patterns between names in these kinds of situations?

November 3, 2003, 11:24 PM

Etheran

start off by looking for numbers within the name.

November 4, 2003, 1:05 AM

Eibro

I remember coming across some interesting C code a few weeks back. It was a password integrity routine, it would *almost always* return false for pure english-ish words. (for instance, Eibro is not a real word, but it probably wouldn't pass the test)
I'll be damned if I can find it now, though I remember it speaking of some sort of 'triple rule' for all words in the english language. I googled for about 5 minutes, but I can't seem to come up with it again.

This could be used, or modified to suit your purposes. If users joining/leaving in a short period of time pass this test, they're most likely randomized. I'm pretty sure the function output degrees of integrity, which would be much more useful than just true/false.

November 4, 2003, 3:08 AM

CupHead

Although it may be a little complex for this situation, a basic neural net would work well for this. It'd have to be "trained" to recognize name entropy, but with a sufficient number of examples, it wouldn't be a problem. Anyway, probably too complex a solution for what you need, but just a suggestion.

November 4, 2003, 2:25 PM

St0rm.iD

CBR (case-based reasoning) would be a great way to attack this I think.

There's a great article in JavaPro about it (search www.javapro.com for cbr). Essentially, it involves isolating X number of criteria for each result/search and placing them on an X dimensional coordinate system. Plot a few results (examples: mail from a friend, sex spam, viagra spam, nigerian spam, security announcement, etc) on the graph, and for each incoming mail, plot the mail on the graph and see which result it is closest to. Then, you can decide whether or not it is spam or not.

November 5, 2003, 1:14 AM

Tuberload

After today's example of a mass-loading bot, I think a start to automatically banning people off of patterns in there names would be to have some sort of learning mode that learns each time the channel is massed. For example, right now all the bots in Op [vL] are random numeric names 15 characters in length. Have the bot save those parameters and from now on ban anyone one with a username that matches the parameters. The bot could be programmed to recognize new parameters during these attacks, and ban accordingly.

November 7, 2003, 6:35 AM

St0rm.iD

That's definately a neural net.

November 8, 2003, 1:28 AM

Skywing

Keep in mind that the solution should not be so processor intensive that floodbot attacks will effectively DoS the client out of useablility.

November 8, 2003, 1:34 AM

Tuberload

[quote author=St0rm.iD link=board=23;threadid=3399;start=0#msg28012 date=1068254904]
That's definately a neural net.
[/quote]

Yes and no, I am thinking we could make it a little simpler.

November 8, 2003, 5:27 AM

CupHead

With the neural net, you just have to train it to an adequate point, after that, any code for adjusting weights can go. You just need the algorithm and the post-training weights stored in a file in order to do the actual check.

November 8, 2003, 5:43 AM

Tuberload

When a flood/spam attack happens, recognize the pattern used by the individual. This will more than likely match a pattern used by the particular flood bot. Then create a profile, and when any suspicious activity occurs that fits a profile take the necessary actions specific to your channels needs. If a flood/spam attack takes place that does not fit a specific profile you could just have your bot in a learning mode trying to figure it out, but not allowing itself to crash or flood. You could also consolidate profiles between members of your botnet to add to the learning process. While this may not prevent every occurrence, it will prevent a large percentage of people who use other peoples tools, and keep those persistent in flooding battle.net channels on their toes. Profiles could contain all characteristics of the bots used in the attack from their usernames, to their client type, to what they spam.

This is my idea to this problems solution, so feel free to build upon this or share your own. I think this is a problem many people could learn
from.

November 8, 2003, 9:53 AM

indulgence

Goal:
   To determine a viable means for determining commonality among strings.

Concept:
   Trim all numbers from a name. Then generate a duality table which creates a running pair sequence for the names as shown below (Should contain 14 WORDs which should be the running 2 byte sequence) These entries are then individually summed and then compared against each entry in other names and the difference stored. The total difference from the pairs is the commonality. The higher the number - the greater the uniqueness. A threshold can be set and you can opt to only do this check against unknown/unflagged users. This should eliminate nearly every bot attack. When you trim the numbers - you eliminate the ability of bots to create a higher commonality seed due to the differentials created because of differences in numbers. To get a better result you could lowercase or uppercase the string prior to this check as well.

Here are some usernames their 15 byte hex equivalent and the duality table generation results:
[code]
Name:   indulgence
HexStr:   69 6E 64 75 6C 67 65 6E 63 65 00 00 00 00 00

Duality Table   results:
69 6E         -> 00D7
6E 64         -> 00D2
64 75         -> 00D9
75 6C         -> 00E1
6C 67         -> 00D3
67 65         -> 00CC
65 6E         -> 00D3
6E 63         -> 00D1
63 65         -> 00C8
65 00         -> 0065
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000

Name:   ecnegludni
HexStr:   65 63 6E 65 67 6C 75 64 6E 69 00 00 00 00 00

Duality Table   results:
65 63         -> 00C8
63 6E         -> 00D1
6E 65         -> 00D3
65 67         -> 00CC
67 6C         -> 00D3
6C 75         -> 00E1
75 64         -> 00D9
64 6E         -> 00D2
6E 69         -> 00D7
69 00         -> 0069
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000

Name:   thuscelackpiss
HexStr:   74 68 75 73 63 65 6C 61 63 6B 70 69 73 73 00

Duality Table   results:
74 68         -> 00DC
68 75         -> 00DD
75 73         -> 00E8
73 63         -> 00D6
63 65         -> 00C8
65 6C         -> 00D1
6C 61         -> 00CD
61 63         -> 00C4
63 6B         -> 00CE
6B 70         -> 00DB
70 69         -> 00D9
69 73         -> 00DC
73 73         -> 00E6
73 00         -> 0073

Name:   hismajesty.
HexStr:   6D 61 6A 65 73 74 79 2E 00 00 00 00 00 00 00

Duality Table   results:
6D 61         -> 00CE
61 6A         -> 00CB
6A 65         -> 00CF
65 73         -> 00D8
73 74         -> 00E7
74 79         -> 00ED
79 2E         -> 00A7
2E 00         -> 002E
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000
00 00         -> 0000

Results of comparison for: hismajesty. (vs) thuscelackpiss

thuscelackpiss hismajesty.    Results
-------------- -----------    -------
00DC            00CE         -> 000E
00DD            00CB         -> 0012
00E8            00CF         -> 0019
00D6            00D8         -> 0002
00C8            00E7         -> 001F
00D1            00ED         -> 001C
00CD            00A7         -> 0026
00C4            002E         -> 0096
00CE            0000         -> 00CE
00DB            0000         -> 00DB
00D9            0000         -> 00D9
00DC            0000         -> 00DC
00E6            0000         -> 00E6
0073            0000         -> 0073
             -------
    Commonality Seed   -> 05E9

Results of comparison for: indulgence (vs) ecnegludni

indulgence   ecnegludni    Results
----------   ----------    -------
00D7            00C8         -> 000F
00D2            00D1         -> 0001
00D9            00D3         -> 0006
00E1            00CC         -> 0015
00D3            00D3         -> 0000
00CC            00E1         -> 0015
00D3            00D9         -> 0006
00D1            00D2         -> 0001
00C8            00D7         -> 000F
0065            0069         -> 0004
0000            0000         -> 0000
0000            0000         -> 0000
0000            0000         -> 0000
0000            0000         -> 0000
             -------
    Commonality Seed   -> 005A

[/code]

November 17, 2003, 8:09 PM

warz

interesting.

November 18, 2003, 1:52 PM

Brolly

Also, you could use a letter-commonality table, but that would take more time.
EASTICL...WYZX

November 20, 2003, 2:27 AM

thetempest

very very nice =) indulgence, very nice

December 3, 2003, 7:38 PM

Valhalla Legends Forums Archive | Advanced Programming | Pattern recognition