Author | Message | Time |
---|---|---|
Skywing | I'm interested in implementing some kind of automated pattern recognition to defeat certain types of spam attacks. For instance, it would be advantangeous if clients could recognize that large numbers of users with randomized alphanumeric names joining the channel within a short period of time is suspicious and most likely a spam attack. Any suggestions on how to detect commonality or patterns between names in these kinds of situations? | November 3, 2003, 11:24 PM |
Etheran | start off by looking for numbers within the name. | November 4, 2003, 1:05 AM |
Eibro | I remember coming across some interesting C code a few weeks back. It was a password integrity routine, it would *almost always* return false for pure english-ish words. (for instance, Eibro is not a real word, but it probably wouldn't pass the test) I'll be damned if I can find it now, though I remember it speaking of some sort of 'triple rule' for all words in the english language. I googled for about 5 minutes, but I can't seem to come up with it again. This could be used, or modified to suit your purposes. If users joining/leaving in a short period of time pass this test, they're most likely randomized. I'm pretty sure the function output degrees of integrity, which would be much more useful than just true/false. | November 4, 2003, 3:08 AM |
CupHead | Although it may be a little complex for this situation, a basic neural net would work well for this. It'd have to be "trained" to recognize name entropy, but with a sufficient number of examples, it wouldn't be a problem. Anyway, probably too complex a solution for what you need, but just a suggestion. | November 4, 2003, 2:25 PM |
St0rm.iD | CBR (case-based reasoning) would be a great way to attack this I think. There's a great article in JavaPro about it (search www.javapro.com for cbr). Essentially, it involves isolating X number of criteria for each result/search and placing them on an X dimensional coordinate system. Plot a few results (examples: mail from a friend, sex spam, viagra spam, nigerian spam, security announcement, etc) on the graph, and for each incoming mail, plot the mail on the graph and see which result it is closest to. Then, you can decide whether or not it is spam or not. | November 5, 2003, 1:14 AM |
Tuberload | After today's example of a mass-loading bot, I think a start to automatically banning people off of patterns in there names would be to have some sort of learning mode that learns each time the channel is massed. For example, right now all the bots in Op [vL] are random numeric names 15 characters in length. Have the bot save those parameters and from now on ban anyone one with a username that matches the parameters. The bot could be programmed to recognize new parameters during these attacks, and ban accordingly. | November 7, 2003, 6:35 AM |
St0rm.iD | That's definately a neural net. | November 8, 2003, 1:28 AM |
Skywing | Keep in mind that the solution should not be so processor intensive that floodbot attacks will effectively DoS the client out of useablility. | November 8, 2003, 1:34 AM |
Tuberload | [quote author=St0rm.iD link=board=23;threadid=3399;start=0#msg28012 date=1068254904] That's definately a neural net. [/quote] Yes and no, I am thinking we could make it a little simpler. | November 8, 2003, 5:27 AM |
CupHead | With the neural net, you just have to train it to an adequate point, after that, any code for adjusting weights can go. You just need the algorithm and the post-training weights stored in a file in order to do the actual check. | November 8, 2003, 5:43 AM |
Tuberload | When a flood/spam attack happens, recognize the pattern used by the individual. This will more than likely match a pattern used by the particular flood bot. Then create a profile, and when any suspicious activity occurs that fits a profile take the necessary actions specific to your channels needs. If a flood/spam attack takes place that does not fit a specific profile you could just have your bot in a learning mode trying to figure it out, but not allowing itself to crash or flood. You could also consolidate profiles between members of your botnet to add to the learning process. While this may not prevent every occurrence, it will prevent a large percentage of people who use other peoples tools, and keep those persistent in flooding battle.net channels on their toes. Profiles could contain all characteristics of the bots used in the attack from their usernames, to their client type, to what they spam. This is my idea to this problems solution, so feel free to build upon this or share your own. I think this is a problem many people could learn from. | November 8, 2003, 9:53 AM |
indulgence | Goal: To determine a viable means for determining commonality among strings. Concept: Trim all numbers from a name. Then generate a duality table which creates a running pair sequence for the names as shown below (Should contain 14 WORDs which should be the running 2 byte sequence) These entries are then individually summed and then compared against each entry in other names and the difference stored. The total difference from the pairs is the commonality. The higher the number - the greater the uniqueness. A threshold can be set and you can opt to only do this check against unknown/unflagged users. This should eliminate nearly every bot attack. When you trim the numbers - you eliminate the ability of bots to create a higher commonality seed due to the differentials created because of differences in numbers. To get a better result you could lowercase or uppercase the string prior to this check as well. Here are some usernames their 15 byte hex equivalent and the duality table generation results: [code] Name: indulgence HexStr: 69 6E 64 75 6C 67 65 6E 63 65 00 00 00 00 00 Duality Table results: 69 6E -> 00D7 6E 64 -> 00D2 64 75 -> 00D9 75 6C -> 00E1 6C 67 -> 00D3 67 65 -> 00CC 65 6E -> 00D3 6E 63 -> 00D1 63 65 -> 00C8 65 00 -> 0065 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 Name: ecnegludni HexStr: 65 63 6E 65 67 6C 75 64 6E 69 00 00 00 00 00 Duality Table results: 65 63 -> 00C8 63 6E -> 00D1 6E 65 -> 00D3 65 67 -> 00CC 67 6C -> 00D3 6C 75 -> 00E1 75 64 -> 00D9 64 6E -> 00D2 6E 69 -> 00D7 69 00 -> 0069 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 Name: thuscelackpiss HexStr: 74 68 75 73 63 65 6C 61 63 6B 70 69 73 73 00 Duality Table results: 74 68 -> 00DC 68 75 -> 00DD 75 73 -> 00E8 73 63 -> 00D6 63 65 -> 00C8 65 6C -> 00D1 6C 61 -> 00CD 61 63 -> 00C4 63 6B -> 00CE 6B 70 -> 00DB 70 69 -> 00D9 69 73 -> 00DC 73 73 -> 00E6 73 00 -> 0073 Name: hismajesty. HexStr: 6D 61 6A 65 73 74 79 2E 00 00 00 00 00 00 00 Duality Table results: 6D 61 -> 00CE 61 6A -> 00CB 6A 65 -> 00CF 65 73 -> 00D8 73 74 -> 00E7 74 79 -> 00ED 79 2E -> 00A7 2E 00 -> 002E 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 00 00 -> 0000 Results of comparison for: hismajesty. (vs) thuscelackpiss thuscelackpiss hismajesty. Results -------------- ----------- ------- 00DC 00CE -> 000E 00DD 00CB -> 0012 00E8 00CF -> 0019 00D6 00D8 -> 0002 00C8 00E7 -> 001F 00D1 00ED -> 001C 00CD 00A7 -> 0026 00C4 002E -> 0096 00CE 0000 -> 00CE 00DB 0000 -> 00DB 00D9 0000 -> 00D9 00DC 0000 -> 00DC 00E6 0000 -> 00E6 0073 0000 -> 0073 ------- Commonality Seed -> 05E9 Results of comparison for: indulgence (vs) ecnegludni indulgence ecnegludni Results ---------- ---------- ------- 00D7 00C8 -> 000F 00D2 00D1 -> 0001 00D9 00D3 -> 0006 00E1 00CC -> 0015 00D3 00D3 -> 0000 00CC 00E1 -> 0015 00D3 00D9 -> 0006 00D1 00D2 -> 0001 00C8 00D7 -> 000F 0065 0069 -> 0004 0000 0000 -> 0000 0000 0000 -> 0000 0000 0000 -> 0000 0000 0000 -> 0000 ------- Commonality Seed -> 005A [/code] | November 17, 2003, 8:09 PM |
warz | interesting. | November 18, 2003, 1:52 PM |
Brolly | Also, you could use a letter-commonality table, but that would take more time. EASTICL...WYZX | November 20, 2003, 2:27 AM |
thetempest | very very nice =) indulgence, very nice | December 3, 2003, 7:38 PM |