Valhalla Legends Forums Archive | Battle.net Bot Development | Regex for Capturing user details

AuthorMessageTime
Myndfyr
This is written for .NET Regex syntax; I'm not sure what incompatibilities exist between it and other languages, so please bear with me.  It seems to work fine based on my testing with RegexBuddy, but I'm welcome to other test cases.

[code]\A(?<charName>[^*@\s]*?)?\*?(?<accountName>[^*@\s]+)@?(?<gateway>\w+)?\z[/code]

The idea:
Diablo II character name can't include *, @, or whitespace.  Take that as little as possible before a star (if I'm connecting with D2, I always get a star prefixed to the name).  Then, the account name has the same rules (no *, @, or whitespace), and must be at least one character long.  Take that up until the optional @ and optional gateway, which must be at least one word character in length, to the end of the string.

It seems to work fine.  I could see it breaking if * or @ was allowed in usernames, but it doesn't seem to be.

Any thoughts?

[Edit]I added the \A anchor to require the beginning of the string as well.
July 19, 2009, 9:12 PM
Myndfyr
I've modified this slightly so that it wouldn't match the @ without a gateway being captured:

[code]\A(?<charName>[^*@\s]*?)?\*?(?<accountName>[^*@\s]+)(?:@(?<gateway>\w+))?\z[/code]

(Previously, "MyndFyre@" would match with MyndFyre going into the <accountName> group, but won't anymore).
July 20, 2009, 12:10 AM
Camel
There are some illys out there that have @ in the account name. Do you handle that case? Or, at least, not crash?

[edit] BNetUserTest.java
July 22, 2009, 7:02 PM
Myndfyr
[quote author=Camel link=topic=18012.msg183116#msg183116 date=1248289341]
There are some illys out there that have @ in the account name. Do you handle that case? Or, at least, not crash?

[edit] BNetUserTest.java
[/quote]

Thank you!  I was not aware that @ could be in the account name. 

That could make for some very difficult identification.
July 22, 2009, 7:57 PM
Sixen
Yeah, I was going to say exactly that... Great example, W@R@USEast.
July 23, 2009, 7:53 AM
xpeh
[quote author=MyndFyre link=topic=18012.msg183095#msg183095 date=1248037940]
This is written for .NET Regex syntax;
[/quote]
.NET has its own regex??

FFFFFFFFFFFFFFFFFFFFFFFFUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
July 23, 2009, 8:10 PM
Myndfyr
[quote author=xpeh link=topic=18012.msg183121#msg183121 date=1248379818]
[quote author=MyndFyre link=topic=18012.msg183095#msg183095 date=1248037940]
This is written for .NET Regex syntax;
[/quote]
.NET has its own regex??

FFFFFFFFFFFFFFFFFFFFFFFFUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[/quote]
It turns out that there are a lot of variations among a number of different regex flavors and I was simply saying that I don't know if there are any distinctions that might need to be made between the one I posted and something else that isn't, say, as feature-rich.  For instance, according to that chart, Java, Python, and Ruby don't support (?n) for explicit capture, and .NET doesn't support possessive quantifiers.

So, time to get off your high horse.
July 23, 2009, 9:31 PM
Myndfyr
OK, thanks to Camel's comments, I think this will assist in accuracy:

[code]
\A(?:(?<charName>[^*#@\s]+)\*)?(?<accountName>[^#*\s]+?)(?:#(?<instance>\d{1,9}))?(?:@(?<gateway>\w+))?(?:#(?<instance>\d{1,9}))?\z
[/code]

Alternatively, I can use:
[code]\A(?:(?<charName>[^*#@\s]+)\*)?(?<accountName>[^#*\s]+?)(?:#(?<instance>\d{1,9}))?(?:@(?<gateway>USEast|USWest|Asia|Europe|Azeroth|Lordaeron|Kalimdor|Northrend))?(?:#(?<instance>\d{1,9}))?\z[/code]

I've refactored it to use non-capturing groups to make sure that the character name match always consumes the *, the gateway match always consumes the last @, and the instance matching always consumes the hash.  The only possible non-standard support shown here is the instance, which can appear before or after the namespace, which I think is non-standard (testuser@Azeroth#2 would indicate that the user's account name is "testuser@Azeroth", but I doubt that such a user exists).

The second one correctly matches "W@R" as accountName, but the first mistakenly thinks that R is the namespace.  Both correctly match "($@$@$@)". 

I think the best solution would be to use the second for real servers where the namespaces are well-defined and to drop namespace support altogether for non-legit servers.
July 29, 2009, 3:25 PM
Sixen
[quote author=MyndFyre link=topic=18012.msg183138#msg183138 date=1248881121]
The only possible non-standard support shown here is the instance, which can appear before or after the namespace, which I think is non-standard (testuser@Azeroth#2 would indicate that the user's account name is "testuser@Azeroth", but I doubt that such a user exists).
[/quote]

Maybe i'm reading this incorrectly, but it's possible to see that user. If testuser@Azeroth and testuser@USEast are in the same channel, they will see eachother's namespaces. Therefore, if testuser@Azeroth#2 enters, the SC user will see "testuser@Azeroth#2", just as if testuser@USEast#2 would enter, the War3 user would see "testuser@USEast#2".
July 29, 2009, 5:35 PM
Myndfyr
@Sixen: The question to which you're responding might be different than the one I'm trying to address.

Suppose we have the account named "testuser" that exists on both USEast and Azeroth in those namespaces.  In that instance, "testuser" would appear as "testuser@USEast" to the person using Warcraft III, and as "testuser" to himself.  Conversely, the Warcraft user would appear as "testuser@Azeroth" to the person using SC/D2, but "testuser" to himself.

HOWEVER, if a user created an illy named "testuser@Azeroth" using Starcraft, then that user would appear as "testuser@Azeroth@USEast" to someone using Warcraft 3.  In order to see the hash after the @, though, that account must be logged on multiple times so that the hash can be generated by the server ("testuser@Azeroth#2" or "testuser@Azeroth#2@USEast").

The reason that the second regex above is more accurate is that, it's VERY unlikely that anyone has an illy with @Azeroth, @USEast, etc. in circulation (I would wager that, if such an account did exist, Blizzard would have killed it by now).  But why would anyone have guessed to create such an account?
July 29, 2009, 7:24 PM
BreW
Wouldn't it be a lot more professional, cleaner, and efficient to write a function for this instead of using a regex?
July 29, 2009, 11:52 PM
Myndfyr
[quote author=brew link=topic=18012.msg183141#msg183141 date=1248911530]
Wouldn't it be a lot more professional, cleaner, and efficient to write a function for this instead of using a regex?
[/quote]
I don't know about "more professional" (which seems fairly subjective) and cleaner or efficient, and I'm not saying that this is going to work for me right.  It's a text parsing problem.  Regex is a text parsing solution.  *shrug*
July 30, 2009, 4:16 AM
Sixen
[quote author=MyndFyre link=topic=18012.msg183140#msg183140 date=1248895495]
@Sixen: The question to which you're responding might be different than the one I'm trying to address.

Suppose we have the account named "testuser" that exists on both USEast and Azeroth in those namespaces.  In that instance, "testuser" would appear as "testuser@USEast" to the person using Warcraft III, and as "testuser" to himself.  Conversely, the Warcraft user would appear as "testuser@Azeroth" to the person using SC/D2, but "testuser" to himself.

HOWEVER, if a user created an illy named "testuser@Azeroth" using Starcraft, then that user would appear as "testuser@Azeroth@USEast" to someone using Warcraft 3.  In order to see the hash after the @, though, that account must be logged on multiple times so that the hash can be generated by the server ("testuser@Azeroth#2" or "testuser@Azeroth#2@USEast").

The reason that the second regex above is more accurate is that, it's VERY unlikely that anyone has an illy with @Azeroth, @USEast, etc. in circulation (I would wager that, if such an account did exist, Blizzard would have killed it by now).  But why would anyone have guessed to create such an account?
[/quote]

Ooooh, I understand. Misunderstanding, we were talking about two different things.
July 30, 2009, 8:13 AM
xpeh
brew, you mean to implement your own regex?

It's easier only for noobs who can't handle regex. They are worth it, inspite of that they are write-only code. Main reason not to use it is speed - if you have regex in your main cycle, replacing it can make a real big boost depending on regex.
July 31, 2009, 2:30 AM
Camel
I don't really have time to read the thread in detail right now, but it looks like you're missing at least one gateway: @Blizzard (check out #Blizzard Tech Support on USWest), and are assuming that 'instance' can only be one digit, which is definitely false. A good test case might be W@R@Blizzard#101, which should break on both fronts with your second regex.

Not sure if this helps, but in my bot I force the user to pick one of the *.battle.net named servers, and then use that information to infer the logged in user's gateway, and subsequently validate other users' names.
August 3, 2009, 8:19 PM
Myndfyr
[quote author=Camel link=topic=18012.msg183164#msg183164 date=1249330783]
I don't really have time to read the thread in detail right now, but it looks like you're missing at least one gateway: @Blizzard (check out #Blizzard Tech Support on USWest), and are assuming that 'instance' can only be one digit, which is definitely false. A good test case might be W@R@Blizzard#101, which should break on both fronts with your second regex.

Not sure if this helps, but in my bot I force the user to pick one of the *.battle.net named servers, and then use that information to infer the logged in user's gateway, and subsequently validate other users' names.
[/quote]
With your example, the first regex in the updated post above correctly matches W@R@Blizzard#101.  I used brackets to indicate repetition in the instance: \d{1,9} means anywhere from 1 to 9 instances of digit.  The second regex would match W@R@Blizzard as the account name.

But yes, it would probably be better for me to explicitly capture @Blizzard along with the other ones.
August 3, 2009, 9:24 PM

Search