General Programming | Locating An Ip within a HTML Source

Author	Message	Time
Spilled[DW]	Ok, what I'm looking for here is some opinions on how I would find a Ip with the html source of a site. Somewhat like a Proxy Leecher works. What do you think the most efficent way would be? I'm sure I could do it using the InStr() method but I'm not sure if there is a better way to achieve this goal.	August 10, 2007, 8:35 AM
Barabajagal	I'd just search for a section of data with four decimals, anywhere from four to twelve numeric characters, and nothing else in the correct pattern. Depending on the language you use, there may be operators or functions you can use to make your job easier... for example in VB, you can use the Like operator, and compare with an IP format as well as checking for numeric-only values like so: [code] If strCheck Like "..." Then 'Fits IP style If IsNumeric(Replace$(strCheck, ".", "")) 'Numbers and Decimals only End If End If[/code]	August 10, 2007, 8:59 AM
iago	This should grab the line with the IP, a sed can probably be used to get rid of everything else: curl http://www.site.com \| grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+" Or, if you prefer: lynx -source http://www.site.com \| grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+"	August 10, 2007, 1:28 PM
warz	i'm thinking you're wanting a website to store the ip addresses of visitors? in that case, i'm not sure if it can be done strictly with html. with php, though, it's simple. the ip address of the visitor is stored in the global variable $REMOTE_ADDR, and can be used like this... [code] $domain = GetHostByName($REMOTE_ADDR); [/code] and then store that, if you want.	August 10, 2007, 4:52 PM
Myndfyr	[quote author=iago link=topic=16935.msg171484#msg171484 date=1186752515] This should grab the line with the IP, a sed can probably be used to get rid of everything else: curl http://www.site.com \| grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+" Or, if you prefer: lynx -source http://www.site.com \| grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+" [/quote] A stricter version of this regex might be: [code] "(?:\d{1,3}\.){3}\d{1,3}" [/code] Note that \. should be escaped because "." matches any non-newline character, and that in C-based languages, you should double-up the backslashes. The most strict version of this I can think of is: [code] (?:(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?) [/code] Note that these are broken into non-capturing groups.	August 10, 2007, 5:24 PM
iago	[quote author=MyndFyre[vL] link=topic=16935.msg171488#msg171488 date=1186766662] [quote author=iago link=topic=16935.msg171484#msg171484 date=1186752515] This should grab the line with the IP, a sed can probably be used to get rid of everything else: curl http://www.site.com \| grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+" Or, if you prefer: lynx -source http://www.site.com \| grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+" [/quote] A stricter version of this regex might be: [code] "(?:\d{1,3}\.){3}\d{1,3}" [/code] Note that \. should be escaped because "." matches any non-newline character, and that in C-based languages, you should double-up the backslashes. The most strict version of this I can think of is: [code] (?:(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]\|2[0-4][0-9]\|[01]?[0-9][0-9]?) [/code] Note that these are broken into non-capturing groups. [/quote] From a quick view, I don't think those regexes are compatible with Perl/sed's syntax, although I could be wrong. For fun, here's a quick function I wrote awhile back to identify valid IPs. If he wants to use Perl for this project, it might come in handy: [code]sub ValidateIp { my $ip = shift; if(!($ip =~ m/^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$/)) { &Log("IP verification failed on ip: $ip"); &CgiDie("IPs must be in the form of a.b.c.d"); } if($1 > 255 \|\| $2 > 255 \|\| $3 > 255 \|\| $4 > 255) { &Log("IP verification failed on ip: $ip"); &CgiDie("All octets in an ip must be in the range of 0..255"); } } [/code]	August 11, 2007, 5:20 PM

Author

Message

Time

Spilled[DW]

Ok, what I'm looking for here is some opinions on how I would find a Ip with the html source of a site. Somewhat like a Proxy Leecher works. What do you think the most efficent way would be? I'm sure I could do it using the InStr() method but I'm not sure if there is a better way to achieve this goal.

August 10, 2007, 8:35 AM

Barabajagal

I'd just search for a section of data with four decimals, anywhere from four to twelve numeric characters, and nothing else in the correct pattern. Depending on the language you use, there may be operators or functions you can use to make your job easier... for example in VB, you can use the Like operator, and compare with an IP format as well as checking for numeric-only values like so:
[code]
If strCheck Like "*.*.*.*" Then
'Fits IP style
If IsNumeric(Replace$(strCheck, ".", ""))
'Numbers and Decimals only
End If
End If[/code]

August 10, 2007, 8:59 AM

iago

This should grab the line with the IP, a sed can probably be used to get rid of everything else:

curl http://www.site.com | grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+"

Or, if you prefer:

lynx -source http://www.site.com | grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+"

August 10, 2007, 1:28 PM

warz

i'm thinking you're wanting a website to store the ip addresses of visitors? in that case, i'm not sure if it can be done strictly with html. with php, though, it's simple. the ip address of the visitor is stored in the global variable $REMOTE_ADDR, and can be used like this...

[code]
$domain = GetHostByName($REMOTE_ADDR);
[/code]

and then store that, if you want.

August 10, 2007, 4:52 PM

Myndfyr

[quote author=iago link=topic=16935.msg171484#msg171484 date=1186752515]
This should grab the line with the IP, a sed can probably be used to get rid of everything else:

curl http://www.site.com | grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+"

Or, if you prefer:

lynx -source http://www.site.com | grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+"

[/quote]
A stricter version of this regex might be:
[code]
"(?:\d{1,3}\.){3}\d{1,3}"
[/code]
Note that \. should be escaped because "." matches any non-newline character, and that in C-based languages, you should double-up the backslashes.

The most strict version of this I can think of is:
[code]
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
[/code]

Note that these are broken into non-capturing groups.

August 10, 2007, 5:24 PM

iago

[quote author=MyndFyre[vL] link=topic=16935.msg171488#msg171488 date=1186766662]
[quote author=iago link=topic=16935.msg171484#msg171484 date=1186752515]
This should grab the line with the IP, a sed can probably be used to get rid of everything else:

curl http://www.site.com | grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+"

Or, if you prefer:

lynx -source http://www.site.com | grep "[0-9]+.[0-9]+.[0-9]+.[0-9]+"

[/quote]
A stricter version of this regex might be:
[code]
"(?:\d{1,3}\.){3}\d{1,3}"
[/code]
Note that \. should be escaped because "." matches any non-newline character, and that in C-based languages, you should double-up the backslashes.

The most strict version of this I can think of is:
[code]
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
[/code]

Note that these are broken into non-capturing groups.
[/quote]

From a quick view, I don't think those regexes are compatible with Perl/sed's syntax, although I could be wrong.

For fun, here's a quick function I wrote awhile back to identify valid IPs. If he wants to use Perl for this project, it might come in handy:

[code]sub ValidateIp
{
my $ip = shift;

if(!($ip =~ m/^([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)$/))
{
&Log("IP verification failed on ip: $ip");
&CgiDie("IPs must be in the form of a.b.c.d");
}

if($1 > 255 || $2 > 255 || $3 > 255 || $4 > 255)
{
&Log("IP verification failed on ip: $ip");
&CgiDie("All octets in an ip must be in the range of 0..255");
}
}
[/code]

August 11, 2007, 5:20 PM

Valhalla Legends Forums Archive | General Programming | Locating An Ip within a HTML Source