Author | Message | Time |
---|---|---|
Barabajagal | I've been working on a php script that logs the amount of users that use my bot. The user's IP address (MD5 Hashed) and exact time of login are stored in the IP and Login fields (respectively). I then display the output in nice graphs, organized in 6 of them: Daily Usage (Count of users that logged in for each hour of the day), Monthly Usage (Count of users that logged in for each day this month), Yearly Usage (Count of users that logged in for each month this year), and the same 3 but counting only unique IP addresses. This can all be seen at http://rcb.realityripple.com/stats/ However, for the yearly usage, which contains quite a bit of data to sort through, the loop takes so long that the server returns Error 500 (Internal Server Error). I need to find a way to request only unique IP addresses from the database, so I don't have to sort through all the results to get unique ones. I've heard of an operator called Distinct, but I don't understand how to use it correctly... Here's my current code: Graph.php: [php]<? /* Usage Graphing System by RealityRipple Software (c) 2007 -graph.php- input: u [y/n] y displays only unique IPs n displays all IPs a [d/m/y] d displays all the hours of today m displays all the days of this month y displays all the months of this year todo: totals (all years) improve graphs a bit perhaps allow for color scheme inputs? */ if (isset($_REQUEST["u"])) { $uni=$_REQUEST["u"]; } else { $uni='n'; } if (isset($_REQUEST["a"])) { $age=$_REQUEST["a"]; } else { $age='d'; } include('getstats.php'); $user = "XXXX"; $password = "****"; $database = "dbXXXX"; $host = "localhost"; mysql_connect($host,$user,$password); @mysql_select_db($database) or die(""); $query = "SELECT * FROM `RCBUsers`"; if ($age=='d') { $startTime = date("Y-m-d 0:00:00"); $endTime = date("Y-m-d H:59:59"); } else if ($age=='m') { $startTime = date("Y-m-1 00:00:00"); $endTime = date("Y-m-d 23:59:59"); } else if ($age=='y') { $startTime = date("Y-1-1 00:00:00"); $endTime = date("Y-m-t 23:59:59"); } $query.=" WHERE `Login` > '$startTime' AND `Login` < '$endTime' ORDER BY `Login` ASC"; $listing = mysql_query($query); mysql_close(); if ($age=='d') { for ($i = 0 ; $i < date("H") + 1; $i++) { $startTime = date("Y-m-d ".$i.":00:00"); $endTime = date("Y-m-d ".$i.":59:59"); $times[$i] = getStats($uni, $listing, $startTime, $endTime); } $gTitle1='Daily Usage'; $gTitle2='Time of Day'; $xLabel ='time'; } else if ($age=='m') { for ($i = 1 ; $i < date("j") + 1; $i++) { $startTime = date("Y-m-".$i." 00:00:00"); $endTime = date("Y-m-".$i." 23:59:59"); $times[$i] = getStats($uni, $listing, $startTime, $endTime); } $gTitle1='Monthly Usage'; $gTitle2='Date'; $xLabel ='day'; } else if ($age=='y') { for ($i = 1 ; $i < date("n") + 1; $i++) { $startTime = date("Y-".$i."-1 00:00:00"); $endTime = date("Y-".$i."-t 23:59:59"); $times[$i] = getStats($uni, $listing, $startTime, $endTime); } $gTitle1='Yearly Usage'; $gTitle2='Month'; $xLabel ='month'; } include('postgraph.php'); $graph = new PostGraph(400,300); $graph->setGraphTitles($gTitle1, $gTitle2, 'Logins'); $graph->setYNumberFormat('integer'); $graph->setYTicks(0); $graph->setYValueMode(1); $graph->setXLabels($xLabel); $graph->setData($times); $graph->setWhiteColor(array(0,0,0)); $graph->setBackgroundColor(array(0,0,0)); $graph->setTextColor(array(255,0,0)); $graph->setLinesColor(array(192,0,0)); $graph->setBarsColor(array(128,0,0)); $graph->setAboveBarColor(array(255,0,0)); $graph->setInsideBarColor(array(255,128,0)); $graph->drawImage(); $graph->printImage(); ?>[/php] GetStats.php: [php]<? /* Usage Graphing System by RealityRipple Software (c) 2007 -getstats.php- input: u unique [y/n] y displays only unique IPs n displays all IPs listing [SQL] sql list of usage startTime [date] starting time of listing endTime [date] ending time of listing todo: see if startTime and endTime can be removed? */ function getStats($u, $listing, $startTime, $endTime) { $num = mysql_numrows($listing); $IPcount = 0; $IPs = ''; for ($i = 0; $i < intval($num); $i++) { $tmpIP = strtoupper(mysql_result($listing,$i,'IP')); $tmpTime = mysql_result($listing,$i,'Login'); $bAdd = 'n'; $tStart = strtotime($startTime, 0); $tTemp = strtotime($tmpTime, 0); $tEnd = strtotime($endTime, 0); if ($tStart < $tTemp && $tTemp < $tEnd) { if ($u == "n") { $bAdd = 'y'; } else { $bThere = 'n'; $lIP = explode(",", $IPs); for ($j = 0; $j < Count($lIP)-1; $j++) { if ($tmpIP == $lIP[$j]) { $bThere = 'y'; } else { $bAdd = 'n'; } } if ($bThere == 'n') { $bAdd = 'y'; } else { $bAdd = 'n'; } } } else { $bAdd = 'n'; } if ($bAdd == 'y') { $IPs.="$tmpIP,"; $IPcount++; } } return $IPcount; } ?>[/php] Any other editing I can do to make this more efficient would be greatly appreciated. PHP and DataBases are still not all that familiar to me. | October 11, 2007, 7:06 PM |
St0rm.iD | 1. In a complex application, your utility files may be include()'d more than once in multiple files, if there are lots of includes. If you're just using it to import functions (which you are doing), as opposed to including to render input, use the require_once() function. This ensures that the file is parsed and imported once and only once. 2. I tend to do <?php ?> instead of <? ?> because "short" delimiters are sometimes not enabled. 3. Use single-quotes for constant strings so PHP doesn't waste time searching them for variable interpolation. Use double quotes when you want to enable it. I also see some places where you are using string concatenation (where you are coming up with the dates) where you can just use variable interpolation. 4. Everyone says you're supposed to use the PEAR DB API instead of the mysql_* functions, but I don't. Then again, I try to stay away from PHP. Consult the PHP documentation. 5. Move database information into a separate config file 6. Move ANYTHING that calls the mysql_* functions into a separate include file that abstracts away the database operations you are performing. This way you can theoretically drop in any other database later. This means you shouldn't pass resultsets around (like you do to getStats()). This isn't important now, but will be when you scale up the size of your project. I don't think it's necessary in this instance 7. Name your include files with a .inc.php 8. When scaling your website up, I like to use the Fusebox 3 framework, but that's just me. EDIT: just read the beginning of your post, thought you were just looking for comments. Will answer soon. | October 11, 2007, 11:40 PM |
St0rm.iD | The query you want begins with: [code] $query = "SELECT DISTINCT ipaddress,[your other fields here] FROM `RCBUsers`"; [/code] | October 11, 2007, 11:43 PM |
iago | You could also do SELECT * FROM blah GROUP BY ipaddress And incidentally, I don't see what the point of md5'ing the IP is -- the domain of possible IP addresses is small enough to bruteforce in fairly reasonable time. | October 12, 2007, 2:36 PM |
Barabajagal | Not when they're salted ;D And thanks for the info, I'll see if I can get this all to work some time... | October 12, 2007, 5:48 PM |
Barabajagal | [code]SELECT DISTINCT IP, Login FROM `RCBUsers` WHERE `Login` > '2007-10-13 0:00:00' AND `Login` < '2007-10-13 16:59:59' ORDER BY `Login` ASC[/code] Didn't work. still returns repeat values of "IP". How do I use groups in MySQL? | October 13, 2007, 11:20 PM |
rabbit | http://dev.mysql.com/doc/refman/5.1/en/group-by-modifiers.html | October 14, 2007, 3:01 AM |
Barabajagal | I realized I've been doing this horribly inefficiently, and with Blake's help (he's making one for JBLS too, since mine was such a good idea ;) ) we're making a much more efficient system. | October 14, 2007, 4:13 AM |
iago | [quote author=Andy link=topic=17098.msg173790#msg173790 date=1192211314] Not when they're salted ;D And thanks for the info, I'll see if I can get this all to work some time... [/quote] Salting it shouldn't make any difference when brute forcing. Salting is only designed to prevent pre-computation attacks. | October 14, 2007, 2:52 PM |
Barabajagal | How would they brute force a salted IP address? Instead of being 15 characters (XXX.XXX.XXX.XXX), it becomes 30 characters (SSSS.XXX.XXX.SSSS.XXX.XXX.SSSS) [note that that's not the actual way I salt the ips, just a demonstration]. | October 14, 2007, 8:12 PM |
iago | Because, by definition, a salt is a known value. | October 14, 2007, 11:58 PM |
Barabajagal | Known to whom? | October 15, 2007, 12:50 AM |
St0rm.iD | [quote author=Andy link=topic=17098.msg173848#msg173848 date=1192409458] Known to whom? [/quote] If they (bad guys) have access to your hashes, they likely have access to your salts, too. | October 15, 2007, 1:27 AM |
iago | Exactly. Salts aren't supposed to be a secret. My main point is, why bother hashing the IPs? That's a silly idea. | October 15, 2007, 1:31 AM |
Barabajagal | User privacy. And no, it's not that hard to get in to a MySQL db. It is a bit harder to get php code. | October 15, 2007, 1:37 AM |
iago | I suppose using security by obscurity works a bit, but that's not what salt is designed for -- salting is designed for an open system, where the source can be examined. I guess I'm so accustomed to opensource stuff. One potential attack against that that immediately comes to mind is for a user to find his own record and bruteforce the salt. Since the IP is known, it might not be as difficult. Just a thought. | October 15, 2007, 1:44 AM |
St0rm.iD | You're supposed to store a unique salt along with each item in the database, IIRC. I might just be really tired, but I think that there's no point in salting a field if the salt is the same for each item (can't you still do precomputing attacks?) You should assume that the attacker can get your source, your database, and your encryption keys. And if you care about user privacy, just do a regular hash of them. If you salt them the way you are supposed to like I talked about earlier, you lose the one-to-one mapping of IP address to IP address hash. | October 15, 2007, 1:55 AM |
HdxBmx27 | It's mainly for the computer illiterate who know what an IP is, but know nothing about security. (There are a lot of people) Just tell them that its hashed/encrypted. And they feel a lot better. ~Hdx | October 15, 2007, 5:37 AM |
iago | [quote author=Banana fanna fo fanna link=topic=17098.msg173854#msg173854 date=1192413323] You're supposed to store a unique salt along with each item in the database, IIRC. I might just be really tired, but I think that there's no point in salting a field if the salt is the same for each item (can't you still do precomputing attacks?) You should assume that the attacker can get your source, your database, and your encryption keys. And if you care about user privacy, just do a regular hash of them. If you salt them the way you are supposed to like I talked about earlier, you lose the one-to-one mapping of IP address to IP address hash. [/quote] You can still do precomputation attacks, but it would have to be so specific (against his particular implementation) that it would never happen. But you're right, having a constant salt can cause other problems, and I wouldn't be surprised if it weakened the hashing. | October 15, 2007, 2:17 PM |
Camel | You should modify your code so that users who log in between 11:59 PM and 12:00AM are counted. Also, you should do more research in to whatever database you're using; you could save yourself a lot of time by having the DB do all of the processing. | October 16, 2007, 3:05 AM |
Barabajagal | Like I said, I already rewrote everything. The code in the first post is now obsolete. Regarding the salt: Here's what I actually did so you guys will shut up. I hashed the IP address via MD5. I then split it in half [S1 and S2] and capitalized S1 (S2 lowercase). I then put S2, the IP again, and S1 into a second MD5 hash. Good enough? | October 16, 2007, 7:41 PM |
iago | [quote author=Andy link=topic=17098.msg173904#msg173904 date=1192563695] Regarding the salt: Here's what I actually did so you guys will shut up. I hashed the IP address via MD5. I then split it in half [S1 and S2] and capitalized S1 (S2 lowercase). I then put S2, the IP again, and S1 into a second MD5 hash. Good enough? [/quote] No, it's pointless for all the reasons stated above. And correct me if I'm wrong, but that means you don't even have a salt, the MD5 is based 100% on the IP address, which means the domain of possible hashes is still 4.2 billion known values, which can be calculated in a reasonable time. | October 16, 2007, 7:45 PM |
St0rm.iD | [quote author=Andy link=topic=17098.msg173904#msg173904 date=1192563695] Like I said, I already rewrote everything. The code in the first post is now obsolete. Regarding the salt: Here's what I actually did so you guys will shut up. I hashed the IP address via MD5. I then split it in half [S1 and S2] and capitalized S1 (S2 lowercase). I then put S2, the IP again, and S1 into a second MD5 hash. Good enough? [/quote] No. Just hash it once using SHA-2. | October 16, 2007, 7:49 PM |
Barabajagal | Except that the attacker wouldn't know the way they were hashed, so if you try to calculate all the MD5 values for possible IPs, none of them would match up, and adding a static salt to them wouldn't help either. | October 16, 2007, 8:19 PM |
iago | Unless, of course, the attacker reads this forum? In any case, security by obscurity isn't generally considered a valid method of security. You have to assume that the attacker has access to your sourcecode, then try to security it based on that assumption. | October 16, 2007, 9:48 PM |
Barabajagal | Even if they read it, it won't help. It's not some program I'm releasing or selling for money. It's an unsecure connection to a php page that displays your IP and happens to store it as well for statistics use. | October 17, 2007, 2:20 AM |
St0rm.iD | "Go big or go home" - Poet Laureat Johnny Tsunami | October 17, 2007, 5:37 PM |