Valhalla Legends Forums Archive | C/C++ Programming | Recursively Getting Subdirectories

AuthorMessageTime
FrostWraith
Edit: Trying to keep it in C.

To keep it short, I would like to fill an array, dirs[], with every subdirectory of every subdirectory of any directory I give it(what a sentence!).
As an example, the dirs[] array might look something like this:
[code]dirs[0] = "C:\Music\"
dirs[1] = "C:\Music\Marley\"
dirs[2] = "C:\Music\Zeppelin\"
dirs[3] = "C:\Music\Marley\Burnin'\"
dirs]4] = "C:\Music\Zeppelin\Coda\"
and so on....[/code]
I plan to make a program that will scan all of these directories and move a specific file type to a new location.  I am doing this on windows and am currently making some headways, but my code isn't exactly working as intended.  To be honest I do not have the slightest clue of the best method.  If you could please analyze my code or suggest a newer or easier method I would be appreciative.

Here is what I am currently working with:
[code]#define _CRT_SECURE_NO_DEPRECATE 1

#include <stdio.h>
#include <string.h>
#include <windows.h>

LPTSTR extensn = "\\";
LPTSTR asteris = "*";

int main (void)
{
WIN32_FIND_DATA f;
DWORD buffSize = MAX_PATH;
LPTSTR lpszBuffer = (LPTSTR)malloc(buffSize *sizeof(char));
LPTSTR tempdir = (LPTSTR)malloc(buffSize *sizeof(char));
LPTSTR basedir = (LPTSTR)malloc(buffSize *sizeof(char));
LPTSTR dirs[1000];
HANDLE h;
int i = 0;
int j = 0;
int k = 0;
sprintf(basedir, "%s", "C:\\Music");
sprintf(lpszBuffer, "%s%s%s", basedir, extensn, asteris);
h = FindFirstFile(lpszBuffer, &f);


while (h != INVALID_HANDLE_VALUE)
{
while (FindNextFile(h, &f))
{
if (f.dwFileAttributes == FILE_ATTRIBUTE_DIRECTORY)
{
if (strcmp(f.cFileName, "..") != FALSE)
{
sprintf(lpszBuffer, "%s%s%s", basedir, extensn, f.cFileName);
dirs[j] = lpszBuffer;
printf("ADDED %s\t%d\n", dirs[j], j);
j++;
}
}
else
{
printf("FILE FOUND!\n");
//Will try to move it later
}
}

if (j > k)
{
sprintf(lpszBuffer, "%s%s%s", lpszBuffer, extensn, asteris);
h = FindFirstFile(lpszBuffer, &f);
k++;
}

}

for (i = 0; i < j; i++)
printf("%s\n", dirs[j]);

return 0;
}[/code]
April 27, 2008, 11:28 PM
Barabajagal
I could tell you how to do it in VB using the Dir$() function no problem...

Here's basically what you do: Write a function to get all the directories contained within a directory and store them in a temporary array. When it's complete, loop through the array and call the function recursively on each subdirectory. At the end, add all the found directories to an externally declared array.
April 27, 2008, 11:42 PM
FrostWraith
Haha well good, that currently is what I happened to come up with in my code, it just isn't performing correctly.  I guess I will just keep trying.  If anyone can find why my code stops looping, let me know.  I seem to get to the first layer of subdirs.  Then it reads from the last subdir found and stops. Kind of like this:

Edit: aha, found where I went wrong.  My string arrays aren't correct.  Could anyone show me how to do this?  Seems like a stupid little thing but I'm having a brainfart.

Edit2: got it nvm, what a worthless topic...

Edit3: ok, well, i now am looking for effiency.  Since I now use a character string to hold every path, it takes up space quickly. Currently I am reserving 10k bytes.  I know this is a bad idea because if there are a lot of directories, it could pass this.  Does anyone see a workaround.

Btw, heres my current working code
[code]#define _CRT_SECURE_NO_DEPRECATE 1

#include <stdio.h>
#include <string.h>
#include <windows.h>

LPTSTR extensn = "\\";
LPTSTR asteris = "*";

int main (void)
{
WIN32_FIND_DATA f;
DWORD buffSize = MAX_PATH;
LPTSTR lpszBuffer = (LPTSTR)malloc(buffSize *sizeof(char));
LPTSTR tempdir = (LPTSTR)malloc(buffSize *sizeof(char));
LPTSTR basedir = (LPTSTR)malloc(buffSize *sizeof(char));
char* dirs[100000];
HANDLE h;
int i = 0;
int j = 0;
int k = 0;

sprintf(basedir, "%s", "C:\\Music");
sprintf(lpszBuffer, "%s%s%s", basedir, extensn, asteris);
h = FindFirstFile(lpszBuffer, &f);

for (i = 0; i < 10000; i++)
dirs[i] = (char*)malloc(sizeof(char) * 50);

while (h != INVALID_HANDLE_VALUE)
{
while (FindNextFile(h, &f))
{
if (f.dwFileAttributes == FILE_ATTRIBUTE_DIRECTORY)
{
if (strcmp(f.cFileName, "..") != FALSE)
{
sprintf(lpszBuffer, "%s%s%s", basedir, extensn, f.cFileName);
strcpy(dirs[j], lpszBuffer);
sprintf(tempdir, "%s%s", f.cFileName, "\0");
j++;
}
}
else
{
printf("FILE FOUND!\t%s\n", f.cFileName);
}
}

if (j > k)
{
sprintf(basedir, "%s%s", dirs[k], extensn);
sprintf(lpszBuffer, "%s%s%s", dirs[k], extensn, asteris);
h = FindFirstFile(lpszBuffer, &f);
k++;
}
else if(j == k)
{
printf("DONE!\n");
return 0;
}
}

return 0;
}[/code]
April 27, 2008, 11:48 PM
Kp
Code like this is why security vulnerabilities happen.

[*]Your global variables are needlessly mutable.  Be const correct.
[*]You use global string pointers for data which could just as easily be in a character array.  Fixing this will save you the size of a pointer per variable.
[*]Your casts with malloc are incorrect.  sizeof(char) != sizeof(TCHAR) when -DUNICODE.  Pick one or the other and be consistent.
[*]You pass character strings to the TCHAR versions of API calls.  Again, this is inconsistent.  Upshot is this'll actually break at compile time when you turn on Unicode, which will hopefully make you realize your code is broken. :)
[*]You're manually managing memory that could be easily automated using STL if you switched to C++.
[*]You're using sprintf to copy strings, which is wasteful.  Use strncpy instead.
[*]You're using sprintf at all.  Never use an unchecked formatting function.  The performance cost of doing unnecessary checks is minimal, and will save you when the code changes to stop satisfying your carefully prechecked length restrictions.
[*]Why are you declaring dirs[100000], but only initializing the first 10000 of them?  That's 90000 wasted char* slots.
[*]You're assuming you'll never have more than 10000 directories, but you never enforce this.  Add a check that dirs is never subscripted beyond 10000 or make dirs dynamically growable.  A std::vector<char *> would be good here.
[*]You use strcpy at all.  Again, use strncpy.  The performance hit of checking the copies is worth the safety of not trashing the heap, especially since you assume that you'll never see a directory more than 50 bytes long.
[*]What're you trying to achieve by using sprintf to copy a string and append a string consisting of a null terminator?  That's just a really expensive strcpy.
[*]You leak madly.  No FindClose, no free() for the dirs, and no free() for the locals.
April 29, 2008, 4:04 AM
FrostWraith
That was one of the most helpful posts ever!  Thank you so much.  I really need some criticism like this to know what guidelines I need to build my code upon.  Being self taught also causes me to miss some things too.

Thanks again.
May 5, 2008, 4:55 PM
Kp
[quote author=FrostWraith link=topic=17472.msg178112#msg178112 date=1210006540]
Thanks again.
[/quote]

You're welcome.  I'd much rather spend a few minutes critiquing your code now than know that someday you'll have a much harder problem and be completely stuck because you never learned any better. :)

Feel free to repost an updated version for follow up, or ask clarifying questions if my remarks weren't clear enough about how or why you should implement my proposed fixes.
May 6, 2008, 3:46 AM
FrostWraith
Ok, I think I am getting better.  I am trying to make my best effort at making all of the arrays grow dynamically.  There is one problem though.  When I change:
[code]char *directories[] = {0};[/code]
to this
[code]char *directories[100000] = {0};[/code]
it works.  I would think that with the way I handle things I wouldn't need to do that.  Is there any way to keep the size of the array dynamic so that I don't have to hardcode any numbers like that?

[code]#define _CRT_SECURE_NO_DEPRECATE 1

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include "id3.h"

char *dirs;

int scandirs();

int main (void)
{
memset(&id3v1_h, 0, sizeof(struct id3v1));

scandirs();

return 0;
}

int scandirs()
{
void *h;
char *buffer, *tempfile;
WIN32_FIND_DATA f;
int id3v1check, i, j, k;
char *directories[] = {0};
char *base;
char *tok;

base = "C:\\Ben\\*";
h = FindFirstFile(base, &f);
j = 0;
k = 0;
i = 0;

directories[0] = base;

while (h != INVALID_HANDLE_VALUE)
{
while (FindNextFile(h, &f))
{
if (f.dwFileAttributes == FILE_ATTRIBUTE_DIRECTORY)
{
if (strncmp(f.cFileName, "..", 2) > 0)
{
j++;
directories[j] = (char *)calloc((strlen(base) + strlen(f.cFileName) + 2), sizeof(char));
strncat(directories[j], base, strlen(base) - 1);
strcat(directories[j], f.cFileName);
strcat(directories[j], "\\*");

printf("%s - %d\n", directories[j], strlen(directories[j]));
}
}
else if (f.dwFileAttributes == FILE_ATTRIBUTE_ARCHIVE)
{
buffer = f.cFileName;
tok = strtok(buffer, ".");
tok = strtok(NULL, ".");

if (strcmp(tok, "mp3") == 0)
{
tempfile = calloc((strlen(base) + strlen(f.cFileName)), sizeof(char));
strncat(tempfile, base, strlen(base) - 1);
strcat(tempfile, f.cFileName);

id3v1check = has_id3v1(tempfile);

free(&base);
i++;

if (id3v1check)
{
//Nothing.  File is mp3 and has id3v1 tag
}
else
{
//Nothing.  File is mp3 but has no id3v1 tag
}

free(&tempfile);
}
}
}

if (j > k)
{
k++;
base = (char *)malloc((strlen(directories[k])) * sizeof(char));
base = directories[k];
h = FindFirstFile(base, &f);
}
else if(j == k)
{
printf("%d mp3 files found\n", i);
free(&directories);
return 0;
}

}
return 0;
}[/code]

Btw, I know I am not consistent with malloc and calloc, but I am just trying to figure out where each one is best suited.
May 7, 2008, 8:16 PM
Kp
Use [code]char **directories = NULL;[/code]Allocate it with:[code]char **tmp = realloc(directories, new_pointer_count * sizeof(char *));
if (!tmp)
/* realloc failed - old memory is not freed.  Most people just die if this happens... */
    exit(1);
directories = tmp;[/code]
Before you free(directories);, you must free each allocated pointer in the array.

When using memset, it's safer to use sizeof on the object.  In your case:[code]memset(&id3v1_h, 0, sizeof(id3v1_h));[/code]This way, if id3v1_h is ever changed to be some type of a different size, the memset does not clobber nearby memory.

By immediately calling FindNextFile, you skip processing the first element in the directory.  This is usually [u].[/u], which is not interesting.  But, it could be something else, in which case missing it could be bad.  You should use a do { } while(FindNextFile()); for the inner loop, so that it makes a pass over the data from FindFirstFile before moving on.

File attributes are a bitmask.  Comparing for exact equality is usually, though not always, wrong.  In this case, I think you meant to test if the bit is set, not if the entire mask is equal to that bit (i.e. that the bit is set and all other bits are unset).

Your use of strncmp is wrong here.  It says "if the first two bytes of f.cFileName are greater than '..', then do this clause".  This is wrong for two reasons.  First, based on context, I think you were trying to exclude only [u]..[/u], not all directories that have a leading [u]..[/u].  Second, if you're really trying to compare to find the literal [u]..[/u], you should test whether the return value is nonzero, rather than whether it's positive.  A negative return means that the supplied filename is less than [u]..[/u], not that it is [u]..[/u].

Use of strtok is dangerous.  You make assumptions about the structure of the filename by how you call strtok.  If it has too few dots, that second call to strtok could return NULL, which will crash when you dereference tok to check whether it's equal to "mp3".  Also, you could encounter a file which has many dots, such as "My.favorite.album.from.2007.mp3".  strtok would not return a pointer to the file extension in that case, so you would incorrectly flag it as "not an mp3".

Your call to free(&base); is wrong and very dangerous.  &base is a pointer into the stack, and free should only ever be given pointers to heap blocks allocated by malloc/calloc/realloc.  You may have meant free(base); here.  Ditto for free(&directories);.  Also, I see that you're mixing having base point at a constant string and point at heap memory.  Although legal, this is dangerous since you must be careful to free base only when it points to the heap.

You still don't have any calls to FindClose, so you're leaking find HANDLE objects.
May 8, 2008, 4:00 AM

Search