Visual Basic Programming | splitting a txt file and setting it to an array

Author	Message	Time
Tontow	I know how to open it but im haveing dificulty when i try to set the .txt file = to an array (i want to open a text file and split it up every vbCrLf (return) and have it be returned in an array. text file = array with every line being a different entrie in the array)	April 8, 2004, 5:04 AM
Newby	If I understand you correctly: Input a line of the text file, and add it to the array Loop until you reach the end of the file. Perhaps show some coding?	April 8, 2004, 5:17 AM
o.OV	Or you can load the whole file into a temporary string and then use Split. I don't know which would be best and I'm not aware of any direct methods.	April 8, 2004, 5:29 AM
Eli_1	[quote author=Tontow link=board=31;threadid=6203;start=0#msg53966 date=1081400695] I know how to open it but im haveing dificulty when i try to set the .txt file = to an array (i want to open a text file and split it up every vbCrLf (return) and have it be returned in an array. text file = array with every line being a different entrie in the array) [/quote] There are 2 ways (that I would use) to do this: 1.) Input the file line by line into an array [code] dim myArray() as string redim myArray(0) open app.path & "/file.bla" for input as #1 do until eof(1) input #1, myArray(ubound(myArray)) redim preserve myarray(ubound(myarray) + 1) loop redim preserve myarray(ubound(myarray) - 1) [/code] Or 2.) Use binarry access read to input the whole file, then parse it according to the VbCrLf's with Split [code] dim myArray() as string, buffer as string open app.path & "\file.bla" for binary access read as #1 buffer = space$(lof(1)) get #1, , buffer myarray = split(buffer, vbcrlf) [/code] Both are untested so you may have to tweak them some to get it to work. Hope it helps.	April 8, 2004, 5:34 AM
Tontow	thx, that helped alot	April 8, 2004, 5:55 AM
Grok	Maybe TheMinistered or Adron knows how a VB arrays is constructed in memory, and if it is possible to get a little trickier. Perhaps loading the whole file into a string, then altering the string to be an array, without having to redim. I think that redim preserve is going to cause at least linecount copy operations.	April 8, 2004, 12:02 PM
Adron	[quote author=Grok link=board=31;threadid=6203;start=0#msg53984 date=1081425743] Maybe TheMinistered or Adron knows how a VB arrays is constructed in memory, and if it is possible to get a little trickier. Perhaps loading the whole file into a string, then altering the string to be an array, without having to redim. I think that redim preserve is going to cause at least linecount copy operations. [/quote] A VB array consists of a number of same-size objects laid out sequentially in memory, just like a C array. An array of String is a bit like a C array of "char". The pointers will be stored at consecutive locations, but the actual text data may be stored anywhere in memory. This means that you can't turn a long string into an array of strings. In C, you could do something like: [code] char buffer[] = "String1\nString2\nString3"; char strings[3]; strings[0] = strtok(buffer, "\n"); strings[1] = strtok(0, "\n"); strings[2] = strtok(0, "\n"); [/code] which would give you 3 strings using the same big buffer. You yourself handle the allocation of memory for the strings, and you know that they all share the same buffer. In VB, the compiler handles allocation of memory for strings, and you can't tell it what memory to use. If you did some magic to make VB use the same memory buffer for all strings, you'd get errors later when VB tried to free the memory used by each string separately. If VB isn't stupid, it won't reallocate the memory for each string when you redim the array of strings. It will just move the pointers, which will be a rather fast operation. It should be nearly equivalent in speed to the solution in C above. Because there too you need to "redim" the strings array of pointers if you don't know the number of lines beforehand. In C, you could also turn it into an actual array of strings without doing any more assignments at all, but only if the strings are fixed length. That would look something like this: [code] char buffer[] = "String1\0String2\0String3"; char (strings)[8]; strings = (char ()[8])buffer; [/code] Here you are telling the compiler that "buffer" is actually an N by 8 (N = 3 in this case) matrix of characters. Each line in the matrix is one string. When you're reading the data from the file you have to replace the '\n' at the end of each line by the string-terminator '\0'.	April 8, 2004, 1:23 PM
iago	Perhaps it would be faster to scan in the file, count the endlines, and then read it in? I don't know how the second file operation will compare to the redims, but I DO know that the second time you read the file it'll be faster due to caching.	April 8, 2004, 4:47 PM
Adron	Another possibility would be to have a collection of arrays and add one array at a time, each array larger than the last, then only reallocate it once at the end. Another possibility would be to check the filesize, guess using some reasonable statistic how many lines there will be, and allocate enough room + some margin for that right away. Then if you hit the limit, you do a new estimate based on the data you've read so far. And at the end, you redim it down which should hopefully not involve any copying of data.	April 8, 2004, 5:51 PM
Eli_1	[quote author=Eli_1 link=board=31;threadid=6203;start=0#msg53971 date=1081402479] 1.) Input the file line by line into an array <codeblock> Or 2.) Use binarry access read to input the whole file, then parse it according to the VbCrLf's with Split <codeblock> [/quote] I was bored and I used those two different ways on my crappy computer on various different files to see which one was faster. Here's my results (in ms). On a file with only 56 lines (readme.txt): Method with ReDim: 16 Method with Split : 11 Method with ReDim: 17 Method with Split : 11 Method with ReDim: 21 Method with Split : 11 On a file with 550 lines (win.ini): Method with ReDim: 29 Method with Split : 20 Method with ReDim: 35 Method with Split : 10 Method with ReDim: 33 Method with Split : 24 On a file with 2589 lines (list from BrooDat.mpq): Method with ReDim: 117 Method with Split : 159 Method with ReDim: 157 Method with Split : 181 Method with ReDim: 172 Method with Split : 174 So it seems like the second method is much faster than the first, untill the file size gets pretty big. So the second method would be faster for the average config/shitlist/whatever (on my comp.)	April 8, 2004, 6:29 PM
Adron	It's more important to get good timings for a large list though - noone cares about 20 or 50 ms, but when it's 5000 or 10000 people will start caring...	April 8, 2004, 6:40 PM
Eli_1	Then in that case the first method would be a better choice. :-\	April 8, 2004, 6:41 PM

Author

Message

Time

Tontow

I know how to open it but im haveing dificulty when i try to set the .txt file = to an array

(i want to open a text file and split it up every vbCrLf (return) and have it be returned in an array.
text file = array with every line being a different entrie in the array)

April 8, 2004, 5:04 AM

Newby

If I understand you correctly:

Input a line of the text file, and add it to the array

Loop until you reach the end of the file.

Perhaps show some coding?

April 8, 2004, 5:17 AM

o.OV

Or you can load the whole file into a temporary string and then use Split.
I don't know which would be best
and I'm not aware of any direct methods.

April 8, 2004, 5:29 AM

Eli_1

[quote author=Tontow link=board=31;threadid=6203;start=0#msg53966 date=1081400695]
I know how to open it but im haveing dificulty when i try to set the .txt file = to an array

(i want to open a text file and split it up every vbCrLf (return) and have it be returned in an array.
text file = array with every line being a different entrie in the array)
[/quote]

There are 2 ways (that I would use) to do this:

1.) Input the file line by line into an array
[code]
dim myArray() as string
redim myArray(0)
open app.path & "/file.bla" for input as #1
do until eof(1)
input #1, myArray(ubound(myArray))
redim preserve myarray(ubound(myarray) + 1)
loop

redim preserve myarray(ubound(myarray) - 1)
[/code]

Or
2.) Use binarry access read to input the whole file, then parse it according to the VbCrLf's with Split
[code]
dim myArray() as string, buffer as string
open app.path & "\file.bla" for binary access read as #1
buffer = space$(lof(1))
get #1, , buffer
myarray = split(buffer, vbcrlf)
[/code]

Both are untested so you may have to tweak them some to get it to work. Hope it helps.

April 8, 2004, 5:34 AM

Tontow

thx, that helped alot

April 8, 2004, 5:55 AM

Grok

Maybe TheMinistered or Adron knows how a VB arrays is constructed in memory, and if it is possible to get a little trickier. Perhaps loading the whole file into a string, then altering the string to be an array, without having to redim. I think that redim preserve is going to cause at least linecount copy operations.

April 8, 2004, 12:02 PM

Adron

[quote author=Grok link=board=31;threadid=6203;start=0#msg53984 date=1081425743]
Maybe TheMinistered or Adron knows how a VB arrays is constructed in memory, and if it is possible to get a little trickier. Perhaps loading the whole file into a string, then altering the string to be an array, without having to redim. I think that redim preserve is going to cause at least linecount copy operations.
[/quote]

A VB array consists of a number of same-size objects laid out sequentially in memory, just like a C array. An array of String is a bit like a C array of "char*". The pointers will be stored at consecutive locations, but the actual text data may be stored anywhere in memory. This means that you can't turn a long string into an array of strings.

In C, you could do something like:

[code]
char buffer[] = "String1\nString2\nString3";
char *strings[3];
strings[0] = strtok(buffer, "\n");
strings[1] = strtok(0, "\n");
strings[2] = strtok(0, "\n");
[/code]

which would give you 3 strings using the same big buffer. You yourself handle the allocation of memory for the strings, and you know that they all share the same buffer. In VB, the compiler handles allocation of memory for strings, and you can't tell it what memory to use.

If you did some magic to make VB use the same memory buffer for all strings, you'd get errors later when VB tried to free the memory used by each string separately.

If VB isn't stupid, it won't reallocate the memory for each string when you redim the array of strings. It will just move the pointers, which will be a rather fast operation. It should be nearly equivalent in speed to the solution in C above. Because there too you need to "redim" the strings array of pointers if you don't know the number of lines beforehand.

In C, you could also turn it into an actual array of strings without doing any more assignments at all, but only if the strings are fixed length. That would look something like this:

[code]
char buffer[] = "String1\0String2\0String3";
char (*strings)[8];
strings = (char (*)[8])buffer;
[/code]

Here you are telling the compiler that "buffer" is actually an N by 8 (N = 3 in this case) matrix of characters. Each line in the matrix is one string. When you're reading the data from the file you have to replace the '\n' at the end of each line by the string-terminator '\0'.

April 8, 2004, 1:23 PM

iago

Perhaps it would be faster to scan in the file, count the endlines, and then read it in? I don't know how the second file operation will compare to the redims, but I DO know that the second time you read the file it'll be faster due to caching.

April 8, 2004, 4:47 PM

Adron

Another possibility would be to have a collection of arrays and add one array at a time, each array larger than the last, then only reallocate it once at the end.

Another possibility would be to check the filesize, guess using some reasonable statistic how many lines there will be, and allocate enough room + some margin for that right away. Then if you hit the limit, you do a new estimate based on the data you've read so far. And at the end, you redim it *down* which should hopefully not involve any copying of data.

April 8, 2004, 5:51 PM

Eli_1

[quote author=Eli_1 link=board=31;threadid=6203;start=0#msg53971 date=1081402479]
1.) Input the file line by line into an array
<codeblock>
Or
2.) Use binarry access read to input the whole file, then parse it according to the VbCrLf's with Split
<codeblock>
[/quote]
I was bored and I used those two different ways *on my crappy computer* on various different files to see which one was faster. Here's my results (in ms).

On a file with only 56 lines (readme.txt):
Method with ReDim: 16
Method with Split : 11

Method with ReDim: 17
Method with Split : 11

Method with ReDim: 21
Method with Split : 11

On a file with 550 lines (win.ini):
Method with ReDim: 29
Method with Split : 20

Method with ReDim: 35
Method with Split : 10

Method with ReDim: 33
Method with Split : 24

On a file with 2589 lines (list from BrooDat.mpq):
Method with ReDim: 117
Method with Split : 159

Method with ReDim: 157
Method with Split : 181

Method with ReDim: 172
Method with Split : 174

So it seems like the second method is much faster than the first, untill the file size gets pretty big. So the second method would be faster for the average config/shitlist/whatever (on my comp.)

April 8, 2004, 6:29 PM

Adron

It's more important to get good timings for a large list though - noone cares about 20 or 50 ms, but when it's 5000 or 10000 people will start caring...

April 8, 2004, 6:40 PM

Eli_1

Then in that case the first method would be a better choice. :-\

April 8, 2004, 6:41 PM

Valhalla Legends Forums Archive | Visual Basic Programming | splitting a txt file and setting it to an array