Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / Programming / March 2008

Tip: Looking for answers? Try searching our database.

Smart way to delete the first row of a huge file

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Charles - 25 Mar 2008 18:26 GMT
Hello

Would anyone know a better way to remove the first line of a huge txt
file?

I am using a FileSystemObject/TextStream, doing a first readline, and
then looping:
Destination.WriteLine Source.ReadLine

but the two files are located on drives and it is taking hours to run.
Is anyone aware of a smarter way? (I guess the solution would be to
directly delete the first row in a file and save the change...without
loading it all to memory, but not sure of how to achieve that)

Thanks in advance
Charles
Tim Williams - 26 Mar 2008 04:37 GMT
I think you're pretty much stuck with how you're doing it now.
How large is the file.  Do you mean hours literally ?

Tim

> Hello
>
[quoted text clipped - 12 lines]
> Thanks in advance
> Charles
Rick Rothstein (MVP - VB) - 26 Mar 2008 05:18 GMT
How huge is "huge"? I would think the following should work fairly quickly
on a file up to 20 Megs in size and, while slowing down on file sizes larger
than this, it should still work much, much faster what you are describing
for your present code. Give it a try and let us know...

Sub RemoveFirstLine(PathAndFileName As String)
 Dim FileNum As Long
 Dim TotalFile As String
 FileNum = FreeFile
 Open PathAndFileName For Binary As #FileNum
   TotalFile = Space(LOF(FileNum))
   Get #FileNum, , TotalFile
 Close #FileNum
 FileNum = FreeFile
 Open PathAndFileName For Output As #FileNum
   Print #FileNum, Mid$(TotalFile, InStr(TotalFile, vbCrLf) + 2)
 Close #FileNum
End Sub

You would call this subroutine from within your own code like this...

Sub Test()
 '
 '     <<< Some code >>>
 '
 RemoveFirstLine "C:\TEMP\TestData.txt"
 '
 '     <<< Some more code >>>
 '
End Sub

where, of course, you would substitute your own path and filename for the
sample one I used. If you think you might need it (I don't), you can add a
slight measure of safety to the RemoveFirstLine subroutine by Output'ting
the modified TotalFile (the Mid$ function call) to a different filename in
the path, then executing a Kill statement for the original file, and finally
executing a Name..As statement to rename the Output'ted filename back to the
original filename.

Rick

> Hello
>
[quoted text clipped - 12 lines]
> Thanks in advance
> Charles
Charles - 26 Mar 2008 19:10 GMT
Hi

by huge I mean 200Meg+. The problem is that in top of that the file is
stored on a network drive, so it's really slow. The solution I found
to improve it is to copy it to the C drive first, process it, and then
copy it back. That's substentially faster than processing it directly
from the network drive.

It doesn't take hours but it takes a good 4-5min, which feels like
hours when you are waiting in front of your screen!

Rick: I'll try your solution.

Thanks!
Charles
Charles - 26 Mar 2008 19:20 GMT
Actually, not sure it works. Your code requires to load the whole
200MB file into memory in one go. Which would be probably faster than
having an access to the drive for each line read/written. But that's
going to be a big ask for Excel. Actually can a single string contain
200MB of data?

I actually have another question: I never really understood the
meaning of the "#" character before variables, or the "$" sign after
the name of the function. How different is it from simply using mid()
or filenum?

Charles
Rick Rothstein (MVP - VB) - 26 Mar 2008 20:03 GMT
Yes, a single string can contain 200MB of data; the actual limit is
approximately 2 billion (2^31) characters. So, the only restriction on how
well the code I posted will work is available RAM (I think a total of 400MB
would be necessary for your file... 200MB for the string as it is read into
the TotalFile variable and another 200MB for the temporary string I think
gets created by the Mid$ function call). If there is not enough available
RAM, the code should still work, but the system paging back out to the hard
drive would be a performance killer. (All this is why I asked you how huge
"huge" was.)

The # signs are part of the syntax for addressing a channel number to a file
(although in some of the locations I used it in, it may be optional). As for
the $ sign, String functions have two forms... without the $ sign, they
return variants with a subtype of String (variants are slow and memory
wasters... not usually a concern outside of large loops); with the $ sign,
they return a pure String data type (and hence, when used in further String
manipulations, are as efficient as the function will get).

Rick

> Actually, not sure it works. Your code requires to load the whole
> 200MB file into memory in one go. Which would be probably faster than
[quoted text clipped - 8 lines]
>
> Charles
Charles - 26 Mar 2008 23:29 GMT
Interesting. That's the kind of post to archive in the section "useful
stuff to know".

Thanks very much for your help!

Charles
Rick Rothstein (MVP - VB) - 27 Mar 2008 01:23 GMT
Just out of curiosity, did you try my code against your file? While your
file is **much** larger than what I'd consider an ideal size, I'd still be
interested how it fairs against your existing methods (assuming you have a
decent amount of RAM in your system to handle the file all in memory).

Rick

> Interesting. That's the kind of post to archive in the section "useful
> stuff to know".
>
> Thanks very much for your help!
>
> Charles
Charles - 27 Mar 2008 10:48 GMT
the line Space(LOF(FileNum)) is generating an error. It is actually
the Space function that generates the error. LOF returns 190,504,020.
It says it ran out of string space. also tried with String(, " ")
function, same result. Don't know if this is a limit of the string
function or if I reached maximum capacity of a string in VBA (or at
least by VBA)

Charles
Charles - 27 Mar 2008 10:57 GMT
I did a test using the following loop

Sub test2()
Dim txt As String, i As Long
For i = 1 To 190
 Application.StatusBar = i
 txt = txt & Space(1000000)
Next i
MsgBox "Success"
End Sub

it breaks at the step 105, which seems to mean that VBA is running out
of memory with a string larger than 104m characters. Excel total
memory usage was around 350MB, so far below the 2GB XL limit, and my
installed memory (about 2GB as well)
So it looks like there is another memory limit for strings.

Charles
Rick Rothstein (MVP - VB) - 28 Mar 2008 19:07 GMT
Okay, I also get the loop quitting well below the maximum string size. Part
of that may be the tight loop (probably interfering with the memory "garbage
collection" routine) and continual concatenations. Each concatenation will
require a larger and larger chunk of contiguous memory which I guess becomes
harder and harder to find within the available (unused, non-garbage
collected) RAM. Anyway, another approach is to process the file in large
"chunks" which I am guessing will still be faster than what you are now
doing. Below my signature is a subroutine that handles the process this way.
You can experiment with this a little bit. I set the chunk size at 20MB (via
the ChunkSize constant assignment)... you can try upping that to, say, 40MB
and see if your system will handle it (I don't have a 200MB file to test the
code out on). Anyway, give it a try and let me know how it works out.

Rick

Sub RemoveFirstLine(PathAndFileName As String)
 Dim InFileNum As Long
 Dim OutFileNum As Long
 Dim ReadLocation As Long
 Dim LoopChunk As Long
 Dim BackSlash As Long
 Dim PartialFile As String
 Const ChunkSize As Long = 20000000
 BackSlash = InStrRev(PathAndFileName, "\")
 ' The next 4 lines must be executed in the order shown
 InFileNum = FreeFile
 Open PathAndFileName For Binary As #InFileNum
 OutFileNum = FreeFile
 Open Left$(PathAndFileName, BackSlash) & "TEMP_" & _
            Mid$(PathAndFileName, BackSlash + 1) For Append As #OutFileNum
   LoopChunk = 5000
   PartialFile = Space$(LoopChunk)
   Get #InFileNum, , PartialFile
   Print #OutFileNum, Mid$(PartialFile, InStr(PartialFile, vbCrLf) + 2);
   Do While Not (EOF(InFileNum))
     PartialFile = Space$(ChunkSize)
     Get #InFileNum, , PartialFile
     If FileLen(PathAndFileName) - Seek(InFileNum) <= 0 Then
       PartialFile = Left(PartialFile, InStr(PartialFile, Chr$(0)) - 1)
     End If
     Print #OutFileNum, PartialFile;
   Loop
 Close #InFileNum
 Close #OutFileNum
 Kill PathAndFileName
 Name Left$(PathAndFileName, BackSlash) & "TEMP_" & _
      Mid$(PathAndFileName, BackSlash + 1) As PathAndFileName
End Sub

>I did a test using the following loop
>
[quoted text clipped - 14 lines]
>
> Charles
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.