Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / Programming / April 2008

Tip: Looking for answers? Try searching our database.

Extract matching data from large data file (csv)

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Utahstew - 13 Apr 2008 06:37 GMT
I need to extract data (the first three letters after the 2nd comma, and the
first 35 characters after the 7th comma) from a csv file (over 100,000 rows),
only after the 8th column matches a values in column A of my spreadsheet. The
two extracted data elements need to be stored in my worksheet in columns B
and C

Any help on this would be appreciated.
Joel - 13 Apr 2008 07:26 GMT
The code below should work.  It may need a slight change.  I a little
confused.  The eigth column is the data after the seventh comma.  Do you mean
after the eigth comma?

the code below will only bring into the worksheet the needed data.  It will
ignore data that doesn't meet your criteria.

Sub Gettext()

Const ForReading = 1, ForWriting = 2, ForAppending = 3
Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0
   
Dim Data(8)

'default folder
Folder = "C:\temp"
ChDir (Folder)

Set fsread = CreateObject("Scripting.FileSystemObject")
FName = Application.GetOpenFilename("CSV (*.csv),*.csv")
   
Set fread = fsread.GetFile(FName)
Set tsread = fread.OpenAsTextStream(ForReading, TristateUseDefault)

RowCount = 1
Do While tsread.atendofstream = False
     
  InputLine = tsread.ReadLine

  For i = 0 To 7
     If InStr(InputLine, ",") > 0 Then
        Data(i) = Left(InputLine, InStr(InputLine, ",") - 1)
        InputLine = Mid(InputLine, InStr(InputLine, ",") + 1)
     Else
        If Len(InputLine) > 0 Then
           Data(i) = InputLine
           InputLine = ""
        Else
           Exit For
        End If
     End If
  Next i
  'check if 8th item is in column A
  Set c = Columns("A:A").Find(what:=Data(7), LookIn:=xlValues, _
     lookat:=xlWhole)
  If Not c Is Nothing Then
     c.Offset(0, 1) = Left(Data(2), 3)
     c.Offset(0, 2) = Left(Data(7), 35)
  End If
Loop
tsread.Close
End Sub

> I need to extract data (the first three letters after the 2nd comma, and the
> first 35 characters after the 7th comma) from a csv file (over 100,000 rows),
[quoted text clipped - 3 lines]
>
> Any help on this would be appreciated.
Utahstew - 13 Apr 2008 17:42 GMT
Thank you for your quick response.

The code works great until it finds an "&" in  Data(7)

Any suggestion on how to over come this ...?

> The code below should work.  It may need a slight change.  I a little
> confused.  The eigth column is the data after the seventh comma.  Do you mean
[quoted text clipped - 56 lines]
> >
> > Any help on this would be appreciated.
Utahstew - 13 Apr 2008 18:03 GMT
I was wrong other "&" exist in the lines previously
Utahstew - 13 Apr 2008 18:14 GMT
Joel,
Here is the modified code I  have in the spreadsheet. It throws a Run-time
error '9': Subscript out of range when reading data from line 3073 of the
Total.csv file.

Equity    93671105    NYSE - H&R Block Inc    20080102    18.61        US0936711052    H&R Block
Inc    HRB    93671105

Sub getDatafromTextFile()
Dim Ticker As String

tickers = Application.CountA(ActiveSheet.Range("A:A")) 'determines number of
Tickers

Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 3
Const TristateUSeDefault = -2
Const TristateTrue = -1
Const TristateFalse = 0

Dim Data(9)

'Default folder

Set fsread = CreateObject("Scripting.fileSystemObject")
Fname = Application.GetOpenFilename(FileFilter:="Text Files (*.csv), *.csv",
Title:="Please select a file")
    If Fname = False Then
        ' They pressed Cancel
        MsgBox "Stopping because you did not select a file"
    End If

Set fread = fsread.getfile(Fname)
Set tsread = fread.openastextstream(ForReading, TristateUSeDefault)

RowCount = 1

   Do While tsread.atendofstream = False

   inputline = tsread.readline
   
   For i = 0 To 10
       If InStr(inputline, ",") > 0 Then
           Data(i) = Left(inputline, InStr(inputline, ",") - 1)
           inputline = Mid(inputline, InStr(inputline, ",") + 1)
       Else
           If Len(inputline) > 0 Then
           Data(i) = inputline
           inputline = ""
       Else
   Exit For
   End If
   End If
   
   Next i
   
       Set c = Columns("A:A").Find(what:=Trim(Data(8)), LookIn:=xlValues,
lookat:=xlWhole)
       If Not c Is Nothing Then
           c.Offset(0, 1) = Left(Data(2), 3)
           c.Offset(0, 2) = Left(Data(7), 35)
       End If
   Loop
   tsread.Close
End Sub
Utahstew - 13 Apr 2008 19:01 GMT
The lies that have problems have extra comma at the end.

The following line work in the code
Equity,93679108,NYSE - Blockbuster Inc Class
A,20080102,3.64,,US0936791088,Blockbuster Inc Class A,BBI,93679108

The following line does not work in the code
Equity,9.26E+104,NYSE - Enh Eq Yd Pr Fd,20080102,15.63,,US09256E1010,Enh Eq
Yd Pr Fd,ECV,9.26E+104,,

Notice the final two comma

How do I limit theinput line to only the first ten columns and disregard the
rest?
Joel - 13 Apr 2008 20:10 GMT
I'm not sure why you modified the size of the Data array and the size of the
FOR loop.  You only need 8 items. Why go through the loop additional times
and get data you don't need.  The original code would of just ignored the
other data in the line.

The error is occuring becuase the for loop is now 0 to 10 which is 11 items.
Dat is defoined as Data(9) which is nine items.  The error is occuring
becuase you are putting more items into the array than it can hold.

> The lies that have problems have extra comma at the end.
>
[quoted text clipped - 10 lines]
> How do I limit theinput line to only the first ten columns and disregard the
> rest?
Utahstew - 13 Apr 2008 20:25 GMT
Thank you Joel, your fast response is appreciated

I had forgotten arrays start at 0.
Joel - 14 Apr 2008 03:57 GMT
I think you should add an exit sub in the code below.  You are going to get
an error if you don't make the change.

from
Title:="Please select a file")
    If Fname = False Then
        ' They pressed Cancel
        MsgBox "Stopping because you did not select a file"
    End If
to
Title:="Please select a file")
    If Fname = False Then
        ' They pressed Cancel
        MsgBox "Stopping because you did not select a file"
        exit sub
    End If

> Thank you Joel, your fast response is appreciated
>
> I had forgotten arrays start at 0.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.