Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / Programming / July 2007

Tip: Looking for answers? Try searching our database.

Fast way of comparing strings?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Guillermo López-Anglada - 15 Jul 2007 23:13 GMT
Hi,

I have the following code to compare strings:

Private Function GetNumOfMatchingWords(ArrayToCompare() As String, _
    ReferenceString As String)

    Dim nWord As Long
    Dim WordsFound As Long

    For nWord = 0 To UBound(ArrayToCompare())

        'palabra contenida en cadena...
        If InStr(ReferenceString, ArrayToCompare(nWord)) > 0 Then

            'aumentamos cómputo de palabras contenidas...
            WordsFound = WordsFound + 1
        End If
        'siguiente palabra...
    Next

    GetNumOfMatchingWords = WordsFound
End Function

I basically convert a string into an "array of words" and then check whether each of these words is
contained within another string. Then I perform the same operation the other way around in order to find
the highest percentage of matching words between the two strings. This approach works for my purpose, but
as I have thousands of strings to compare, it isn't very efficient. Could someone suggest a faster method?

The whole process involves reading a text file (in binary mode), splitting the read buffer into lines,
then the strings into words and then calling this routine. I think this one is the bottleneck. I've tried
the Like operator as well, but it isn't flexible enough.

Regards,

Guillermo
Klaus Linke - 16 Jul 2007 00:37 GMT
Hi Guillermo,

Not sure...

Do you compare every string to every other string? Are the strings you
compare and the reference strings from the same list (text file)?

Maybe you could put all words from all strings into a two-dimensional array:
The word itself in the first column, and a number indicating which string
it's from, "1", "2"..., in the second.

Use a (fast) sorting algorithm.

Now you have a list like
Aardvark 7
Aberdeen 3
Aberdeen 5
Aberdeen 12
and 2
and 5
and 7
...

Now you could build a two-dimensional integer matrix (both dimensions equal
to the number of strings) by running through this list once, and
incrementing every time a word appears in two strings:
Aardvark: can be ignored (since it only appears in one sentence),
Aberdeen: increment matrix(3,5), matrix(3,12), matrix(5,12)
and: increment matrix(2,5), matrix(2,7), matrix(5,7) ...

When you're finished, matrix elements with a high number indicate a pair of
sentences that has a lot of words in common.

Regards,
Klaus

> Hi,
>
[quoted text clipped - 35 lines]
>
> Guillermo
Tony Strazzeri - 16 Jul 2007 03:14 GMT
On Jul 16, 8:13 am, Guillermo López-Anglada <guille...@deletethislopez-
anglada.com> wrote:
> Hi,

In addition to Klaus' reply you could look at another approach.

When I need to check an item against a large list I don't like
iterating through the list to find if the item exists.  I have
developed an approach using either a listbox control or a collection
object.  Both of these allow you to add only unique items to their
lists.  If you try to add an item that already exists they generate an
error.  My technique involves testing for the error to determine if
the item exists.  I think (expect/hope) this may be faster than
iterating an array but have not tested it.

Maybe you could use this method.  I can't post example code at this
time because I am not at a PC where I have the code but I will do so
when I can.

Hope this helps.

Cheers
TonyS.

> I have the following code to compare strings:
>
[quoted text clipped - 30 lines]
>
> Guillermo
Tony Strazzeri - 17 Jul 2007 10:47 GMT
Here is the explanation and sample code to use a collection to
determine is an item exists in a list.

I wrote the notes below as aprt of my operations and coding
practices.  This is the first time they will have been read by someone
else.  I hope they make sense..

VBA does not have a method for quickly determining if a item is a
member of a list.  For example in an array of names does the array
contain the name fred.  Conversely determine if fred is not a member
of the array of names.
The usual way to determine this is to iterate through the entire array
until the item is found (or not found).  Although tedious this is a
quick way to check a small number of items to see if they are part of
the list.
A good technique that I have developed is to use a collection to
achieve this.

============================
Collection Object
A Collection object is an ordered set of items that can be referred to
as a unit.

Remarks
The Collection object provides a convenient way to refer to a related
group of items as a single object. The items, or members, in a
collection need only be related by the fact that they exist in the
collection. Members of a collection don't have to share the same data
type.
A collection can be created the same way other objects are created.
For example:
Dim objCollection As New Collection

Once a collection is created, members can be added using the Add
method and removed using the Remove method. Specific members can be
returned from the collection using the Item method, while the entire
collection can be iterated using the For Each...Next statement.
===================================

When adding items you specify the item value and a key for the item.
I  usually use the same value for the key as for the item.  If the
item is a long string the may be wasteful of memory in which case
alternatives should be sought
Attempting to add an item with the same key as an existing item key
generates a' runtime error 457'.  The error can be trapped  or simply
ignored.
To ignore the error simply wrap the add with an error trap. Eg.

       On Error Resume Next
         objCollection.Add Item, key
      On Error GoTo 0

This will avoid adding the duplicate leaving the original item in the
collection.  If we want to update an existing value (where we don't
know if the item is in the list we can use a similar technique but
this time attempting to remove the item then adding the new item.

     On error resume next
        objCollection.remove "key"
     On error goto 0
     objCollection.Add Item, key

The following example will determine if the item is a member of the
list. If the item exists  the variable temp will contain its value.

       temp = ""
       On Error Resume Next
           temp = objCollection.Item(key)
       On Error GoTo 0

Things to be aware of.
·    Item numbers and item count are dynamic.
·    If an item is removed all subsequent items will shift up and the
item count will reduce by 1.
·    Items can be removed either by key or item number

Since you can have a collection of virtually anything you can address/
store multiple data properties of the item by storing an object in the
collection
Limitation
You can't return an item's index without iterating the list.

METHOD USING A LISTBOX IN A USERFORM
===================================
If programming in a userform we can use the properties of a ListBox
(which can remain hidden) to more easily return the index of an item
if it exists in the list.
Although we can use the same technique as for a collection (above).
The listbox has the advantage that it can easily return the item's
index in the list.
Using a listbox called lbxCTItem

       With lbxCTItem
           .ListIndex = -1    'unselects any selected item
           On Error Resume Next
               lbxCTItem = str
           On Error GoTo 0
           thisItem = .ListIndex
       End With

       'If the item does not exist thisItem retains a value of -1
       'Otherwise it returns the items index number

       If thisItem < 0 Then
               Msgbox "The item does not exist in the list."
       End If

You can store additional data about the item as columns of the
listbox.  By changing the listcolumn number you can even find an item
using that column.  But beware of duplicates.

I Would be interested in hearing comments on this technique.  Maybe
some of the more savvy developers  can comment/test on whether this is
quicker than other methods.  If not I will get around to it one day.
Thill then I suspec that even with several hundred items in a list
most methods will be equally effective in this environment (Document
automation as distinct from number crunching).

Cheers
TonyS.
Russ - 16 Jul 2007 07:34 GMT
What your ultimate goal?
Or is it to find the words or the number of words that are common to two
lists?
Or is it to find a total word count for each word?
Or is it to find which group of words consistently appear more often in
other groups of word?
(Then use what I say below.)

In your code below you are counting the number of words that are in common
with both. If you reverse the process, like you say you do between the same
two lists, then you should come up with the same number of common words.

Looking at your code below, change 'If InStr' to 'While InStr'...<> 0 if you
want to count every match in the string. Then if you reverse the process,
you could come up with a different number of matches, if the lists are not
unique and allow the same word to repeat.

> Hi,
>
[quoted text clipped - 38 lines]
>
> Guillermo

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

Russ - 16 Jul 2007 11:35 GMT
Guillermo,
Message below in text.

> What your ultimate goal?
> Or is it to find the words or the number of words that are common to two
[quoted text clipped - 12 lines]
> you could come up with a different number of matches, if the lists are not
> unique and allow the same word to repeat.

You would do the While loop like this:

Private Function GetNumOfMatchingWords(ArrayToCompare() As String, _
    ReferenceString As String)

    Dim nWord As Long
    Dim WordsFound As Long

    nWord = 1

    nWord = InStr( nWord, ReferenceString, ArrayToCompare(nWord))
    While nWord <> 0
       'aumentamos cómputo de palabras contenidas...
       WordsFound = WordsFound + 1
       nWord = InStr( nWord, ReferenceString, ArrayToCompare(nWord))
    Wend

    GetNumOfMatchingWords = WordsFound
End Function

>> Hi,
>>
[quoted text clipped - 40 lines]
>>
>> Guillermo

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

Russ - 16 Jul 2007 11:50 GMT
Guillermo,
Use this instead. To avoid an endless loop, I have to increment nWord, too.

Private Function GetNumOfMatchingWords(ArrayToCompare() As String, _
    ReferenceString As String)

    Dim nWord As Long
    Dim WordsFound As Long

    nWord = 1

    nWord = InStr( nWord, ReferenceString, ArrayToCompare(nWord))
    While nWord <> 0
       nWord = nWord + 1
       'aumentamos cómputo de palabras contenidas...
       WordsFound = WordsFound + 1
       nWord = InStr( nWord, ReferenceString, ArrayToCompare(nWord))
    Wend

    GetNumOfMatchingWords = WordsFound
End Function

> Guillermo,
> Message below in text.
[quoted text clipped - 80 lines]
>>>
>>> Guillermo

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

Russ - 16 Jul 2007 18:08 GMT
Guillermo,
I wasn't going through ArrayToCompare, so use this instead.

Private Function GetNumOfMatchingWords(ArrayToCompare() As String, _
    ReferenceString As String)
    Dim iCounter As Long
    Dim nWord As Long
    Dim WordsFound As Long

    iCounter = 1
    For nWord = 0 To UBound(ArrayToCompare())
       iCounter = InStr( iCounter, ReferenceString, ArrayToCompare(nWord))
       While iCounter <> 0
           'aumentamos cómputo de palabras contenidas...
           WordsFound = WordsFound + 1
           iCounter = InStr( iCounter + 1, ReferenceString, _
                 ArrayToCompare(nWord))
       Wend
    Next nWord
    GetNumOfMatchingWords = WordsFound
End Function

> Guillermo,
> Use this instead. To avoid an endless loop, I have to increment nWord, too.
[quoted text clipped - 104 lines]
>>>>
>>>> Guillermo

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

Russ - 16 Jul 2007 18:54 GMT
Guillermo,
Sorry, I'm not testing this function. But as I look at it again, I see that
I needed to move the iCounter inside the For loop.

Private Function GetNumOfMatchingWords(ArrayToCompare() As String, _
    ReferenceString As String)
    Dim iCounter As Long
    Dim nWord As Long
    Dim WordsFound As Long

    For nWord = 0 To UBound(ArrayToCompare())
       iCounter = 1
       iCounter = InStr( iCounter, ReferenceString, ArrayToCompare(nWord))
       While iCounter <> 0
           'aumentamos cómputo de palabras contenidas...
           WordsFound = WordsFound + 1
           iCounter = InStr( iCounter + 1, ReferenceString, _
                 ArrayToCompare(nWord))
       Wend
    Next nWord
    GetNumOfMatchingWords = WordsFound
End Function

> Guillermo,
> I wasn't going through ArrayToCompare, so use this instead.
[quoted text clipped - 128 lines]
>>>>>
>>>>> Guillermo

Signature

Russ

drsmN0SPAMikleAThotmailD0Tcom.INVALID

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.