Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / New Users / March 2008

Tip: Looking for answers? Try searching our database.

locating duplicate records in a 300,000 record file

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
robro - 14 Mar 2008 22:53 GMT
I previously had help with writing a macro to eliminate duplicate
records, and it worked great which told me there are two duplicate
records.  Now I need to quickly identify them.

There is one column of numbers, sorted in numeric order.  I just need
to place an X
in a helper column for each duplicate record so I can then sort by the
helper column.  Basically for this to be fast, I just want to compare
one cell with the one above it.  I've yet to figure out how to do
that.  Any help appreciated!  I've googled up a storm and all I'm
finding is deleting dupes or placing an X in a helper column but
comparing all records in column A which will take too long with
300,000 records.

Thanks!
JP - 14 Mar 2008 23:28 GMT
What Excel version do you have? I assume 2007, because you can't have
300k rows in any other version.

Let's say your data runs from A2:A300000 (with header in A1)

Column B will be the helper column. Starting from B3, enter this
formula:

=IF(COUNTIF(A2:A3,A3)=1,"X","")

Fill down and then Copy>Paste values. Any cell with an "X" has a
duplicate in the cell immediately above the corresponding cell in
column A. For example if A3 and A4 were dupes, cell B4 would have an
"X" in it.

Keep in mind this won't catch dupes found elsewhere in the range.

HTH,
JP

> I previously had help with writing a macro to eliminate duplicate
> records, and it worked great which told me there are two duplicate
[quoted text clipped - 11 lines]
>
> Thanks!
Billy Liddel - 14 Mar 2008 23:34 GMT
Robro

Sub MarkDupes()
'Assumes data in column A ' Adjust to suit
Dim i As Long, nr As Long
Range("A1").Select
nr = ActiveCell.CurrentRegion.Rows.Count
 For i = 2 To nr
   If Cells(i, 1) = Cells(i - 1, 1) Then
     Cells(i, 2) = "X"
   End If
 Next i
End Sub

Will mark cells in column b

Peter

> I previously had help with writing a macro to eliminate duplicate
> records, and it worked great which told me there are two duplicate
[quoted text clipped - 11 lines]
>
> Thanks!
robro - 14 Mar 2008 23:42 GMT
Thanks for both responses.  I guess my first macro did something odd
because using this method, there are no duplicates.
Damn, I'm trying to figure out why a SQL data import is crashing with
duplicate errors and my SQL guy is unavailable so
I'm resorting to guessing and using Excel (yes Excel 2007, I was very
excited to find it could handle so many rows when
it first came out.  That was one of the first features I checked for).

On Mar 14, 6:34 pm, Billy Liddel
<BillyLid...@discussions.microsoft.com> wrote:
> Robro
>
[quoted text clipped - 29 lines]
>
> > Thanks!
Pete_UK - 15 Mar 2008 00:25 GMT
Why not sort the data first and then put this in B2:

=IF(A1=A2,"X","")

and copy down? Then apply a filter to column B to locate all the X's
(duplicates).

Hope this helps.

Pete

> Thanks for both responses.  I guess my first macro did something odd
> because using this method, there are no duplicates.
[quoted text clipped - 42 lines]
>
> - Show quoted text -
T. Valko - 15 Mar 2008 04:06 GMT
>Excel 2007

Assume the range is A1:A300000
Select the range A1:A300000
Goto Home tab>Styles>Conditional Formatting>New rule>Format only unique or
duplicate values

From the drop down select duplicate
Click the Format button
Select the desired style(s)
OK out

Signature

Biff
Microsoft Excel MVP

> Thanks for both responses.  I guess my first macro did something odd
> because using this method, there are no duplicates.
[quoted text clipped - 39 lines]
>>
>> > Thanks!
Shane Devenshire - 15 Mar 2008 16:03 GMT
Hi,

Since you are using 2007, if you want to remove the duplicates you can choose the command Data, Remove Duplicates.  As already mentioned you can use conditional formatting to mark the duplicates.  But with 300000 you may want to locate the found records quickly, so:

1. After applying the conditional format mentioned in a previous reply
2. Select the 300000 rows and choose the command Home, Sort & Filter, Custom Filter
3. In the Sort box under Sort by pick the column you want to sort on, under Sort on pick Cell Color, under Order select No Cell Color and On Bottom, then click OK.

This will put all the duplicates at the top.

Cheers,
Shane Devenshire
Microsoft Excel MVP

> I previously had help with writing a macro to eliminate duplicateh
> records, and it worked great which told me there are two duplicate
[quoted text clipped - 11 lines]
>
> Thanks!

Rate this thread:






 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.