Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / New Users / June 2007

Tip: Looking for answers? Try searching our database.

Finding Duplicate Entries

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
MarkC - 18 Jun 2007 01:47 GMT
Using XP Pro SP2 and Excel XP (2002)

Like to know if it is possible find all duplicate entries?.  Can not use a
starting string because it's unknown having all kinds of possibilities.

I text pasted a file directory with paths and filenames, I can have
duplicate filenames in different directories, and like to search for
duplicate names if possible.  The list was entered in delimited based on "\"
which will be in it's own column, dependent on how many sub-folders.

At this stage, I am interested in only finding duplicate file naming.  Not
concerned if these files are identical files, or different files having the
same name.

Thanks,

Mark
Bernie Deitrick - 18 Jun 2007 18:43 GMT
Mark,

I'm assuming that your list is in column A, starting in cell A2.

In Cell B2, array enter (Enter using Ctrl-Shift-Enter) the formula:

=MID(A2,MAX(IF(ISERROR(SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))),0,SEARCH("\",A2,ROW(INDIRECT("A1:A"&LEN(A2))))))+1,256)

in Cell C2, enter the formula:

=IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")

Then copy B2:C2 down to match your list of file names in column A, and duplicate filenames will be
highlighted by column C's value.

HTH,
Bernie
MS Excel MVP

> Using XP Pro SP2 and Excel XP (2002)
>
[quoted text clipped - 11 lines]
>
> Mark
*alan* - 19 Jun 2007 01:16 GMT
(TOP-POSTING CORRECTED)

>> Using XP Pro SP2 and Excel XP (2002)
>>
[quoted text clipped - 13 lines]
>>
>> Mark

> Mark,
>
[quoted text clipped - 14 lines]
> Bernie
> MS Excel MVP

That doesn't appear to work (at least not in Excel 2003).

Here, however, is a low-tech solution that does work in Excel 2003:

Assuming that A1 is a column heading and your name entries begin in A2,

1. select column A
2. data> filter > advanced filter> check Unique records only > click OK
3. Alt; (selects visible cells)
4. Data > filter > show all

Step 2 will hide all the duplicates.
Step 3 will highlight the remaining unique records.
Step 4 will make the entire column A reappear, with Unique records
highlighted (grey) and the duplicates un-highlighted (white).

Hope that helps.
--
Alan

A: Because it runs counter to the normal flow of communication.
Q: Why is top-posting frowned upon?
Dave Peterson - 19 Jun 2007 01:29 GMT
First, it's the custom of the microsoft.public.excel newsgroups to top post.
It's different than most of Usenet.

And try Bernie's array formula once more.  That long formula returns just the
filename and drops the drive and path (all folders).  The formula returns
everything after the last backslash.

It should work for you in xl2003--it worked for me.

Second, your formula looks for exact duplicates for the whole string--not just
the filename.  Your technique doesn't address the OP's problem.

> (TOP-POSTING CORRECTED)
>
[quoted text clipped - 57 lines]
> A: Because it runs counter to the normal flow of communication.
> Q: Why is top-posting frowned upon?

Signature

Dave Peterson

MarkC - 20 Jun 2007 03:39 GMT
I tried both methods and got both of them to work.  Bernie's formula is the
one that I needed for this particular project.   However, it is good know
about the advance filter explained by Alan.  It may come in handy for some
other projects down the road.

I do have one more issue, I have 10,000 rows, is there a quicker way to
"copy down" the formula cells other than holding down the bottom right
corner and dragging?  I have other duplicate filenames sheet tabs that have
20,000+ rows and there's got to be a faster way.  Using Excel 2002.

Thanks,

M

> First, it's the custom of the microsoft.public.excel newsgroups to top
> post.
[quoted text clipped - 76 lines]
>> A: Because it runs counter to the normal flow of communication.
>> Q: Why is top-posting frowned upon?
Dave Peterson - 20 Jun 2007 03:47 GMT
Debra Dalgleish has a few options here:
http://contextures.com/xlDataEntry01.html

> I tried both methods and got both of them to work.  Bernie's formula is the
> one that I needed for this particular project.   However, it is good know
[quoted text clipped - 94 lines]
> >
> > Dave Peterson

Signature

Dave Peterson

MarkC - 20 Jun 2007 05:30 GMT
thanks, Interesting site and in my favorites!
M

> Debra Dalgleish has a few options here:
> http://contextures.com/xlDataEntry01.html
[quoted text clipped - 111 lines]
>> >
>> > Dave Peterson
Gord Dibben - 20 Jun 2007 04:45 GMT
Mark

Enter the formula then double-click on the fill handle to copy down as far as
you have data in an adjacent column.

Or type the formulas in B2 and C2 then in namebox type B2:C10000 then ENTER key.

Then CTRL + d to fill.

Gord Dibben  MS Excel MVP

>I tried both methods and got both of them to work.  Bernie's formula is the
>one that I needed for this particular project.   However, it is good know
[quoted text clipped - 90 lines]
>>> A: Because it runs counter to the normal flow of communication.
>>> Q: Why is top-posting frowned upon?
MarkC - 20 Jun 2007 05:13 GMT
Thanks Gord, I got the namebox method to work.  But for some reason couldn't
get the double-click on the fill handle to work.  Can you explain that one a
little more?  And is it available in Excel 2002?

Thanks,

M

> Mark
>
[quoted text clipped - 111 lines]
>>>> A: Because it runs counter to the normal flow of communication.
>>>> Q: Why is top-posting frowned upon?
Gord Dibben - 20 Jun 2007 15:09 GMT
Yes, it is available in 2002

Enter the formula in top cell of column.

If adjacent column has continuous data, double-clicking on the little black
square at lower right corner of the cell should copy down as far as you have
data in the adjacent column.

Note:  you must have "allow drag and drop" checked in Tools>Options>Edit in
order to get the fill handle.

You did say you could drag/copy so I assume it is checked.

Gord

>Thanks Gord, I got the namebox method to work.  But for some reason couldn't
>get the double-click on the fill handle to work.  Can you explain that one a
[quoted text clipped - 119 lines]
>>>>> A: Because it runs counter to the normal flow of communication.
>>>>> Q: Why is top-posting frowned upon?
MarkC - 21 Jun 2007 03:47 GMT
Got IT!!!

Thanks!
M

> Yes, it is available in 2002
>
[quoted text clipped - 148 lines]
>>>>>> A: Because it runs counter to the normal flow of communication.
>>>>>> Q: Why is top-posting frowned upon?
MarkC - 21 Jun 2007 03:46 GMT
I am retyping Alan's method for quick reference:

*****Assuming that A1 is a column heading and your name entries begin in A2,

1. select column A
2. data> filter > advanced filter> check Unique records only > click OK
3. Alt; (selects visible cells)
4. Data > filter > show all

Step 2 will hide all the duplicates.
Step 3 will highlight the remaining unique records.
Step 4 will make the entire column A reappear, with Unique records
highlighted (grey) and the duplicates un-highlighted (white).*****

My Questions:
Is there an opposite way in doing this?  That is, is there a way to select
non-visible cells?  Or does Excel have a reverse highlight/un-highlight
feature?  Highlight becomes Un-highlighted and vice-versa.

Thanks,
M

> (TOP-POSTING CORRECTED)
>
[quoted text clipped - 58 lines]
> A: Because it runs counter to the normal flow of communication.
> Q: Why is top-posting frowned upon?
MarkC - 19 Jun 2007 01:16 GMT
Thanks, I will give it a try!
M

> Mark,
>
[quoted text clipped - 32 lines]
>>
>> Mark
MarkC - 20 Jun 2007 05:29 GMT
Too ALL:

I have played around with the duplicates and found out with Bernie's method,
it does find all filename duplicates, but the problem is, when deleting the
rows, it removes "ALL" duplicates.  I need to keep one on record to show
that it's on file.

With Alan's method, it looks for exact duplicates for the whole string.  The
good part is when deleting the marked duplicates, it leaves one on record.

M

> Thanks, I will give it a try!
> M
[quoted text clipped - 36 lines]
>>>
>>> Mark
Dave Peterson - 20 Jun 2007 12:55 GMT
I think that the difference is how people understand your original question.

If you want all duplicates marked, then use Bernie's formula:
=IF(COUNTIF(B:B,B2)>1,"Duplicate File Name","")

If you want all duplicates marked but not the first entry, you can use:
=IF(COUNTIF(B$2:B2,B2)>1,"Duplicate File Name","")

> Too ALL:
>
[quoted text clipped - 48 lines]
> >>>
> >>> Mark

Signature

Dave Peterson


Rate this thread:






 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.