Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Excel / New Users / August 2007

Tip: Looking for answers? Try searching our database.

Extracting some text from a string.

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Art MacNeil - 18 Aug 2007 01:18 GMT
Hello,

 I'm trying to extract some specific text from a bunch of strings of text.

I think this should be really easy but I can't get it done. I've used "Left"
"Mid", "Substitute" and "Find" in one combination of another and sometimes
it works but it doesn't always work.

Regardless of the path, what I need is, the string of text after the final
"\" and before the ".xls"

I've copied a sample of the strings I'm working with below.

C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\5 year compare\5yearscomparePF.xls
C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\CCA\CCA Line 1 to 2475.xls
C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\CCA\CCA Lines 2475 & down.xls
C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\RRSP Deduction pf\RRSP Deduction PF.xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\5 year compare pf.xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\CCA Line 1 to 2475.xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\CCA Lines 2475 & down.xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Federal\RRSP Deduction pf.xls
C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\Annexe A\Annexe A T P.xls
C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\LSVCC\LSVCC (Québec) T P.xls
C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\QVEH-B\QVEH-B series T P.xls
C:\Source\qa\T P 2006\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\TP1-Comparative_Summary\TP1_Comparative_TAXSUMMARY_PF.xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\Annexe A.xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\LSVCC (Québec).xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\QVEH B.xls
E:\T Ps\T P 2007\QT, IR, QTW, IRW\Tax Components\Calculation
Testing\Québec\TP1 Comp Summary pf.xls

The Text I need is:
5yearscomparePF
CCA Line 1 to 2475
CCA Lines 2475 & down
RRSP Deduction PF
5 year compare pf
CCA Line 1 to 2475
CCA Lines 2475 & down
RRSP Deduction pf
Annexe A T P
LSVCC (Québec) T P
QVEH-B series T P
TP1_Comparative_TAXSUMMARY_PF
Annexe A
LSVCC (Québec)
QVEH B
TP1 Comp Summary pf

Thank you for your help,

Art.
Rick Rothstein (MVP - VB) - 18 Aug 2007 01:51 GMT
If your 'xls' extensions are all always in lower case letters, then this
formula will work...

=SUBSTITUTE(SUBSTITUTE(A1,LEFT(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))),""),".xls","")

If the extensions can be all lower or all upper case letters, then this
formula will work....

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,LEFT(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))),""),".xls",""),".XLS","")

Note though that neither formula will work if there is a mixture of upper
and lower case letters in the extension part of the string.

Rick

> Hello,
>
[quoted text clipped - 63 lines]
>
> Art.
MartinW - 18 Aug 2007 02:14 GMT
Hi Rick,

Perhaps this formula which I believe was one of yours might work better.
=MID(A1,FIND("~",SUBSTITUTE(A1,"\","~",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))+1,999)
then in C1 put
=LEFT(B1,LEN(B1)-4)
This allows for upper and lower case as well as .doc or any other extension.

Now if you could just show us how to combine the two.....I've been trying
but I just can't get the syntax right.

Regards
Martin

> If your 'xls' extensions are all always in lower case letters, then this
> formula will work...
[quoted text clipped - 79 lines]
>>
>> Art.
Rick Rothstein (MVP - VB) - 18 Aug 2007 02:58 GMT
Hey, I forgot I posted that one (I tend to re-create these things as the
questions come in). Here is the combined formula...

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,LEFT(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))),""),".xls",""),".XLS","")

It is 38 characters longer than the one I posted to this thread and requires
some 5 additional function calls (the extra overhead coming from having to
remove the extension).

Rick

> Hi Rick,
>
[quoted text clipped - 94 lines]
>>>
>>> Art.
Art MacNeil - 18 Aug 2007 02:35 GMT
This worked perfectly:

=SUBSTITUTE(SUBSTITUTE(A1,LEFT(A1,FIND("*",SUBSTITUTE(A1,"\","*",LEN(A1)-LEN(SUBSTITUTE(A1,"\",""))))),""),".xls","")

I can't believe how small the formula is.  Mine was huge, and it didn't work
in all cases.

Thank you very much,

Art.

> If your 'xls' extensions are all always in lower case letters, then this
> formula will work...
[quoted text clipped - 79 lines]
>>
>> Art.
Ron Coderre - 18 Aug 2007 03:24 GMT
Try this:

B1:
=SUBSTITUTE(RIGHT(A1,MATCH("\",INDEX(MID(A1,LEN(A1)+1-ROW($1:$255),1),0),0)-1),".xls","")

That assumes the file name does not exceed 255 characters.

Is that something you can work with?
***********
Regards,
Ron

XL2003, WinXP

> Hello,
>
[quoted text clipped - 63 lines]
>
> Art.
Art MacNeil - 18 Aug 2007 07:55 GMT
This worked as well.

Thank you,

Art.

> Try this:
>
[quoted text clipped - 81 lines]
>>
>> Art.
Harlan Grove - 18 Aug 2007 08:19 GMT
"Art MacNeil" <artmacneil@shaw.ca> wrote...
...
>Regardless of the path, what I need is, the string of text after the final
>"\" and before the ".xls"
...

Base filenames.

Another alternative, define the name seq referring to the formula

=ROW(INDEX($1:$65536,1,1):INDEX($1:$65536,1024,1))

Then with one of these full pathnames in cell A2, try the formula

=SUBSTITUTE(REPLACE(A2,1,LOOKUP(2,1/(MID(A2,seq,1)="\"),seq),""),".xls","")

For more general parsing tasks, you'd be better off downloading and
installing Laurent Longre's MOREFUNC.XLL add-in and using it's regular
expression functions. Using it you could extract base filenames using

=REGEX.SUBSTITUTE(A2,"^.+\\([^\\]+?)(\.[^.\\]*)?$","[1]")
Ron Rosenfeld - 18 Aug 2007 12:08 GMT
>"Art MacNeil" <artmacneil@shaw.ca> wrote...
>...
[quoted text clipped - 17 lines]
>
>=REGEX.SUBSTITUTE(A2,"^.+\\([^\\]+?)(\.[^.\\]*)?$","[1]")

Or you could use (a little shorter):

=REGEX.MID(A1,"[^\\]*(?=\.xls$)")

or, if other than xls files might be present:

=REGEX.MID(A1,"[^\\]*(?=\.\w{3}$)")

--ron
Ron Rosenfeld - 18 Aug 2007 15:29 GMT
>>"Art MacNeil" <artmacneil@shaw.ca> wrote...
>>...
[quoted text clipped - 27 lines]
>
>--ron

The non-XLS file version regex could be augmented to something like:

[^\\]*(?=\.[^\\]*$)

or

[^\\]*(?=\.[^\\]+$)

--ron
Art MacNeil - 18 Aug 2007 19:25 GMT
> On Sat, 18 Aug 2007 07:08:43 -0400, Ron Rosenfeld
> <ronrosenfeld@nospam.org>
[quoted text clipped - 42 lines]
>
> --ron

Thank you.

I hate to say it, but I already had downloaded the MOREFUNC.XLL add-in
installed.  I just didn't realize it could be used to do this.

Art.
Ron Rosenfeld - 18 Aug 2007 20:00 GMT
>Thank you.
>
>I hate to say it, but I already had downloaded the MOREFUNC.XLL add-in
>installed.  I just didn't realize it could be used to do this.
>
>Art.

Once you get used to using Regular Expressions, you'll find that complex text
manipulations become pretty simple.

By the way, so far as I know, MOREFUNC.XLL has not been updated to work with
XL2007, so some of the functions don't work.

Also, one limitation of MOREFUNC is 255 characters.  This is a limitation of
the XLL, apparently.  But the Regular Expression functions can all be
duplicated in VBA, with only minor changes in the expressions.
--ron
Harlan Grove - 18 Aug 2007 19:47 GMT
"Ron Rosenfeld" <ronrosenfeld@nospam.org> wrote...
...
...
>>>=REGEX.SUBSTITUTE(A2,"^.+\\([^\\]+?)(\.[^.\\]*)?$","[1]")
...
>The non-XLS file version regex could be augmented to something like:
>
[quoted text clipped - 3 lines]
>
>[^\\]*(?=\.[^\\]+$)
...

I have nastier filenames on my system, e.g., .RHistory which begins with a
period and has no other period. My REGEX.SUBSTITUTE formula returns it while
your REGEX.MID formula returns "". Either may be appropriate depending on
exactly how one wants to handle such filenames. However, there are also
filenames that have no periods whatsoever, and with those my formula returns
then while yours still returns "".

I had tried REGEX.MID with several variations on your ending assertion to
handle such filenames, but I gave up and used (what I consider to be) the
much more reliable REGEX.SUBSTITUTE.
Ron Rosenfeld - 19 Aug 2007 01:04 GMT
>"Ron Rosenfeld" <ronrosenfeld@nospam.org> wrote...
>...
[quoted text clipped - 20 lines]
>handle such filenames, but I gave up and used (what I consider to be) the
>much more reliable REGEX.SUBSTITUTE.

It is certainly the case that my REGEX.MID function assumes that the filename
will have a suffix starting with a "."

How do you differentiate a file name that has no suffix from a subfolder name?
I suppose you could look for a terminal "\", but you'd have to ensure that
would be in the list you're examining.
--ron
Harlan Grove - 19 Aug 2007 03:47 GMT
"Ron Rosenfeld" <ronrosenfeld@nospam.org> wrote...
...
>How do you differentiate a file name that has no suffix from a subfolder
>name?
>I suppose you could look for a terminal "\", but you'd have to ensure that
>would be in the list you're examining.

Exactly. If there's a \ to the right, you're NOT looking at a filename.
Conversely, folders CAN have periods in their names.

As for ensuring there are \s, it didn't (and still doesn't) seem relevant to
this thread, but if filenames MIGHT lack drive/directory paths, then use

=REGEX.SUBSTITUTE(A1,"^(.+\\)?([^\\]+?)(\.[^.\\]+)?$","[2]")
Ron Rosenfeld - 19 Aug 2007 12:37 GMT
>"Ron Rosenfeld" <ronrosenfeld@nospam.org> wrote...
>...
[quoted text clipped - 10 lines]
>
>=REGEX.SUBSTITUTE(A1,"^(.+\\)?([^\\]+?)(\.[^.\\]+)?$","[2]")

OK.  That makes sense.  And it's been a long time since I've worked with other
file systems.  As usual, the best Regex to use depends on the nature of the
data and what is to be extracted.

It is the case that "[^\\]*(?=\.[^\\]+$)" will match just the file names in all
of the examples given by the OP.  And if he were only interested in .xls files,
then

=REGEX.MID(A2,"[^\\]*(?=\.xls$)")

But yours is more general, and could be easily adapted to other situations.
--ron

Rate this thread:






 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.