Hi,
As for the hard ¶ returns at the end of each line:
For tipps, see http://www.word.mvps.org/FAQs/Formatting/CleanWebText.htm
Often in such files, the "line breaks" have a space before them, while the
"real" paragraph marks haven't. In that case, you could replace " ^p" with a
space.
Or the "real" paragraph marks are marked by two or more consecutive
paragraph marks, in which case you could first format multiple paragraph
marks say as "bold":
Edit > Replace, check "Match wildcards,
Find what: ^13(^13){1,}
Replace with: ^& ((and Format > Font > Bold))
Then replace the remaining (single) paragraph marks that are not bold (with
a regular Replace) with spaces.
As for the headings/outline levels: How have they been applied in the text
file?
Often, you have say 3 empty paragraphs before a heading and one empty ¶
below.
Then you could remove the empty ¶s and add a tag with a wildcard
replacement,
Find what: (^13)^13^13^13([!^13]@^13)^13
Replace with: \1<H>\2
That gets rid of the empty paragraph marks (^13 can be used instead of ^p in
a wildcard "Find", \1 inserts the first parenthesized (expression), \2 the
second...
It also inserts "<H>" as a marker tag in front of the designated heading.
Or (other) headings might be differentiable from regular text because they
are short (say between 2 and 20 characters) and don't have a punctuation
mark (?!.) at the end.
Then you could insert a tag with a wildcard replacement
Find what: (^13)([!^13\!\?.]{2,20}^13)
Replace with: \1<H>\2
[!^13\!\?.] matches any character except a ¶ mark, and the punctuation marks
"! ? ."
{2,20} looks for between 2 and 20 of them.
In some countries, the list separator is a semicolon instead of a comma:
{2;20}
Then in two regular replacements, replace first the tag <H> with a "Heading"
style of your choice, to apply the style, then replace the tag with nothing
to delete it.
How to best do it depends a lot on how the file looks exactly...
Regards,
Klaus