Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
DiscussionsAccessExcelInfoPathOutlookPowerPointPublisherWord
DirectoryUser Groups
Related Topics
Outlook ExpressInternet ExplorerWindowsMS Server ProductsMore Topics ...

MS Office Forum / Word / General MS Word Questions / November 2005

Tip: Looking for answers? Try searching our database.

Loading a document in a different codepage

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Tamara - 18 Nov 2005 15:19 GMT
Hi all,

I'm running Windows XP and Microsoft Offfice 2003.

When I open an existing "*.doc" document, and this document is written in
another language as English, a window opens and ask what codepage to use to
open that document.

I read often Thai documents, and everytime I''m asked what codepage to
use.

But recently I wanted to open a Thai document, and Word didn''t asked what
codepage shall be used.
It simply loaded the document and filled the screen with all kind of garbled
characters.

Is there anyway to tell Word that this document i to be openened in
Windows-Thai codepage?

TIA

Tamara
Bob Eaton - 20 Nov 2005 03:40 GMT
In Word: Tools, Options, General tab, "Confirm conversion at Open".

Bob

> Hi all,
>
[quoted text clipped - 20 lines]
>
> Tamara
Tamara - 20 Nov 2005 15:06 GMT
> In Word: Tools, Options, General tab, "Confirm conversion at Open".
>
> Bob

Hello Bob,

I've tried the options as you gave above, but Word is still loading the
document without asking for a conversion and once loaded my screen is filled
with garbage.

Any more suggestions?

TIA,

Tamara

>> Hi all,
>>
[quoted text clipped - 21 lines]
>>
>> Tamara
Bruce Rusk - 20 Nov 2005 19:18 GMT
What format are the files in?

>> In Word: Tools, Options, General tab, "Confirm conversion at Open".
>>
[quoted text clipped - 38 lines]
>>>
>>> Tamara
Tamara - 21 Nov 2005 02:27 GMT
> What format are the files in?

Helllo Bruce,

Strange but cute name.

The files are simple Word 6 files.

The document referred to is a Manual for a Bussiness Address Manager
packadge called "Address.Th".

The strange thing is that the document "Manual.doc" is not displayed as a
Thai document on my screen, but another file in this packadge named
"Treadme.wri" is displayed in the correct character set.

I have uploaded part of this file into
http://s36.yousendit.com/d.aspx?id=1B3N1SR3457E41W2HEJYN354WX

Everybody can download this file from the link and try it out..

Regards,

Tamara

>>> In Word: Tools, Options, General tab, "Confirm conversion at Open".
>>>
[quoted text clipped - 38 lines]
>>>>
>>>> Tamara
Bruce Rusk - 21 Nov 2005 21:31 GMT
I'm sending this in HTML because of formatting issues, apologies if you can't read it.

Does the document begin as follows?

1.

2.

3.

(I don't know Thai so I'm just guessing).

If so, the problem is probably that the file was saved on a non-Unicode setup, perhaps on an earlier Thai-only version of Windows/Office, perhaps using a third-party Thai-language enabling program. I don't know anything about Thai, but this is a big problem for East Asian languages such as Chinese. Before Windows and Office had good Unicode support, a lot of Chinese-language users outside of Chinese locales used such programs to input Chinese characters into Word.

The problem is that the files created this way are not easily readable in later, Unicode-aware versions of Office.

I wrote a simple VBA program that converts such documents in Chinese, and it could probably be adapted to your situation. Michael would be able to tell us if there's an easier way around this but VBA is the only way I've found so far.

The key function would be this:

Function ThaiToUnicode(ThaiText As String) As StringThaiToUnicode = StrConv(StrConv(ThaiText, vbFromUnicode), vbUnicode, wdThai)End Function

Given a string from your document, it will turn it into a correctly-formatted Unicode version of that string. Running it on the first few lines gave the above output, which looks like Thai to me ... and Google suggests that the first word is "manual."

If this is the right track, I can send you some code that can be adapted to process a whole document.

Bruce

>> What format are the files in?
>
[quoted text clipped - 62 lines]
>>>>>
>>>>> Tamara
Tamara - 22 Nov 2005 04:20 GMT
> I'm sending this in HTML because of formatting issues, apologies
> if you can't read it.

Don't worry, I can read it.
BTW, if by "formatting issues" you mean writing Thai, the message can be
send in non-html simply be using the format Thai encoding.

> Does the document begin as follows?

Yes Sir, very good work!!!!
A rough translation of what you wrote:

>ÊÒúѭ

Table of Contents

>º··Õè 1. ¡ÒõԴµÑé§Ãкº

Chapter 1. Installing the software

>º··Õè 2. àÃÔèÁµé¹ãªé§Ò¹

Chapter 2. Start using the software

>º··Õè 3. àÅ×Í¡¢éÍÁÙÅ

Chapter 3. Selecting data

>(I don't know Thai so I'm just guessing).

>If so, the problem is probably that the file was saved on a non-Unicode
>setup, perhaps on an earlier Thai-only version of Windows/Office, perhaps
[quoted text clipped - 5 lines]
>The problem is that the files created this way are not easily readable in
>later, Unicode-aware versions of Office.

The file is dated "23/01/1997".
Lots of files in that time have been written in a non-Unicode setup, but
used fonts especialy made for that particular language (8 bit fonts).
Loading these files without the proper fonts installed results in a screen
full of garbage.
Strange thing is that I have fonts that this file requires (CordiaUPC,
AngsanaUPC, EucrosiaUPC) installed on my machine, but it is stil garbled.

>I wrote a simple VBA program that converts such documents in Chinese, and
>it
>could probably be adapted to your situation. Michael would be able to tell
>us if there's an easier way around this but VBA is the only way I've found
>so far.

The easiest way I can imagine (IMHO) is that Word would ask for a conversion
or at least to install the proper fonts if Word cannot find them..
The original document specifies that the fonts CordiaUPC, AngsanaUPC and
EucrosiaUPC has been used (looked up with a binary editor).
If word can find these fonts, the document should be displayed correctly.
If not, Word should ask for the original fonts.
I have the multilanguage version of Word and Windows installled, thus
reading documents written in Thai would be assumed to be possible..

>The key function would be this:

>Function ThaiToUnicode(ThaiText As String) As String
>ThaiToUnicode = StrConv(StrConv(ThaiText, vbFromUnicode), vbUnicode,
[quoted text clipped - 5 lines]
>few lines gave the above output, which looks like Thai to me ... and Google
>suggests that the first word is "manual."

Altough I know a little bit of programming in Visual Basic myself, I did not
experiment with that idea.
But if the end result is acceptable, and apparently it is, then this
solution is fine for me.

>If this is the right track, I can send you some code that can be adapted to
>process a whole document.

For me, the end result is most important.
As far as I can see, you did a wonderful job and I wish to thank you for
that.
If it would not be too much trouble for you, I would gladly receive from you
a code as you describe (Visual Basic) that would allow me to process the
whole document.

I would surely appreciate if Michael or some people from Microsoft could
come up with a Macro that would convert documents written in non-Unicode
fonts to Unicode fonts and publish it for the many other users of Office who
faces the same problem.

Regards,

Tamara

>Bruce

>> What format are the files in?
>
[quoted text clipped - 62 lines]
>>>>>
>>>>> Tamara
Bruce Rusk - 22 Nov 2005 04:53 GMT
Tamara,

I played around with it a bit more, and here's the solution.

1. Open the .DOC file in Wordpad, not Word (in XP you can right-click, Open
With -> WordPad).

2. Save As a new file name (e.g. Manual2.doc) with Rich Text format

3. Open the new file in Word. The Thai text might display in the wrong font,
just select all and put it into the Thai font.

Should be fine, though some of the formatting might not be preserved.

Bruce
Bruce Rusk - 22 Nov 2005 05:19 GMT
Or, alternately, open in WordPad, paste into a blank Word document, change
font.

> Tamara,
>
[quoted text clipped - 11 lines]
>
> Bruce
Tamara - 23 Nov 2005 01:41 GMT
Dear Bruce,

I have tried both options you gave, but with no succes.
The text was displayed everytime as question marks.

Spinning further on your idea, I tried to open the "*.doc" file with
Internet Explorer (right-clicking and choosse "open with").

The file was then loaded into Word as a "Web" file and strangely enough it
was loaded in the right Thai font.
I guess that some of the formatting might be lost, but that is less
important.

Strange thing is that the file which was originaly more than 12MB after the
conversion was reduced to 1,768KB.
The manual is complete as far as I can see.

I have uploaded the result to see for yourself at:
http://s35.yousendit.com/d.aspx?id=0BIO3EY2T6YOH0QIEXR8AARKDN

If you don't mind, I would gladly receive the program code to prcoess a file
as you said.
Maybe it weil come in handy for loading other files if there is a problem.

Thanks to all the people who helped me with this problem.

Regards,

Tamara

> Or, alternately, open in WordPad, paste into a blank Word document, change
> font.
[quoted text clipped - 14 lines]
>>
>> Bruce
Bob Eaton - 22 Nov 2005 18:02 GMT
> I would surely appreciate if Michael or some people from Microsoft could
> come up with a Macro that would convert documents written in non-Unicode
> fonts to Unicode fonts and publish it for the many other users of Office
> who faces the same problem.

Check out http://scripts.sil.org/EncCnvtrs for the "Data Conversion Macro"
which can be used to convert documents from legacy fonts/encodings to
Unicode (or other Unicode text processing such as transliteration, etc).

The current version supports code page converters (e.g. Thai code page =
874), but they can't be configured directly (it'll be there in the next
version--until then, see below for how to add a code page converter to the
repository).

Once the converter is added to the repository, then it can be used in any
COM/.Net enabled application and in particular, the Data Conversion Macro is
a VBA client app for converting Word documents.

Bob

To add a code page converter, use something like the following:

Sub AddThaiCodePageConverter()
   Dim aECs As EncConverters
   Set aECs = CreateObject("SilEncConverters.EncConverters")
   aECs.Add "Thai Code Page", "874", ConvType_Legacy_to_from_Unicode,
"Thai", "UNICODE", _
       ProcessTypeFlags_CodePageConversion
End Sub
Tamara - 23 Nov 2005 01:57 GMT
>> I would surely appreciate if Michael or some people from Microsoft could
>> come up with a Macro that would convert documents written in non-Unicode
[quoted text clipped - 26 lines]
>        ProcessTypeFlags_CodePageConversion
> End Sub

Dear Bob,

Thank you for the link to the "Data Conversion Macro" and the code to add a
Code Page converter to MS Office..

I've finaly been able to open the document without loosing any vital
information.

Regards,

Tamara
Michael (michka) Kaplan [MS] - 21 Nov 2005 00:36 GMT
Have you tried changing the setting in the FileOpen dialog?

Signature

MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap

This posting is provided "AS IS" with
no warranties, and confers no rights.

>> In Word: Tools, Options, General tab, "Confirm conversion at Open".
>>
[quoted text clipped - 38 lines]
>>>
>>> Tamara
Tamara - 21 Nov 2005 02:33 GMT
> Have you tried changing the setting in the FileOpen dialog?

Hallo Michka,

I have tried almost everything that I can think off before posting this
problem into Usenet.

I evven did open the file with a binary editor, and tried to make changes to
the file in the hope that Word would not recognise what language the
document is written in and would ask me before displaying it.

Ths has worked sometimes fine in similar cases.

Anyhow, I have uploaded part of the document into
http://s36.yousendit.com/d.aspx?id=1B3N1SR3457E41W2HEJYN354WX

Anybody can download the document from this link and see for what the cause
for this problem is.

Regards,

Tamara

>>> In Word: Tools, Options, General tab, "Confirm conversion at Open".
>>>
[quoted text clipped - 38 lines]
>>>>
>>>> Tamara
Michael (michka) Kaplan [MS] - 21 Nov 2005 10:31 GMT
The option you want is the "Open and Repair" option -- the dropdown on the
"Open" button in the file open dialog.

It is also possible that the files are corrupt and not just in some other
code page....

Signature

MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap

This posting is provided "AS IS" with
no warranties, and confers no rights.

>> Have you tried changing the setting in the FileOpen dialog?
>
[quoted text clipped - 61 lines]
>>>>>
>>>>> Tamara
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.