Word: Problems with Excel data sources

Home Up

 

 

Google
 


 

If you're a Word 2002/2003/2007 user and you're in a hurry...

If you are having problems using an Excel Workbook as a data source for a MailMerge in Word 2002 (Word XP), Word 2003, or Word 2007 many problems can be solved in one of two ways (and I'm not so sure when it comes to Word 2007)

1. Use Word "field formatting switches" to modify the way Word displays your data

You can use formatting switches in your Word MERGEFIELD fields and other types of field. You will find plenty of information about many of the problems and solutions on Graham Mayor's pages at
http://www.gmayor.com/mail_merge_labels_with_word_xp.htm (look for the Excel section) and
http://www.gmayor.com/formatting_word_fields.htm

2. Change the way Word gets your data to "DDE"

Word has several ways to connect to Excel data. In Word 2002 and later, Word uses something called OLE DB unless you say otherwise. You can change Word so that it uses the way that Word gets its data from Excel to a thing called "DDE." DDE is an old method that does not understand everything in newer Excel workbooks, so it is not a good long-term choice. In particular, DDE only has a chance of working if:
 1. you have Excel on your system and
 2. you are connecting to the first worksheet in your workbook 

To change to DDE:
 a. in Word, make sure a document (it can be a blank one) is open, then click the Tools|Options menu option
 b. select the General tab
 c. check "Confirm conversion at open"
 d. close the dialog box
 e. go through the process of connecting to your data source again
 f. after you have selected the file in the "Select Data Source" dialog box and clicked "Open", you should see an additional dialog box titled "Confirm Data Source"
 g. select "MS Excel Worksheets via DDE (*.xls)" and click OK.

To see detailed screenshots of the above procedure, click here. It's a big page with lots of graphics - about 2Mbytes.

Notes:
 A. The "Confirm conversion at open" option is "sticky," so every time you open a file or a data source while this option is set, you are likely to see a dialog box.
 B. Changing the connection method is not "sticky" - the next time you need to connect to a workbook using DDE you will need to use the same process.

...If you're not a Word 2002/2003/2007 user, or you're not in a hurry, or the above approaches didn't help

Unfortunately, I cannot provide "The Answer", partly because there is no single solution that works in all cases, and partly because some of the problems are still unexplored. So the following notes describe what my current understanding of how Word gets data from Excel. It is not at all straightforward, so bear with me...

A short note for Mac Word users: I only have experience of Mac Word 2004, which only has one connection method for connecting to Excel. It behaves in a rather similar way to the DDE method in Windows except that you can connect to any worksheet in the workbook. The main problem with it is that it does not work properly with non-ANSI Unicode data in the workbook. At the moment I can't offer a workaround for that problem except that if your worksheet has 63 columns or fewer, you should be able to copy/paste it into a Word document and use that as the data source. If you can paste non-tabular data, you may be able to go beyond that 63-column limit. The rest of this document is concerned with the Windows version of Word.

How (Windows) Word gets data from Excel

Word can get data from Excel using "OLE DB", "DDE", "ODBC", or a "converter". These things are often referred to as Connection methods.

For further details on these methods, look here. Otherwise, the key points are as follows:

 a. In Word 2000 and earlier Word will try to get data from Excel using DDE by default - that is, unless you specify otherwise.

 b. in Word 2002, 2003, Word will try to get data from Excel using the Jet OLE DB Provider by default. In Word 2007, Word will use the Ace OLE DB Provider by default. Jet is the database engine that is used by Microsoft Access to maintain data in its .mdb format files. It is also capable of reading and writing a number of other formats including some older dBase formats, delimited text files, Excel workbooks and others. In Office 2007, Microsoft produced a modified version of Jet called Ace. A key change for Word users is that the Ace OLE DB Provider can read data in the new Excel 2007 XML formats - .xlsx and so on.

 c. DDE uses Excel to get the data more or less in the format you see it in Excel. However, DDE can only get data from the first worksheet in your workbook. It cannot read Unicode data properly. It isn't maintained or supported. Because it uses Excel, Excel must be on your system, and if you are trying to automate Word to do a MailMerge, the whole process becomes much more complicated because Excel must run.

 d. In contrast, the OLE DB Provider does not use Excel, except, perhaps when the workbook you want to access is already open in Excel. The OLE DB Provider can get data from any worksheet that it can see in your workbook. It reads Unicode text data properly. It can retrieve very long memo fields. However, the OLEDB provider does not always get the data as you see it in Excel, and in some cases it can lose data such as memo fields, text and so on. Also, unless the workbook is already open in Excel, the OLE DB Provider
    - cannot get data from the older Excel Spreadsheet XML format used in Excel 2003.
    - cannot get more than 255 columns (so you can't even use the 256 available in many versions of Excel, let alone the increased count in Excel 2007
     - cannot get data from an Excel workbook that has been encrypted using a "Password to open" (e.g. in Excel|Tools|Options|Security). (If the workbook has a "Password to modify" but no "Password to Open", the OLE DB Provider should be able to get data from it)

 e. As far as I am aware, in Word 2003 no method can open Excel data sources (or any other kind of data source) located on an Internet URL such as an http or ftp URL. The file has to be accessible via the Windows file system, either locally or via a Windows share (I do not think you can open files via a WebDAV share either). Even in that case, you will need the necessary Windows file system permissions to open the file - they may be permissions on the file itself, the folder it is in, or the share.

 f. If the workbook is already open in Excel, all the methods seem to be able to connect to it. However, the Converter asks whether you want to make a read-only copy. The other methods never seem to report a "read-only" problem.

 g. If the workbook is encrypted using a "Password to Open", only the DDE method wil be able to get data from it if the workbook is closed. DD will get Excel to open the book, and the user will be prompted for the password. if the workbook is already open in Excel, DDE, ODBC and OLE DB all appear to work, but the Converter asks whether you want to make a read-only copy, then tries to open it and cannot.

 h. if the Workbook is not already open in Excel but you have opened it as a Word data source, then Excel pops up its "File in use" dialog box and ask if the user wants to open the file Read-Only. The user can avoid this dialog by opening the workbook using Excel File|Open and specifying "Open Read-Only" in the Open button's dropdown option list. (If Word used DDE to open the Workbook, it will already be open in Excel). Trying to change the way that Word opens the workbook by setting the OLE DB Provider's connection string "Mode" parameter to a value such as "Share Deny None"  does not work around this problem.

 The remainder of this article is about what the OLE DB Provider actually does. The information provided here comes from experiments that have been done mainly on a Windows XP Pro SP2 system  running Office 2003 Pro SP2 both fully patched up to around 25 January 2007. The experiments have all been to do with setting up Excel as a mail Merge Data Source, rather than as a datasource for a Word DATABASE field.

Using an Excel data source in Word while it is open in Excel

Broadly speaking, Word seems to be able to use an Excel workbook as a data source even when you have it open in Excel, whichever method you use to connect.

Why doesn't the OLE DB provider get the data as I see it in Excel?

None of the Connection methods transfer anything other than the values in the cell, with or without text formatting such as currency signs, numeric separators, percentage signs and so on,. None of them transfers typeface or font size, text or cell colour information and so on.

However, when you format a value in Excel, so that you see, say $1,234.50 in a cell, OLE DB may only transfer the value 1234.5.

There are two main reasons for this:

1. Excel does not actually store the data as you see it

The OLE DB Provider does not retrieve the data you see in Excel from the .xls workbook file because Excel does not store the data as you actually see it, and the Provider does not apply all the formatting information that Excel stores about each cell.

For example, suppose you enter the number 1234.5 in an Excel cell, then format it as a US dollar currency value, Excel will display it as $1,234.50 . However, when you save the workbook, Excel does not store that displayed value anywhere. What it (probably) stores is:
 a. the text "1234.5"
 b. a "type", in this case "Number"
 c. a format or layout specification, in this case perhaps "[$$-409]#,##0.00"

I say "probably" because I don't have the details of the proprietary binary .xls format and cannot see what Excel actually stores. However, in Excel 2003 it is possible to see what Excel probably stores in .xls format by saving in Spreadsheet XML format and examining it using (say) Notepad or the XML Notepad application that you can download from the Microsoft website.  The XML format definitely does not store "$1,234.50"

Not surprisingly, Excel understands its own format specifications and is capable of reconstructing the display text "$1,234.50" when you re-open the Workbook. This is why the DDE connection method can return the data you see in Excel - it lets Excel do all the formatting work for it. Excel format specifications are richer and more complicated than the formatting switches used in Word MailMerge. For example, Excel can use:
 d. 'Short Date', i.e. the Short date format set up in your Regional options in Control Panel (I think)
 e. '"£"#,##0.00;[Red]"£"#,##0.00' for UK pounds that display in Red when negative.
 f. '[<=9999999]###\-####;\(###\)\ ###\-####'' for a U.S. telephone number.

However, the OLE DB provider does not use the same code as Excel and does not appear to understand or apply some of the format specifications that Excel uses, and in some cases probably applies its own rules. One thing I have noticed is that the formats that Excel uses if you select a format from the Currency column are sometimes different from the formats you get in the Custom column  . In particular
 g. '[$$-409]#,##0.00' is the standard format for US dollars (when working on a UK Windows system). The OLE DB provider does not appear to understand the [$$-409].
 h. '"$"#,##0.00' is a US dollar format picked from the "Custom format" category in Format|Cells... while the Windows Regional Options locale was set to "United States".

2. The OLE DB Provider has to determine a specific data type for each column.

In order to conform to OLE DB standards, the OLE DB provider has to determine a "data type" for each column in the data source. This is normal practice in the database world. For example, in an Access database you specify whether a column is numeric, text, YesNo (Boolean), date/time and so on. You simply can't store a text value such as mytext in a numeric column in Access - the software will not let you.

In Excel, by contrast, every cell in a column could contain a different type of data. Excel encourages this, because it is designed to recognise certain types of data and act accordingly. It should be able to - it's  a spreadsheet, not a database. For example, if you enter a number such as 1234.5 into a cell, Excel right-justifies it. Enter the text mytext and Excel left-justifies it. Enter the text true and Excel recognises it as a "Boolean" (or "logical") value, and displays the text TRUE in the middle of the cell. So if the OLE DB Provider finds a column with all those different types of data in it, it has to do two things:
 a. determine a data type for the entire column
 b. decide what to do with the values in the column that do not have that type.

The problem with this approach is that if the OLE DB Provider decides that a column is numeric, then it has to discard the contents of a cell containing the text mytext.  The only thing the Provider can do is to set that value to a numeric value (and traditionally, it would be 0) or to a special "null" value. So you see "mytext" in Excel, but "0" or "" in Word.

There's one possible way around this, and that would be for the OLE DB Provider to decide that every column was a text column and to convert all the values in every column to text. That way, it could return mytext as "mytext", the number 1234.5 as the text "1234.5", the Boolean TRUE as the text "TRUE", and so on. In other words, every value gets through, and anything that needs to convert the text "1234.5" back to the number 1234.5 can do so.

Unfortunately, the OLE DB Provider does not do that.

How the OLE DB determines the data type for a column

1. Data types returned by the provider

The Provider recognises the following types as far as I can tell:
  - numeric
  - currency
  - Boolean
  - date/time
  - string (text up to 255 characters long)
  - memo (text over 255 characters long)

2. The IMEX, TypeGuessRows, and ImportMixedTypes settings

When a program such as Word invokes the Provider, it can tell it to use one of three different "IMEX" values. "IMEX" is just a contraction of "Import/Export mode". The Provider recognises IMEX=0, IMEX=1 and IMEX=2.

The provider also looks at two values stored in the Windows Registry. These are called TypeGuessRows and ImportMixedTypes.

By default, TypeGuessRows is set to 8. In theory it can be set to any number between 0 and...a larger number than the maximum number of rows in an Excel worksheet (65536 for Excel 2003, 1048576 for Excel 2007).

By default, ImportMixedTypes is set to "Text". The Provider recognises two values "Text" and "Majority type". If this value is set to anything else, e.g. "Other", the Provider will raise an error and fail.

3. Determining column type

For each column in the worksheet, the Provider reads a maximum of TypeGuessRows data cells (i.e. ignoring the column name cell), and counts the number of cells with each data type. Most documentation suggests that if TypeGuessRows is set to 0, the provider reads the entire column, but experiments suggest it actually reads 16384 rows of data. This was perhaps a maximum row count in an earlier version of Excel.

If IMEX=0 or IMEX=2, or IMEX=1 and ImportMixedTypes="Majority type", or the provider only found one type in the column) the Provider uses the "Majority type" to determine the column type. So for example, if TypeGuessRows is 8 and in the first 8 rows of a column there are 5 numbers and 3 string values, the Provider sets the column type to Number. If there are equal counts of numbers and strings, "numbers" will "win". If there are 5 Boolean values and 3 strings, the Provider will set the column type to Boolean. And so on. However, there are two important exceptions:
  - if there are only strings and/or memos in the first TypeGuessRows of the column, the Provider sets the column type to string if it only finds strings and memo if it finds any memos at all
  - if there are any cells with a data type other than string or memo in the first TypeGuessRows in the column, the Provider behaves as if there are no memo values. For example, even if the column has 7 memos and only 1 number, "number" will win.

 h. Otherwise, i.e. if IMEX=1, ImportMixedTypes="Text", and the provider has found different types in the first TypeGuessRows rows, the provider sets the column type to "string", unless it found a memo in which case it sets the column type to memo.

4. Converting/formatting the data to fit the column type

Once the provider has decided the column type, it decides what to do with the values in the column that do not have that type. However, it does this differently depending on whether you specified IMEX=0 or 2, or IMEX=1. The following table shows the possibilities. The column on the left shows examples of data that you might have in your Excel worksheet. Columns 2-6 show what happens to that data in each type of column that the OLE DB Provider can return.

For example, if IMEX=0 and the Provider decides that a column should be "Boolean", a cell containing the number 1 will appear as "True" in Word. The texts in this table are the results of Word MERGEFIELD fields.

IMEX=0

Column type decided by the Provider Result

Data type/value in Excel Numeric Currency Date/time Boolean String Memo
null/empty cell null/empty null/empty null/empty null/empty null/empty null/empty
Numeric, e.g. 5432.1 5432.1000000000004 1234.5 11/14/1914 2:24:00 AM True null/empty null/empty
Numeric -1 -1 -1 12/29/1899 True null/empty null/empty
Numeric 0 0 0 12:00:00 AM False null/empty null/empty
Numeric 1 1 1 1/1/1900 True null/empty null/empty
Currency, e.g. $5,432.10 5432.1000000000004 1234.5 11/14/1914 2:24:00 AM True null/empty null/empty
Date ("general date") 29/02/2008 39507 39507 2/29/2008 True null/empty null/empty
Time 23:59 0.99930555555555556 0.9993 11:59:00 PM True null/empty null/empty
Date/Time 29/02/2008 23:59 39507.999305555553 39507.9993 2/29/2008 11:59:00 PM True null/empty null/empty
Boolean TRUE 0 0 12:00:00 AM True null/empty null/empty
Boolean FALSE 0 0 12:00:00 AM False null/empty null/empty
String, e.g. abc 0 0 12:00:00 AM False abc abc
Memo 0 0 12:00:00 AM False the memo, truncated to 255 characters the memo

Perhaps the main things to notice in the IMEX=1 table are the problems frequently reported by users, including
 - short numbers turning into very long ones (but only when the Provider decides the column is numeric, not mixed)
 - currency and other formatting characters disappearing
 -dates often in "U.S." MM/DD/YYYY format even when they have been entered as U.K. DD/MM/YYYY dates, again only when the Provider decides that the column is date/time. Note also that for users who normally sue DD/MM/YYYY the Word Mergefied switch \@"DD/MM/YYYY" always worked with the version of Word and OLE DB driver tested. This has not always been the case, but if that's your problem "apply all the updates for Word and Windows" is probably the solution
 - the treatment of strings and memos
 - 5-digit ZIP codes should be usable whether they are returned as string/memo or number (as long as you use a \#"00000" format)
 - 9-digit ZIP and phone numbers are returned in a less useful format in string/memo fields.
 - I haven't listed the result for percentages etc. but you can obviously do your own similar tests.

IMEX=1

Column type decided by the Provider/Result

Data type/value in Excel Numeric Currency Date/time Boolean String Memo
null/empty cell null/empty null/empty null/empty null/empty null/empty null/empty
Numeric, e.g. 5432.1 5432.1000000000004 5432.1 11/14/1914 2:24:00 AM True 5432.1 5432.1
Numeric -1 -1 -1 12/29/1899 True -1 -1
Numeric 0 0 0 12:00:00 AM False 0 0
Numeric 1 1 1 1/1/1900 True 1 1
Currency, e.g. $5,432.10 5432.1000000000004 5432.1 11/14/1914 2:24:00 AM True 5,432.10 5,432.10
Date "General date" 29-Feb-2008 39507 39507 2/29/2008 True 29-Feb-08 29-Feb-08
Date - UK format 29/02/2008 39507 39507 2/29/2008 True 29/02/2008 29/02/2008
Date - US format 2/29/2008 39507 39507 2/29/2008 True 2/29/2008 2/29/2008
Time 23:59 0.99930555555555556 0.9993 11:59:00 PM True 23:59 23:59
Date/Time 29/02/2008 23:59 39507.999305555553 39507.9993 2/29/2008 11:59:00 PM True 29/02/2008 23:59 29/02/2008 23:59
Boolean TRUE 0 0 12:00:00 AM True TRUE TRUE
Boolean FALSE 0 0 12:00:00 AM False FALSE FALSE
U.S. 5-digit ZIP code 01234
(probably not a valid code)
1234 1234 5/18/1903 true 01234 01234
U.S. 9-digit ZIP code 0012-34567 1234567 1234567 2/15/5280 True 00123-4567 00123-4567
U.S. Phone number (555) 123-4567 5551234567 5551234567 null/empty True 555123-4567 555123-4567
String, e.g. abc 0 0 12:00:00 AM False abc abc
Memo 0 0 12:00:00 AM False the memo, truncated to 255 characters the memo

 

Because of regional differences, you may find that OLE DB does not behave in exactly the same way on your system. For example, it may return £5,432.10 or $5,432.10 instead of 5432.1 in some cases. Also, if for example you work with the German-language version of Office, "true" and "false" are treated as strings in Excel, whereas the German words WAHR and FALSCH are recognised as the true and false Boolean values. On my German language system (which actually runs German Windows XP and the "MLP" multi-language version of Word 2003, so it may be different from a typical German-language Windows/Office system), when I use the OLE DB provider to get Boolean Excel data, I still see the English-language words.

There is one other critical piece of information. When Word connects via OLE DB to a .xls data source, Word always specifies IMEX=1. If you have managed to follow the story so far, you will probably realise that OLE DB is less likely to lose data that way.

Some conclusions

I think the main conclusions to be drawn are as follows:

 a. it's all very messy and could probably work a lot better

 b. if you have an Excel column with a single data type, all the data will get through but will probably need reformatting using switches in Word MERGEFIELD fields as suggested near the beginning of this page. if anything, the most irritating situation is when you have plain numbers and the Provider tacks all those extra digits on the end.

 c. if you have a column with mixed data types, in most cases you will see most of your data, but
     - only if the data in the first 8 rows is mixed, or all string/memo, thereby forcing the Provider to set the column type to string or memo. If the first 8 rows all happen to contain numeric data, any data of any other type will be returned in a les helpful format, and strings/memos will be lost altogether
     - if you have anything except strings and memos in the first 8 rows, all the memos in the column will be truncated.

 d. However, if the memos are not truncated, Word and OLE DB do seem to be able to read very large memo fields - in simple tests here I was successfully reading memo fields around 140Kb before I gave up.

Is there any other way to work around these problems?

There does not appear to be a straightforward workaround. Word already sets IMEX=1 which probably provides more useful results in most cases than IMEX=0. But you can try the following...

1. Format a column as text via Data|Text to Columns...

If you have a mixture of numbers and texts in a column but numbers in the first 8 rows, you may need to get the OLE DB provider to "see" the column as a text column. However, selecting the column and using Format|Cells to set the format as Text does not appear to be enough. One way that does appear to work is
 - select the column
 - select the Data|Text to Columns... menu option
 - click Next through the wizard until you  reach Step 3 of 3, then select Text as the Column Data Format.

2. Add a row of cells containing memo texts and tell the Provider to inspect only 1 row

This is a kludge that requires you to add a row to your spreadsheet before using it as a data source, and other stuff. I haven't used the approach for real - it's just something I thought of as I investigated the problems. I suspect that it might cause problems if you try to sort or filter your data.

The idea is as follows:
 - you change the Windows registry TypeGuessRows value to 1
 - you insert a row at the beginning of your worksheet that contains a 256-character memo field in every column
 - when OLE DB opens the worksheet, it should now recognise every column as a memo field, and do all the other data transforms listed in the IMEX=1 table provided above.
 - in Word, you deselect record 1 in the Mail Merge Recipients box before merging
 - you'll still need to use Word field formatting switches.

However, there's a problem with this approach. If you change the standard settings for the Jet Excel Engine by changing TypeGuessRows to 1, it may affect everything on your system that gets data from OLE DB. Changing these settings in not a good programming practice in my opinion. Incidentally, if you decide to go in the other direction and get OLE DB to look at more rows, you could in theory set the count to the maximum number of rows in an Excel 2003 worksheet. But that doesn't solve the "mixed data" problem when you're using Memo fields. And the OLE DB will take a long time to read the entire spreadsheet - every time you open it as a Word data source.

Fortunately, it's possible to create your own area in the registry containing the values you want for TypeGuessRows and ImportMixedTypes, and tell the Provider to look in your area rather than the standard area. The Provider will pick up any values you don't put in your own area from the standard area.

So for example, the standard registry keys and values are here:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\

(I haven't checked the values for Ace yet)

Suppose I create my own keys and put them here:
    HKEY_LOCAL_MACHINE\SOFTWARE\mysample\v1.0\Engines\Excel\TypeGuessRows
    HKEY_LOCAL_MACHINE\SOFTWARE\mysample\v1.0\Engines\Excel\ImportMixedTypes

Then I have to tell the OLE DB provider where they are by adding this to my OLE DB connection string:
Jet OLE DB:Registry Path="SOFTWARE\mysample\v1.0"

If you are familiar with VBA and the OpenDataSource method, you might be thinking that all you have to do is record the VBA while you connect to your Excel workbook, then alter the Connection parameter and run the modified VBA. Unfortunately, there is yet another twist in the tale. When you connect to an Excel data source using OLE DB, Word does set up and return a Connection string.

Unfortunately, Word takes no notice of any connection string you provide in an OpenDataSource method call when it opens an Excel .xls using OLE DB. Fortunately, there is a workaround.

3. In addition, use a .udl or .odc file to open the workbook indirectly...

Instead of connecting to the .xls file directly, you can connect using a .udl (Universal Data Link) file or .odc (Office Data Connection) file that contains the necessary connection information. .udl files contain the information that an OLE DB provider needs to connect to a particular "database" (which in the case of Excel, means a "workbook". However, they don't let you specify a "table" in that database (for Excel,"table" means worksheet or range). I'm not going to consider .udl files any further at the moment because programmers usually already know how to create them, and Office is more oriented to using .odc files. .odc files are rather like .udl files but
 a. are in HTML format with some scripts and associated "behaviours"
 b. can also contain "table" information

There does not seem to be a specification of the .odc format anywhere, so all you can do is create one, see what's in it, and try modifying the contents. I outline how to create one below.

There are several ways you can use a .odc file with Word MailMerge (or to insert a DATABASE field)

4. ...or create one or more .odc files and let the users select them instead of the .xls workbook

If the idea is still that the user selects the data source manually, then you can provide a .odc with all the right information in it and let them select that instead of the .xls file. As usual, there's a catch. The user isn't selecting the actual workbook any more, so there has to be another way to specify it. Two possible approaches are:
 a. specify the workbook's pathname in the .odc's connection string. Then you have to have one .odc for each .xls that the user might want to open.
 b. do not specify the workbook's pathname in the .odc. Then when the user selects the .odc, they see the "Data Link Dialog box" instead. Although this is a rather nasty dialog box that I wouldn't usually want users to have to deal with, in this case all the necessary information except the pathname has already been provided, so the user only needs to specify that pathname. Since there's a drop-down that makes that reasonably easy, this may be a good solution in some cases, because you then only need one .odc for all your Excel files.

There seem to be at least two types of .odc file. One stores information on how to connect to the "database" as a whole, and the other stores information on how to connect to a specific table within a database. For Excel workbooks a "database" is a workbook, and the tables are the worksheets and named ranges within it.

If you use the "database" type of .odc file, the user should see a prompt for a list of worksheets and named ranges, just as if they had tried to open the workbook by selecting the .xls. If you use the "table" type of .odc file, you have to name the table. If you were trying to use approach (a), that might mean that you would have to create one .odc fie for every workbook/table combination you wanted - probably not such a good idea. However, if you want to make it particularly easy for a user to select a few specific workbook/table combinations without even facing the dialog with the list of tables, it's probably quite a good approach. Just don't forget to train the users :-)

There's one more snag with using .odc files that some people need to know. Supposing you are designing a system which will run with one data source today (let's say today.xls) but which you know will eventually use another (oneday.xls) Can you roll out a Word document with a .odc that points to today.xls, then simply roll out a new version of the .odc that points to oneday.xls when you want to change over? Unfortunately not. Word only really gets connection information from the .odc when it executes an OpenDataSource call. It stores that connection information internally. So if you close your .doc, change the .odc, and re-open the .doc, the real data source will not change. If you want Word to take notice of the .odc every time the .doc opens, you will have to ensure that Word actually issues an OpenDataSource call, e.g. in an AutoOpen macro.  My guess is that this results from a deliberate design decision. Incidentally, Excel works in a similar way when using .odc files.

 5. If you're programming, you can do some of this using a VBA OpenDataSource method call...

If you want to make the connection programmatically, the VBA method call can be very simple, e.g.

ActiveDocument.MailMerge.OpenDataSource _
  Name:="S:\tips\t0003\sample-03-03.odc"

If the .odc is a "table" type .odc, the method call should open the workbook and table specified in the .odc, as in (a). If no workbook is specified, the user will face the OpenDataSource dialog box as in (b).

If the .odc is a "database" type .odc and the workbook was specified in the .odc, the user should see the "tables" dialog box, as usual. If no workbook was specified, the user will probably see the Data Link Dialog, then possibly the tables dialog box - probably a recipe for disaster so if you really want to do that, I leave it up to you to research it.

However, an obvious programming ploy once you have created your .odc would be to see if you can set up a general-purpose .odc then specify the workbook name, table name and perhaps other stuff in the OpenDataSource method call.

What actually happens is as follows:

 a. if you put anything at all in the Connection parameter in the OpenDataSource call, the entire connection string in the .odc is ignored. However, in this case, Word does honour what you put in the connection string (unlike when you specify a .xls file in the Name parameter). So you have to put everything you need and want in the Connection parameter. As far as I can tell, the absolute minimum for connecting to an Excel workbook consists of:
 - the provider name, e.g. "Provider=Microsoft.Jet.OLEDB.4.0;", or for ACE, "Provider=Microsoft.Ace.OLEDB.12.0"
 - the Engine Type - this tells the Jet provider that the specified file is an Excel file (probably Excel 97 and later. The value is "Jet OLEDB:Engine type=35;". As far as I can tell that works for both Jet and Ace.
 - the workbook path, e.g. "Data Source=s:\tips\t0003\sample-03-03.xls;"

In addition, you may want to specify the things we have been discussing, e.g.
 - the IMEX value, using: 'Extended Properties="HDR=Yes;IMEX=1;"'
 - the path to your "private" registry values: e.g. 'Jet OLE DB:Registry Path="SOFTWARE\mysample\v1.0"'

 b. if you put anything at all in the SQLStatement/SQLStatement1 parameters, it overrides any table specification in the .odc file, but it must be a valid piece of Jet SQL. This seems to be the case whether you have a "database" type .odc file or a "table" type .odc file.

6. ...and you can use an empty .odc file

If you need to use a .odc in OpenDataSource and you want to override the connection string in the .odc and the table name in the .odc, here's one final kludge that may really help: don't put anything at all in the .odc file. It's just a text file. Leave it completely empty, and put everything you need in the OpenDataSource call, e.g.

ActiveDocument.MailMerge.OpenDataSource _
  Name:="S:\tips\t0003\sample-03-03.odc", _
  Connection:="Provider=Microsoft.Jet.OLEDB.4.0;Jet OLEDB:Engine type=35;Data Source=s:\tips\t0003\sample-03-03.xls;Extended Properties=""HDR=Yes;IMEX=1;"";Jet OLE DB:Registry Path=""SOFTWARE\mysample\v1.0"";", _
  SQLStatement:="SELECT * FROM `Sheet2$`"

Programming details

The following notes are intended to fill in some of the practical details missing from the section titled "Is there any other way to work around these problems?", in particular:
 -how to set up a private area in the Windows Registry containing Jet-related settings
 -how to create and edit a .odc file for an Excel data source
 -how to set up and use a Word VBA macro that uses OpenDataSource

1. How to set up a private area in the Windows Registry containing Jet-related settings

 a. Open Notepad

 b. Copy/paste the following text into it. Remove any blank lines at the beginning and end.

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SOFTWARE\mysample]
[HKEY_LOCAL_MACHINE\SOFTWARE\mysample\v1.0]
[HKEY_LOCAL_MACHINE\SOFTWARE\mysample\v1.0\Engines]

[HKEY_LOCAL_MACHINE\SOFTWARE\mysample\v1.0\Engines\Excel]
"ImportMixedTypes"="Text"
"TypeGuessRows"=dword:00000001

 c. Save the file using a .reg extension (e.g. in c:\a\mysample.reg)

 d. start Windows Registry Editor, e.g. by locating regedit.exe, typically in c:\windows\regedit.exe, and double-clicking it

 e. Click FIle|Import, select the .reg file you just saved, and import it.

2. How to create and edit a .odc file for an Excel data source

You can create a .odc in Word, Excel, or by editing an existing one.

In Word, start  with a blank document and the MailMerge toolbar enabled, then
 a. click on the Select Data Source icon
 b. click the New Source button
 c. Select Other/advanced, then click Next
 d. Select Microsoft Jet 4.0 OLE DB Provider, then click Next (things are a bit different for the Ace provider and I don't want to go there right now).
 e. Select or enter the full path name of the "database", i.e. the path name of your .xls file. Don't change anything else on this tab. If you want, click Test Connection and you'll see an error message because the Provider is expecting a Jet .mdb file at this point.
 f. Click the "All" tab (we can ignore "Advanced" for now).
 g. Double-click on "Extended Properties", enter the value HDR=YES;IMEX=1; then click OK. The value you just entered should be displayed.
 h. Double-click on "Jet OLEDB:Engine Type", enter 35 then click OK.
 i. If you want to specify the registry path used in the example above, double-click on "Jet OLEDB:Registry Path", enter
SOFTWARE\mysample\v1.0 then click OK.
 j. Click OK at the bottom of the Data Link Properties dialog box. If everything was entered correctly, the OLE DB Provider should open the workbook and display a list of all the worksheets and named ranges. At this point you can create a "Database" .odc by unchecking "Connect to a specific table" or a "Table" .odc by leaving the box chcked and selecting a table. Then click Next.
 k. modify the suggested file name and path as necessary and optionally enter a description. Leave the "Save password in file" box unchecked. Click Finish. Your .odc should now be saved, and the next dialog you see in Word should let you select it and connect to the data source.

When the OLE DB Provider tries to connect to the data source, if there are any problems, Word may either pop up the Data Link dialog box again, or it may just try to treat the .odc as if it were a .csv type text file, in which case Word will pop up various error messages to do with invalid field counts and so on.

There is one other potentially useful aspect of .odc files. If you created your file in your "My Data Sources" folder and you open the .odc using Internet Explorer, you should see the results - i.e. for a "database" .odc, you should see a list of tables, and for a "table" .odc, you should see the table's data. This doesn't seem to work for all types of .odc, though. Also, if your .odc is in a different folder, you will need to copy a file in "My Data Sources" called DATACONN.HTC to the same folder as your .odc for this to work, and you will probably face more security-related dialog boxes when you try to open the .odc. The .HTC file contains JavaScript code that may shed some light on the contents of your .odc

The following listing is for a "Table" .odc file for an Excel data source. Some notes:

 k. You can edit the .odc in Notepad and change whatever you need in the ConnectionString. Notice that you need to use HTML-style names for some characters, e.g. &quot; instead of "

 l. For Excel, there doesn't seem to be a great deal of difference between a "database" .odc and a "table" .odc. You should check for yourself, but if you start with a "Table" .odc, to make a "Database" .odc fior the same .xls file,
 - change
  <meta name=ProgId content=ODC.Table>
  to
  <meta name=ProgId content=ODC.Database>
 - delete
  <meta name=Table content="Sheet1$">
 - delete
  <odc:CommandText>Sheet1$</odc:CommandText>

 m. In a "table" .odc, to change the table that Word will open, I assumed that you would have to change the name in the <odc:CommandText> element and that the <meta name=Table content=> tag would just be documentation. This turns out to be incorrect, at least for Excel files. You only need to change the meta tag, although I suppose it is good practice to change the other one as well, or delete it if is not necessary. You can use the following:
 - a sheet name (with "$"), e.g. <meta name=Table content="Sheet1$">
 - a named range, e.g. <meta name=Table content="valnr">
 - a sheet+range specified in A1:Xn notation, e.g. <meta name=Table content="Sheet4$E3:L46">

Here's the .odc sample

<html>

<head>
<meta http-equiv=Content-Type content="text/x-ms-odc; charset=utf-8">
<meta name=ProgId content=ODC.Table>
<meta name=SourceType content=OLEDB>
<meta name=Table content="Sheet1$">
<xml id=docprops><o:DocumentProperties
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns="http://www.w3.org/TR/REC-html40">
<o:Description>ODC to specific Excel sheet with IMEX = 0</o:Description>
</o:DocumentProperties>
</xml><xml id=msodc><odc:OfficeDataConnection
xmlns:odc="urn:schemas-microsoft-com:office:odc"
xmlns="http://www.w3.org/TR/REC-html40">
<odc:Connection odc:Type="OLEDB">
<odc:ConnectionString>Provider=Microsoft.Jet.OLEDB.4.0;User ID=Admin;Data Source=S:\tips\t0003\sample-03-03.xls;Mode=Read;Extended
Properties=&quot;HDR=Yes;IMEX=0;&quot;;Jet OLEDB:System database=&quot;&quot;;Jet OLEDB:Registry Path=&quot;&quot;;Jet OLEDB:Engine
Type=35;Jet OLEDB:Database Locking Mode=0;Jet OLEDB:Global Partial Bulk Ops=2;Jet OLEDB:Global Bulk Transactions=1;Jet OLEDB:New Database
Password=&quot;&quot;;Jet OLEDB:Create System Database=False;Jet OLEDB:Encrypt Database=False;Jet OLEDB:Don't Copy Locale on
Compact=False;Jet OLEDB:Compact Without Replica Repair=False;Jet OLEDB:SFP=False</odc:ConnectionString>
<odc:CommandType>Table</odc:CommandType>
<odc:CommandText>Sheet1$</odc:CommandText>
</odc:Connection>
</odc:OfficeDataConnection>
</xml>
<style>
<!--
.ODCDataSource
{
behavior: url(dataconn.htc);
}
-->
</style>

</head>

<body onload='init()' scroll=no leftmargin=0 topmargin=0 rightmargin=0 style='border: 0px'>
<table style='border: solid 1px threedface; height: 100%; width: 100%' cellpadding=0 cellspacing=0 width='100%'>
<tr>
<td id=tdName style='font-family:arial; font-size:medium; padding: 3px; background-color: threedface'>
&nbsp;
</td>
<td id=tdTableDropdown style='padding: 3px; background-color: threedface; vertical-align: top; padding-bottom: 3px'>

&nbsp;
</td>
</tr>
<tr>
<td id=tdDesc colspan='2' style='border-bottom: 1px threedshadow solid; font-family: Arial; font-size: 1pt; padding: 2px; background-
color: threedface'>

&nbsp;
</td>
</tr>
<tr>
<td colspan='2' style='height: 100%; padding-bottom: 4px; border-top: 1px threedhighlight solid;'>
<div id='pt' style='height: 100%' class='ODCDataSource'></div>
</td>
</tr>
</table>


<script language='javascript'>

function init() {
var sName, sDescription;
var i, j;

try {
sName = unescape(location.href)

i = sName.lastIndexOf(".")
if (i>=0) { sName = sName.substring(1, i); }

i = sName.lastIndexOf("/")
if (i>=0) { sName = sName.substring(i+1, sName.length); }

document.title = sName;
document.getElementById("tdName").innerText = sName;

sDescription = document.getElementById("docprops").innerHTML;

i = sDescription.indexOf("escription>")
if (i>=0) { j = sDescription.indexOf("escription>", i + 11); }

if (i>=0 && j >= 0) {
j = sDescription.lastIndexOf("</", j);

if (j>=0) {
sDescription = sDescription.substring(i+11, j);
if (sDescription != "") {
document.getElementById("tdDesc").style.fontSize="x-small";
document.getElementById("tdDesc").innerHTML = sDescription;
}
}
}
}
catch(e) {

}
}
</script>

</body>

</html>
 

3. How to set up and use a Word VBA macro that uses OpenDataSource

Let's take the example from an earlier section in this page:

ActiveDocument.MailMerge.OpenDataSource _
  Name:="S:\tips\t0003\sample-03-03.odc", _
  Connection:="Provider=Microsoft.Jet.OLEDB.4.0;Jet OLEDB:Engine type=35;Data Source=s:\tips\t0003\sample-03-03.xls;Extended Properties=""HDR=Yes;IMEX=1;"";Jet OLE DB:Registry Path=""SOFTWARE\mysample\v1.0"";", _
  SQLStatement:="SELECT * FROM `Sheet2$`"

Add Sub and End Sub statements before and after the statement. In the following example I also add some statements to disconnect the existing data source and set the merge type and destination:

Sub mysub()

  ActiveDocument.MailMerge.MainDocumentType = wdNotAMergeDocument
  ActiveDocument.MailMerge.MainDocumentType = wdFormLetters
  ActiveDocument.MailMerge.Destination = wdSendToNewDocument

  ActiveDocument.MailMerge.OpenDataSource _
    Name:="S:\tips\t0003\sample-03-03.odc", _
    Connection:="Provider=Microsoft.Jet.OLEDB.4.0;Jet OLEDB:Engine type=35;Data Source=s:\tips\t0003\sample-03-03.xls;Extended Properties=""HDR=Yes;IMEX=1;"";Jet OLE DB:Registry Path=""SOFTWARE\mysample\v1.0"";", _
    SQLStatement:="SELECT * FROM `Sheet2$`"

End Sub

To use the macro, see the following articles by Word MVPs:

http://word.mvps.org/FAQs/MacrosVBA/CreateAMacro.htm

http://word.mvps.org/FAQs/MacrosVBA/DocumentEvents.htm

You may also have to set up the registry entry described in the following Microsoft Knowledgebase article:

http://support.microsoft.com/kb/825765

[End of page]

Google
 

Please post any follow-up questions to this article in the Microsoft public newsgroup on mailmerge and fields. If you are using a newsreader such as Outlook Express, the server is at news://news.microsoft.com and the Mailmerge newsgroup is at news://news.microsoft.com/microsoft.public.word.mailmerge.fields. Otherwise, you can go to the Microsoft Communities home at  http://www.microsoft.com/communities and look for the group. For some reason it is currently named "Mailmerge and Fax"
Disclaimer

Peter Jamieson's Tip Pages: Copyright © 2005-2007, Peter Jamieson

Terms of Use