Saturday, February 19, 2011

Export to Word Document in C#

I'm looking for a .NET library that will allow creation of a Word document. I need to export HTML based content to a Word doc (97-2003 format, not docx).

I know that there are the Microsoft Office Automation libraries and Office interop, but as far as I can tell, they require that you have office actually installed and they do the conversion by opening word itself. But I don't want to have the requirement of having office installed for the conversion to work.

Edit: Converting to RTF may even work, if possible.

From stackoverflow
  • Since the doc format specification is not open, and the interop assemblies are the Microsoft solution, I fear that they are your primary (or even only) option.

    They do indeed require office to be installed, and they open Word (although showing a window is optional).

    I think Word can open HTML documents; is that an option for you?

    Cheeso : Bzzt! the doc specs for Word-ML format are freely available. In fact, in my scenario, I produce a single XML file from MS-Word, and then just did a text-replace on fields in that XML file, to "dynamically generate" a new doc, in a mail-merge sort of way. Simple, easy.
    Erik Hesselink : That's the XML format, right? The question was about the binary Word format...
  • I have found that a document output to HTML but called .doc will open properly formated in Word. I tested with Word 2000 and a file with an internal style sheet.

  • Using Word Automation from ASP.NET is not a good idea (see the MSKB - http://support.microsoft.com/default.aspx?scid=kb;EN-US;q257757#kb2)

    If you are not using WinForms your best option IMHO is to generate RTF, which ms word will happily open. (see the link in the already referenced article).

    Good Luck!

  • I tried just opening the html directly in word, which technically works except for one thing... My html doc also contains CSS, and when opening in Word, it completely ignores the CSS so I no longer have any of the formatting. I realize that I wouldn't get everything out of the CSS but I would at least like to still have the specified fonts, font sizes, etc... Any way to get it to read the CSS? Would it work if I somehow converted the CSS to be embedded in the HTML??

  • Would it work if I somehow converted the CSS to be embedded in the HTML??

    Yes. I use an internal style sheet, as I mentioned.

    Document Example:

    <html>
    <head>
    <STYLE type="text/css">
        h1 {text-align:center; font-size:12.0pt; font-family:Arial; font-weight:bold;}
    
        p {margin:0in; margin-bottom:0pt; font-size: 10.0pt;font-family: Arial;}
        p.Address {text-align:center;font-family:Times; margin-bottom: 10px;}
    </style></head>
    <body>
    <p class="Address">The Street</p>
    <h1>Head</h1>
    
    Si Keep : We do this too, to allow our dynamic pages to be 'exported' to Word. The page content HTML is extracted and then inserted into the middle of a Word HTML doc template that already contains all the styles that the html needs.
  • I use Aspose for working with Word, makes everything a breeze: http://www.aspose.com/

    Remou : It seems very expensive (>$800) when all that is required is output, yesno?
  • There's a tool called JODConverter which hooks into open office to expose it's file format converters, there's versions available as a webapp (sits in tomcat) which you post to and a command line tool. I've been firing html at it and converting to .doc and pdf succesfully it's in a fairly big project, haven't gone live yet but I think I'm going to be using it. http://sourceforge.net/projects/jodconverter/

0 comments:

Post a Comment