Convert HTML to PDF in ASP.NET and MVC with C# and VB.NET

The majority of the websites are already able to produce reports or to present various results in HTML pages. While the HTML content is simple generate and edit it is not suitable for printing or for transmission by email. The de facto standard for printing is the PDF format. The HiQPdf HTML to PDF Converter for .NET can be used in your .NET applications to transform any HTML page into a PDF document preserving the original aspect of the HTML document.

The HiQPdf Library for .NET offers you a modern, simple, fast, flexible and powerful tool to create complex and stylish PDF documents in your applications with just a few lines of code.

Using the high quality HTML to PDF conversion engine you can easily design a document in HTML with CSS3, JavaScript, SVG or Canvas and then convert it to PDF preserving the exact content and style.

The main features of the converter are listed below:

  • Convert HTML and HTML5 Documents and Web Pages to PDF
  • Convert URLs and HTML Strings to PDF Files or Memory Buffers
  • Set the PDF Page Size and Orientation
  • Fit HTML Content in PDF Page Size
  • Advanced Support for Web Fonts in .WOFF and .TTF Formats
  • Advanced Support for Scalar Vector Graphics (SVG)
  • Advanced Support for HTML5 and CSS3
  • Delayed Conversion Triggering Mode
  • Control PDF page breaks with page-break CSS attributes in HTML
  • Repeat HTML Table Header and Footer on Each PDF Page
  • Packaged and Delivered as a Zip Archive
  • No External Dependencies
  • Direct Copy Deployment Supported
  • ASP.NET and Windows Forms Samples, Complete Documentation
  • Supported on All Windows Versions

You can find all the HiQPdf Library for .NET Features with a brief description of each feature on product page.

html_to_pdf

The C# sample code below shows how easy you can create the PDF documents from existing HTML pages or HTML strings. With just a few lines of code you can get richly formatted PDF document:

protected void buttonConvertToPdf_Click(object sender, EventArgs e)
{
    // create the HTML to PDF converter
    HtmlToPdf htmlToPdfConverter = new HtmlToPdf();

    // set browser width
    htmlToPdfConverter.BrowserWidth = int.Parse(textBoxBrowserWidth.Text);

    // set browser height if specified, otherwise use the default
    if (textBoxBrowserHeight.Text.Length > 0)
        htmlToPdfConverter.BrowserHeight = int.Parse(textBoxBrowserHeight.Text);

    // set HTML Load timeout
    htmlToPdfConverter.HtmlLoadedTimeout = int.Parse(textBoxLoadHtmlTimeout.Text);

    // set PDF page size and orientation
    htmlToPdfConverter.Document.PageSize = GetSelectedPageSize();
    htmlToPdfConverter.Document.PageOrientation = GetSelectedPageOrientation();

    // set the PDF standard used by the document
    htmlToPdfConverter.Document.PdfStandard = checkBoxPdfA.Checked ? PdfStandard.PdfA : PdfStandard.Pdf;

    // set PDF page margins
    htmlToPdfConverter.Document.Margins = new PdfMargins(5);

    // set whether to embed the true type font in PDF
    htmlToPdfConverter.Document.FontEmbedding = checkBoxFontEmbedding.Checked;

    // set triggering mode; for WaitTime mode set the wait time before convert
    switch (dropDownListTriggeringMode.SelectedValue)
    {
        case "Auto":
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.Auto;
            break;
        case "WaitTime":
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.WaitTime;
            htmlToPdfConverter.WaitBeforeConvert = int.Parse(textBoxWaitTime.Text);
            break;
        case "Manual":
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.Manual;
            break;
        default:
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.Auto;
            break;
    }

    // set header and footer
    SetHeader(htmlToPdfConverter.Document);
    SetFooter(htmlToPdfConverter.Document);

    // set the document security
    htmlToPdfConverter.Document.Security.OpenPassword = textBoxOpenPassword.Text;
    htmlToPdfConverter.Document.Security.AllowPrinting = checkBoxAllowPrinting.Checked;

    // set the permissions password too if an open password was set
    if (htmlToPdfConverter.Document.Security.OpenPassword != null && htmlToPdfConverter.Document.Security.OpenPassword != String.Empty)
        htmlToPdfConverter.Document.Security.PermissionsPassword = htmlToPdfConverter.Document.Security.OpenPassword + "_admin";

    // convert HTML to PDF
    byte[] pdfBuffer = null;

    if (radioButtonConvertUrl.Checked)
    {
        // convert URL to a PDF memory buffer
        string url = textBoxUrl.Text;

        pdfBuffer = htmlToPdfConverter.ConvertUrlToMemory(url);
    }
    else
    {
        // convert HTML code
        string htmlCode = textBoxHtmlCode.Text;
        string baseUrl = textBoxBaseUrl.Text;

        // convert HTML code to a PDF memory buffer
        pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlCode, baseUrl);
    }

    // inform the browser about the binary data format
    HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf");

    // let the browser know how to open the PDF document, attachment or inline, and the file name
    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("{0}; filename=HtmlToPdf.pdf; size={1}",
        checkBoxOpenInline.Checked ? "inline" : "attachment", pdfBuffer.Length.ToString()));

    // write the PDF buffer to HTTP response
    HttpContext.Current.Response.BinaryWrite(pdfBuffer);

    // call End() method of HTTP response to stop ASP.NET page processing
    HttpContext.Current.Response.End();
}

You can find more HTML to PDF C# and VB.NET samples in the online demo.

Extract Text from PDF Documents in .NET Applications

With HiQPdf Library you can extract the text from PDF documents to a .NET System. String object using the PdfTextExtract class. You can set the text extraction mode with PdfTextExtract.TextExtractMode property and choose to keep the original positioning of the text in the PDF document or you can choose to extract the text in a layout more suitable for reading.

The C# sample code below shows how easy you can extract the text from existing PDF documents. With just a few lines of code you can obtain the text representation of a PDF document:

// get the PDF file
string pdfFile = Server.MapPath("~") + @"\DemoFiles\Pdf\InputPdf.pdf";

// create the PDF text extractor
PdfTextExtract pdfTextExtract = new PdfTextExtract();

// set the text extraction mode
pdfTextExtract.TextExtractMode = GetTextExtractMode();

int fromPdfPageNumber = int.Parse(textBoxFromPage.Text);
int toPdfPageNumber = textBoxToPage.Text.Length > 0 ? int.Parse(textBoxToPage.Text) : 0;

// extract the text from a range of pages of the PDF document
string text = pdfTextExtract.ExtractText(pdfFile, fromPdfPageNumber, toPdfPageNumber);

// get UTF-8 bytes
byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);

// the UTF-8 marker
byte[] utf8Marker = new byte[] { 0xEF, 0xBB, 0xBF };

// the text document bytes with UTF-8 marker followed by UTF-8 bytes
byte[] bytes = new byte[utf8Bytes.Length + utf8Marker.Length];
Array.Copy(utf8Marker, 0, bytes, 0, utf8Marker.Length);
Array.Copy(utf8Bytes, 0, bytes, utf8Marker.Length, utf8Bytes.Length);

// inform the browser about the data format
HttpContext.Current.Response.AddHeader("Content-Type", "text/plain; charset=UTF-8");

// let the browser know how to open the text document and the text document name
HttpContext.Current.Response.AddHeader("Content-Disposition",
    String.Format("{0}; filename=ExtractedText.txt; size={1}", "attachment", bytes.Length.ToString()));

// write the text buffer to HTTP response
HttpContext.Current.Response.BinaryWrite(bytes);

// call End() method of HTTP response to stop ASP.NET page processing
HttpContext.Current.Response.End();

See also the live demo for Text Extraction from PDF documents for a fully functional example.

 

Search Text In PDF Using HiQPdf Library

With HiQPdf Library for .NET you can search a text in a PDF document using the SearchText() method of the PdfTextExtract class. You can choose to match the case or to match the whole word only when searching using this method parameters.

In the C# code sample below you can see how to search for a text in an existing PDF document. The found text is then highlighted in the original PDF.

C# Code Sample to Search and Highlight Text in PDF

// get the PDF file
string pdfFile = Server.MapPath("~") + @"\DemoFiles\Pdf\InputPdf.pdf";

// get the text to search
string textToSearch = textBoxTextToSearch.Text;

// create the PDF text extractor
PdfTextExtract pdfTextExtract = new PdfTextExtract();

int fromPdfPageNumber = int.Parse(textBoxFromPage.Text);
int toPdfPageNumber = textBoxToPage.Text.Length > 0 ? int.Parse(textBoxToPage.Text) : 0;

// search the text in PDF document
PdfTextSearchItem[] searchTextInstances = pdfTextExtract.SearchText(pdfFile, textToSearch,
            fromPdfPageNumber, toPdfPageNumber, checkBoxMatchCase.Checked, checkBoxMatchWholeWord.Checked);

// load the PDF file to highlight the searched text
PdfDocument pdfDocument = PdfDocument.FromFile(pdfFile);

// highlight the searched text in PDF document
foreach (PdfTextSearchItem searchTextInstance in searchTextInstances)
{
    PdfRectangle pdfRectangle = new PdfRectangle(searchTextInstance.BoundingRectangle);

    // set rectangle color and opacity
    pdfRectangle.BackColor = Color.Yellow;
    pdfRectangle.Opacity = 30;

    // highlight the text
    pdfDocument.Pages[searchTextInstance.PdfPageNumber - 1].Layout(pdfRectangle);
}

// write the modified PDF document
try
{
    // write the PDF document to a memory buffer
    byte[] pdfBuffer = pdfDocument.WriteToMemory();

    // inform the browser about the binary data format
    HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf");

    // let the browser know how to open the PDF document and the file name
    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment; filename=SearchText.pdf; size={0}",
                pdfBuffer.Length.ToString()));

    // write the PDF buffer to HTTP response
    HttpContext.Current.Response.BinaryWrite(pdfBuffer);

    // call End() method of HTTP response to stop ASP.NET page processing
    HttpContext.Current.Response.End();
}
finally
{
    pdfDocument.Close();
}

You can find a live demo for searching and highlighting the text in PDF on product website.

Partially Convert a HTML Page to PDF

The HiQPdf HTML to PDF converter allows you to convert only a selected HTML element from the HTML document. The selected element can be for example a TABLE element or a DIV element containing other HTML elements.

This feature is useful when you want to convert only a part of the HTML document. For example, a web page usually has a header with menu and logo and a footer with contact information and copyright notice besides the main HTML content you want to convert to PDF. In order to convert only the main content of the document you can place the main content in a block element like a DIV or a TABLE and configure the converter to convert only that block element.

The HTML element to be converted is selected by the ConvertedHtmlElementSelector property. This property can be set with a value representing the CSS selector of the HTML element to be converted. For example, the #MyHtmlElement CSS selector will select the HTML element having the ‘MyHtmlElement‘ ID from document and the the *[class=”ConvertibleElementStyle”] CSS selector will select only the HTML element having the ‘ConvertibleElementStyle‘ CSS class. If many elements in the HTML document are selected by a CSS selector, only the the first one will be converted. The values of the attributes in the CSS selectors are case sensitive. If this property is not set then the whole HTML document is converted.

C# Code Sample for Partially Converting a HTML to PDF

// create the HTML to PDF converter
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();

// convert only the HTML element having the MyHtmlElement ID 
htmlToPdfConverter.ConvertedHtmlElementSelector = "#MyHtmlElement";

You can test this feature live in Convert Only a Selected Region of HTML Page demo.

Convert HTML with Web Fonts to PDF

The Web Fonts offer a great flexibility to web designers to create special effects on text in a HTML document because they are not limited anymore to a small set of fonts installed on the client computers displaying the HTML document. The Web Fonts can be downloaded on the fly by the modern web browsers and used to render the HTML document without installing those fonts on the local machine. The location from where they can be downloaded is given in a CSS3 @font-face rule.

The HiQPdf HTML to PDF Converter has the capacity to convert HTML documents with Web Fonts. It offers support for TrueType fonts in .ttf files, OpenType fonts with TrueType Outlines in .otf files and Web Open Font Format (WOFF) fonts with TrueType Outlines in .woff files.

The Web Open Font Format (WOFF), as its name suggests, was designed to be used with web pages. It is based on a compression algorithm which makes the fonts file smaller and more appropriate for distribution over a network. The WOFF format is currently supported by all major browsers (Firefox 3.6 and later versions, Google Chrome 6.0 and later versions, Internet Explorer 9 and later versions, Opera 11.10 and later versions, Safari 5.1 and later versions).

In the live demo for Converting HTML with Web Fonts to PDF you learn how to define the web fonts in HTML using the @font-face rules and the C# code to convert such a HTML document to PDF.