Convert HTML to PDF in ASP.NET and MVC with C# and VB.NET

The majority of the websites are already able to produce reports or to present various results in HTML pages. While the HTML content is simple generate and edit it is not suitable for printing or for transmission by email. The de facto standard for printing is the PDF format. The HiQPdf HTML to PDF Converter for .NET can be used in your .NET applications to transform any HTML page into a PDF document preserving the original aspect of the HTML document.

The HiQPdf Library for .NET offers you a modern, simple, fast, flexible and powerful tool to create complex and stylish PDF documents in your applications with just a few lines of code.

Using the high quality HTML to PDF conversion engine you can easily design a document in HTML with CSS3, JavaScript, SVG or Canvas and then convert it to PDF preserving the exact content and style.

The main features of the converter are listed below:

  • Convert HTML and HTML5 Documents and Web Pages to PDF
  • Convert URLs and HTML Strings to PDF Files or Memory Buffers
  • Set the PDF Page Size and Orientation
  • Fit HTML Content in PDF Page Size
  • Advanced Support for Web Fonts in .WOFF and .TTF Formats
  • Advanced Support for Scalar Vector Graphics (SVG)
  • Advanced Support for HTML5 and CSS3
  • Delayed Conversion Triggering Mode
  • Control PDF page breaks with page-break CSS attributes in HTML
  • Repeat HTML Table Header and Footer on Each PDF Page
  • Packaged and Delivered as a Zip Archive
  • No External Dependencies
  • Direct Copy Deployment Supported
  • ASP.NET and Windows Forms Samples, Complete Documentation
  • Supported on All Windows Versions

You can find all the HiQPdf Library for .NET Features with a brief description of each feature on product page.

html_to_pdf

The C# sample code below shows how easy you can create the PDF documents from existing HTML pages or HTML strings. With just a few lines of code you can get richly formatted PDF document:

protected void buttonConvertToPdf_Click(object sender, EventArgs e)
{
    // create the HTML to PDF converter
    HtmlToPdf htmlToPdfConverter = new HtmlToPdf();

    // set browser width
    htmlToPdfConverter.BrowserWidth = int.Parse(textBoxBrowserWidth.Text);

    // set browser height if specified, otherwise use the default
    if (textBoxBrowserHeight.Text.Length > 0)
        htmlToPdfConverter.BrowserHeight = int.Parse(textBoxBrowserHeight.Text);

    // set HTML Load timeout
    htmlToPdfConverter.HtmlLoadedTimeout = int.Parse(textBoxLoadHtmlTimeout.Text);

    // set PDF page size and orientation
    htmlToPdfConverter.Document.PageSize = GetSelectedPageSize();
    htmlToPdfConverter.Document.PageOrientation = GetSelectedPageOrientation();

    // set the PDF standard used by the document
    htmlToPdfConverter.Document.PdfStandard = checkBoxPdfA.Checked ? PdfStandard.PdfA : PdfStandard.Pdf;

    // set PDF page margins
    htmlToPdfConverter.Document.Margins = new PdfMargins(5);

    // set whether to embed the true type font in PDF
    htmlToPdfConverter.Document.FontEmbedding = checkBoxFontEmbedding.Checked;

    // set triggering mode; for WaitTime mode set the wait time before convert
    switch (dropDownListTriggeringMode.SelectedValue)
    {
        case "Auto":
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.Auto;
            break;
        case "WaitTime":
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.WaitTime;
            htmlToPdfConverter.WaitBeforeConvert = int.Parse(textBoxWaitTime.Text);
            break;
        case "Manual":
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.Manual;
            break;
        default:
            htmlToPdfConverter.TriggerMode = ConversionTriggerMode.Auto;
            break;
    }

    // set header and footer
    SetHeader(htmlToPdfConverter.Document);
    SetFooter(htmlToPdfConverter.Document);

    // set the document security
    htmlToPdfConverter.Document.Security.OpenPassword = textBoxOpenPassword.Text;
    htmlToPdfConverter.Document.Security.AllowPrinting = checkBoxAllowPrinting.Checked;

    // set the permissions password too if an open password was set
    if (htmlToPdfConverter.Document.Security.OpenPassword != null && htmlToPdfConverter.Document.Security.OpenPassword != String.Empty)
        htmlToPdfConverter.Document.Security.PermissionsPassword = htmlToPdfConverter.Document.Security.OpenPassword + "_admin";

    // convert HTML to PDF
    byte[] pdfBuffer = null;

    if (radioButtonConvertUrl.Checked)
    {
        // convert URL to a PDF memory buffer
        string url = textBoxUrl.Text;

        pdfBuffer = htmlToPdfConverter.ConvertUrlToMemory(url);
    }
    else
    {
        // convert HTML code
        string htmlCode = textBoxHtmlCode.Text;
        string baseUrl = textBoxBaseUrl.Text;

        // convert HTML code to a PDF memory buffer
        pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlCode, baseUrl);
    }

    // inform the browser about the binary data format
    HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf");

    // let the browser know how to open the PDF document, attachment or inline, and the file name
    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("{0}; filename=HtmlToPdf.pdf; size={1}",
        checkBoxOpenInline.Checked ? "inline" : "attachment", pdfBuffer.Length.ToString()));

    // write the PDF buffer to HTTP response
    HttpContext.Current.Response.BinaryWrite(pdfBuffer);

    // call End() method of HTTP response to stop ASP.NET page processing
    HttpContext.Current.Response.End();
}

You can find more HTML to PDF C# and VB.NET samples in the online demo.

HiQPdf Multi-Platform HTML to PDF Library for .NET

The HiQPdf Multi-Platform Solution delivers the same power and quality to your .NET Core applications on many platforms, including the most restrictive ones. The solution consists in HiQPdf Server application which can run as an Azure Cloud Service or as a Windows Service and a client library for .NET Core that can be used in any .NET Core application on any platform which offers support for .NET Core or .NET Standard 2.0 and above.

The full list of features can be seen at https://www.hiqpdf.com/multiplatform-html-to-pdf-library-net-core.aspx

HiQPdf Multi-Platform HTML to PDF Library for .NET

You can deploy your .NET Core applications on Windows, Linux and MacOS operating systems or in more restrictive environments like Azure App Service for Windows and Linux. You can also use the library in Universal Windows Platform applications or in Xamarin applications for Android and iOS.

Download the Server Zip Package from downloads page. and extract it in a folder close to the root folder to avoid working with long file path names. Follow the detailed instructions from ‘Installation Guide.pdf’ document to install the server and get the server IP address that you will use in the client applications. For a quick testing you can use either the Windows Service installed on localhost or the Azure Cloud Service started in emulator and in this case the IP address is 127.0.0.1.

Download Client Library for .NET Core Zip Package, extract it into a folder, open in Visual Studio the demo application from Samples folder to build and run it. By default the application is using the localhost IP address. If everything works well on localhost you can start the production deployment. Instead of using our demo application you can create your own application for .NET Core in Visual Studio, add a reference to HiQPdf.Client NuGet Package or to assembly from product package and use the simple code below to convert an URL to PDF document you can save into a file or send it for download in browser.

using HiQPdfClient;

 // Create the converter object
 HtmlToPdf converter = new HtmlToPdf("{server_ip_address}");

 // Convert the HTML page from URL to memory
 byte[] pdfData = converter.ConvertUrlToMemory(UrlToConvert);

 // Save the PDF data to a file
 System.IO.File.WriteAllBytes("output.pdf", pdfData);

 // Alternatively convert and save to a file in one step
 converter.ConvertUrlToFile(UrlToConvert, "output.pdf");

 // Send the PDF data for download in ASP.NET Core applications
 FileResult fileResult = new FileContentResult(pdfData, "application/pdf");
 fileResult.FileDownloadName = "Output.pdf";
 return fileResult;

 // Send the PDF data for download in ASP.NET Web Forms applications
 HttpResponse httpResponse = HttpContext.Current.Response;
 httpResponse.AddHeader("Content-Type", "application/pdf");
 httpResponse.AddHeader("Content-Disposition",
            String.Format("attachment; filename=ConvertHtmlPart.pdf; size={0}",
            pdfData.Length.ToString()));
 httpResponse.BinaryWrite(pdfData);
 httpResponse.End();

The multi-platform solution offers the same features as the regular library for .NET and .NET Core that you can use now at the same power and quality on various operating systems and in the most restrictive environments.

Extract Text from PDF Documents in .NET Applications

With HiQPdf Library you can extract the text from PDF documents to a .NET System. String object using the PdfTextExtract class. You can set the text extraction mode with PdfTextExtract.TextExtractMode property and choose to keep the original positioning of the text in the PDF document or you can choose to extract the text in a layout more suitable for reading.

The C# sample code below shows how easy you can extract the text from existing PDF documents. With just a few lines of code you can obtain the text representation of a PDF document:

// get the PDF file
string pdfFile = Server.MapPath("~") + @"\DemoFiles\Pdf\InputPdf.pdf";

// create the PDF text extractor
PdfTextExtract pdfTextExtract = new PdfTextExtract();

// set the text extraction mode
pdfTextExtract.TextExtractMode = GetTextExtractMode();

int fromPdfPageNumber = int.Parse(textBoxFromPage.Text);
int toPdfPageNumber = textBoxToPage.Text.Length > 0 ? int.Parse(textBoxToPage.Text) : 0;

// extract the text from a range of pages of the PDF document
string text = pdfTextExtract.ExtractText(pdfFile, fromPdfPageNumber, toPdfPageNumber);

// get UTF-8 bytes
byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);

// the UTF-8 marker
byte[] utf8Marker = new byte[] { 0xEF, 0xBB, 0xBF };

// the text document bytes with UTF-8 marker followed by UTF-8 bytes
byte[] bytes = new byte[utf8Bytes.Length + utf8Marker.Length];
Array.Copy(utf8Marker, 0, bytes, 0, utf8Marker.Length);
Array.Copy(utf8Bytes, 0, bytes, utf8Marker.Length, utf8Bytes.Length);

// inform the browser about the data format
HttpContext.Current.Response.AddHeader("Content-Type", "text/plain; charset=UTF-8");

// let the browser know how to open the text document and the text document name
HttpContext.Current.Response.AddHeader("Content-Disposition",
    String.Format("{0}; filename=ExtractedText.txt; size={1}", "attachment", bytes.Length.ToString()));

// write the text buffer to HTTP response
HttpContext.Current.Response.BinaryWrite(bytes);

// call End() method of HTTP response to stop ASP.NET page processing
HttpContext.Current.Response.End();

See also the live demo for Text Extraction from PDF documents for a fully functional example.

 

Search Text In PDF Using HiQPdf Library

With HiQPdf Library for .NET you can search a text in a PDF document using the SearchText() method of the PdfTextExtract class. You can choose to match the case or to match the whole word only when searching using this method parameters.

In the C# code sample below you can see how to search for a text in an existing PDF document. The found text is then highlighted in the original PDF.

C# Code Sample to Search and Highlight Text in PDF

// get the PDF file
string pdfFile = Server.MapPath("~") + @"\DemoFiles\Pdf\InputPdf.pdf";

// get the text to search
string textToSearch = textBoxTextToSearch.Text;

// create the PDF text extractor
PdfTextExtract pdfTextExtract = new PdfTextExtract();

int fromPdfPageNumber = int.Parse(textBoxFromPage.Text);
int toPdfPageNumber = textBoxToPage.Text.Length > 0 ? int.Parse(textBoxToPage.Text) : 0;

// search the text in PDF document
PdfTextSearchItem[] searchTextInstances = pdfTextExtract.SearchText(pdfFile, textToSearch,
            fromPdfPageNumber, toPdfPageNumber, checkBoxMatchCase.Checked, checkBoxMatchWholeWord.Checked);

// load the PDF file to highlight the searched text
PdfDocument pdfDocument = PdfDocument.FromFile(pdfFile);

// highlight the searched text in PDF document
foreach (PdfTextSearchItem searchTextInstance in searchTextInstances)
{
    PdfRectangle pdfRectangle = new PdfRectangle(searchTextInstance.BoundingRectangle);

    // set rectangle color and opacity
    pdfRectangle.BackColor = Color.Yellow;
    pdfRectangle.Opacity = 30;

    // highlight the text
    pdfDocument.Pages[searchTextInstance.PdfPageNumber - 1].Layout(pdfRectangle);
}

// write the modified PDF document
try
{
    // write the PDF document to a memory buffer
    byte[] pdfBuffer = pdfDocument.WriteToMemory();

    // inform the browser about the binary data format
    HttpContext.Current.Response.AddHeader("Content-Type", "application/pdf");

    // let the browser know how to open the PDF document and the file name
    HttpContext.Current.Response.AddHeader("Content-Disposition", String.Format("attachment; filename=SearchText.pdf; size={0}",
                pdfBuffer.Length.ToString()));

    // write the PDF buffer to HTTP response
    HttpContext.Current.Response.BinaryWrite(pdfBuffer);

    // call End() method of HTTP response to stop ASP.NET page processing
    HttpContext.Current.Response.End();
}
finally
{
    pdfDocument.Close();
}

You can find a live demo for searching and highlighting the text in PDF on product website.

Partially Convert a HTML Page to PDF

The HiQPdf HTML to PDF converter allows you to convert only a selected HTML element from the HTML document. The selected element can be for example a TABLE element or a DIV element containing other HTML elements.

This feature is useful when you want to convert only a part of the HTML document. For example, a web page usually has a header with menu and logo and a footer with contact information and copyright notice besides the main HTML content you want to convert to PDF. In order to convert only the main content of the document you can place the main content in a block element like a DIV or a TABLE and configure the converter to convert only that block element.

The HTML element to be converted is selected by the ConvertedHtmlElementSelector property. This property can be set with a value representing the CSS selector of the HTML element to be converted. For example, the #MyHtmlElement CSS selector will select the HTML element having the ‘MyHtmlElement‘ ID from document and the the *[class=”ConvertibleElementStyle”] CSS selector will select only the HTML element having the ‘ConvertibleElementStyle‘ CSS class. If many elements in the HTML document are selected by a CSS selector, only the the first one will be converted. The values of the attributes in the CSS selectors are case sensitive. If this property is not set then the whole HTML document is converted.

C# Code Sample for Partially Converting a HTML to PDF

// create the HTML to PDF converter
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();

// convert only the HTML element having the MyHtmlElement ID 
htmlToPdfConverter.ConvertedHtmlElementSelector = "#MyHtmlElement";

You can test this feature live in Convert Only a Selected Region of HTML Page demo.