Displaying a part of a pdf page and reading Acroforms with pdf.js
Introduction
pdf.js is simply put a PDF reader in pure HTML5/Javascript developed by Mozilla Labs.
It allows you to display pdfs in your webpages without requiring an external plugin but instead drawing the pdf to an HTML5 canvas.
This way, it is possible to directly interact with the pdf using Javascript.
In this article we will provide an example of using this library: we will render a pdf page and add controls to zoom in on the acroform fields contained within the rendered page.
In order to do this we need to solve two different tasks: we need to be able to render a pdf page with a specific x,y offset and we need to be able to read basic acroform field info.
pdf.js notably does not work on IE8 since this browser is very weak on the HTML5 side.
For more info on browser compatibility, check the “What browsers are supported?” questions from the pdf.js FAQ.
Using the library
In this tutorial, we are using the latest stable release of pdf.js (1.0.473).
There are two ways to start using this library.
The first one is to start with importing the viewer that comes as an example with the pdf.js distribution.
It consists of 3 files – viewer.js, viewer.html and viewer.css – which implement a full pdf viewer.
This implementation uses all features of the pdf.js library, from basic to advanced.
If you want to embed a full pdf viewer in your page, it is probably best to start from this example and remove and add features as necessary.
The second one is to start from scratch: importing the pdf.js and pdf.worker.js files and implement the actual viewer yourself.
I believe this is the best way to learn the pdf.js library basics and also the best approach when you don’t need a full pdf viewer but more specific functionality.
At the pdf.js Github page there are already some good jsbin examples of doing this: a simple hello world in which the first page of a pdf is rendered and a simple reader with previous/next controls.
Displaying a part of a pdfpage
We will base us on the simple reader example already mentioned above.
However, instead of displaying one full page, we want to display a pdf page with a specific x,y offset.
Compared to the simple reader example, the most interesting part is our different implementation of the renderPage function:
function renderPage(pageNum, xOffset, yOffset) { pageRendering = true; // Using promise to fetch the page pdfDoc.getPage(pageNum).then(function(page) { var pdfViewBox = page.pageInfo.view; var pdfPageWidth = pdfViewBox[2]; var pdfPageHeight = pdfViewBox[3]; var viewport = new PDFJS.PageViewport(pdfViewBox, scale, page.rotate, -xOffset, -yOffset); // Render PDF page into canvas context var renderContext = { canvasContext : ctx, viewport : viewport }; var renderTask = page.render(renderContext); // Wait for rendering to finish renderTask.promise.then(function() { pageRendering = false; if (pageNumPending !== null) { // New page rendering is pending renderPage(pageNumPending, pageLeftPending, pageTopPending); pageNumPending = null; pageLeftPending = 0; pageTopPending = 0; } }); }); }
We replaced the “page.getViewport(scale)” call from the basic reader example with creating a new instance of “PDFJS.PageViewport”.
If we look in the pdf.js source code, we see that the page.getViewport function of the PDFPageProxy object does exactly that:
getViewport: function PDFPageProxy_getViewport(scale, rotate) { if (arguments.length < 2) { rotate = this.rotate; } return new PDFJS.PageViewport(this.view, scale, rotate, 0, 0); }
But the viewport returned here always has x offset and y offset 0 while we now create one in which these offsets are set.
pdf.js and AcroForms
AcroForms is the part of the PDF specification that allows to include form fields and data in a pdf in a standardized way.
The built-in support in pdf.js for AcroForms is relatively poor.
There is currently no way to get all the acroform field data in an easily digestable way from the pdf we are loading.
However, some work has been done by pdf.js contributors in the form of this example: https://github.com/mozilla/pdf.js/blob/master/examples/acroforms/forms.js
What it comes down to is that it is not so difficult to read the basic acroform fields info using the getAnnotations method provided by the pdf.js API.
For example, the following variant of the Acroforms example prints all acroform field type, key and position info to the Javascript console:
function logAcroFormsToConsoleFor(page) { page.getAnnotations().then( function(items) { for ( var i = 0; i < items.length; i++) { var item = items[i]; switch (item.subtype) { case 'Widget': if (item.fieldType != 'Tx' && item.fieldType != 'Btn' && item.fieldType != 'Ch') { break; } if (item.fieldType == 'Tx') { console.log('found inputfield with key ' + item.fullName + ' (position: ' + item.rect + ')'); } if (item.fieldType == 'Btn') { if (item.flags & 32768) { console.log('found radio button with key ' + item.fullName + ' (position: ' + item.rect + ')'); } else if (item.flags & 65536) { console.log('found pushbutton with key ' + item.fullName + ' (position: ' + item.rect + ')'); } else { console.log('found checkbox with key ' + item.fullName + ' (position: ' + item.rect + ')'); } } if (item.fieldType == 'Ch') { console.log('found select box with key ' + item.fullName + ' (position: ' + item.rect + ')'); } } } }); }
Positioning the pdf with a specific offset for an acroform field
We now create a full example in which we combine displaying a pdf page with a specific offset with reading the acrofield information:
<!DOCTYPE html> <html> <body> <div> <button id="prev">Previous</button> <button id="next">Next</button> <span>Page: <span id="page_num"></span> / <span id="page_count"></span></span> </div> <div> <canvas id="the-canvas" style="border:1px solid black"></canvas> </div> <div> <form id="fieldsForm"> Choose field: <select id="selectField" onchange="eval(this.value);"> </select> </form> </div> <!-- Use latest PDF.js build from Github --> <script src="http://mozilla.github.io/pdf.js/build/pdf.js"></script> <script> var path = 'pdfs/7-NFR-Acroform.pdf'; var pdfDoc = null, pageNum = 1, pageRendering = false, pageNumPending = null, pageLeftPending = 0, pageTopPending = 0, canvas = document.getElementById('the-canvas'), ctx = canvas.getContext('2d'), pdfPageWidth = 0, pdfPageHeight = 0, pdfRotate = 0; function removeOptions(selectbox){ var i; for(i=selectbox.options.length-1;i>=0;i--){ selectbox.remove(i); } } function addSelectBoxForPageAcroFieldsToForm(page) { page.getAnnotations().then( function(items) { var selectField = document.getElementById("selectField"); removeOptions(selectField); var optionsHTML = []; if (items.length == 0){ optionsHTML.push("<option value=\"console.log('no fields clicked - doing nothing');\">No fields</option>"); } for ( var i = 0; i < items.length; i++) { var item = items[i]; switch (item.subtype) { case 'Widget': if (item.fieldType != 'Tx' && item.fieldType != 'Btn' && item.fieldType != 'Ch') { break; } var fieldName; if (item.fieldType == 'Tx') { fieldName = 'Inputfield with key ' + item.fullName + ' (position: ' + item.rect + ')'; } if (item.fieldType == 'Btn') { if (item.flags & 32768) { fieldName = 'Radiobutton with key ' + item.fullName + ' (position: ' + item.rect + ')'; } else if (item.flags & 65536) { fieldName = 'Pushbutton with key ' + item.fullName + ' (position: ' + item.rect + ')'; } else { fieldName = 'Checkbox with key ' + item.fullName + ' (position: ' + item.rect + ')'; } } if (item.fieldType == 'Ch') { fieldName = 'Selectbox with key ' + item.fullName + ' (position: ' + item.rect + ')'; } x = item.rect[0]; if (pdfRotate == 90){ y = pdfPageWidth - item.rect[1]; } else if (pdfRotate == 0){ y = pdfPageHeight - item.rect[1]; } else { //TODO: other rotates } optionsHTML.push("<option value=\"queueRenderPage("+pageNum+","+x+","+y+");\">"+fieldName+"</option>"); } } selectField.innerHTML = optionsHTML.join('\n'); }); } function renderPage(pageNum, xOffset, yOffset, renderAcroFieldSelect) { pageRendering = true; // Using promise to fetch the page pdfDoc.getPage(pageNum).then(function(page) { var pdfViewBox = page.pageInfo.view; pdfPageWidth = pdfViewBox[2]; pdfPageHeight = pdfViewBox[3]; pdfRotate = page.rotate; var viewport = new PDFJS.PageViewport(pdfViewBox, 1, page.rotate, -xOffset, -yOffset); // Render PDF page into canvas context var renderContext = { canvasContext : ctx, viewport : viewport }; var renderTask = page.render(renderContext); // Wait for rendering to finish renderTask.promise.then(function() { if (renderAcroFieldSelect){ addSelectBoxForPageAcroFieldsToForm(page); } pageRendering = false; if (pageNumPending !== null) { // New page rendering is pending renderPage(pageNumPending, pageLeftPending, pageTopPending); pageNumPending = null; pageLeftPending = 0; pageTopPending = 0; } }); // Update page counters document.getElementById('page_num').textContent = pageNum; }); } /** * If another page rendering in progress, waits until the rendering is * finised. Otherwise, executes rendering immediately. */ function queueRenderPage(num, xOffset, yOffset) { if (pageRendering) { pageNumPending = num; pageLeftPending = xOffset; pageTopPending = yOffset; } else { renderPage(num, xOffset, yOffset, false); } } /** * Displays previous page. */ function onPrevPage() { if (pageNum <= 1) { return; } pageNum--; queueRenderPage(pageNum,0,0,true); } document.getElementById('prev').addEventListener('click', onPrevPage); /** * Displays next page. */ function onNextPage() { if (pageNum >= pdfDoc.numPages) { return; } pageNum++; queueRenderPage(pageNum,0,0,true); } document.getElementById('next').addEventListener('click', onNextPage); /** * Asynchronously downloads PDF. */ PDFJS.getDocument(path).then(function (pdfDoc_) { pdfDoc = pdfDoc_; document.getElementById('page_count').textContent = pdfDoc.numPages; // Initial/first page rendering renderPage(pageNum,0,0,true); }); </script> </body> </html>
We also put the snippet on JSBin. However, we do not have a web server which allows cross-origin requests, so we cannot reference a pdf with actual Acroforms in this snippet.
In order to run the snippet, you will need to execute it from a web server of your own on which you deploy a sample pdf with Acroforms such as this one.
What the snippet does is displaying a particular page of a pdf, reading the acroforms for this page and then adding an element to a select box for each acrofield that was found.
When one of the elements of the select box is clicked, it positions the page so the field for which the element was clicked is in the upper left corner of the pdf display.
Compliments to Mozilla Labs for pdf.js
Mozilla Labs did an awesome job at developing pdf.js.
It is a very interesting and useful Javascript library.
However, better Acroforms support would make it even more useful.
In particular, we would like:
- A way of getting all appearance states for select boxes and checkbox acroform fields from the API.
- A way of setting the value of an acroform field on the pdf and rerendering it so the field is filled.
Currently, the API does not provide a way to do this(something like item.fieldValue = 'test' currently has no effect on the rendered pdf since these items are not taken into account at all when rendering a pdf).