| iTextSharp Tutorial
|
|
iTextSharp,
a Free C#-PDF library
|
| [Home] |
|
[TOC] |
[Next] |
[PDF] |
Part I: Simple iText
Chapter 1: Creating a document
|
Creation of
a document in 5 steps: Hello World Download the source code of the first example:
Chap0101.cs. This example contains the 5 most important
steps to create a PDF file using iText:
Step 1: Creates an instance of the iTextSharp.text.Document-object: Document document = new Document();
Step 2: Creates
a Writer that listens to this document and writes the document to
the Stream of your choice: PdfWriter.getInstance(document,
new FileStream("Chap0101.pdf", FileMode.Create));
Step 3: Opens
the document: document.Open();
Step 4: Adds
content to the document: document.Add(new Paragraph("Hello
World"));
Step 5: Closes
the document: document.Close();
Check the result here:
Chap0101.pdf.
|
Examining step 1: the
Document-object The iTextSharp.text.Document-object has 3
constructors:
public Document(); public Document(Rectangle pageSize); public Document(Rectangle pageSize, int marginLeft, int marginRight, int marginTop, int marginBottom);
The first
constructor calls the second one, with PageSize.A4 as parameter. The second constructor calls the third one,
with 36 as value for
each margin.
PageSize You
could create your own Rectangle-object in a certain color and
use this as pageSize. In example Chap0102.cs, we create a
long, narrow document with a yellowish backgroundcolor: Rectangle pageSize = new Rectangle(144,
720); pageSize.BackgroundColor = new
Color(0xFF, 0xFF, 0xDE); Document document =
new Document(pageSize); This is the
result: Chap0102.pdf. Normally,
you don't have to worry about creating this rectangle, since you can
use one of the statics in class PageSize.cs. These are the pagesizes
that are provided: A0-A10, LEGAL, LETTER, HALFLETTER, _11x17,
LEDGER, NOTE, B0-B5, ARCH_A-ARCH_E, FLSA and FLSE.
Most these pageSizes
are in PORTRAIT-format. If you want them to be in LANDSCAPE, all you have to do is rotate() the Rectangle: Document document = new
Document(PageSize.A4.rotate()); Check out example Chap0103.cs and its result.
Margins When creating a document, you can also define
left, right, upper and lower margins: Document document = new
Document(PageSize.A5, 36, 72, 108, 180); In example Chap0104.cs (and its
result: Chap0104.pdf), you will
see that this document has a left margin of 0.5 inch and a right
margin of 1 inch. The upper margin is 1.5 inch, the lower 2.5 inch.
Measurements When creating a rectangle or choosing a
margin, you might wonder what measurement unit is used:
centimeters, inches or pixels. In fact, the default
measurement system roughly corresponds to the various
definitions of the typographic unit of measurement known as
the point. There are 72 points in
1 inch. If you want to create a
rectangle in PDF that has the size of an A4-page, you have to
calculate the number of points: 21 cm / 2.54 = 8.2677
inch 8.2677 * 72 =
595 points 29.7 cm /
2.54 = 11.6929 inch 11.6929 * 72 = 842
points The default border of 36 points
corresponds with half an inch.
| Remark: if you change the page size, this only
has effect on the next page (see page
initializations). If you change the margins, this has an
immediate effect, so be careful!
|
Examining step 2: the
Writer-object Once our document
is created, we can create one or more instances of writers that
listen to this document. All writers should be derived from the
abstract class iTextSharp.text.DocWriter. For the moment there is one possibility: you
can use iTextSharp.text.pdf.PdfWriter to
generate documents in the Portable Document Format. If for instance
you want to generate TeX-documents as well, you could write a
package: iTextSharp.text.TeX.TeXWriter.
Remark: There
is also a package iTextSharp.text.xml (see Chapter 7).
| The
constructor of these writer-classes is made private. You can only
create an instance with the following method: public
static xxxWriter getInstance(Document document, Stream os) (xxx being Pdf or Xml)
You can create an
instance this way: PdfWriter writer =
PdfWriter.getInstance(document, new
FileStream("Chap01xx.pdf")); but you
will hardly ever need the object writer (except if you want to create Advanced PDF or if you want to use
some very specific functionalities such as ViewerPreferences or Encryption). So it's sufficient to
just get the instance: PdfWriter.getInstance(document, new
FileStream("Chap01xx.pdf"));
It's trivial that the first parameter should be
the document you created in step 1. The
second parameter can be an Stream of any kind. Until
now, we have always used a System.IO.FileStream to write the
document to a file, but in example Chap0105.aspx, the
outputStream is a System.IO.MemoryStream. (This is not a
standalone example; you will have to test this code
on a Servlet Engine). Here is a sample of a code-behind version of the
same example Chap0105.cs.
|
Examining step 3: Meta
data + opening the document Metadata Before you add any actual data (= content), you
might want to add some metadata about the document with one of these
methods: public boolean addTitle(String title) public boolean addSubject(String subject) public boolean addKeywords(String keywords) public boolean addAuthor(String author) public boolean addCreator(String creator) public boolean addProducer() public boolean addCreationDate() public boolean addHeader(String name, String
content)
You
can choose your own title, subject, keywords, author and creator,
but the method that adds the producerdata should always add: iTextSharp (or a reference to iTextSharp) and the
method that adds the creation date always add the current system
time (as a matter of fact, these two methods are called
automatically). You can also add a header
with a custom name, but this will have no effect on the PdfWriter. If we look at the example in step 1: Chap0101.pdf, we see
that only the producer and date are shown in the infobox. If we run
Chap0106.cs, the result
is a similar document: Chap0106.pdf, but there
are more items added to the infobox:
Things to do BEFORE you open the
document You can only add metadata
BEFORE the open-method is invoked. This is a choice
made by the developer of iText. In HTML
meta-information is put between the HEAD-tags at the beginning of the
document (the header section). Invoking the open-method causes the writer to write
this header to the Stream. So there is no way to
change this data once the document is 'opened'. The PDF header doesn't contain any metadata, it
looks like this: %PDF-1.2 %рсту The first line indicates the generated document
is a file in the Portable Document Format version 1.2. The meaning of the second line is explained in
the reference manual.
Portable Document Format Reference Manual
Version 1. 3 (section 2.3.2 'Portability' page 22): A PDF file is a binary file; the entire
8-bit range of characters may be used. Unfortunately, some
agents treat files that happen to use only the printable
subset of the 7-bit ASCII code and whitespace characters as text, and take unreasonable liberties
with the contents. For example, mail-transmission systems may
not preserve certain 7-bit characters and may change line
endings. This can cause damage to PDF files. Therefore, in situations where it is
possible to label PDF files as binary, we recommend that this be
done. One method for encouraging such treatment is to include
a few binary characters (codes greater than 127) in a comment
near the beginning of the file. |
In
PDF, the metadata is kept in a PdfInfo-object, written to the PdfWriter
when the document is closed. So there is no technical reason why one
couldn't alter the library to be able to add or change the metadata
at any time. It's a design choice that was made.
Page
initializations The Open-method also triggers
some initializations in the different writers. For instance if you
want a Watermark or a
HeaderFooter-object
to appear starting on the FIRST page of the document, you have to
add it BEFORE you open the document. The same goes for setting
watermark, headers, footers, pagenumbers and sizes for the rest of
the pages in the document. When you invoke
methods such as: public bool setPageSize(Rectangle
pageSize) public bool Add(Watermark
watermark) public void removeWatermark()
setting Header property public
void resetHeader() setting Footer
property public void resetFooter() public
void resetPageCount() setting PageCount
property the result of these methods
will only be seen on the next new page (when the initialization
methods of this page are called). This is illustrated in example 7. If you want
to try this example, you will need an image file called watermark.jpg. The
result should look like Chap0107.pdf.
Viewerpreferences For PDF files you can also set some viewer
preferences with the method: public void
setViewerPreferences(int preferences)
In example 8, some of the
possibilities are demonstrated: writerA.setViewerPreferences(PdfWriter.PageLayoutTwoColumnLeft); (see Chap0108a.pdf) writerB.setViewerPreferences(PdfWriter.HideMenubar
| PdfWriter.HideToolbar); (see Chap0108b.pdf) writerC.setViewerPreferences(PdfWriter.PageLayoutTwoColumnLeft
| PdfWriter.PageModeFullScreen |
PdfWriter.NonFullScreenPageModeUseThumbs); (see Chap0108c.pdf) As you can see, the preferences can be set by
ORing some of these constants:
- The page layout to be used when the
document is opened (choose one).
- PdfWriter.PageLayoutSinglePage - Display one page
at a time.
- PdfWriter.PageLayoutOneColumn - Display the pages
in one column.
- PdfWriter.PageLayoutTwoColumnLeft - Display the
pages in two columns, with oddnumbered pages on the left.
- PdfWriter.PageLayoutTwoColumnRight - Display the
pages in two columns, with oddnumbered pages on the right.
- The page mode how the document should be
displayed when opened (choose one).
- PdfWriter.PageModeUseNone - Neither document
outline nor thumbnail images visible.
- PdfWriter.PageModeUseOutlines - Document outline
visible.
- PdfWriter.PageModeUseThumbs - Thumbnail images
visible.
- PdfWriter.PageModeFullScreen - Full-screen mode,
with no menu bar, window controls, or any other window visible.
- PdfWriter.HideToolbar - A flag specifying whether
to hide the viewer application's tool bars when the document is
active.
- PdfWriter.HideMenubar - A flag specifying whether
to hide the viewer application's menu bar when the document is
active.
- PdfWriter.HideWindowUI - A flag specifying whether
to hide user interface elements in the document's window (such as
scroll bars and navigation controls), leaving only the document's
contents displayed.
- PdfWriter.FitWindow - A flag specifying whether to
resize the document's window to fit the size of the first
displayed page.
- PdfWriter.CenterWindow - A flag specifying whether
to position the document's window in the center of the screen.
- The document's page mode, specifying how
to display the document on exiting full-screen mode. It is
meaningful only if the page mode is PageModeFullScreen (choose one).
- PdfWriter.NonFullScreenPageModeUseNone - Neither
document outline nor thumbnail images visible
- PdfWriter.NonFullScreenPageModeUseOutlines -
Document outline visible
- PdfWriter.NonFullScreenPageModeUseThumbs -
Thumbnail images visible
Remark: you can invoke
this method only on a class of type PdfWriter.
Encryption Another thing to do BEFORE you open the
document, is to set the encryption (that is: if you want the PDF
file to be encrypted). To achieve this, you use the method: public
void setEncryption(boolean strength, String userPassword, String
ownerPassword, int permissions);
- The strength is one of these two
constants:
- PdfWriter.STRENGTH40BITS: 40 bits
- PdfWriter.STRENGTH128BITS: 128 bits (only with
Acrobat Reader 5.0 or higher)
- The userPassword and ownerPassword can be
null or have zero length. In this case the ownerPassword is
replaced by a random string.
- Permissions are set by ORing some of
these constants:
- PdfWriter.AllowPrinting
- PdfWriter.AllowModifyContents
- PdfWriter.AllowCopy
- PdfWriter.AllowModifyAnnotations
- PdfWriter.AllowFillIn
- PdfWriter.AllowScreenReaders
- PdfWriter.AllowAssembly
- PdfWriter.AllowDegradedPrinting
This functionality is demonstrated in example 9 and example 10.
writer.setEncryption(PdfWriter.STRENGTH40BITS,
null, null, PdfWriter.AllowCopy); Chap0109.pdf can be
opened without using a password, but the user is not allowed to
Print, Modify,... the document.
writer.setEncryption(PdfWriter.STRENGTH128BITS,
"userpass", "ownerpass", PdfWriter.AllowCopy |
PdfWriter.AllowPrinting); When you
try to open Chap0110.pdf, you are
asked for a password (type in 'userpass'). Because the
AllowPrinting-preference was added, you can print this document
without any problem.
|
Examining step 4:
adding content In the different
examples explaining step 1 to 3, you have already encountered
objects such as Phrase, Paragraph,... In the next 5 chapters all
these objects will be explained more thoroughly. Sometimes you may want a writer to
deliberately ignore actions
performed on a document. This is shown in example 11. If we create two writers: writerA and writerB
(this is an exception to what was said in step 2): PdfWriter writerA = PdfWriter.getInstance(document,
new FileStream("Chap0111a.pdf",
FileMode.Create)); PdfWriter writerB = PdfWriter.getInstance(document,
new FileStream("Chap0111b.pdf",
FileMode.Create)); We can create two documents that are
slightly different: writerA.Pause(); document.add(new Paragraph("This paragraph will
only be added to Chap0111b.pdf, not to Chap0111a.pdf")); writerA.resume(); You can compare the results of example
11 here: Chap0111a.pdf vs. Chap0111b.pdf.
|
Examining step 5:
closing the document Closing the
document is very important, because it flushes and closes the
outputstream to which the writer is writing. The close-method is
called in the finalize-method, but you shouldn't count on that. You
should always close the document yourself!
|
Advanced: reading PDF
You can't 'parse' an existing PDF file using iText, you can only 'read' it page per page.
What does this mean?
The pdf format is just a canvas where text and graphics are placed without
any structure information. As such there aren't any 'iText-objects' in a PDF file.
In each page there will probably be a number of 'Strings', but you can't reconstruct
a phrase or a paragraph using these strings. There are probably a number of lines drawn,
but you can't retrieve a Table-object based on these lines. In short: parsing the content
of a PDF-file is NOT POSSIBLE with iText. Post your question on the newsgroup
news://comp.text.pdf and maybe you will get some
answers from people that have built tools that can parse PDF and extract some of its contents,
but don't expect tools that will perform a bullet-proof conversion to structured text.
What iText DOES provide is the possibility to READ a PDF document and copy an entire
page of this file into the PDF file you are constructing from scratch. This can be useful
if you want to create a new document based on (an) existing document(s). You can add
a Watermark, pagenumbers,...
Example 12 takes a pdf
file from Chapter 7 and creates a new document where 4 pages of the original document
are painted on 1 page of the new document. We also added a Watermark and pagenumbers
(see Chap0112.pdf).
In order to fully understand the code (an how to adapt it to your needs, you will
have to read Chapter 10 first)
If you have an existing PDF file that represents a form, you could copy the pages
of this form and paint text at precise locations on this form. You can't edit an
existing PDF document, by saying: for instance replace the word Louagie by Lowagie.
To achieve this, you would have to know the exact location of the word Louagie,
paint a white rectangle over it and paint the word Lowagie on this white rectangle.
Please avoid this kind of 'patch' work. Do your PDF editing with an Adobe product.
Concat, Split, Handout, Encrypt
Using the functionality to read PDFs, I have made 4 little tools that can be
called from the commandline:
- Concat destfile file1 file2 [file3 ...]
concatenates the files file1, file2(, file3,...) and puts the result in destfile.
Concat concat.pdf Chap0101.pdf Chap0102.pdf Chap0103.pdf Chap0107.pdf
In the command above, we concatenate 4 files and put the result in file concat.pdf.
- Split srcfile destfile1 destfile2 pagenumber
splits the srcfile and puts pages 1 to (pagenumber - 1) in destfile1 and the rest of the pages
starting with pagenumber in destfile2.
Split concat.pdf split1.pdf split2.pdf 4
In the command above, we split a file and put pages 1-3 in file split1.pdf and pages 4-... in file split2.pdf.
- Handout srcfile destfile pages
For instance
Handout concat.pdf handouts.pdf 4
The command above creates a file handouts.pdf that put all the pages of concat.pdf 4 by 4 per page
and leaves some extra room for note, as shown in handouts.pdf.
- Encrypt srcfile destfile password
encrypts srcfile using password and put the result in destfile.
Encrypt concat.pdf encrypted.pdf iText
You will only be able to open the file encrypted.pdf
if you know the password (= iText).
|
| [Top] |
|
[TOC] |
[Next] |
[PDF] |
|