Creating eBooks: An ePub Tutorial
Most recent version of this document is now at the Spontaneous Derivation Wiki.
This is a step-by-step tutorial, with example, of making a standards-compliant ePub book by hand.
We’ll be using the public domain (in both illustrations and text) book The Velveteen Rabbit. It has the following good qualifications as a tutorial example:
- Small.
- Exists in HTML form in the public domain.
- Tiny table of contents, but a table of contents still exists.
- Images.
Sections
- Introduction
- Materials
- Preparing the Basic Structure
- Adding Content
- Creating the Epub File
- About the Sample Epub
Appendix
Introduction
At the moment, whenever I create a Mobipocket book for the Kindle, I start off with creating the ePub version, because things are a bit cleaner that way.
While there are existing ePub tutorials out there, they’re usually not thorough enough to avoid a lot of *headdesk* moments. Currently the best by far is Harrison Ainsworth’s Epub Format Construction Guide, which is referred to by this tutorial from time to time.
If you want to follow along, here’s the final ePub of this process: the_velveteen_rabbit.epub
Materials
These are what I use when I’m creating an ePub book. Your mileage may vary. For what it’s worth, I work on a MacBook Pro.
I use other tools as well for text manipulation, but this is the minimal set, apart from a text editor, for working with ePub books.
- Info-ZIP
-
It can exclude extra file attributes, which is necessary because otherwise your ePub book can’t be read. I’m not sure if other versions can do the same.
- Adobe Digital Editions
-
The best way to look at your work as it implements more of the ePub spec than any other software currently.
- EpubCheck
-
The best way to check your work’s structure for errors; even a small mistake may result in, say, table of contents not working. Adobe Digital Editions will not display the specific error, even if it’s a fatal one, but epubcheck will.
Preparing the Basic Structure
The structure for ePub is more strict than for Mobipocket or, indeed, many other ebook formats.
Here’s a step-by-step guide. After this, you might want to copy this directory and its contents somewhere else so you can use it as a template instead of doing this annoying thing over and over again.
1. Create a base directory.
It doesn’t have to be perfectly named after the book itself, since the actual title is included in a metadata file we’ll create later.
2. In the base directory, create the following folders:
META-INF and content.
I’ll note that I’m an odd duck, because I like to square away my content in a separate directory so that the text/images are separate from the metadata about the book.
3. Create a mimetype file.
It’s not going to be complicated; it just contains the single line:
application/epub+zip
It should have no suffix at all (e.g., it should just be called “mimetype”, not “mimetype.txt”) nor extra spacing or empty lines.
4. Create a basic toc.ncx file.
This lists navigation points (like chapters). What you put here will show up in the left part of Adobe Digital Editions.
We’re going to add more to this as we add more files to our content directory, but we’ll start with the skeleton.
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
<head>
<meta name="dtb:uid" content="Spontaneous Derivation [2008.12.10-21:02:00]"/>
<meta name="dtb:depth" content="1"/>
<meta name="dtb:totalPageCount" content="0"/>
<meta name="dtb:maxPageNumber" content="0"/>
</head>
<docTitle>
<text>The Velveteen Rabbit</text>
</docTitle>
<navMap>
</navMap>
</ncx>
You should replace the title between the <text></text>, and replace the content attribute of <meta name="dtb:uid" content="..."/> with your own unique id. This id will be the same as the one in the metadata.opf file, coming up next…
5. Create a basic metadata.opf file.
We’re going to add more to this as we add more files to our content directory, but we’ll start with the skeleton.
The contents of metadata.opf (and this time the suffix needs to be .opf) is the following XML:
<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="bookid">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:title>The Velveteen Rabbit or How Toys Become Real</dc:title>
<dc:creator opf:file-as="Williams, Margery" opf:role="aut">Margery Williams</dc:creator>
<dc:creator opf:file-as="Nicholson, William" opf:role="ill">William Nicholson</dc:creator>
<dc:language>en-US</dc:language>
<dc:identifier id="bookid">Spontaneous Derivation [2008.12.10-21:02:00]</dc:identifier>
<dc:rights>Public Domain</dc:rights>
</metadata>
<manifest>
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
</manifest>
<spine toc="ncx">
</spine>
</package>
The meanings of some of the dc: tags in the metadata:
<dc:title>-
Real title of the work.
<dc:creator>-
Author/contributor of the work. The name format should be first name, last name. You’ll also want:
opf:file-as="[last name], [first name]"
For the sanity of ebook libraries that need to file books by last name first.opf:role="[aut|ill|etc]"
Useful for indicating writers (“aut”) versus illustrators (“ill”), translators (“trl”), and more (see this table of the OPF spec for more roles).You can have multiple creators, and thus multiple
<dc:creator>tags. <dc:language>-
The language code for this book; needs to be an IETF language code; for more on this, see Language Code on Wikipedia.
epubcheck alerts you if the language code is invalid.
<dc:identifier>-
An ID string that uniquely identifies your book. For its
id="something"attribute,"something"must theunique-identifier="something"on the very first line (you may need to scroll right to see it). This something is just the name of the schema of the id string.I just use “bookid” as the something, because it’s the most flexible, and then a string with my website name and a date/time stamp. You can use ISBN strings, or really anything, so long as it’s likely unique.
You can even have multiple
<dc:identifier>if for some reason an ebook store/system cares about a specific format. <dc:rights>-
Unlike the above dc: tags, this isn’t necessary for epub standards-compliance, but I like to include it. It’s nice to notify people that a work is Public Domain versus the different types of Creative Commons, so as to help forestall nasty emails.
If you have copyright plus a license (like creative commons), you can have multiple
<dc:rights>—one for the copyright, and one for the license.
There are other, optional dc tags; see Ainsworth’s guide.
6. Create a container.xml file in the META-INF directory.
Very simple and very boilerplate; the only thing you really care about is that the full-path of the rootfile points to the above metadata.opf.
This file belongs inside META-INF, not in the root directory.
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="metadata.opf" media-type="application/oebps-package+xml"/>
</rootfiles>
</container>
Adding Content
This section covers adding the HTML files and the basics of XHTML/CSS compliancy needed so that ePub readers can display your stuff.
I won’t be covering file format conversion or HTML massage here (and the file over at the University of Pennsylvania needs it).
Content Files
Your HTML files need to stick to the XHTML 1.1 specification and the CSS 2.1 specification, with a few limitations. You can also include separate CSS files and OpenType fonts and images (JPEG, GIF, PNG are most common).
Ainsworth’s guide has the nitty-gritty.
Epubcheck will help you find any trouble spots.
See Appendix: Catalog of Common HTML Problems for more details.
Updating metadata.opf
You need to update two sections in metadata.opf:
1. Manifest
The list of content files, as well as the toc.ncx we created earlier). This includes the CSS files, fonts, and images (but not metadata.opf, container.xml, or mimetype).
<manifest>
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
<item id="titlepage" href="content/title.html" media-type="application/xhtml+xml"/>
<item id="dedication" href="content/dedication.html" media-type="application/xhtml+xml"/>
<item id="listillustrations" href="content/list-illustrations.html" media-type="application/xhtml+xml"/>
<item id="text" href="content/text.html" media-type="application/xhtml+xml"/>
<item id="credits" href="content/credits.html" media-type="application/xhtml+xml"/>
<item id="image_1" href="content/images/christmas-morning.jpg" media-type="image/jpeg"/>
<item id="image_2" href="content/images/anxious-times.jpg" media-type="image/jpeg"/>
<item id="image_3" href="content/images/at-last.jpg" media-type="image/jpeg"/>
<item id="image_4" href="content/images/fairy-flower.jpg" media-type="image/jpeg"/>
<item id="image_5" href="content/images/skin-horse.jpg" media-type="image/jpeg"/>
<item id="image_6" href="content/images/spring-time.jpg" media-type="image/jpeg"/>
<item id="image_7" href="content/images/summer-days.jpg" media-type="image/jpeg"/>
<item id="image_8" href="content/images/capital-t.jpg" media-type="image/jpeg"/>
</manifest>
You need to include, for each item: an id (which can be used to refer to it later in the file), path to the file, and the media type of the file. Here’s a list of all the media types.
2. Spine
Reading order of the HTML files from the manifest. You don’t want to include images, CSS, or fonts here.
<spine toc="ncx">
<itemref idref="titlepage"/>
<itemref idref="dedication"/>
<itemref idref="listillustrations"/>
<itemref idref="text"/>
<itemref idref="credits"/>
</spine>
The idref attribute refers back to the item’s id in the manifest.
Common examples:
HTML
<item id="titlepage" href="content/title.html" media-type="application/xhtml+xml"/>
CSS
<item id="stylesheet" href="content/stylesheet.css" media-type="text/css"/>
JPG Image
<item id="image_1" href="content/images/christmas-morning.jpg" media-type="image/jpeg"/>
Updating toc.ncx
This is the table of contents, listing each navigation point (chapters and such) under the navigation map.
<navMap>
<navPoint id="navpoint-1" playOrder="1">
<navLabel>
<text>Title Page</text>
</navLabel>
<content src="content/title.html"/>
</navPoint>
<navPoint id="navpoint-2" playOrder="2">
<navLabel>
<text>Dedication</text>
</navLabel>
<content src="content/dedication.html"/>
</navPoint>
<navPoint id="navpoint-3" playOrder="3">
<navLabel>
<text>List of Illustrations</text>
</navLabel>
<content src="content/list-illustrations.html"/>
</navPoint>
<navPoint id="navpoint-4" playOrder="4">
<navLabel>
<text>Start Reading</text>
</navLabel>
<content src="content/text.html"/>
</navPoint>
<navPoint id="navpoint-5" playOrder="5">
<navLabel>
<text>Credits</text>
</navLabel>
<content src="content/credits.html"/>
</navPoint>
</navMap>
Every navigation point (navPoint) has a unique navigation point id and a play order, with the label of that oint and the content source specified.
<navPoint id="navpoint-5" playOrder="5">
<navLabel>
<text>Credits</text>
</navLabel>
<content src="content/credits.html"/>
</navPoint>
Creating the Epub File
Epub is, in essence, nothing more than a ZIP archive of files. (Mobipocket is actually the same way, except that it uses a Palm-specific format.) You can create it with zip, but as mentioned somewhere before, you need to exclude extra file attributes. Epub depends on the mimetype file to be of the correct byte length—and also for this reason, the mimetype file must come before all the other ones.
If you have Info zip, on the command line run:
zip -Xr9D The_Velveteen_Rabbit.epub mimetype * -x .DS_Store
Options:
-X: Exclude extra file attributes (permissions, ownership, anything that adds extra bytes)
-r: Recurse into directories
-9: Better compression
-D: Don’t list directories as separate entries in the zip file
-x .DS_Store: Don’t include Mac OS X’s little hidden file of snapshots etc.
About the Sample EPub
This is different from the main Velveteen Rabbit download offered on S∂. The main download includes extra files and metadata so that a Mobipocket book can be generated.
That’s another post for another day.
Print This Post
Appendix: Catalog of Common HTML Problems
Common problems (with explanation of epubcheck error translation if non-straightforward):
XML and Namespace Declarations
-
Not including the
<?xml...?>declaration at the top. -
Not including
xmlns="http://www.w3.org/1999/xhtml"attribute in the<html>tag.Epubcheck would say that none of your tags are supported by the “” namespace.
Break, Formatting, and Paragraph Tags
-
Break tags that look like
<br>instead of<br />.Epubcheck would say that it finds an unexpected closing tag that wasn’t
</br>. -
Paragraph tags (
<p>) that aren’t closed (</p>).Epubcheck will run into the next
<p>and complain about it. -
Break tags and text formatting tags (like
<tt>, <i>, <em>, <b>, <strong>, <font>, etc.) that exist outside of paragraphs.Epubcheck will tell you they don’t belong there.
Lists
-
Tags that were formerly allowed to run free without a closing tag can do so no longer. This includes
li, dt, dd. -
Bare list item tags (li, dt, dd) are no longer allowed; they must be inside closing ol, ul, or dl tags.
-
Lists don’t belong inside paragraphs; they must be outside of them.
Headers
-
Headers don’t belong inside paragraphs; they must be outside of them.
Images
-
Image tags that don’t end with
/>(rather than simply the angle bracket). -
Images that don’t include the
altattribute.
Anchors, aka “Missing Fragments”
-
Using
<a name="...">tags instead of the id attribute when creating anchors. Example: instead of<a name="illus_christmas_morning"></a><p>...</p>Use instead
<p id="illus_christmas_morning">...</p>When you want to link to the anchor (aka “fragment”), you still use
<a href="text.html#illus_christmas_morning">Christmas Morning</a>either way.
This error generates by far the weirdest epubcheck error, where it complains that fragments are missing.
The id way is cleaner anyways.
-
An anchor is also considered missing if the HTML around it is invalid for some reason.
Print This Post





























I’m really impressed by the tutorial and the amount of work you have to put into creating even simplest ebook. I’ve never realized it’s so complicated and time consuming process, since I use Mobipocket Creator, free tool which simplifies this process enormously. Unfortunately, Windows-only.
http://www.mobipocket.com/en/DownloadSoft/ProductDetailsCreator.asp
Hello Wasaty,
I use Mobipocket Creator as well. Just for Mobipocket books, though; it can’t generate ePub.
It’s the magic of Crossover for Mac that lets me run that Windows app.
And addendum: Most of the time I spend creating an ebook, whether I’m going to produce ePub, Mobipocket Creator, or something else, I spend in the neighborhood of 90% of my time fiddling with formatting and breaking up files, adding captions to images, etc.
It’s required to have a ‘toc’ attribute on the spine element. The value of this element should be the ID of the NCX file.
See here in the spec:
http://openebook.org/2007/opf/OPF_2.0_final_spec.html#Section2.4.1
“If a Publication includes an NCX, the item that describes the NCX must be referenced by the spine toc attribute.”
epubcheck doesn’t currently complain about it, but it should — there’s an error in the official OPF schema. This is in the process of being corrected.
Hi Liza,
Thanks for the heads up! I’ve updated the basic metadata.opf section as well as the sample epub file.
Actually for dc:creator it should be “firstname lastname” and you can use the opf:file-as attribute on dc:creator to set “lastname, firstname”. You can also use an opf:role value on the dc:creator property.
Thanks, Hadrien. The tutorial section and the EPub have both been updated.
Awesome tutorial. Thank you.
Thanks so much for posting this! … I’ll come back with some silly questions later in the week. Keep up the great work. … MP
Wow. This is pretty complex. Even starting with text in ASCII format. Taking scans off the printed page would just be a beast to tackle!
Great job on your tutorial. Thanks for taking the time to create and post it!
Seth
Hi ,
I downloaded few epub books from “http://www.epubbooks.com/books” ,
when i renamed the .epub extension to .zip and tried to unzip them,
then they are not getting unzipped ,
What might be the problem ?
any suggestion is welcomed.
Thanks..
Hello Biranchi,
I don’t know what the problem could be. File corruption, maybe, but I’m not sure.
Sorry I can’t be of more help. :(
Some things that I had to scratch my head over -
Does the mime type need a newline after it?
In META-INF/container.xml, I think but am not sure that full-path is relative to the root of the distro, so there’s an implied META-INF/.. . For example, the following is the correct full path for the metadata.opf in the parent of META-info:
Are these two IDs the same?
metadata.opf: foo
toc.ncx:
Hello Lucas!
There’s a newline at the end of the mime type for the books I’ve created, but I don’t know if it’s necessary or not.
Yup, the full-path is indeed relative to the root of the distro.
I’m not sure which IDs you’re referring to?
I tried out MIMETYPE with and without a newline and epubcheck didn’t complain either way. I tested a no -newline version on Adobe’s reader it opened fine.
For full-path I tested with ./ as a prefix and found that epubcheck couldn’t process it, even though it’s legal.
The IDs got munged up during posting because I included XML. The question is what the difference is between dc:identifier in the .opf file and meta name=”dtb:uid” in toc.ncx.
Lucas,
Epubcheck isn’t infallible unfortunately, and has bugs. It’s a google code project, and so people can file bugs/ask questions.
I don’t know what the difference between dc:identifier and dtb:uid, but it’s probably a good idea to make them match up. Different reader implementations will latch onto one or the other to ID texts.
Maybe of interest: I just wrote a how-to on creating e-pub books with Sigil. I referenced this post as a must read back ground text before even beginning the wysiwyg route. :-)
http://www.noisepollution.nl/?p=2015
Thanks for your great run through.
There are tutorials for hand-creating Kindle (.mobi), and iPad (ePub) files on http://www.katiebooks.ca
Most of the code necessary can be copied and new titles, authors, etc inserted.
Have a look!
I would like to make ePub books with custom content, including video on some pages. Is there a tutorial for creating that?
Could I make books directly in some ePub software, or would I always need to convert using this tutorial?
Most importantly, could I make ebooks that do the cool page turning thing on my iPad?
Eric,
You should probably look at Sigil.
“Page turning thing” is a feature of the ebook reader, not the ebook itself.