Most recent version of this document is now at the Spontaneous Derivation Wiki.

This is a step-by-step tutorial, with example, of making a standards-compliant ePub book by hand.

We’ll be using the public domain (in both illustrations and text) book The Velveteen Rabbit. It has the following good qualifications as a tutorial example:

  • Small.
  • Exists in HTML form in the public domain.
  • Tiny table of contents, but a table of contents still exists.
  • Images.

Sections

Appendix

Introduction

At the moment, whenever I create a Mobipocket book for the Kindle, I start off with creating the ePub version, because things are a bit cleaner that way.

While there are existing ePub tutorials out there, they’re usually not thorough enough to avoid a lot of *headdesk* moments. Currently the best by far is Harrison Ainsworth’s Epub Format Construction Guide, which is referred to by this tutorial from time to time.

If you want to follow along, here’s the final ePub of this process: the_velveteen_rabbit.epub

Materials

These are what I use when I’m creating an ePub book. Your mileage may vary. For what it’s worth, I work on a MacBook Pro.

I use other tools as well for text manipulation, but this is the minimal set, apart from a text editor, for working with ePub books.

Info-ZIP

It can exclude extra file attributes, which is necessary because otherwise your ePub book can’t be read. I’m not sure if other versions can do the same.

Adobe Digital Editions

The best way to look at your work as it implements more of the ePub spec than any other software currently.

EpubCheck

The best way to check your work’s structure for errors; even a small mistake may result in, say, table of contents not working. Adobe Digital Editions will not display the specific error, even if it’s a fatal one, but epubcheck will.

Preparing the Basic Structure

The structure for ePub is more strict than for Mobipocket or, indeed, many other ebook formats.

Here’s a step-by-step guide. After this, you might want to copy this directory and its contents somewhere else so you can use it as a template instead of doing this annoying thing over and over again.

1. Create a base directory.

It doesn’t have to be perfectly named after the book itself, since the actual title is included in a metadata file we’ll create later.

Velveteen Rabbit - Root Folder

2. In the base directory, create the following folders:

META-INF and content.

I’ll note that I’m an odd duck, because I like to square away my content in a separate directory so that the text/images are separate from the metadata about the book.

Velveteen Rabbit - META INF and content directories

3. Create a mimetype file.

It’s not going to be complicated; it just contains the single line:

application/epub+zip

It should have no suffix at all (e.g., it should just be called “mimetype”, not “mimetype.txt”) nor extra spacing or empty lines.

Velveteen Rabbit - mimetype

4. Create a basic toc.ncx file.

This lists navigation points (like chapters). What you put here will show up in the left part of Adobe Digital Editions.

Velveteen Rabbit - Basic toc.ncx

We’re going to add more to this as we add more files to our content directory, but we’ll start with the skeleton.

<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
  <head>
    <meta name="dtb:uid" content="Spontaneous Derivation [2008.12.10-21:02:00]"/>
    <meta name="dtb:depth" content="1"/>
    <meta name="dtb:totalPageCount" content="0"/>
    <meta name="dtb:maxPageNumber" content="0"/>
  </head>
  <docTitle>
    <text>The Velveteen Rabbit</text>
  </docTitle>
  <navMap>
  </navMap>
</ncx>

You should replace the title between the <text></text>, and replace the content attribute of <meta name="dtb:uid" content="..."/> with your own unique id. This id will be the same as the one in the metadata.opf file, coming up next…

5. Create a basic metadata.opf file.

We’re going to add more to this as we add more files to our content directory, but we’ll start with the skeleton.

Velveteen Rabbit - Basic metadata.opf

The contents of metadata.opf (and this time the suffix needs to be .opf) is the following XML:

<package xmlns="http://www.idpf.org/2007/opf" version="2.0" unique-identifier="bookid">
    <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
        <dc:title>The Velveteen Rabbit or How Toys Become Real</dc:title>
        <dc:creator opf:file-as="Williams, Margery" opf:role="aut">Margery Williams</dc:creator>
        <dc:creator opf:file-as="Nicholson, William" opf:role="ill">William Nicholson</dc:creator>
        <dc:language>en-US</dc:language>
        <dc:identifier id="bookid">Spontaneous Derivation [2008.12.10-21:02:00]</dc:identifier>
        <dc:rights>Public Domain</dc:rights>
    </metadata>
    <manifest>
        <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
    </manifest>
    <spine toc="ncx">
    </spine>
</package>

The meanings of some of the dc: tags in the metadata:

<dc:title>

Real title of the work.

<dc:creator>

Author/contributor of the work. The name format should be first name, last name. You’ll also want:

opf:file-as="[last name], [first name]"
For the sanity of ebook libraries that need to file books by last name first.

opf:role="[aut|ill|etc]"
Useful for indicating writers (“aut”) versus illustrators (“ill”), translators (“trl”), and more (see this table of the OPF spec for more roles).

You can have multiple creators, and thus multiple <dc:creator> tags.

<dc:language>

The language code for this book; needs to be an IETF language code; for more on this, see Language Code on Wikipedia.

epubcheck alerts you if the language code is invalid.

<dc:identifier>
An ID string that uniquely identifies your book. For its id="something" attribute, "something" must the unique-identifier="something" on the very first line (you may need to scroll right to see it). This something is just the name of the schema of the id string.

I just use “bookid” as the something, because it’s the most flexible, and then a string with my website name and a date/time stamp. You can use ISBN strings, or really anything, so long as it’s likely unique.

You can even have multiple <dc:identifier> if for some reason an ebook store/system cares about a specific format.

<dc:rights>
Unlike the above dc: tags, this isn’t necessary for epub standards-compliance, but I like to include it. It’s nice to notify people that a work is Public Domain versus the different types of Creative Commons, so as to help forestall nasty emails.

If you have copyright plus a license (like creative commons), you can have multiple <dc:rights>—one for the copyright, and one for the license.

There are other, optional dc tags; see Ainsworth’s guide.

6. Create a container.xml file in the META-INF directory.

Very simple and very boilerplate; the only thing you really care about is that the full-path of the rootfile points to the above metadata.opf.

Velveteen Rabbit - Basic container.xml

This file belongs inside META-INF, not in the root directory.

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
   <rootfiles>
      <rootfile full-path="metadata.opf" media-type="application/oebps-package+xml"/>
   </rootfiles>
</container>

Adding Content

This section covers adding the HTML files and the basics of XHTML/CSS compliancy needed so that ePub readers can display your stuff.

I won’t be covering file format conversion or HTML massage here (and the file over at the University of Pennsylvania needs it).

Content Files

Velveteen Rabbit - Content Tree

Your HTML files need to stick to the XHTML 1.1 specification and the CSS 2.1 specification, with a few limitations. You can also include separate CSS files and OpenType fonts and images (JPEG, GIF, PNG are most common).

Ainsworth’s guide has the nitty-gritty.

Epubcheck will help you find any trouble spots.

See Appendix: Catalog of Common HTML Problems for more details.

Updating metadata.opf

You need to update two sections in metadata.opf:

1. Manifest

The list of content files, as well as the toc.ncx we created earlier). This includes the CSS files, fonts, and images (but not metadata.opf, container.xml, or mimetype).

    <manifest>
        <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>

        <item id="titlepage" href="content/title.html" media-type="application/xhtml+xml"/>
        <item id="dedication" href="content/dedication.html" media-type="application/xhtml+xml"/>
        <item id="listillustrations" href="content/list-illustrations.html" media-type="application/xhtml+xml"/>
        <item id="text" href="content/text.html" media-type="application/xhtml+xml"/>
        <item id="credits" href="content/credits.html" media-type="application/xhtml+xml"/>

        <item id="image_1" href="content/images/christmas-morning.jpg" media-type="image/jpeg"/>
        <item id="image_2" href="content/images/anxious-times.jpg" media-type="image/jpeg"/>
        <item id="image_3" href="content/images/at-last.jpg" media-type="image/jpeg"/>
        <item id="image_4" href="content/images/fairy-flower.jpg" media-type="image/jpeg"/>
        <item id="image_5" href="content/images/skin-horse.jpg" media-type="image/jpeg"/>
        <item id="image_6" href="content/images/spring-time.jpg" media-type="image/jpeg"/>
        <item id="image_7" href="content/images/summer-days.jpg" media-type="image/jpeg"/>
        <item id="image_8" href="content/images/capital-t.jpg" media-type="image/jpeg"/>
    </manifest>

You need to include, for each item: an id (which can be used to refer to it later in the file), path to the file, and the media type of the file. Here’s a list of all the media types.

2. Spine

Reading order of the HTML files from the manifest. You don’t want to include images, CSS, or fonts here.

    <spine toc="ncx">
        <itemref idref="titlepage"/>
        <itemref idref="dedication"/>
        <itemref idref="listillustrations"/>
        <itemref idref="text"/>
        <itemref idref="credits"/>
    </spine>

The idref attribute refers back to the item’s id in the manifest.

Common examples:

HTML

<item id="titlepage" href="content/title.html" media-type="application/xhtml+xml"/>

CSS

<item id="stylesheet" href="content/stylesheet.css" media-type="text/css"/>

JPG Image

<item id="image_1" href="content/images/christmas-morning.jpg" media-type="image/jpeg"/>

Updating toc.ncx

This is the table of contents, listing each navigation point (chapters and such) under the navigation map.

  <navMap>
    <navPoint id="navpoint-1" playOrder="1">
      <navLabel>
        <text>Title Page</text>
      </navLabel>
      <content src="content/title.html"/>
    </navPoint>
    <navPoint id="navpoint-2" playOrder="2">
      <navLabel>
        <text>Dedication</text>
      </navLabel>
      <content src="content/dedication.html"/>
    </navPoint>
    <navPoint id="navpoint-3" playOrder="3">
      <navLabel>
        <text>List of Illustrations</text>
      </navLabel>
      <content src="content/list-illustrations.html"/>
    </navPoint>
    <navPoint id="navpoint-4" playOrder="4">
      <navLabel>
        <text>Start Reading</text>
      </navLabel>
      <content src="content/text.html"/>
    </navPoint>
    <navPoint id="navpoint-5" playOrder="5">
      <navLabel>
        <text>Credits</text>
      </navLabel>
      <content src="content/credits.html"/>
    </navPoint>
  </navMap>

Every navigation point (navPoint) has a unique navigation point id and a play order, with the label of that oint and the content source specified.

    <navPoint id="navpoint-5" playOrder="5">
      <navLabel>
        <text>Credits</text>
      </navLabel>
      <content src="content/credits.html"/>
    </navPoint>

Creating the Epub File

Epub is, in essence, nothing more than a ZIP archive of files. (Mobipocket is actually the same way, except that it uses a Palm-specific format.) You can create it with zip, but as mentioned somewhere before, you need to exclude extra file attributes. Epub depends on the mimetype file to be of the correct byte length—and also for this reason, the mimetype file must come before all the other ones.

If you have Info zip, on the command line run:

zip -Xr9D The_Velveteen_Rabbit.epub mimetype * -x .DS_Store

Options:

-X: Exclude extra file attributes (permissions, ownership, anything that adds extra bytes)

-r: Recurse into directories

-9: Better compression

-D: Don’t list directories as separate entries in the zip file

-x .DS_Store: Don’t include Mac OS X’s little hidden file of snapshots etc.

Velveteen Rabbit - Final Tree

About the Sample EPub

This is different from the main Velveteen Rabbit download offered on S∂. The main download includes extra files and metadata so that a Mobipocket book can be generated.

That’s another post for another day.

Print This Post Print This Post

Appendix: Catalog of Common HTML Problems

Common problems (with explanation of epubcheck error translation if non-straightforward):

XML and Namespace Declarations

  • Not including the <?xml...?> declaration at the top.

  • Not including xmlns="http://www.w3.org/1999/xhtml" attribute in the <html> tag.

    Epubcheck would say that none of your tags are supported by the “” namespace.

Break, Formatting, and Paragraph Tags

  • Break tags that look like <br> instead of <br />.

    Epubcheck would say that it finds an unexpected closing tag that wasn’t </br>.

  • Paragraph tags (<p>) that aren’t closed (</p>).

    Epubcheck will run into the next <p> and complain about it.

  • Break tags and text formatting tags (like <tt>, <i>, <em>, <b>, <strong>, <font>, etc.) that exist outside of paragraphs.

    Epubcheck will tell you they don’t belong there.

Lists

  • Tags that were formerly allowed to run free without a closing tag can do so no longer. This includes li, dt, dd.

  • Bare list item tags (li, dt, dd) are no longer allowed; they must be inside closing ol, ul, or dl tags.

  • Lists don’t belong inside paragraphs; they must be outside of them.

Headers

  • Headers don’t belong inside paragraphs; they must be outside of them.

Images

  • Image tags that don’t end with /> (rather than simply the angle bracket).

  • Images that don’t include the alt attribute.

Anchors, aka “Missing Fragments”

  • Using <a name="..."> tags instead of the id attribute when creating anchors. Example: instead of

    <a name="illus_christmas_morning"></a><p>...</p>

    Use instead

    <p id="illus_christmas_morning">...</p>

    When you want to link to the anchor (aka “fragment”), you still use

    <a href="text.html#illus_christmas_morning">Christmas Morning</a>

    either way.

    This error generates by far the weirdest epubcheck error, where it complains that fragments are missing.

    The id way is cleaner anyways.

  • An anchor is also considered missing if the HTML around it is invalid for some reason.

Print This Post Print This Post
  • del.icio.us
  • StumbleUpon
  • Google Bookmarks
  • Reddit
  • BlinkList
  • Twitter
  • Facebook
  • Digg
  • Yahoo! Bookmarks
  • Propeller
  • Sphinn
  • PDF
  • email