Tag Archive: epub

Stanza EPub Update for Thoughtcrime Experiments, Shadow Unit Season 2 Extras

There was a wee coding flummox in the book source that resulted in the table of contents not being interpreted correctly by the popular Stanza iPhone ebook reader. Those have now been fixed.

You can find these downloads and more on the Stanza Library Page1

  1. Which now also has a shorter and friendlier URL. You know. Except that you still have to type out ’spontaneous derivation’ on your itty bitty keyboards. They don’t autocomplete well, either. Sorry about that. []

To Digital in a Day: Act III

Sat 9:30 PM

This book needs a cover; that would have been a nice tutorial on Covers for All Book Formats More or Less, but that’s a blog post for another day.

Time for Mobipocket.

mobigen run on the Epub file finishes in a minute. File checks out.

Sat 9:32 PM

Wondering what format to do next.

Oh yes, PDF. Which means a detour via html2ps and then Ghostscript’s ps2pdf.

This is a little more complicated.

Especially since html2ps is segfaulting for some reason on my Mac.

Sat 10:02 PM

Removed incidental cause of the seg fault, will be fixing it for real tomorrow or sommat.

Or… not. html2ps doesn’t deal with raw utf8; it lives on ISO encoding. Like the rest of perl. No Ruby (defaults to utf8) equivalent around.

Sat 10:24 PM

The other option I know of, wkpdf, needs a consolidated HTML file.

So… let’s take the worked original file, and add anchors. MacVim again!

This time I’m running short perl filters against the text directly in the editor, then using grep to grab the anchors and more regular expression/replace to create the links.

Incidentally, this also gives us the one-file HTML version with a table of contents.

Sat 11:09 PM

Well, I ended up enabling the web server with PHP5, increasing memory limits and PCRE backlimit, to try to run html2pdf, the PHP version. But most of my problems stem from the document being too large.

At this point, I can already generate every other format. Except for PDF with hyperlinks (PDF without such links I can do). Which has always been a bit of an Achilles heel for me.

By the way, apparently the Mac these days comes with textutil, so I might have been able to save a bit of time earlier in Act I. Hey, it can convert to… Word document, Open Office Document Text, …. hang on….

Nah, doesn’t preserve inter-text hyperlinks.

Sat 11:20 PM

Let’s just get all the other mobile formats out of the way.

Using calibre’s any2lrf, we get a valid Sony Reader file in under two minutes.

Using calibre’s oeb2lit, we get a valid Microsoft Reader file in under two minutes.

It takes me longer to type all this down for you and to locate my bookmark to the calibre site, actually.

Sat 11:29 PM

The state of affairs:

  • Valid Epub.
  • Valid Mobipocket (MOBI).
  • (Really) Valid HTML with linkage.
  • Valid Sony Reader.
  • Valid Microsoft Reader.

All in one day. Actually, counting up the time, less than one day.

To Digital in a Day: Act II

Sat 8:27 PM

Rested off some dizziness and decided to pick this up again so that there’s at least a book for my Kindle.

The HTML looks pretty good. It’s at the state where someone could email it to their Kindle’s email account and have it converted fairly well. It lacks a table of contents, though.

But first, let’s do the Epub.

Sat 8:30 PM

The annoying thing about Epub: most Epub readers don’t deal well when the source contains a very large HTML file. Other mobile formats are internally broken down into separate records, to be loaded in small chunks so as not to stress memory. But many Epub readers try to load the entire file in one go, including Adobe Digital Editions.

So the first task is to break this file up into multiple files. I usually do one file per chapter, but my concern is that the chapters are so large (about 20k words each) but we’ll have to see what happens.

First, breaking out the stylesheet into its own style.css file, which can then be linked to by each chapter html file.

<link rel="stylesheet" type="text/css" href="style.css" />

Now I write a ruby script to chop the file up, because I’m just that way. You could do this in perl or python as well, but I prefer ruby.

#!/usr/bin/ruby

#
# Opens a new file handle to a file with the given
# filename (no suffix), writing the initial header for
# the file, also using the given title in the <head> section.
#
def open_section(name, title)
    section = File.new("#{name}.html", "w")
    section.puts <<-END
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
   "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>#{title}</title>
  <link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
    END

    return section
end

#
# Closes the given file handle after writing the HTML footer.
#
def close_section(section)
    section.puts <<-END
</body>
</html>
    END
    section.close
end

File.open("TextilePlanet-te.html") do |file|

    current_section = nil
    section_num = 1
    state = :in_head

    file.each do |line|

        case state
        when :in_head
            if line =~ %r|<body|
                state = :in_body
                current_section = open_section('section-00', 'Title')
            end
        when :in_body
            if line =~ %r|<h2 class="chapter"><b>([^<]+)</b></h2>|
                title = $1
                if current_section
                    close_section(current_section)
                end
                current_section = open_section("section-%02d" % section_num, title)
                section_num += 1
            end

            current_section.puts line
        end
    end

    close_section(current_section)
end

The end result is 11 files, from sectin-00.html to section-10.html.

I check the various HTML files in Firefox to make sure we’ve got everything. Everything is quite nicely split; section-00.html has the title page, section-{01-08}.html each contain an entire chapter, section-09.html contains the author bio, and section-10.html the copyright page.

We need an explicit ToC file for Mobipocket. (Epub doesn’t need it, but I’m working from the same source for all the formats.)

Sat 8:55 PM

I could write the ToC file by hand… or generate it with a script… or just use MacVim and shell commands and some writing by hand.

I do that.

Sat 9:00 PM

Moving the non-toc, non-section, non-stylesheet, non-image files to another directory.

Now using Ruby Epub tools in the root of the work directory. Also, I’ve discovered that an old friend of mine, HTML Tidy, still exists. Big help for finding illegal character sequences and replacing them appropriately.

% epub add-to-opf . content/*
% vim metadata.opf   # (reorder spine)
% epub add-guide . content/toc.html \
      --type "toc" \
      --title "Table of Contents"
% epub add-guide . content/section-01.html \
      --type "text" \
      --title "Start Reading"
% epub add-to-ncx . content/section*
% epub add-to-ncx . content/toc.html
% epub compile .
% epub compile .
  adding: mimetype (stored 0%)
  adding: META-INF/container.xml (deflated 35%)
  adding: metadata.opf (deflated 77%)
  adding: content/style.css (deflated 59%)
  adding: content/section-10.html (deflated 55%)
  adding: content/section-09.html (deflated 33%)
  adding: content/section-02.html (deflated 63%)
  adding: content/section-03.html (deflated 62%)
  adding: content/section-01.html (deflated 62%)
  adding: content/section-06.html (deflated 63%)
  adding: content/section-07.html (deflated 53%)
  adding: content/section-04.html (deflated 62%)
  adding: content/bookview-logo.png (stored 0%)
  adding: content/section-05.html (deflated 63%)
  adding: toc.ncx (deflated 79%)
  adding: content/toc.html (deflated 59%)
  adding: content/section-00.html (deflated 32%)
  adding: content/section-08.html (deflated 33%)
% ~/Software/ebooks/epub/epubcheck *epub
No errors or warnings detected
Sat 9:18 PM

Test reading it successfully in Adobe Digital Editions, with a table of contents and a proper NCX.

Ladies and gentlemen, as of 9:18 PM we have valid Epub. And a blog entry shortly.

Shadow Unit: Episode 2×01 eBootleg

cover-2x01-blog

Yes. It is here.

Update! Ask and ye shall sometimes receive, if it is within my power. A PDF version is now available, with clicky links and everything. Please see Locations below.

Update #2! No one asked, but I created an LRF file for older Sony Readers that can’t read ePub files. I don’t have a Sony Reader, though, so I’m not sure how it turns out; I think the formatting might actually look different than the corresponding ePub, even though they’re all from the same source files….

Note: Remember: these are an unofficial—if legal—conversion. Both the original and these conversions are licensed under a Creative Commons Attribution-Noncommercial 3.0 license.

Schedule: I plan on doing the episodes as individual files, with “up to now” season compilations as a separate, updated ebook file. Eventually the individual episodes will expire (but that won’t be until season end) and there’ll just be the Season 2 Ebook.

Exciting Features:

  • Episode 1 in mobile reading form.
  • Pretty cover for your ebook device/software.
  • Typographical quotes and dashes!
  • All episode-specific easter eggs and extras included.

Locations:

If you own a Kindle or other device/program that can read Mobipocket files (.MOBI):
  Shadow Unit - 2x01 - Lucky Day (Kindle/Mobipocket) (153.9 KiB, 438 hits)

If you own a Sony PRS-505 or later, Adobe Digital Editions, or some other device/program that can read EPub files (.EPUB):
  Shadow Unit - 2x01 - Lucky Day (Epub) (344.3 KiB, 287 hits)

If you own anything that can read a PDF file (bonus if it can deal with links in a PDF file):
  Shadow Unit - 2x01 - Lucky Day (PDF) (135.5 KiB, 313 hits)

If you own an older Sony that cannot read EPub files and don’t want to use the PDF above1:
  Shadow Unit - 2x01 - Lucky Day (Sony) (157.6 KiB, 276 hits)

Enjoy!

Need Season 1?

It’s currently only available for the Kindle, but can be converted via calibre for Sony. (It’s a huge file, relatively speaking, for an ebook).

  Shadow Unit: Season 1: Kindle/Mobipocket (1.5 MiB, 1,509 hits)

Screenshots!

Below the cut.

Click here to read more »

  1. It’s not form-factored for a small screen, still need to figure out how to do that with the tools I have on hand. []

RubyEpub Tools (ruby-epub) 0.0.2 Released

Added the ‘add-to-ncx’ operation on the epub script, and removed the creation of the template HTML file, which just got in the way.

See the ruby-epub GoogleCode page.

RubyEpub Tools (ruby-epub) 0.0.1 Released

Right now this is a very minimal bundle of functionality. Basically it’s my create/add-buncha-files/compile script, and not much else. It’ll work on Mac OS X and any Unix. Windows is, on the other hand, special. I don’t have a Windows box, so I don’t know.

On the ruby-epub Google Code page is a featured download (ruby-epub-0.0.1.gem) and a featured wiki page (“Installing”), and also the road map (more a check list) prominently displayed.

Added Quick Reference: EBookery Workflow for Various Formats

I added a new quick reference, which is the workflow I currently use to produce two formats (alongside with what might be considered a by-product, HTML): Epub and Mobipocket.

EBookery Workflow for Various Formats

It’s very much geared towards someone with a Mac; however, much of it is easily adaptable to a Windows or even Linux machine, since many of the tools mentioned are either cross-platform or can be run under CrossOver.

The RubyEpub tools mentioned from time to time aren’t necessary, but they make life easier. Right now to get at them you need Subversion knowledge. At some point I want to make them a standard release in one fashion or another, probably as a Ruby gem.

Perfecting (Simple) PDF Conversion to EPub and Mobipocket

pdf-icon

Problem: Convert PDF to reflowable text, preferably HTML.

Why: This is because text that reflows based on the size of the screen, or the size of the font, or the length of a page (or indeed, without the concept of a page) is what suits mobile ebook readers, with smaller screens, best.

Apologies for two Geekery posts in a row. The rest of the discussion is under the fold.

Click here to read more »

LRF to HTML: The Rough Guide

As of this writing, calibre, which can convert many things from one format to another featuring command-line tools, does not convert LRF to HTML, or indeed, to most anything else other than LRS, an XML format. Currently this is not a high-priority item to fix in calibre itself, because calibre is aimed at converting things to LRF. (The ePub conversion is still relatively new and shiny.)

ETA: Here’s the LRS specification.

So. Heck. Why not. I’m using Ruby, by the way, because Ruby has the kick-ass REXML library, which also forms the cornerstone for my ruby-epub stuff (still in the making).

Geekery after the cut.

Click here to read more »

Constitution of the United States of America on Your Kindle/Reader

All because I’m still on a West Wing jag and I drank a bottle of this about eight hours ago and I’m still hyper. So I started this (well, yesterday apparently) evening and ended just now.

Here they are:

  United States Constitution [EPUB] (460.2 KiB, 492 hits)
  United States Constitution [Kindle/Mobipocket] (362.9 KiB, 541 hits)

Below the cut is a gallery of images on the Kindle, wherein I also describe and demonstrate the highlighting and note-taking features of the Kindle. As the movie reviewer with his or her notepad, so I with my ebook reader.

Click here to read more »