<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Perfecting (Simple) PDF Conversion to EPub and Mobipocket</title>
	<atom:link href="http://www.spontaneousderivation.com/2009/01/04/perfecting-simple-pdf-conversion-to-epub-and-mobipocket/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.spontaneousderivation.com/2009/01/04/perfecting-simple-pdf-conversion-to-epub-and-mobipocket/</link>
	<description>I dream, too. Mostly about fish.</description>
	<lastBuildDate>Tue, 16 Mar 2010 18:50:37 +0000</lastBuildDate>
	
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Arachne Jericho</title>
		<link>http://www.spontaneousderivation.com/2009/01/04/perfecting-simple-pdf-conversion-to-epub-and-mobipocket/comment-page-1/#comment-2129</link>
		<dc:creator>Arachne Jericho</dc:creator>
		<pubDate>Sun, 04 Jan 2009 21:20:26 +0000</pubDate>
		<guid isPermaLink="false">http://www.spontaneousderivation.com/?p=3781#comment-2129</guid>
		<description>Wasaty, 

I can believe that.  I&#039;ve tangled with PDF for years, and back when I was in college (or even just a few years ago), free tools weren&#039;t very good. 

Though life is easier these days with pdf2xml on the outskirts and scripting languages that can parse XML; the opportunity for a free tool that can parse out commercial-quality text is much higher.  It&#039;s a matter of brutally applying heuristics, after all....</description>
		<content:encoded><![CDATA[<p>Wasaty, </p>
<p>I can believe that.  I&#8217;ve tangled with PDF for years, and back when I was in college (or even just a few years ago), free tools weren&#8217;t very good. </p>
<p>Though life is easier these days with pdf2xml on the outskirts and scripting languages that can parse XML; the opportunity for a free tool that can parse out commercial-quality text is much higher.  It&#8217;s a matter of brutally applying heuristics, after all&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wasaty</title>
		<link>http://www.spontaneousderivation.com/2009/01/04/perfecting-simple-pdf-conversion-to-epub-and-mobipocket/comment-page-1/#comment-2128</link>
		<dc:creator>Wasaty</dc:creator>
		<pubDate>Sun, 04 Jan 2009 21:11:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.spontaneousderivation.com/?p=3781#comment-2128</guid>
		<description>There was a time when I fought many battles against a powerful enemy - the PDF files. (Sorry, too much Stargate and my favorite Teal&#039;c.)
I think that I&#039;ve worked with every free pdf-to-text extraction utility and while there are some useful ones, nothing really beats commercial tools if you want to retain both paragraph structure and some formatting, like italics and bold text - sometimes my clients expects me to translate contents of the pdf files and deliver translation with the exact formatting of the original.
And thanks for the ruby link.</description>
		<content:encoded><![CDATA[<p>There was a time when I fought many battles against a powerful enemy &#8211; the PDF files. (Sorry, too much Stargate and my favorite Teal&#8217;c.)<br />
I think that I&#8217;ve worked with every free pdf-to-text extraction utility and while there are some useful ones, nothing really beats commercial tools if you want to retain both paragraph structure and some formatting, like italics and bold text &#8211; sometimes my clients expects me to translate contents of the pdf files and deliver translation with the exact formatting of the original.<br />
And thanks for the ruby link.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arachne Jericho</title>
		<link>http://www.spontaneousderivation.com/2009/01/04/perfecting-simple-pdf-conversion-to-epub-and-mobipocket/comment-page-1/#comment-2127</link>
		<dc:creator>Arachne Jericho</dc:creator>
		<pubDate>Sun, 04 Jan 2009 20:18:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.spontaneousderivation.com/?p=3781#comment-2127</guid>
		<description>Hi Wasaty, 

I was thinking more of reflowable HTML text.  It&#039;s really easy to convert PDF straight to a presentation-style HTML that has no regard for text structure---see the free &lt;a href=&quot;http://pdftohtml.sourceforge.net/&quot; rel=&quot;nofollow&quot;&gt;pdftohtml&lt;/a&gt;, which does an incredibly good job of this. 

Ruby has a Windows implementation you can download and play around with.  &lt;a href=&quot;http://www.ruby-lang.org/en/downloads/&quot; rel=&quot;nofollow&quot;&gt;There&#039;s a one-click installer and everything&lt;/a&gt;.  

For UTF/Unicode, Ruby handles such encodings out of the box---in fact, it was developed in Japan initially, so it&#039;s one of the few scripting languages that was prepared from the start.  :)</description>
		<content:encoded><![CDATA[<p>Hi Wasaty, </p>
<p>I was thinking more of reflowable HTML text.  It&#8217;s really easy to convert PDF straight to a presentation-style HTML that has no regard for text structure&#8212;see the free <a href="http://pdftohtml.sourceforge.net/" rel="nofollow">pdftohtml</a>, which does an incredibly good job of this. </p>
<p>Ruby has a Windows implementation you can download and play around with.  <a href="http://www.ruby-lang.org/en/downloads/" rel="nofollow">There&#8217;s a one-click installer and everything</a>.  </p>
<p>For UTF/Unicode, Ruby handles such encodings out of the box&#8212;in fact, it was developed in Japan initially, so it&#8217;s one of the few scripting languages that was prepared from the start.  :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Wasaty</title>
		<link>http://www.spontaneousderivation.com/2009/01/04/perfecting-simple-pdf-conversion-to-epub-and-mobipocket/comment-page-1/#comment-2126</link>
		<dc:creator>Wasaty</dc:creator>
		<pubDate>Sun, 04 Jan 2009 19:04:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.spontaneousderivation.com/?p=3781#comment-2126</guid>
		<description>Of course you could also use one of the commercial programs to convert pdf into readable text - I happen to have Iceni&#039;s Gemini, which does quite a good job here - but I admit it&#039;s quite expensive and I wouldn&#039;t invest in this if it wasn&#039;t needed for my job.
Keep up the good work and maybe you could release a Windows executable one day? I&#039;m bounded to that platform by the software I have to use and lately I become to lazy to boot my secret Linux partition.
BTW, how does your script handle the non-English encodings, like UTF or others?</description>
		<content:encoded><![CDATA[<p>Of course you could also use one of the commercial programs to convert pdf into readable text &#8211; I happen to have Iceni&#8217;s Gemini, which does quite a good job here &#8211; but I admit it&#8217;s quite expensive and I wouldn&#8217;t invest in this if it wasn&#8217;t needed for my job.<br />
Keep up the good work and maybe you could release a Windows executable one day? I&#8217;m bounded to that platform by the software I have to use and lately I become to lazy to boot my secret Linux partition.<br />
BTW, how does your script handle the non-English encodings, like UTF or others?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
