Bug 38586

Summary: support text rotation
Product: poppler Reporter: Joshua Richardson <joshuarbox-junk1>
Component: utilsAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: enhancement    
Priority: medium    
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: example : LARGE quotation needs to be rotated
another example needing rotation
shows rotation of LARGE quotation mark fixed
shows another example fixed
patch implements text rotation
patch changes output format to HTML5/XHTML5 compliant
patch to remove spurious spans and make IE happier
patch implements text rotation ; no whitespace deltas
patch changes output format to HTML5/XHTML5 compliant
patch to remove spurious spans and make IE happier
patch to address comment #18

Description Joshua Richardson 2011-06-22 20:54:20 UTC
To best support text rotation with equivalency to PDF, we need to output HTML5.  While at it, we might as well also make it compatible with XHTML5, which will also make it compatible with epub format.
Comment 1 Joshua Richardson 2011-06-22 20:55:11 UTC
Created attachment 48316 [details]
example : LARGE quotation needs to be rotated
Comment 2 Joshua Richardson 2011-06-22 20:56:19 UTC
Created attachment 48317 [details]
another example needing rotation
Comment 3 Joshua Richardson 2011-06-22 20:57:07 UTC
Created attachment 48318 [details]
shows rotation of LARGE quotation mark fixed
Comment 4 Joshua Richardson 2011-06-22 20:57:34 UTC
Created attachment 48319 [details]
shows another example fixed
Comment 5 Joshua Richardson 2011-06-22 21:00:25 UTC
Created attachment 48320 [details]
patch implements text rotation
Comment 6 Joshua Richardson 2011-06-22 21:02:10 UTC
Created attachment 48321 [details]
patch changes output format to HTML5/XHTML5 compliant

This patch allows the previous patch to work across more browsers.
Comment 7 Joshua Richardson 2011-06-22 21:03:34 UTC
The attached patches should resolve this enhancement.
Comment 8 Albert Astals Cid 2011-06-23 00:56:46 UTC
@@ -297,25 +298,17 @@ int HtmlFontAccu::AddFont(const HtmlFont& font){
 }
 
 // get CSS font name for font #i 
-GooString* HtmlFontAccu::getCSStyle(int i, GooString* content, int j){
+GooString* HtmlFontAccu::getCSStyle(GooString* content){
   GooString *tmp;
-  GooString *iStr=GooString::fromInt(i);
-  GooString *jStr=GooString::fromInt(j);
   
   if (!xml) {
-    tmp = new GooString("<span class=\"ft");
-    tmp->append(jStr);
-    tmp->append(iStr);
-    tmp->append("\">");
+    tmp = new GooString("<span>");
     tmp->append(content);
     tmp->append("</span>");
   } else {
     tmp = new GooString("");
     tmp->append(content);
   }
-
-  delete jStr;
-  delete iStr;
   return tmp;
 }

Any reason why you are deleting this perfectly valid feature?


-  if( !noframes )
-  {
-      GooString* pgNum=GooString::fromInt(page);
-      tmp = new GooString(DocName);
-      if (!singleHtml){
-            tmp->append('-')->append(pgNum)->append(".html");
-            pageFile = fopen(tmp->getCString(), "w");
-      } else {
-            tmp->append("-html")->append(".html");
-            pageFile = fopen(tmp->getCString(), "a");
-      }
-      delete pgNum;
-      if (!pageFile) {
-	  error(-1, "Couldn't open html file '%s'", tmp->getCString());
-	  delete tmp;
-	  return;
-      } 
+  if( !noframes ) {
+    GooString* pgNum=GooString::fromInt(page);
+    tmp = new GooString(DocName);
+    if (!singleHtml){
+      tmp->append('-')->append(pgNum)->append(".html");
+      pageFile = fopen(tmp->getCString(), "w");
+    } else {
+      tmp->append("-html")->append(".html");
+      pageFile = fopen(tmp->getCString(), "a");
+    }
+    delete pgNum;
+    if (!pageFile) {
+      error(-1, "Couldn't open html file '%s'", tmp->getCString());
+      delete tmp;
+      return 1;
+    }

Please do not reformat code.
Comment 9 Joshua Richardson 2011-06-24 10:38:18 UTC
Re comment #8:

The font formatting was moved up to the block-level element (the parent DIV), as required by later specs of HTML.  It works as it did before.  In fact, the Font-level function can be removed entirely, as it is just a no-op at this point and complicating the HTML and code.  I will submit a patch to do that.

Sorry for the reformatting.
Comment 10 Joshua Richardson 2011-06-24 10:43:26 UTC
Created attachment 48383 [details]
patch to remove spurious spans and make IE happier

I have removed the unneeded function, which removed the spans that are no longer needed.  I have also added some spacing for elements like <br/>.  IE needs a space to function properly:  "<br />"
Comment 11 Albert Astals Cid 2011-06-24 11:39:02 UTC
Can you please remove the reformating from the patches? Makes it much easier to review if i know the change is really a change and do not have to stress my eyes trying to find a change when there is none
Comment 12 Joshua Richardson 2011-06-28 13:57:26 UTC
Created attachment 48530 [details] [review]
patch implements text rotation ; no whitespace deltas
Comment 13 Joshua Richardson 2011-06-28 13:58:24 UTC
Created attachment 48531 [details] [review]
patch changes output format to HTML5/XHTML5 compliant
Comment 14 Joshua Richardson 2011-06-28 13:59:42 UTC
Created attachment 48532 [details] [review]
patch to remove spurious spans and make IE happier
Comment 15 Joshua Richardson 2011-06-28 14:09:11 UTC
Let me know if you need anything else.
Comment 16 Albert Astals Cid 2011-06-28 14:13:33 UTC
Thanks for the updated patches, i'll be quite busy until July 18th so i'm not sure i'll ge able to test them properly before that date, just saying so you know i am not ignoring you
Comment 17 Albert Astals Cid 2011-07-24 15:35:27 UTC
Some comments now that i have more time.

Please do not declare "namespace poppler" in popplerUtils.h since that is a namespace we reserve for the poppler cpp frontend.
Please do not use "using namespace ;" constructs since we do not use them anywhere in the utils code.

Please do not redefine the DEBUG macro or explain why you need to do it

I'd prefer you could remove the use of  
static GooString *gstr_buff0 = NULL; // a workspace in which I format strings
since makes me a bit worried

Also in 
-#define DOCTYPE "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"
-#define DOCTYPE_FRAMES "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Frameset//EN\"\n\"http://www.w3.org/TR/html4/frameset.dtd\">"
+//#define DOCTYPE "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">"
+//#define DOCTYPE_FRAMES "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Frameset//EN\"\n\"http://www.w3.org/TR/html4/frameset.dtd\">"
+#define DOCTYPE "<!DOCTYPE html>"
+#define DOCTYPE_FRAMES "<!DOCTYPE html>"
If we are not going to need the old definitions, please just delete them instead of commenting them

Also I'll need some pdf file showing the improvement (like the ones you used for your screen captures)
Comment 18 Joshua Richardson 2011-07-28 11:13:33 UTC
(In reply to comment #17)
> Some comments now that i have more time.
> 
> Please do not declare "namespace poppler" in popplerUtils.h since that is a
> namespace we reserve for the poppler cpp frontend.
> Please do not use "using namespace ;" constructs since we do not use them
> anywhere in the utils code.

Thanks for taking the time, Albert!

I think namespacing in general is a good practice, but I will go ahead and take it out and supply another patch.

> 
> Please do not redefine the DEBUG macro or explain why you need to do it

Sorry, Albert, I am unable to find another definition of this macro.  Can you point me to the relevant file where it is previously defined?

> I'd prefer you could remove the use of  
> static GooString *gstr_buff0 = NULL; // a workspace in which I format strings
> since makes me a bit worried

I have removed it in the "font extraction" patches, because there I extended the functionality of GooString.

> 
> Also in 
> -#define DOCTYPE "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01
> Transitional//EN\">"
> -#define DOCTYPE_FRAMES "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01
> Frameset//EN\"\n\"http://www.w3.org/TR/html4/frameset.dtd\">"
> +//#define DOCTYPE "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01
> Transitional//EN\">"
> +//#define DOCTYPE_FRAMES "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01
> Frameset//EN\"\n\"http://www.w3.org/TR/html4/frameset.dtd\">"
> +#define DOCTYPE "<!DOCTYPE html>"
> +#define DOCTYPE_FRAMES "<!DOCTYPE html>"
> If we are not going to need the old definitions, please just delete them
> instead of commenting them

No problem.  Will be included in the next patch.

> 
> Also I'll need some pdf file showing the improvement (like the ones you used
> for your screen captures)

I have a test PDF I think I can upload without a legal issue, since it's just an excerpt.  But just to be sure, I'll send it to you via email.
Comment 19 Joshua Richardson 2011-07-28 11:21:25 UTC
Created attachment 49681 [details] [review]
patch to address comment #18
Comment 20 Albert Astals Cid 2011-08-18 09:47:53 UTC
I've commited a somehow cleaned up version to master. Will be in poppler >= 0.17.3

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.