Bug 94173

Summary: return dates in UTC format
Product: poppler Reporter: Jakub Alba <jakubalba>
Component: generalAssignee: poppler-bugs <poppler-bugs>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: cloos, jakubalba
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: patch
patch
patch
glib: use UTC instead of local time in dates
Emulate some glibc functions
glib: use UTC instead of local time in dates v2
Emulate some glibc functions v2
pdfinfo: convert to local time zone
pdfinfo: add -isodates option
1/4: Emulate some glibc functions v3
2/4 glib: use UTC instead of local time in dates v3
3/4 pdfinfo: convert to local time zone v3
4/4 pdfinfo: add -isodates option v3
5: cpp: switched from detail::convert_date() to core's dateStringToTime()

Description Jakub Alba 2016-02-16 11:30:01 UTC
Created attachment 121781 [details]
patch

Yes I know that Bug 94035 is a twin bug and it would be nice not to double the code, but solutions to both of the bugs are a bit hacky and I think we should use an internal datetime structure and possibly something like Boost.DateTime in the cpp frontend and GDateTime in the glib frontend, because time_t is really one big invitation for bugs. But that would break the API, so in the meantime here is a hacky solution which doubles the code a bit.
Comment 1 Jakub Alba 2016-02-16 18:46:21 UTC
Created attachment 121788 [details] [review]
patch

Oops... There's been a mistake.
Comment 2 Albert Astals Cid 2016-02-16 23:04:23 UTC
setenv seems really bad for a library, is there no way to do that without changing the environment?
Comment 3 Adrian Johnson 2016-02-17 10:02:52 UTC
*** Bug 94035 has been marked as a duplicate of this bug. ***
Comment 4 Adrian Johnson 2016-02-17 10:09:08 UTC
Bug 94035 is the same issue. The one bug can be used to solve the issue for all the frontends.

I agree setenv is a bad idea. The timegm() function looks like it will do the job. On windows the equivalent function is _mkgmtime(). If we find platforms that don't have either of these functions we could use the timegm code at:

http://www.catb.org/esr/time-programming/#_mktime_3_timelocal_3_timegm_3
Comment 5 Jakub Alba 2016-02-17 13:30:16 UTC
Created attachment 121808 [details] [review]
patch

There is timegm(3), but it's a nonstandard GNU extension, so I've written another version of utcmktime (patch). Here is a proof that it's correct: https://github.com/yakubin/utcmktime
Comment 6 Jakub Alba 2016-02-17 13:32:03 UTC
Oh, I didn't refresh the page, so I didn't notice Adrian's reply.
I still think that my (last) patch is a better solution though.
Comment 7 Adrian Johnson 2016-02-17 20:43:38 UTC
Your code doesn't work. timegm() modifies the struct. Swap the utcmktime and timegm around (or just make a copy of the struct) and it fails.

I suggest using the public domain ESR code that is known to work correctly. Add a comment above the function saying
"Public domain code from http://www.catb.org/esr/time-programming/#_mktime_3_timelocal_3_timegm_3"
so we know where it came from.
Comment 8 Jakub Alba 2016-02-17 21:22:48 UTC
(In reply to Adrian Johnson from comment #7)
> Your code doesn't work. timegm() modifies the struct. Swap the utcmktime and
> timegm around (or just make a copy of the struct) and it fails.

Actually, utcmktime() works. I only made a mistake in the random struct tm generation. Fixed it. Check now.
Comment 9 Adrian Johnson 2016-02-19 10:13:20 UTC
Created attachment 121840 [details] [review]
glib: use UTC instead of local time in dates

I've found a simpler method of emulating timegm. I also prefer using the real timegm where available with a fallback to an emulated timegm.

The attached patch implements the timegm configure check and emulated fallback. I've only done the minimum to get glib to use it. I'll let you rework your patch around it to fix cpp.
Comment 10 Jakub Alba 2016-02-22 17:19:39 UTC
(In reply to Adrian Johnson from comment #9)
> Created attachment 121840 [details] [review] [review]
> glib: use UTC instead of local time in dates
> 
> I've found a simpler method of emulating timegm. I also prefer using the
> real timegm where available with a fallback to an emulated timegm.
> 
> The attached patch implements the timegm configure check and emulated
> fallback. I've only done the minimum to get glib to use it. I'll let you
> rework your patch around it to fix cpp.

Using #ifdefs seems ugly to me. But if Albert says this solution is better, than sure, I can write the cpp patch.
Comment 11 Albert Astals Cid 2016-02-23 00:08:13 UTC
I think Adrian's solution is better, let's use native library calls when possible, i don't like the #ifndef HAVE_TIMEGM in the "sometimes instlled" DateInfo.h should find a less public place to put it.

Also are you guys sure returning the time in UTC is the right thing instead of returnin time+timezone information?
Comment 12 Jakub Alba 2016-02-23 06:49:02 UTC
(In reply to Albert Astals Cid from comment #11)
> Also are you guys sure returning the time in UTC is the right thing instead
> of returnin time+timezone information?

It's what time_t is supposed to hold. When someone uses a frontend which returns the date as time_t (cpp & glib), they will expect it to be UTC. Then they may call localtime and the returned struct tm will be wrong, because the time_t is wrong. Same goes for gmtime, where they'd expect UTC, but get local time.

So that's for the part with time_t/mktime. My patch also takes the timezone info stored in the PDF into consideration. Poppler didn't use to do it, which could cause weird behavior, if someone saved a document in let's say UTC-6, sent it to a friend in a different timezone and they'd find out that the only difference in metadata dates they see comes from a bug concerning time_t/mktime.

For more information we would need an internal datetime struct which we would pass on to the frontends where it would be converted to QDateTime, GDateTime and I don't know what for cpp (because cpp/boost is such a wonderful way of writing C++...). It would break the API though, so I don't know if it's the best idea. Anyway, time_t is spartan by design. It isn't of much use when one wants to store timezone info.
Comment 13 Adrian Johnson 2016-02-23 11:25:44 UTC
Created attachment 121915 [details] [review]
Emulate some glibc functions

This creates a goo/glibc.cc for emulating non portable glibc functions. This allows as to use glibc functions in poppler without all the #ifdefs.

This patch adds gmtime_r and localtime_r emulation when not available. I'll look at adding windows versions of these functions next time I build poppler for windows.
Comment 14 Adrian Johnson 2016-02-23 11:27:07 UTC
Created attachment 121916 [details] [review]
glib: use UTC instead of local time in dates v2

This update to my previous patch moves the timegm code into goo/glib.cc.
Comment 15 Adrian Johnson 2016-02-23 11:45:10 UTC
(In reply to Albert Astals Cid from comment #11)
> Also are you guys sure returning the time in UTC is the right thing instead
> of returnin time+timezone information?

I did some testing with PDFs containing dates in different time zones.

Test PDF contains the time and zone: 09:25:03+2:00

Without the fix the displayed time is "09:25:03 ACDT". This is incorrect as it is ignoring the timezone in the PDF and showing only the time part. Even worse it is now claiming this time is in my timezone.

With the fix the displayed time is "17:55:03 ACDT". This is correct. The PDF time has been correctly converted to my timezone (+10:30).

If PDF viewers want to display "09:25:03+2" to the user we could add an additional API function that returns both the localtime (as stored in the PDF) and timezone.
Comment 16 Albert Astals Cid 2016-02-23 23:56:43 UTC
glib.cc comment says glib.h

Please include the "license gpl 2 or later" we have in some other files.

Do we really need to install glibc.h?
Comment 17 Adrian Johnson 2016-02-24 10:55:42 UTC
Created attachment 121942 [details] [review]
Emulate some glibc functions v2

Add GPL license and don't install glibc.h
Comment 18 Adrian Johnson 2016-02-24 10:57:24 UTC
Created attachment 121943 [details] [review]
pdfinfo: convert to local time zone

This is the fix for pdfinfo to use the timegm function.
Comment 19 Adrian Johnson 2016-02-24 11:02:52 UTC
Created attachment 121944 [details] [review]
pdfinfo: add -isodates option

It is useful to be able to view the dates including the original timezone using pdfinfo. The -rawdates option can provide this information but it is hard to read and in a non standard format that can not be parsed by other tools.

This patch adds the -isodates option that prints the dates in ISO-8601 format including the time zone.
Comment 20 Albert Astals Cid 2016-02-24 23:12:10 UTC
This is looking really good, the only thing i think could be improved (but i can convice to ignore or let it for later) is that on a file with date

D:20050131102149-08'00'

i get

CreationDate:   Mon Jan 31 19:21:49 2005 CET

when it'd probably be nicer if i got

CreationDate:   Mon Jan 31 10:21:49 2005 WhateverTimeZone-8Is

Do you think that's doable?
Comment 21 Adrian Johnson 2016-02-25 00:21:02 UTC
(In reply to Albert Astals Cid from comment #20)
> This is looking really good, the only thing i think could be improved (but i
> can convice to ignore or let it for later) is that on a file with date
> 
> D:20050131102149-08'00'
> 
> i get
> 
> CreationDate:   Mon Jan 31 19:21:49 2005 CET
> 
> when it'd probably be nicer if i got
> 
> CreationDate:   Mon Jan 31 10:21:49 2005 WhateverTimeZone-8Is
> 
> Do you think that's doable?

I'm not sure what you mean by "WhateverTimeZone-8Is". We don't know the time zone that the PDF was created in, only the time offset from UTC. There may be multiple timezones that map to the same offset. 

Displaying the timezone offset is doable as unlike the front ends where the API is limited to providing a single time_t value, pdfinfo can get the time zone value. However I think it is more useful to convert the time to the local time zone. It is more useful to me if for example an PDF file is emailed to be at 12:00pm, I open it and see the creation date is 11:30am, to immediately know that is was created 30 minutes earlier than if the displayed creation date was 01:00+04.

It is useful to be able to see the time zone information in the PDF which is why I added the -isodates option so you will see 2016-02-25T01:00+04.

The third possibility, which seems to be the one you are suggesting, is to convert the UTC+timezone time in the PDF to the local time of the creator. ie 4:00am. I don't think this is useful. Showing times in the creators time zone is only useful to someone in that time zone. To me it is just some random time. It is not showing the time in a format that I can compare to other times in my timezone.

We could add an option for displaying times in the creators localtime. But then we have the problem of what to do if the creation date and modification dates are in different timezones.
Comment 22 Albert Astals Cid 2016-02-25 00:37:18 UTC
(In reply to Adrian Johnson from comment #21)
> I'm not sure what you mean by "WhateverTimeZone-8Is". We don't know the time
> zone that the PDF was created in, only the time offset from UTC. There may
> be multiple timezones that map to the same offset. 

I meant "Mon Jan 31 10:21:49 2005 PST"

> Displaying the timezone offset is doable as unlike the front ends where the
> API is limited to providing a single time_t value, pdfinfo can get the time
> zone value. However I think it is more useful to convert the time to the
> local time zone. It is more useful to me if for example an PDF file is
> emailed to be at 12:00pm, I open it and see the creation date is 11:30am, to
> immediately know that is was created 30 minutes earlier than if the
> displayed creation date was 01:00+04.

I guess you have a point.

> It is useful to be able to see the time zone information in the PDF which is
> why I added the -isodates option so you will see 2016-02-25T01:00+04.
> 
> The third possibility, which seems to be the one you are suggesting, is to
> convert the UTC+timezone time in the PDF to the local time of the creator.
> ie 4:00am. I don't think this is useful. Showing times in the creators time
> zone is only useful to someone in that time zone. To me it is just some
> random time. It is not showing the time in a format that I can compare to
> other times in my timezone.

No no, that seems messy.


Ok, so I guess you have convinced me, let's give Jakub some time to comment about what he thinks of the patches.
Comment 23 James Cloos 2016-02-25 15:46:44 UTC
What would be most useful for pdfinfo would be to print both the original time (using the numeric zone, ie %z) as well as localtime (using %Z for the zone).  

My own pref would be to match "%F %T%z"’s output and then use "%F %T %Z" for the localtime.

Doubling up like that works well and given better intel.
Comment 24 Jakub Alba 2016-05-15 17:55:40 UTC
(In reply to Albert Astals Cid from comment #22)
> Ok, so I guess you have convinced me, let's give Jakub some time to comment
> about what he thinks of the patches.

Sorry, I've been quiet occupied recently.

I'd prefer if the code of poppler_date_parse() was somehow moved to core and used by glib and cpp, because now it does the same thing as detail::convert_date from cpp. This way we could avoid doubling the effort put into fixing/changing the way time conversion works. I think these two functions should stay (especially since one (overloaded) version of convert_date is exported as public API), but they should call a function from core which would do all the work. But this is of course open for discussion.

Other than that, everything seems ok to me.
Comment 25 Jakub Alba 2016-05-18 12:43:48 UTC
Oh, I've just realised we actually have such a function in core. pdfTimeToInteger() in poppler/DateInfo.cc. So this function should be fixed just as poppler_date_parse(). And then glib's poppler_date_parse() and cpp's detail::convert_date() should just call pdfTimeToInteger(). Or we could remove glib's poppler_date_parse() altogether (it is used in just 2 places) and leave cpp's detail::convert_date() just for converting time_t to time_type. I can do that if Adrian doesn't want to. (It would replace the glib patch.)
Comment 26 Adrian Johnson 2016-05-18 21:16:55 UTC
(In reply to Jakub Kucharski from comment #25)
> Oh, I've just realised we actually have such a function in core.
> pdfTimeToInteger() in poppler/DateInfo.cc.

That function didn't exist when I developed my patch. It was recently merged in from the digital signature branch.

I'll update my glib patch to move the conversion into this function.
Comment 27 Adrian Johnson 2016-05-19 13:16:31 UTC
Created attachment 123909 [details] [review]
1/4: Emulate some glibc functions v3
Comment 28 Adrian Johnson 2016-05-19 13:17:10 UTC
Created attachment 123910 [details] [review]
2/4 glib: use UTC instead of local time in dates v3
Comment 29 Adrian Johnson 2016-05-19 13:17:36 UTC
Created attachment 123911 [details] [review]
3/4 pdfinfo: convert to local time zone v3
Comment 30 Adrian Johnson 2016-05-19 13:18:00 UTC
Created attachment 123912 [details] [review]
4/4 pdfinfo: add -isodates option v3
Comment 31 Adrian Johnson 2016-05-19 13:19:38 UTC
I've rebased my patches and moved the pdf date to time_t conversion in glib/poppler-date.cc to DataInfo.cc.
Comment 32 Jakub Alba 2016-05-19 14:13:01 UTC
Created attachment 123914 [details] [review]
5: cpp: switched from detail::convert_date() to core's dateStringToTime()

And here comes the cpp patch.
Comment 33 Carlos Garcia Campos 2016-05-22 08:46:05 UTC
Patches look good to me.
Comment 34 Adrian Johnson 2016-06-06 10:38:47 UTC
Pushed

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.