Summary: | return dates in UTC format | ||
---|---|---|---|
Product: | poppler | Reporter: | Jakub Alba <jakubalba> |
Component: | general | Assignee: | poppler-bugs <poppler-bugs> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | cloos, jakubalba |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
patch
patch patch glib: use UTC instead of local time in dates Emulate some glibc functions glib: use UTC instead of local time in dates v2 Emulate some glibc functions v2 pdfinfo: convert to local time zone pdfinfo: add -isodates option 1/4: Emulate some glibc functions v3 2/4 glib: use UTC instead of local time in dates v3 3/4 pdfinfo: convert to local time zone v3 4/4 pdfinfo: add -isodates option v3 5: cpp: switched from detail::convert_date() to core's dateStringToTime() |
Description
Jakub Alba
2016-02-16 11:30:01 UTC
Created attachment 121788 [details] [review] patch Oops... There's been a mistake. setenv seems really bad for a library, is there no way to do that without changing the environment? *** Bug 94035 has been marked as a duplicate of this bug. *** Bug 94035 is the same issue. The one bug can be used to solve the issue for all the frontends. I agree setenv is a bad idea. The timegm() function looks like it will do the job. On windows the equivalent function is _mkgmtime(). If we find platforms that don't have either of these functions we could use the timegm code at: http://www.catb.org/esr/time-programming/#_mktime_3_timelocal_3_timegm_3 Created attachment 121808 [details] [review] patch There is timegm(3), but it's a nonstandard GNU extension, so I've written another version of utcmktime (patch). Here is a proof that it's correct: https://github.com/yakubin/utcmktime Oh, I didn't refresh the page, so I didn't notice Adrian's reply. I still think that my (last) patch is a better solution though. Your code doesn't work. timegm() modifies the struct. Swap the utcmktime and timegm around (or just make a copy of the struct) and it fails. I suggest using the public domain ESR code that is known to work correctly. Add a comment above the function saying "Public domain code from http://www.catb.org/esr/time-programming/#_mktime_3_timelocal_3_timegm_3" so we know where it came from. (In reply to Adrian Johnson from comment #7) > Your code doesn't work. timegm() modifies the struct. Swap the utcmktime and > timegm around (or just make a copy of the struct) and it fails. Actually, utcmktime() works. I only made a mistake in the random struct tm generation. Fixed it. Check now. Created attachment 121840 [details] [review] glib: use UTC instead of local time in dates I've found a simpler method of emulating timegm. I also prefer using the real timegm where available with a fallback to an emulated timegm. The attached patch implements the timegm configure check and emulated fallback. I've only done the minimum to get glib to use it. I'll let you rework your patch around it to fix cpp. (In reply to Adrian Johnson from comment #9) > Created attachment 121840 [details] [review] [review] > glib: use UTC instead of local time in dates > > I've found a simpler method of emulating timegm. I also prefer using the > real timegm where available with a fallback to an emulated timegm. > > The attached patch implements the timegm configure check and emulated > fallback. I've only done the minimum to get glib to use it. I'll let you > rework your patch around it to fix cpp. Using #ifdefs seems ugly to me. But if Albert says this solution is better, than sure, I can write the cpp patch. I think Adrian's solution is better, let's use native library calls when possible, i don't like the #ifndef HAVE_TIMEGM in the "sometimes instlled" DateInfo.h should find a less public place to put it. Also are you guys sure returning the time in UTC is the right thing instead of returnin time+timezone information? (In reply to Albert Astals Cid from comment #11) > Also are you guys sure returning the time in UTC is the right thing instead > of returnin time+timezone information? It's what time_t is supposed to hold. When someone uses a frontend which returns the date as time_t (cpp & glib), they will expect it to be UTC. Then they may call localtime and the returned struct tm will be wrong, because the time_t is wrong. Same goes for gmtime, where they'd expect UTC, but get local time. So that's for the part with time_t/mktime. My patch also takes the timezone info stored in the PDF into consideration. Poppler didn't use to do it, which could cause weird behavior, if someone saved a document in let's say UTC-6, sent it to a friend in a different timezone and they'd find out that the only difference in metadata dates they see comes from a bug concerning time_t/mktime. For more information we would need an internal datetime struct which we would pass on to the frontends where it would be converted to QDateTime, GDateTime and I don't know what for cpp (because cpp/boost is such a wonderful way of writing C++...). It would break the API though, so I don't know if it's the best idea. Anyway, time_t is spartan by design. It isn't of much use when one wants to store timezone info. Created attachment 121915 [details] [review] Emulate some glibc functions This creates a goo/glibc.cc for emulating non portable glibc functions. This allows as to use glibc functions in poppler without all the #ifdefs. This patch adds gmtime_r and localtime_r emulation when not available. I'll look at adding windows versions of these functions next time I build poppler for windows. Created attachment 121916 [details] [review] glib: use UTC instead of local time in dates v2 This update to my previous patch moves the timegm code into goo/glib.cc. (In reply to Albert Astals Cid from comment #11) > Also are you guys sure returning the time in UTC is the right thing instead > of returnin time+timezone information? I did some testing with PDFs containing dates in different time zones. Test PDF contains the time and zone: 09:25:03+2:00 Without the fix the displayed time is "09:25:03 ACDT". This is incorrect as it is ignoring the timezone in the PDF and showing only the time part. Even worse it is now claiming this time is in my timezone. With the fix the displayed time is "17:55:03 ACDT". This is correct. The PDF time has been correctly converted to my timezone (+10:30). If PDF viewers want to display "09:25:03+2" to the user we could add an additional API function that returns both the localtime (as stored in the PDF) and timezone. glib.cc comment says glib.h Please include the "license gpl 2 or later" we have in some other files. Do we really need to install glibc.h? Created attachment 121942 [details] [review] Emulate some glibc functions v2 Add GPL license and don't install glibc.h Created attachment 121943 [details] [review] pdfinfo: convert to local time zone This is the fix for pdfinfo to use the timegm function. Created attachment 121944 [details] [review] pdfinfo: add -isodates option It is useful to be able to view the dates including the original timezone using pdfinfo. The -rawdates option can provide this information but it is hard to read and in a non standard format that can not be parsed by other tools. This patch adds the -isodates option that prints the dates in ISO-8601 format including the time zone. This is looking really good, the only thing i think could be improved (but i can convice to ignore or let it for later) is that on a file with date D:20050131102149-08'00' i get CreationDate: Mon Jan 31 19:21:49 2005 CET when it'd probably be nicer if i got CreationDate: Mon Jan 31 10:21:49 2005 WhateverTimeZone-8Is Do you think that's doable? (In reply to Albert Astals Cid from comment #20) > This is looking really good, the only thing i think could be improved (but i > can convice to ignore or let it for later) is that on a file with date > > D:20050131102149-08'00' > > i get > > CreationDate: Mon Jan 31 19:21:49 2005 CET > > when it'd probably be nicer if i got > > CreationDate: Mon Jan 31 10:21:49 2005 WhateverTimeZone-8Is > > Do you think that's doable? I'm not sure what you mean by "WhateverTimeZone-8Is". We don't know the time zone that the PDF was created in, only the time offset from UTC. There may be multiple timezones that map to the same offset. Displaying the timezone offset is doable as unlike the front ends where the API is limited to providing a single time_t value, pdfinfo can get the time zone value. However I think it is more useful to convert the time to the local time zone. It is more useful to me if for example an PDF file is emailed to be at 12:00pm, I open it and see the creation date is 11:30am, to immediately know that is was created 30 minutes earlier than if the displayed creation date was 01:00+04. It is useful to be able to see the time zone information in the PDF which is why I added the -isodates option so you will see 2016-02-25T01:00+04. The third possibility, which seems to be the one you are suggesting, is to convert the UTC+timezone time in the PDF to the local time of the creator. ie 4:00am. I don't think this is useful. Showing times in the creators time zone is only useful to someone in that time zone. To me it is just some random time. It is not showing the time in a format that I can compare to other times in my timezone. We could add an option for displaying times in the creators localtime. But then we have the problem of what to do if the creation date and modification dates are in different timezones. (In reply to Adrian Johnson from comment #21) > I'm not sure what you mean by "WhateverTimeZone-8Is". We don't know the time > zone that the PDF was created in, only the time offset from UTC. There may > be multiple timezones that map to the same offset. I meant "Mon Jan 31 10:21:49 2005 PST" > Displaying the timezone offset is doable as unlike the front ends where the > API is limited to providing a single time_t value, pdfinfo can get the time > zone value. However I think it is more useful to convert the time to the > local time zone. It is more useful to me if for example an PDF file is > emailed to be at 12:00pm, I open it and see the creation date is 11:30am, to > immediately know that is was created 30 minutes earlier than if the > displayed creation date was 01:00+04. I guess you have a point. > It is useful to be able to see the time zone information in the PDF which is > why I added the -isodates option so you will see 2016-02-25T01:00+04. > > The third possibility, which seems to be the one you are suggesting, is to > convert the UTC+timezone time in the PDF to the local time of the creator. > ie 4:00am. I don't think this is useful. Showing times in the creators time > zone is only useful to someone in that time zone. To me it is just some > random time. It is not showing the time in a format that I can compare to > other times in my timezone. No no, that seems messy. Ok, so I guess you have convinced me, let's give Jakub some time to comment about what he thinks of the patches. What would be most useful for pdfinfo would be to print both the original time (using the numeric zone, ie %z) as well as localtime (using %Z for the zone). My own pref would be to match "%F %T%z"’s output and then use "%F %T %Z" for the localtime. Doubling up like that works well and given better intel. (In reply to Albert Astals Cid from comment #22) > Ok, so I guess you have convinced me, let's give Jakub some time to comment > about what he thinks of the patches. Sorry, I've been quiet occupied recently. I'd prefer if the code of poppler_date_parse() was somehow moved to core and used by glib and cpp, because now it does the same thing as detail::convert_date from cpp. This way we could avoid doubling the effort put into fixing/changing the way time conversion works. I think these two functions should stay (especially since one (overloaded) version of convert_date is exported as public API), but they should call a function from core which would do all the work. But this is of course open for discussion. Other than that, everything seems ok to me. Oh, I've just realised we actually have such a function in core. pdfTimeToInteger() in poppler/DateInfo.cc. So this function should be fixed just as poppler_date_parse(). And then glib's poppler_date_parse() and cpp's detail::convert_date() should just call pdfTimeToInteger(). Or we could remove glib's poppler_date_parse() altogether (it is used in just 2 places) and leave cpp's detail::convert_date() just for converting time_t to time_type. I can do that if Adrian doesn't want to. (It would replace the glib patch.) (In reply to Jakub Kucharski from comment #25) > Oh, I've just realised we actually have such a function in core. > pdfTimeToInteger() in poppler/DateInfo.cc. That function didn't exist when I developed my patch. It was recently merged in from the digital signature branch. I'll update my glib patch to move the conversion into this function. Created attachment 123909 [details] [review] 1/4: Emulate some glibc functions v3 Created attachment 123910 [details] [review] 2/4 glib: use UTC instead of local time in dates v3 Created attachment 123911 [details] [review] 3/4 pdfinfo: convert to local time zone v3 Created attachment 123912 [details] [review] 4/4 pdfinfo: add -isodates option v3 I've rebased my patches and moved the pdf date to time_t conversion in glib/poppler-date.cc to DataInfo.cc. Created attachment 123914 [details] [review] 5: cpp: switched from detail::convert_date() to core's dateStringToTime() And here comes the cpp patch. Patches look good to me. Pushed |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.