Bug 38879 - Add git history/log parser for tinderbox
Summary: Add git history/log parser for tinderbox
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: WWW (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: difficultyInteresting, easyHack, skillScript
Depends on:
Blocks:
 
Reported: 2011-07-01 06:22 UTC by Björn Michaelsen
Modified: 2015-12-16 00:39 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Michaelsen 2011-07-01 06:22:11 UTC
Add git history/log parser for tinderbox

Background: For tinderbox (http://tinderbox.go-oo.org/) use, it would be nice to be able to tell what exactly did change between builds, i.e. what could have broken the build, etc.

See the existing ones for svn and mercurial to get an idea what is needed (interesting part starts at about line 530):

http://cgit.freedesktop.org/libreoffice/website/tree/lib/TinderDB/VC_OOo.pm?h=tinderbox

Also see http://tinderbox.go-oo.org/cgi-bin/tinder.cgi?tree=sb135&start-time=1294235414&display-hours=100 for example on how it is actually used in the end (and http://tinderbox.go-oo.org/sb135/all_vc.html for a corresponding checkin-data (date, author, link-to-diff, affected files) - merges can of course be shortened/don't need to list all files).

Skills: perl, git
Comment 2 Christian Lohmaier 2011-08-22 03:44:34 UTC
cc'ing as "mentor"/initial creator of this EasyHack proposal
Comment 3 Florian Reisinger 2012-05-18 09:01:28 UTC
Deteted "Easyhack" from summary
Comment 4 DavidO 2013-02-20 20:43:06 UTC
Just for the case some one whant to pick it up:

that core-link is not part of the forced metadata, but an optional information string that the tinderbox client does send.

Using that information would require to track information for each tinderbox slave, and to also generate the commits-since-last-build either on-demand or for each buildslave separately.

Tinderbox code is available:

once in the tinderbox contrib script to make it a mandatory/standard flag (as opposed to using TinderboxPrint - as that can appear everywhere in the log and thus will make storing it in the build-db much harder), and then obviously in the tinderbox code. http://cgit.freedesktop.org/libreoffice/website/tree/?h=tinderbox
Comment 5 Björn Michaelsen 2013-10-04 18:48:11 UTC
adding LibreOffice developer list as CC to unresolved EasyHacks for better visibility.

see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
Comment 6 Christian Lohmaier 2013-10-14 11:21:28 UTC
While it is true that the revision that was built is not part of the required parameters, most tinderboxes do use the scripts that add them (and it could be made mandatory). But since onegit, this isn't necessary. For tinderbox purposes one can assume that the timestamp reflects the tip of the tree when the build started, and not some revision in the past. So tinderbox server could take an educated guess by time.

So tinderbox server could git pull every 15 minutes/whatever the minimum display-interval is, and store the time and the corresponding git-hash of the branches.
When a tinderbox doesn't supply the built hash as additional info, tinderbox will assign the corresponding rev that was stored with the timestamp older than the build-start date.

And yes, obviously tinderbox needs to store info per build-entry and also per tinderbox slave (and it does so already, sample for the per-build data is the core-revision for example, and example for per-builder info is the average (mean) buildtime.

But with the automatic mails to committers since last successful build on failure of the tinderboxes, this is of lower prio than when it was initially filed...
Comment 7 Norbert Thiebaud 2013-10-15 14:04:45 UTC
(In reply to comment #6)
> 
> So tinderbox server could git pull every 15 minutes/whatever the minimum
> display-interval is, and store the time and the corresponding git-hash of
> the branches.

No, that does not work.

The tinderbox script save the sha of the tips and the timestamp of when they _fetched_
but you cannot rely on the timestamp of the commits themselves as they are routinely not in chronological order.

the commit timestamp is dated from when you created the commit... commit appears on HEAD when they are pushed... there can be a significant amount of time between the two event... drastic differences for feature branch that get integrated
Just take a look at bcc239b405478040fda46d1bf1d4f3e38506d1a3 2013-07-29
and the next commit is 41d2036bee3279928903cdada115d3e3cd022a06 2012-12-18

The tb script that spam people does not rely on dates.. but on a git log analysis between the last good commit sha and the current broken one.
This of course is only relevant for 'progressive' tinderbox
if/when we move to tb3, the spamming will be the job of the server since the tb client will not have a reliable 'last successful build' point
Comment 8 Lionel Elie Mamane 2013-10-15 19:49:45 UTC
(In reply to comment #7)
 
> The tinderbox script save the sha of the tips and the timestamp of when they
> _fetched_
> but you cannot rely on the timestamp of the commits themselves as they are
> routinely not in chronological order.

> the commit timestamp is dated from when you created the commit... commit
> appears on HEAD when they are pushed... there can be a significant amount of
> time between the two event... drastic differences for feature branch that
> get integrated
> Just take a look at
> bcc239b405478040fda46d1bf1d4f3e38506d1a3 2013-07-29
> and the next commit is
> 41d2036bee3279928903cdada115d3e3cd022a06 2012-12-18

That's because you look at AuthorDate, and not at CommitDate. CommitDate is usually in order. Theoretically, they could not be, but that's only if the clock on the machine doing the rebase / merge / ... is wrong.

$ git log --pretty=fuller 41d2036bee3279928903cdada115d3e3cd022a06
commit 41d2036bee3279928903cdada115d3e3cd022a06
Author:     Herbert Dürr <hdu@apache.org>
AuthorDate: Tue Dec 18 15:25:42 2012 +0000
Commit:     Caolán McNamara <caolanm@redhat.com>
CommitDate: Mon Jul 29 11:28:04 2013 +0100

    Resolves: #i121406# support the OSX>=10.7 fullscreen mode based on OSX Spaces

commit bcc239b405478040fda46d1bf1d4f3e38506d1a3
Author:     Caolán McNamara <caolanm@redhat.com>
AuthorDate: Mon Jul 29 11:17:11 2013 +0100
Commit:     Gerrit Code Review <gerrit@vm2.documentfoundation.org>
CommitDate: Mon Jul 29 10:17:42 2013 +0000

    Updated core
    Project: help  60eaec58845c8f697c2d7ab5bb671273b0ff4155
Comment 9 Christian Lohmaier 2013-10-26 14:04:14 UTC
Oh, you misunderstood.

tinderbox shouldn't use the commit's time as reference, but assume that the tinderbox did start the build right after pulling. 


So tinderbox knows the commits between the intervals where tinderbox does check for a build, and can use the starttime of the buildbot to map that into this interval.

Of course this will not be accurate, as tinderboxes can lie about the starttime, and aren't required to immediately start building after updating the repo. And of course there will always be multiple commits in the tinderbox-check-for-update range, hence it is always an approximate list of changes.

My proposal is a simple one, that doesn't rely on tinderbox slaves reporting the hash they built, and doesn't look at commit-timestamps at all. As written: it is a fallback-method.

So assume the timeline:
18:00 (tinderbox server pulls and change-ID foo is at top) and 
18:08 build with status success was started, but didn't provide change-ID
18:15 (chage-ID bar),
[....]
23:00 (change-ID oof)
23:09 build started and result was failure, reported without change-ID
23:15 (change-ID rab)


With the fallback-method, tinderbox will report all changes between "foo" and "rab" as possible candidates that could have broken the build. No attempt is made to detect the exact revision that the bot did build.

It is an educated guess, not exact.

The reason why I suggest this method is, that clicking on the timeline in the overview pages, also used to list all commit since that date, this was trivial to do with bonsai in the early days, and also no problem with svn later on. Impossible with the multi-repo stuff, and in reach again with the onegit/submodules based repo.

As you write yourself: Impossible to tell when then commit reached the main repo by looking at the commit's date, as that is the local commit time, not the time it landed in the upstream repo. But when tinderbox checks the upstream repo regularily, no problem to list that info.

When you rely on tinderboxes reporting the built revision, you still would have to embed this into a timeline based on the starttime to be able to make the timeline work. 

But all the above is just suggestions..
Comment 10 Björn Michaelsen 2015-01-15 16:34:21 UTC
This is implemented now, IIRC by Norbert. Kudos to him!
Comment 11 Robinson Tryon (qubit) 2015-12-16 00:39:45 UTC
Migrating Whiteboard tags to Keywords: (EasyHack DifficultyInteresting SkillScript)
[NinjaEdit]