[petsc-dev] Commit collision/corruption?

Sean Farley sean at mcs.anl.gov
Thu Mar 7 23:06:47 CST 2013


Jed Brown writes:

> On Wed, Mar 6, 2013 at 10:54 PM, Sean Farley <sean at mcs.anl.gov> wrote:
>
>> Responding to this paragraph more in depth now; your information above
>> is at best misinformed and at worse complete bs. Both mercurial and git
>> store the sha1 representation with a binary-safe "diff" algorithm. Git's
>> seems to be like xdiff and mercurial's is mdiff.
>>
>
> In Git, the diff algorithm is irrelevant because it is not part of the
> SHA1. Git's SHA1 comes from a complete snapshot of the state, not a diff.
> Packing is an optimization that does not affect the user.

Whoops, conceptual typo on my part. Both Mercurial and Git create a SHA1
from the current snapshot + metadata; never, ever a diff. They also both
use almost the same algorithm for merges / diffs. I apparently tried to
combine those sentences too late last night.

>> Let's look again at your original request, 'hg export de73c9a | grep
>> fsolvebaij' shows no output. Well, that's because you're diffing against
>> the wrong parent. Running 'hg export --switch-parent de73c9a | grep
>> fsolvebaij' gives:
>>
>> diff --git a/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
>> b/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
>> --- a/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
>> +++ b/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
>>
>
> So naming these files is optional?
>
> changeset:   26388:de73c9a7d341
> tag:         tip
> parent:      26376:84182841dc78
> parent:      26362:2c1d69e93258
> user:        Barry Smith <bsmith at mcs.anl.gov>
> date:        Tue Mar 05 16:55:25 2013 -0600
> summary:     merge, terrible manual process with many conflicts with Jed's
> PETSC_INTERNAL
> $ hg export -r de73c9a7d341 > first-parent.patch
>
> $ hg strip de73c9a7d341
> 99 files updated, 0 files merged, 40 files removed, 0 files unresolved
> saved backup bundle to
> /tmp/jed/petsc-hgbad/.hg/strip-backup/de73c9a7d341-backup.hg
> $ hg up 84182841dc78          # first parent
> 89 files updated, 0 files merged, 40 files removed, 0 files unresolved
> $ hg import first-parent.patch
> applying first-parent.patch
> $ hg log -r tip
> changeset:   26387:4b474795bc4d
> tag:         tip
> parent:      26376:84182841dc78
> parent:      26362:2c1d69e93258
> user:        Barry Smith <bsmith at mcs.anl.gov>
> date:        Tue Mar 05 16:55:25 2013 -0600
> summary:     merge, terrible manual process with many conflicts with Jed's
> PETSC_INTERNAL
>
> $ hg log -vr 4b474795bc4d | grep fsolvebaij       # de73c9a7d341 included
> the file name
> $ hg export -r 4b474795bc4d | grep fsolvebaij    # no output, same as
> de73c9a7d341
> $ hg export --switch-parent -r 4b474795bc4d | grep fsolvebaij # also same
> output
> diff --git a/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
> b/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
> --- a/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
> +++ b/src/mat/impls/baij/seq/ftn-kernels/fsolvebaij.h
>
> unbundling the original, I have
>
> $ hg diff -r 4b474795bc4d:de73c9a7d341
> $

Are you really trying to compare a representation of merge using a
format that was never designed to handle a three-way merge?

Let's look at something git did after a 'git stash pop':

diff --cc src/libmap/mapclientrpc.c
index b0e3a2f,bb06b5d..0000000
--- a/src/libmap/mapclientrpc.c
+++ b/src/libmap/mapclientrpc.c

You would think that this would delete mapclientrpc.c but it didn't!
<troll>If 'git stash' stored things in a canonical format instead of as
diffs, this would be impossible!</troll>

> So I have created a new commit that is semantically identical (parents,
> metadata, and merged content), but that does not name those extraneous
> files (as evidenced by 'hg log -v')? Is this ambiguity/non-reproducibility
> intentional?
>
> At least 'hg import --exact first-parent.patch' fails saying that it cannot
> reproduce the same result. I don't understand why somewhat would design a
> system that has this sort of non-uniqueness due to superfluous information
> that is not represented in human-readable form (unlike 'hg bundle').

I don't understand why you aren't looking this information up for
yourself. You completely misunderstand how Mercurial is computing the
SHA1 (hint: both Git and Mercurial are based on the same scheme [1]). If
you commit a changeset with _exactly_ the same contents (including file
hashes, user, date, extra fields...), you will get the same SHA1. Git
and Mercurial both store deltas, and they both compute SHA1s of commits,
but the two computations are unrelated.

>> Editorializing, I believe that you are wasting everyone's time with your
>> constant (and mostly vacuous) complaining about mercurial. No one has a
>> problem with this merge except for you. And, judging from your earlier
>> emails, it was a problem for you after you ran it through gitifyhg. So,
>> again, I say that this is a bug with your extension.
>>
>
> This commit has nothing to do with gitifyhg, but it is true that I would
> not have noticed the non-unique representation if not for gitifyhg. In the
> past, gitifyhg has always been able to reproduce Hg sha1s, even for very
> large repositories. It can't here because the SHA1 is literally not
> reproducible from the content (actual changes, metadata). If this is
> known/intentional, then perhaps the concept of having any external system
> talk two-way with Mercurial is wrong-headed. That would be sad.

The first changeset, which is a merge, made by Barry is perfectly
fine. Your gitifyhg machinery seems to be the culprit of borking the
second commit due it using private Mercurial's internals. Look around 20
or so lines below line 763 of gitifyhg:

https://github.com/buchuki/gitifyhg/blob/master/gitifyhg.py#L763

I'm guessing gitifyhg blindly passed a list of files (which happened to
be in the diff of the first parent but were empty) and sent them to
memctx. Who knows. Nor do I really care enough to dig deeper into the
gitifyhg code. The point is, you are already in the land of calling
internals of Mercurial, and therefore, you take the risk of bypassing
checks that are in place for calls coming from the command-line. This is
the price of admission for using internal calls (and not, say, the
command server).

I will note here the irony of Jed blaming the library code (Mercurial)
and not the user code (gitifyhg), especially considering how long he's
been on petsc-maint.

[1] - http://mercurial.selenic.com/wiki/Presentations?action=AttachFile&do=get&target=ols-mercurial-paper.pdf



More information about the petsc-dev mailing list