[petsc-dev] Commit collision/corruption?

Jed Brown jedbrown at mcs.anl.gov
Fri Mar 8 09:39:29 CST 2013


On Thu, Mar 7, 2013 at 11:06 PM, Sean Farley <sean at mcs.anl.gov> wrote:

> Are you really trying to compare a representation of merge using a
> format that was never designed to handle a three-way merge?
>

To start with, hg export cannot represent the difference.

$ diff <(hg export -r 84df07d03c6e) <(hg export -r de73c9a7d341)
4c4
< # Node ID 84df07d03c6e55bd0f27bd5ee8c1738562bd529d
---
> # Node ID de73c9a7d341d846b5e16a8d61a48242f0521c02
$ diff <(hg export --switch-parent -r 84df07d03c6e) <(hg export
--switch-parent -r de73c9a7d341)
4c4
< # Node ID 84df07d03c6e55bd0f27bd5ee8c1738562bd529d
---
> # Node ID de73c9a7d341d846b5e16a8d61a48242f0521c02

Digging deeper

$ diff <(hg manifest --debug -r 84df07d03c6e) <(hg manifest --debug -r
de73c9a7d341)
902c902
< 402038908c312f83748dae2fffdb4d699b35663f 644
src/dm/impls/composite/pack.c
---
> 154f020ddb26968c2e1002aaf63972aa82f6427b 644
src/dm/impls/composite/pack.c
[...]

$ hg debug-diff-tree 84df07d03c6e de73c9a7d341

:100664 100664 402038908c31 154f020ddb26 M
 src/dm/impls/composite/pack.c   src/dm/impls/composite/pack.c
[...]

Okay, this is progress. But how do we determine what is different

$ diff <(hg cat -r 84df07d03c6e src/dm/impls/composite/pack.c) <(hg cat -r
de73c9a7d341 src/dm/impls/composite/pack.c)
$

(Same for all the others.) Unfortunately, the internals are basically
undocumented and I haven't found the exact algorithm used to compute these
file hashes. Do you know to tell what makes these hashes different, and why
none of the user-facing commands seem capable of showing the difference?



>
> Let's look at something git did after a 'git stash pop':
>
> diff --cc src/libmap/mapclientrpc.c
> index b0e3a2f,bb06b5d..0000000
> --- a/src/libmap/mapclientrpc.c
> +++ b/src/libmap/mapclientrpc.c
>
> You would think that this would delete mapclientrpc.c but it didn't!
> <troll>If 'git stash' stored things in a canonical format instead of as
> diffs, this would be impossible!</troll>
>

'git stash' creates a normal commit object, reachable from refs/stash
(.git/refs/stash). You'll have to provide context if you want to claim
something is inconsistent.


> > So I have created a new commit that is semantically identical (parents,
> > metadata, and merged content), but that does not name those extraneous
> > files (as evidenced by 'hg log -v')? Is this
> ambiguity/non-reproducibility
> > intentional?
> >
> > At least 'hg import --exact first-parent.patch' fails saying that it
> cannot
> > reproduce the same result. I don't understand why somewhat would design a
> > system that has this sort of non-uniqueness due to superfluous
> information
> > that is not represented in human-readable form (unlike 'hg bundle').
>
> I don't understand why you aren't looking this information up for
> yourself. You completely misunderstand how Mercurial is computing the
> SHA1 (hint: both Git and Mercurial are based on the same scheme [1]).
>

That is a high-level paper that does not provide details. Compare to this
chapter, for example, that explains how git objects are constructed,
including stand-alone code to reproduce it.

http://git-scm.com/book/en/Git-Internals-Git-Objects

The git man pages also explain this in great detail.


> If
> you commit a changeset with _exactly_ the same contents (including file
> hashes, user, date, extra fields...), you will get the same SHA1.
>

Can you show why the file hashes above are different, despite the files
having the same contents? See also this thread showing that the
representation has historically been under-specified and handled
inconsistently, thus you can't always reproduce old SHA1s using a newer
version of Hg.

http://selenic.com/pipermail/mercurial/2011-June/038828.html


>  Git
> and Mercurial both store deltas,
>

Until 'git gc' runs (possibly automatically), Git does not use deltas.


>  and they both compute SHA1s of commits,
> but the two computations are unrelated.
>

What is the exact algorithm used to compute Hg SHA1s?


>  > This commit has nothing to do with gitifyhg, but it is true that I would
> > not have noticed the non-unique representation if not for gitifyhg. In
> the
> > past, gitifyhg has always been able to reproduce Hg sha1s, even for very
> > large repositories. It can't here because the SHA1 is literally not
> > reproducible from the content (actual changes, metadata). If this is
> > known/intentional, then perhaps the concept of having any external system
> > talk two-way with Mercurial is wrong-headed. That would be sad.
>
> The first changeset, which is a merge, made by Barry is perfectly
> fine.
>

Yet 'hg import --exact' cannot import it.


>  Your gitifyhg machinery seems to be the culprit of borking the
> second commit due it using private Mercurial's internals. Look around 20
> or so lines below line 763 of gitifyhg:
>
> https://github.com/buchuki/gitifyhg/blob/master/gitifyhg.py#L763
>
> I'm guessing gitifyhg blindly passed a list of files (which happened to
> be in the diff of the first parent but were empty) and sent them to
> memctx. Who knows. Nor do I really care enough to dig deeper into the
> gitifyhg code. The point is, you are already in the land of calling
> internals of Mercurial, and therefore, you take the risk of bypassing
> checks that are in place for calls coming from the command-line. This is
> the price of admission for using internal calls (and not, say, the
> command server).
>

'hg import' also produces a different representation of that commit. Why is
it so difficult to produce a human-readable representation of the
difference?

(If you want to say gitifyhg did something wrong, you should provide
evidence that something is actually wrong, not just that you don't like the
code. Also, if there is a better way to create hg commits without using a
working tree, please refer me to the documentation on how to do that.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130308/5ffba8ea/attachment.html>


More information about the petsc-dev mailing list