Instead of using type punning to convert the OID to a Python hash, use
_Py_HashBytes() to hash the OID again. This means we no longer make any
assumptions on the internal representation of OID values or Python
hashes (before this commit, we at least relied on the fact that OID
hases are longer than Python hashes). Moreover, the random seed stored
in PYTHONHASHSEED is now honored.
This also fixes a compiler warning seen with -Wstrict-aliasing.
Signed-off-by: Lukas Fleischer <lfleischer@lfos.de>
When ffi.new() is used to build a new pointer object, the returned
pointer object has ownership on the allocated memory. When it is
garbage-collected, then the memory is freed. Thus, we need to make sure
the original object survives its use, otherwise the casted pointer will
point to garbage.
This fixes one test which was failing with the latest CFFI version, see
issue #694. Thus, this commit also reverts 803b1cb (cffi 1.10 not yet
supported, 2017-03-22) where the latest CFFI version was marked as
unsupported.
Signed-off-by: Lukas Fleischer <lfleischer@lfos.de>
Removed a chunk of code from `checkout` that did nothing, but had a bug.
When checking out from a branch-less state (like the state when a repository is first initialized) the code failed.
The failure was due to trying to get some properties of the current branch, which were never used in the code.
Commit objects in git always have a tree_id associated, that points
to the corresponding Tree object.
When the tree object is missing, the repo is corrupted.
In those cases:
* official git cli fatals with status code 128 and message:
fatal: unable to read tree <hash>
* libgit2 returns error GIT_ENOTFOUND when calling git_commit_tree()
* pygit2 when accessing the tree by id with repo[commit.tree_id]
raises a KeyError: <hash>
But on the other hand, on the commit object, rather than throwing
and exception, pygit2 is swallowing the error returned by libgit2
and setting the <Commit object>.tree property to None.
None is arguable the wrong choice to encode an error condition,
specially in python that is used heavily.
In particular this caused in our system to assume there was
an empty tree, and the sync service that tails git repo changes
decided to DELETE everything. The code was using None to
represent empty tree, usefull for example when we need to
compare a path between two commits (the path might be non-existant
at one of the commits you are comparing).
I think that in this case the right decision would be to raise
since is an exceptional case, caused by a corrupted repo,
is more consistent with other tools, and ensures user code
does not take the wrong decissions.
For curiosity the corrupted repository can happen more commonly
than expected. We run our repositories on a shared NFS filer,
and one of our servers didn't have the lookupcache=positive
flag. This makes NFS cache the metadata (files on a directory
for example) and use that for negative lookups (to deny existance
of files). In this case, the commit object was on a directory
not cached, so the commit was seen immediately, but the tree
object was in a folder that was cached, the cache didn't
contained the tree object, and thus for some seconds the tree
was not existing and the repo was corrupted. Our sync service
saw tree being None and decided to delete everything, causing
a lot of issues down the way.