datalad_next.datasets
Representations of DataLad datasets built on git/git-annex repositories
Two sets of repository abstractions are available LeanGitRepo
and
LeanAnnexRepo
vs. LegacyGitRepo
and LegacyAnnexRepo
.
LeanGitRepo
and LeanAnnexRepo
provide a more modern,
small-ish interface and represent the present standard API for low-level
repository operations. They are geared towards interacting with Git and
git-annex more directly, and are more suitable for generator-like
implementations, promoting low response latencies, and a leaner processing
footprint.
The Legacy*Repo
classes provide a, now legacy, low-level API to repository
operations. This functionality stems from the earliest days of DataLad and
implements paradigms and behaviors that are no longer common to the rest of the
DataLad API. LegacyGitRepo
and LegacyAnnexRepo
should no
longer be used in new developments, and are not documented here.
- class datalad_next.datasets.LeanAnnexRepo(*args, **kwargs)[source]
Bases:
AnnexRepo
git-annex repository representation with a minimized API
This is a companion of
LeanGitRepo
. In the same spirit, it restricts its API to a limited set of method that extendLeanGitRepo
.
- class datalad_next.datasets.LeanGitRepo(*args, **kwargs)
Bases:
RepoInterface
Representation of a Git repository
- add_fake_dates_to_env(env=None)
Add fake dates to env.
- Parameters:
env (dict, optional) -- Environment variables.
- Returns:
A dict (copied from env), with date-related environment
variables for git and git-annex set.
- call_git(args, files=None, expect_stderr=False, expect_fail=False, env=None, pathspec_from_file: bool | None = False, read_only=False)
Call git and return standard output.
- Parameters:
args (list of str) -- Arguments to pass to git.
files (list of str, optional) -- File arguments to pass to git. The advantage of passing these here rather than as part of args is that the call will be split into multiple calls to avoid exceeding the maximum command line length.
expect_stderr (bool, optional) -- Standard error is expected and should not be elevated above the DEBUG level.
expect_fail (bool, optional) -- A non-zero exit is expected and should not be elevated above the DEBUG level.
pathspec_from_file (bool, optional) -- Could be set to True for a git command which supports --pathspec-from-file and --pathspec-file-nul options. Then pathspecs would be passed through a temporary file.
read_only (bool, optional) -- By setting this to True, the caller indicates that the command does not write to the repository, which lets this function skip some operations that are necessary only for commands the modify the repository. Beware that even commands that are conceptually read-only, such as git-status and git-diff, may refresh and write the index.
- Return type:
standard output (str)
- Raises:
CommandError if the call exits with a non-zero status. --
- call_git_items_(args, files=None, expect_stderr=False, expect_fail=False, env=None, pathspec_from_file: bool | None = False, read_only=False, sep=None, keep_ends=False)
Call git, yield output lines when available. Output lines are split at line ends or sep if sep is not None.
- Parameters:
sep (str, optional) -- Use sep as line separator. Does not create an empty last line if the input ends on sep.
call_git. (All other parameters match those described for) --
- Returns:
Generator that yields stdout items, i.e. lines with the line ending or
separator removed.
Please note, this method is meant to be used to process output that is
meant for 'interactive' interpretation. It is not intended to return
stdout from a command like "git cat-file". The reason is that
it strips of the line endings (or separator) from the result lines,
unless 'keep_ends' is True. If 'keep_ends' is False, you will not know
which line ending was stripped (if 'separator' is None) or whether a
line ending (or separator) was stripped at all, because the last line
may not have a line ending (or separator).
If you want to reliably recreate the output set 'keep_ends' to True and
"".join() the result, or use 'GitRepo.call_git()' instead.
- Raises:
CommandError if the call exits with a non-zero status. --
- call_git_oneline(args, files=None, expect_stderr=False, pathspec_from_file: bool | None = False, read_only=False)
Call git for a single line of output.
All other parameters match those described for call_git.
- Raises:
CommandError if the call exits with a non-zero status. --
AssertionError if there is more than one line of output. --
- call_git_success(args, files=None, expect_stderr=False, pathspec_from_file: bool | None = False, read_only=False)
Call git and return true if the call exit code of 0.
All parameters match those described for call_git.
- Return type:
bool
- property cfg
Get a ConfigManager instance for this repository
- Return type:
ConfigManager
- for_each_ref_(fields=('objectname', 'objecttype', 'refname'), pattern=None, points_at=None, sort=None, count=None, contains=None)
Wrapper for git for-each-ref
Please see manual page git-for-each-ref(1) for a complete overview of its functionality. Only a subset of it is supported by this wrapper.
- Parameters:
fields (iterable or str) -- Used to compose a NULL-delimited specification for for-each-ref's --format option. The default field list reflects the standard behavior of for-each-ref when the --format option is not given.
pattern (list or str, optional) -- If provided, report only refs that match at least one of the given patterns.
points_at (str, optional) -- Only list refs which points at the given object.
sort (list or str, optional) -- Field name(s) to sort-by. If multiple fields are given, the last one becomes the primary key. Prefix any field name with '-' to sort in descending order.
count (int, optional) -- Stop iteration after the given number of matches.
contains (str, optional) -- Only list refs which contain the specified commit.
- Yields:
dict with items matching the given fields
- Raises:
ValueError -- if no fields are given
RuntimeError -- if git for-each-ref returns a record where the number of properties does not match the number of fields
- init(sanity_checks=True, init_options=None)
Initializes the Git repository.
- Parameters:
create_sanity_checks (bool, optional) -- Whether to perform sanity checks during initialization if the target path already exists, such as that new repository is not created in the directory where git already tracks some files.
init_options (list, optional) -- Additional options to be appended to the git-init call.
- is_valid()
Returns whether the underlying repository appears to be still valid
This method can be used as an instance method or a class method.