Skip to content

πŸ”οΈ Extract provenance information (W3C PROV) from GitLab projects.

License

Notifications You must be signed in to change notification settings

DLR-SC/gitlab2prov

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

gitlab2prov, github2prov: (🦊|πŸˆβ€β¬›) β†’ πŸ“„

License: MIT Badge: Made with Python Badge: PyPi Version Badge: PyPi Downloads Monthly Twitter: DLR Software Badge: Open in VSCode Badge: DOI Badge: W3C PROV Badge: Citation File Format Inside

gitlab2prov is a Python library and command line tool that extracts provenance information from GitLab projects. GitHub support is provided by the github2prov command line tool contained in this package.


The data model underlying gitlab2prov & github2prov has been designed according to W3C PROV specification. The model documentation can be found here.

οΈπŸ—οΈ ️Installation

Please note that this tool requires Git to be installed on your machine.

Clone the project and install using pip:

pip install .

Or install the latest release from PyPi:

pip install gitlab2prov

To install gitlab2prov with all extra dependencies require the [dev] extras:

pip install .[dev]            # clone repo, install with extras
pip install gitlab2prov[dev]  # PyPi, install with extras

That's it! You can now use gitlab2prov and github2prov from the command line.

gitlab2prov --version  # show version
github2prov --version  # show version

⚑ Getting started

gitlab2prov & github2prov require a personal access token to clone git repositories and to authenticate with the GitLab/GitHub API.

Use the following guides to obtain a token with the required scopes for yourself:

πŸš€β€ Usage

The usage of gitlab2prov and github2prov is identical. The only difference being that github2prov only supports GitHub projects whereas gitlab2prov supports only GitLab projects. We will use gitlab2prov in the following examples.

gitlab2prov can be configured using the command line interface or by providing a configuration file in .yaml format.

Command Line Usage

The command line interface consists of commands that can be chained together like a unix pipeline.

Usage: gitlab2prov [OPTIONS] COMMAND1 [ARGS]... [COMMAND2 [ARGS]...]...

  Extract provenance information from GitLab projects.

Options:
  --version        Show the version and exit.
  --verbose        Enable logging to 'gitlab2prov.log'.
  --config FILE    Read config from file.
  --validate FILE  Validate config file and exit.
  --help           Show this message and exit.

Commands:
  combine    Combine one or more provenance documents.
  extract    Extract provenance information for one or more gitlab projects.
  read       Read provenance information from file[s].
  stats      Print statistics for one or more provenance documents.
  transform  Apply a set of transformations to provenance documents.
  write      Write provenance information to file[s].

Configuration Files

gitlab2prov supports configuration files in .yaml format that are functionally equivalent to command line invocations.

To envoke a run using a config file, use the --config option:

# run gitlab2prov using the config file 'config/example.yaml'
gitlab2prov --config config/example.yaml

You can validate your config file using the provided JSON Schema file that comes packaged with every installation:

# validate config file 'config/example.yaml' against the JSON Schema
gitlab2prov --validate config/example.yaml

Here is an example config file that extracts provenance information from three GitLab projects, reads a serialized provenance document from a file, combines the resulting provenance documents, transforms the combined document and writes it to files in different formats. Finally, statistics about the generated output are printed to the console:

- extract:
    url:
      - "https://gitlab.com/aristotle/nicomachean-ethics"
      - "https://gitlab.com/aristotle/poetics"
    token: golden_mean_and_drama_token
- extract:
    url:
      - "https://gitlab.com/plato/the-republic"
      - "https://gitlab.com/plato/phaedrus"
    token: ideal_forms_and_speech_token
- extract:
    url: ["https://gitlab.com/socrates/apology"]
    token: know_thyself_token
- read:
    input: [aristotelian_logic.rdf]
- combine:
- transform:
    use_pseudonyms: true
    remove_duplicates: true
- write:
    output: philosopher_outputs
    format: [json, rdf, xml, dot]
- stats:
    fine: true
    explain: true
    format: table

The config file example is functionally equivalent to this command line invocation:

gitlab2prov                                                              \
  extract                                                                \
    --url https://gitlab.com/aristotle/nicomachean-ethics                \
    --url https://gitlab.com/aristotle/poetics                           \
    --token golden_mean_and_drama_token                                  \
  extract                                                                \
    --url https://gitlab.com/plato/the-republic                          \
    --url https://gitlab.com/plato/phaedrus                              \
    --token ideal_forms_and_speech_token                                 \
  extract                                                                \
    --url https://gitlab.com/socrates/apology --token know_thyself_token \
  read --input aristotelian_logic.rdf                                    \
  combine                                                                \
  transform --use_pseudonyms --remove_duplicates                         \
  write --output philosopher_outputs                                     \
    --format json --format rdf --format xml --format dot                 \
  stats --fine --explain --format table

🎨 Provenance Output Formats

gitlab2prov & github2prov support all output formats that the prov library provides:

🀝 Contributing

Contributions and pull requests are welcome!
For major changes, please open an issue first to discuss what you would like to change.

✨ How to cite

If you use GitLab2PROV in a scientific publication, we would appreciate citations to the following paper:

Bibtex entry:

@InProceedings{SchreiberBoerKurnatowski2021,
  author    = {Andreas Schreiber and Claas de~Boer and Lynn von~Kurnatowski},
  booktitle = {13th International Workshop on Theory and Practice of Provenance (TaPP 2021)},
  title     = {{GitLab2PROV}{\textemdash}Provenance of Software Projects hosted on GitLab},
  year      = {2021},
  month     = jul,
  publisher = {{USENIX} Association},
  url       = {https://www.usenix.org/conference/tapp2021/presentation/schreiber},
}

You can also cite specific releases published on Zenodo: DOI

✏️ References

Influencial Software for gitlab2prov

  • Martin Stoffers: "Gitlab2Graph", v1.0.0, October 13. 2019, GitHub Link, DOI 10.5281/zenodo.3469385

  • Quentin Pradet: "How do you rate limit calls with aiohttp?", GitHub Gist, MIT LICENSE

Influencial Papers for gitlab2prov:

Papers that refer to gitlab2prov:

πŸ“œ Dependencies

gitlab2prov depends on several open source packages that are made freely available under their respective licenses.

Package License
GitPython License
click License
python-gitlab License: LGPL v3
prov License: MIT
jsonschema License: MIT
ruamel.yaml License: MIT
pydot License: MIT

πŸ“ License

This project is MIT licensed.
Copyright Β© 2019 German Aerospace Center (DLR) and individual contributors.

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages