Skip to content

[Feature Request]: Add zstd support in tarfile #81276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
evan0greenup mannequin opened this issue May 30, 2019 · 19 comments
Open

[Feature Request]: Add zstd support in tarfile #81276

evan0greenup mannequin opened this issue May 30, 2019 · 19 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@evan0greenup
Copy link
Mannequin

evan0greenup mannequin commented May 30, 2019

BPO 37095
Nosy @gustaebel, @lilydjwg, @serhiy-storchaka, @animalize, @websurfer5, @erlend-aasland

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-05-30.03:42:48.084>
labels = ['type-feature', 'library', '3.10']
title = '[Feature Request]: Add zstd support in tarfile'
updated_at = <Date 2021-10-26.10:45:36.871>
user = 'https://bugs.python.org/evan0greenup'

bugs.python.org fields:

activity = <Date 2021-10-26.10:45:36.871>
actor = 'yan12125'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2019-05-30.03:42:48.084>
creator = 'evan0greenup'
dependencies = []
files = []
hgrepos = []
issue_num = 37095
keywords = []
message_count = 7.0
messages = ['343945', '356498', '373583', '373634', '374123', '375472', '376095']
nosy_count = 11.0
nosy_names = ['lars.gustaebel', 'daniel.ugra', 'lilydjwg', 'serhiy.storchaka', 'wicher', 'malin', 'Jeffrey.Kintscher', 'evan0greenup', 'erlendaasland', 'Jerrod Frost', 'Anatol Pomozov']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue37095'
versions = ['Python 3.10']

@evan0greenup
Copy link
Mannequin Author

evan0greenup mannequin commented May 30, 2019

Zstandard is getting more and more popular. It could be awesome if tarfile support this compression format for .tar.zst file.

@evan0greenup evan0greenup mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels May 30, 2019
@tirkarthi tirkarthi added 3.8 (EOL) end of life 3.9 only security fixes and removed 3.7 (EOL) end of life labels May 30, 2019
@JerrodFrost
Copy link
Mannequin

JerrodFrost mannequin commented Nov 12, 2019

Curious about this as well.

@AnatolPomozov
Copy link
Mannequin

AnatolPomozov mannequin commented Jul 13, 2020

Is there any progress with this feature development?

Arch Linux uses Python tar library for its toolset. Arch devs are looking to add ZSTD support to the toolset but it needs this feature to be implemented.

@ambv ambv added 3.10 only security fixes and removed 3.8 (EOL) end of life 3.9 only security fixes labels Jul 13, 2020
@animalize
Copy link
Mannequin

animalize mannequin commented Jul 14, 2020

Add zstd support in tarfile

This requires the stdlib to contain a Zstandard module.

You can ask in the Idea forum:
https://discuss.python.org/c/ideas

@serhiy-storchaka
Copy link
Member

The tarfile module supports arbitrary compressions by using the stream mode. You only need to use a third-party library which provides zstd support.

Recent versions of the tar utility has options to explicit support of new compressions: --lzip, --lzma, --lzop, --zstd, so corresponding modes can be added to the tarfile module. But it needs to include the support of these compressions in the stdlib. It should be discussed on the Python-ideas mailing list.

https://mail.python.org/mailman3/lists/python-ideas.python.org/

@animalize
Copy link
Mannequin

animalize mannequin commented Aug 15, 2020

There are two zstd modules on pypi:

https://pypi.org/project/zstd/
https://pypi.org/project/zstandard/

The first one is too simple.

The second one is powerful, but has too many APIs:
ZstdCompressorIterator
ZstdDecompressorIterator
ZstdCompressionReader
ZstdCompressionWriter
ZstdCompressionChunkerIterator
(multi-thread compression)

IMO these are not necessary for stdlib.

In addition, it needs to add something, such as the max_length parameter, and a ZstdFile class that can be integrated with the tarfile module. These workloads are not big.

I looked at the zstd API, it's a bit simpler than lzma/bz2/zlib. If spend a month, should be able to make a zstd module for stdlib. Then discuss the detailed API on Python-Ideas.

I once wanted to do this job, but it seems my time does not allow it. If anyone wants to do this work, please reply here.

FYI, Python 3.10 schedule:
3.10.0 beta 1: 2021-05-03 (No new features beyond this point.)

@animalize
Copy link
Mannequin

animalize mannequin commented Aug 30, 2020

I have spent two weeks, almost complete the code, a preview:
https://github.com/animalize/cpython/pull/8/files

Write directly for stdlib, since there are already zstd modules on pypi.
In addition, the API of zstd is simple, not as complicated as lzma.

Can also use these:
1, argument clinic
2, multi-phase init
3. internal function _PyLong_AsInt

@Techcable
Copy link

Techcable commented Sep 8, 2022

@animalize wrote a pyzstd module that closely matches the gzip/lama API

The other main contender zstandard is very advanced, but doesn't try to adapt to the stdlib tarfile API....

@dralley
Copy link

dralley commented May 25, 2023

@animalize The PR you created is between branches on your own fork, is there any chance you could submit that PR against CPython upstream?

@lgommans
Copy link

lgommans commented Jan 26, 2024

Was looking whether zstd support was being worked on or if I could help, similar to the existing bzip and related modules that are super convenient to have in stdlib (thanks to whoever made those, in case they're around!). Happy to see u/animalize worked on it but... their user is deleted now? :(

Does anyone have a copy of the code and know what license it was under?

Edit: I also signed up for and replied on the related discourse forum thread: https://discuss.python.org/t/integrate-zstd-compression-in-tarfile-module/7013

@dralley
Copy link

dralley commented Jan 26, 2024

So, uh, by proxy does that mean that pyzstd is now unmaintained? Seems like it would, he's the only maintainer.

I dunno if perhaps someone at Github could return an archive of that repo / PR?

Worst case, the source code tarball can be downloaded from PyPI and then the PR turning it into a module can be rewritten. The license is declared as 3-clause BSD.

@hugovk
Copy link
Member

hugovk commented Jan 26, 2024

@lgommans You can see animalize's changes on the Wayback Machine (be patient, it takes a while to load):

https://web.archive.org/web/20231214201705/https://github.com/animalize/cpython/pull/8/files

@hugovk
Copy link
Member

hugovk commented Jan 26, 2024

@dralley https://web.archive.org/web/20231126145554/https://github.com/animalize/pyzstd shows the repo was still active at least as late as November 2023, and had two other contributors. Checking their forks, and poking around some other links:

@hauntsaninja
Copy link
Contributor

animalize was definitely gone by mid-December (I tried to look it up). I use indygreg's zstandard. The documentation buries the one-shot APIs a little, but they work great.

@dralley
Copy link

dralley commented Jan 31, 2024

@lgommans You can potentially download the latest release from PyPI (tarball) and work from that.

Unfortunately there's a fair number of changes in 2023 that aren't captured by any of the forks.

zstandard is a great library but it doesn't mesh quite as well with the stdlib style.

@hauntsaninja
Copy link
Contributor

zstandard does have simple one-shot APIs: zstandard.compress / zstandard.decompress. Its documentation just buries them a little. Unless you meant something else?

@helmutg
Copy link

helmutg commented Feb 27, 2024

For those still searching for a quick solution (based on zstandard):

class TarFile(tarfile.TarFile):
    """Subclass of tarfile.TarFile that can read and write zstd compressed archives."""

    OPEN_METH = {"zst": "zstopen"} | tarfile.TarFile.OPEN_METH

    @classmethod
    def zstopen(
        cls,
        name: str,
        mode: typing.Literal["r", "w", "x"] = "r",
        fileobj: None = None,
    ) -> tarfile.TarFile:
        if mode not in ("r", "w", "x"):
            raise NotImplementedError(f"mode `{mode}' not implemented for zst")
        if fileobj is not None:
            raise NotImplementedError("zst does not support a fileobj yet")
        try:
            import zstandard
        except ImportError:
            raise tarfile.CompressionError("zstandard module not available")
        if mode == "r":
            zfobj = zstandard.open(name, "rb")
        else:
            zfobj = zstandard.open(
                name,
                mode + "b",
                cctx=zstandard.ZstdCompressor(write_checksum=True, threads=-1),
            )
        try:
            tarobj = cls.taropen(name, mode, zfobj)
        except (OSError, EOFError, zstandard.ZstdError) as exc:
            zfobj.close()
            if mode == "r":
                raise tarfile.ReadError("not a zst file") from exc
            raise
        except:
            zfobj.close()
            raise
        # Setting the _extfileobj attribute is important to signal a need to
        # close this object and thus flush the compressed stream.
        # Unfortunately, tarfile.pyi doesn't know about it.
        tarobj._extfileobj = False  # type: ignore
        return tarobj

This is not perfect and does not handle file objects, but it may be good enough for some use cases. I am the author of this code and explicitly grant a MIT license on it as the original tarfile.py also is MIT licensed.

@nanonyme
Copy link

nanonyme commented Aug 3, 2024

The tarfile module supports arbitrary compressions by using the stream mode. You only need to use a third-party library which provides zstd support.

Recent versions of the tar utility has options to explicit support of new compressions: --lzip, --lzma, --lzop, --zstd, so corresponding modes can be added to the tarfile module. But it needs to include the support of these compressions in the stdlib. It should be discussed on the Python-ideas mailing list.

https://mail.python.org/mailman3/lists/python-ideas.python.org/

Doesn't tarfile say "However, such a TarFile object is limited in that it does not allow random access" for this stream mode? So while it may be sufficient, there are significant limitations compared to real zstd support.

@picnixz picnixz removed the 3.10 only security fixes label Mar 1, 2025
@gpshead
Copy link
Member

gpshead commented Apr 22, 2025

https://peps.python.org/pep-0784/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Status: No status
Development

No branches or pull requests