Quantcast
Viewing latest article 19
Browse Latest Browse All 51

Answer by Louis Maddox for Download only a range of bytes or a specific file from a TAR archive in S3

Yes, it’s possible for an uncompressed tarball, the file format has header records about the files you can use to check its contents.

I'm more of a Python than a Java guy, but take a look at my implementation of tarball range requests here and docs here.

In short, you can check the header (the file name always comes first, and is padded to 512 byte blocks with NULL b"\x00" bytes), get the range corresponding to the file length to determine the variable length, get the remainder of that file length of 512 to determine the end-of-file padding, and then iterate up to 1024 before the end of the file (you can send a HEAD request to get the total bytes, or it's sent back when you execute a range request, AKA partial content request). The 1024-before-the-end part is because there are at least 2 empty blocks of 512 bytes at the end of a tar archive.

When iterating, it's probably sensible to check if the filename of each new block you expect to find a file header in is actually all NULL bytes, as this indicates you've actually entered one of the end-of-file blocks (the spec seems to say "at least 2 empty blocks" so there may be more). But if you control the tar files being generated maybe you wouldn't need to bother.


Viewing latest article 19
Browse Latest Browse All 51

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>