
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Compress Files Using zipfile Module in Python
Problem
You want to create a compress files in python.
Introduction
ZIP files can hold the compressed contents of many other files. Compressing a file reduces its size on disk, which is useful when transferring it over the internet or between the systems using Control-m AFT or Connect direct or even scp.
Python programs creates ZIP files using functions in the zipfile module.
How to do it...
1. We will be using zipfile and io packages. Install them with pip if any of the packages are missing on your system. If you are unsure, use pip freeze command to validate the packages.
2. We will write a function to write sample data to a file. The function write_data_to_files below takes data as input and creates a file in the current directory name.
EXample
# Function : write_data_to_files def write_data_to_files(inp_data, file_name): """ function : create a csv file with the data passed to this code args : inp_data : data to be written to the target file file_name : target file name to store the data return : none assumption : File to be created and this code are in same directory. """ print(f" *** Writing the data to - {file_name}") throwaway_storage = io.StringIO(inp_data) with open(file_name, 'w') as f: for line in throwaway_storage: f.write(line)
3. We will now write a function file_compress to zip the files created in above step. This function accepts list of files, go through them and compress it to a zip file. Detailed explanation of each step is provided in comments.
To create your own compressed ZIP files, you must open the ZipFile object in write mode by passing 'w' as the second argument.
When you pass a path to the write() method of a ZipFile object, Python will compress the file at that path and add it into the ZIP file.
The first argument for write() method is a string of the filename to add.
The second argument is the compression type parameter - which tells the computer what algorithm it should use to compress the files.
Example
# Function : file_compress def file_compress(inp_file_names, out_zip_file): """ function : file_compress args : inp_file_names : list of filenames to be zipped out_zip_file : output zip file return : none assumption : Input file paths and this code is in same directory. """ # Select the compression mode ZIP_DEFLATED for compression # or zipfile.ZIP_STORED to just store the file compression = zipfile.ZIP_DEFLATED print(f" *** Input File name passed for zipping - {inp_file_names}") # create the zip file first parameter path/name, second mode print(f' *** out_zip_file is - {out_zip_file}') zf = zipfile.ZipFile(out_zip_file, mode="w") try: for file_to_write in inp_file_names: # Add file to the zip file # first parameter file to zip, second filename in zip print(f' *** Processing file {file_to_write}') zf.write(file_to_write, file_to_write, compress_type=compression) except FileNotFoundError as e: print(f' *** Exception occurred during zip process - {e}') finally: # Don't forget to close the file! zf.close()
4. We will call the functions to create two csv files and then zip them. We will use tennis players data who won more than 1 grandslam titles to one file - temporary_file1_for_zip.csv and tennis players who won less than or equal to 1 grandslam in another file temporary_file1_for_zip.csv. We will then zip both these files to temporary.zip file.
Example
import zipfile import io import pandas as pd file_name1 = "temporary_file1_for_zip.csv" file_name2 = "temporary_file2_for_zip.csv" file_name_list = [file_name1, file_name2] zip_file_name = "temporary.zip" # data for file 1 file_data_1 = """ player,titles Federer,20 Nadal,20 Djokovic,17 Murray,3 """ # data for file 2 file_data_2 = """ player,titles Theim,1 Zverev,0 Medvedev,0 Rublev,0 """ # write the file_data to file_name write_data_to_files(file_data_1, file_name1) write_data_to_files(file_data_2, file_name2) # zip the file_name to zip_file_name file_compress(file_name_list, zip_file_name)
Example
5.Putting everything together discussed in above steps.
# Define the data # let us create a zip file with a single file import zipfile import io import pandas as pd # Function : write_data_to_files def write_data_to_files(inp_data, file_name): """ function : create a csv file with the data passed to this code args : inp_data : data to be written to the target file file_name : target file name to store the data return : none assumption : File to be created and this code are in same directory. """ print(f" *** Writing the data to - {file_name}") throwaway_storage = io.StringIO(inp_data) with open(file_name, 'w') as f: for line in throwaway_storage: f.write(line) # Function : file_compress def file_compress(inp_file_names, out_zip_file): """ function : file_compress args : inp_file_names : list of filenames to be zipped out_zip_file : output zip file return : none assumption : Input file paths and this code is in same directory. """ # Select the compression mode ZIP_DEFLATED for compression # or zipfile.ZIP_STORED to just store the file compression = zipfile.ZIP_DEFLATED print(f" *** Input File name passed for zipping - {inp_file_names}") # create the zip file first parameter path/name, second mode print(f' *** out_zip_file is - {out_zip_file}') zf = zipfile.ZipFile(out_zip_file, mode="w") try: for file_to_write in inp_file_names: # Add file to the zip file # first parameter file to zip, second filename in zip print(f' *** Processing file {file_to_write}') zf.write(file_to_write, file_to_write, compress_type=compression) except FileNotFoundError as e: print(f' *** Exception occurred during zip process - {e}') finally: # Don't forget to close the file! zf.close() # __main__ program if __name__ == '__main__': # Define your file name and data file_name1 = "temporary_file1_for_zip.csv" file_name2 = "temporary_file2_for_zip.csv" file_name_list = [file_name1, file_name2] zip_file_name = "temporary.zip" file_data_1 = """ player,titles Federer,20 Nadal,20 Djokovic,17 Murray,3 """ file_data_2 = """ player,titles Theim,1 Zverev,0 Medvedev,0 Rublev,0 """ # write the file_data to file_name write_data_to_files(file_data_1, file_name1) write_data_to_files(file_data_2, file_name2) # zip the file_name to zip_file_name file_compress(file_name_list, zip_file_name)
*** Writing the data to - temporary_file1_for_zip.csv *** Writing the data to - temporary_file2_for_zip.csv *** Input File name passed for zipping - ['temporary_file1_for_zip.csv', 'temporary_file2_for_zip.csv'] *** out_zip_file is - temporary.zip *** Processing file temporary_file1_for_zip.csv *** Processing file temporary_file2_for_zip.csv
Output
When the above code is executed, the output is
temporary_file1_for_zip.csv created in current directory.
temporary_file2_for_zip.csv created in current directory.
temporary.zip file is created in current directory.