Today, we will discuss removing duplicate lines from a file in Python
You may also wish to read Object Oriented Concepts in Python as mentioned :
Python Class and Object Example
Inheritance in Python
Packages in Python
Exceptions in Python
How to remove duplicate lines from a file in perl
Lets discuss in two ways as mentioned below :
1) Removing duplicate lines and print the lines in order (When Order is important)
- Using normal way
2) Removing duplicate lines and print the lines in any order (When Order is NOT important)
- Using SET concept in Python. SET concept in python does not consider Order.
Note:
Both the scripts read duplicated content from file_with_duplicates.txt,
Read the above file and remove duplicate lines and finally
Print to file_without_duplicates.txt
Input: file_with_duplicates.txt
1) remove_duplicate_lines_from_file_with_order.py
- Using Normal Way - it take cares of order of the lines
Output: file_without_duplicates.txt
2) remove_duplicate_lines_from_file_without_order.py
- Using SET concept - it does not consider the order of the lines
Output: file_without_duplicates.txt
Note:
Both the scripts read duplicated content from file_with_duplicates.txt,
Read the above file and remove duplicate lines and finally
Print to file_without_duplicates.txt
Input: file_with_duplicates.txt
Mother Teresa Winston Churchill Abraham Lincoln Mahatma Gandhi Winston Churchill Mother Teresa Abraham Lincoln
1)
infile = open('file_with_duplicates.txt', 'r')outfile = open('file_without_duplicates.txt', 'w')lines_seen = set()for line in infile:if line not in lines_seen:outfile.write(line)lines_seen.add(line)outfile.close()
1) remove_duplicate_lines_from_file_with_order.py
- Using Normal Way - it take cares of order of the lines
#!/usr/bin/python try: input_file = open("file_with_duplicates.txt", "r") output_file = open("file_without_duplicates.txt", "w") unique = [] for line in input_file: line = line.strip() if line not in unique: unique.append(line) input_file.close() for i in range(0, len(unique)-1): unique[i] += "\n" output_file.writelines(unique) output_file.close except FileNotFoundError: print('\n File NOT Found Error') sys.exit except IOError: print('\n IO Error') sys.exit
Output: file_without_duplicates.txt
Mother Teresa Winston Churchill Abraham Lincoln Mahatma Gandhi
2) remove_duplicate_lines_from_file_without_order.py
- Using SET concept - it does not consider the order of the lines
#!/usr/bin/python try: input_file = open("file_with_duplicates.txt", "r") output_file = open("file_without_duplicates.txt","w") #The main drawback of using sets is, the order of the lines may not be same as in input file uniquelines = set(input_file.read().split("\n")) output_file.write("".join([line + "\n" for line in uniquelines])) input_file.close() output_file.close() except FileNotFoundError: print('\n File NOT Found Error') sys.exit except IOError: print('\n IO Error') sys.exit
Output: file_without_duplicates.txt
Abraham Lincoln Winston Churchill Mother Teresa Mahatma Gandhi
You may also wish to read Object Oriented Concepts in Python as mentioned :
Python Class and Object Example
Inheritance in Python
Packages in Python
Exceptions in Python
How to remove duplicate lines from a file in perl
No comments:
Post a Comment