Python | Duplicate substring removal from list
Last Updated :
05 Apr, 2023
Sometimes we can come to the problem in which we need to deal with certain strings in a list that are separated by some separator and we need to remove the duplicates in each of these kinds of strings. Simple shorthands to solve this kind of problem is always good to have. Let's discuss certain ways in which this can be done.
Method #1: Using split() and for loops
Python3
# Python3 code to demonstrate
# removing duplicate substrings
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# removing duplicate substrings
res = []
for i in test_list:
x=i.split("-")
a=[]
for j in x:
if j not in a:
a.append(j)
res.append(a)
# print result
print("The list after duplicate removal : " + str(res))
OutputThe original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
Time Complexity: O(n*n), where n is the length of the input list. This is because we’re using the split() and for loops which has a time complexity of O(n*n) in the worst case.
Auxiliary Space: O(n), as we’re using additional space res other than the input list itself with the same size of input list
Method #2: Using set() + split() This particular problem can be solved using the split function to have target string and then set that actually would remove the duplicacy from the string.
Python3
# Python3 code to demonstrate
# removing duplicate substrings
# using set() + split()
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# using set() + split()
# removing duplicate substrings
res = [set(sub.split('-')) for sub in test_list]
# print result
print("The list after duplicate removal : " + str(res))
Output : The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [{'aa', 'bb'}, {'cc', 'bb'}, {'gg', 'ff'}, {'hh'}]
Method #3: Using {} + split() + list comprehension
For the cases in which we require to fully segregate the strings as a separate component, we can use these set of methods to achieve this task. The curly braces convert to set and rest all the functionality is similar to method above.
Python3
# Python3 code to demonstrate
# removing duplicate substrings
# using {} + split() + list comprehension
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# using {} + split() + list comprehension
# removing duplicate substrings
res = list({i for sub in test_list for i in sub.split('-')})
# print result
print("The list after duplicate removal : " + str(res))
Output : The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : ['cc', 'ff', 'aa', 'hh', 'gg', 'bb']
Method #4:Using Counter() function
Python3
# Python3 code to demonstrate
# removing duplicate substrings
from collections import Counter
# initializing list
test_list = ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# removing duplicate substrings
res = []
for i in test_list:
x = i.split("-")
freq = Counter(x)
tempresult = []
for j in x:
if freq[j] > 0:
tempresult.append(j)
freq[j] = 0
res.append(tempresult)
# print result
print("The list after duplicate removal : " + str(res))
OutputThe original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
Method#5: Using Recursive method.
Python3
# Recursive function to remove duplicate substrings
def remove_duplicates(substrings):
if not substrings:
return []
result = []
for substring in substrings:
if substring not in result:
result.append(substring)
return result
# Initialize the list of strings
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# Split each string into substrings and remove duplicates
result = [remove_duplicates(string.split("-")) for string in test_list]
# print result
print("The list after duplicate removal : " + str(result))
#this code contributed by tvsk
OutputThe original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
Time Complexity: O(n)
Space Complexity: O(n)
Method#6: Using list comprehension and set():
Python3
# Python3 code to demonstrate
# removing duplicate substrings
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# removing duplicate substrings
res = [list(set(i.split("-"))) for i in test_list]
# print result
print("The list after duplicate removal : " + str(res))
#This code is contributed by Jyothi Pinjala
OutputThe original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['cc', 'bb'], ['gg', 'ff'], ['hh']]
Time Complexity: O(n)
Space Complexity: O(n)
Method#7:Using dict.fromkeys()
The given code removes duplicate substrings in each string of a list by splitting each string by the "-" character and using a dictionary to remove duplicates.
Here's a step-by-step explanation of the algorithm:
- Initialize a list of strings test_list.
- Initialize an empty list res to store the modified strings.
- Loop through each string s in test_list using a for loop.
- Split the string s by the "-" character using the split() function, and create a list of the resulting substrings.
- Convert the list to a dictionary using the dict() function, which automatically removes duplicates because dictionaries cannot have duplicate keys.
- Convert the dictionary back to a list using the list() function to get the unique substrings.
- Append the list of unique substrings to the res list.
- After the loop, return res.
Python3
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# removing duplicate substrings
res = []
for s in test_list:
res.append(list(dict.fromkeys(s.split("-"))))
# print result
print("The list after duplicate removal : " + str(res))
#This code is contributed by Vinay pinjala.
OutputThe original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
The time complexity of this algorithm is O(n*m), where n is the number of strings in the list and m is the maximum length of each string. This is because we iterate through each string and split it into substrings, which takes O(m) time for each string.
The auxiliary space of this algorithm is also O(n*m), since we create a new list of modified strings that has the same length and size as the original list, and we use a dictionary to store the unique substrings. However, the actual space usage may be smaller than nm, depending on how many duplicates are removed from each string.
Method#8:Using reduce():
Algorithm:
- Import the reduce function from functools module.
- Create a list test_list and initialize it with some string values.
- Print the original list.
- Use the reduce function to remove duplicate substrings. The reduce function takes three arguments: a lambda function, the list to iterate over, and an optional initial value.
- The lambda function is used to merge the lists by concatenating them with the + operator. The lambda function takes two arguments: the accumulator x and the current element y.
- Use the split function to split each string in test_list into a list of substrings based on the delimiter "-".
Convert the list of substrings into a set to remove duplicates. - Convert the setback to a list.
- Append the list to the accumulator.
- Print the final result.
Python3
from functools import reduce
# initializing list
test_list = ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# removing duplicate substrings using reduce() and set()
res = reduce(lambda x, y: x + [list(set(y.split('-')))], test_list, [])
# print result
print("The list after duplicate removal : " + str(res))
# This code is contributed by Rayudu.
OutputThe original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['ff', 'gg'], ['hh']]
Time Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list.
Space Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list. This is because the function creates a new list for each substring in the input list, which could potentially be as long as the input strings themselves
Similar Reads
Remove Duplicate Strings from a List in Python
Removing duplicates helps in reducing redundancy and improving data consistency. In this article, we will explore various ways to do this. set() method converts the list into a set, which automatically removes duplicates because sets do not allow duplicate values.Pythona = ["Learn", "Python", "With"
3 min read
Python - Remove duplicate words from Strings in List
Sometimes, while working with Python list we can have a problem in which we need to perform removal of duplicated words from string list. This can have application when we are in data domain. Let's discuss certain ways in which this task can be performed. Method #1 : Using set() + split() + loop The
6 min read
Python - Remove K length Duplicates from String
To remove consecutive K-length duplicates from a string iterate through the string comparing each substring with the next and excluding duplicates. For example we are given a string s = "abcabcabcabc" we need to remove k length duplicate from the string so that the output should become "aaaabc" . We
3 min read
Python - Remove Duplicates from a List
Removing duplicates from a list is a common operation in Python which is useful in scenarios where unique elements are required. Python provides multiple methods to achieve this. Using set() method is most efficient for unordered lists. Converting the list to a set removes all duplicates since sets
2 min read
Python | Remove duplicate tuples from list of tuples
Given a list of tuples, Write a Python program to remove all the duplicated tuples from the given list. Examples: Input : [(1, 2), (5, 7), (3, 6), (1, 2)] Output : [(1, 2), (5, 7), (3, 6)] Input : [('a', 'z'), ('a', 'x'), ('z', 'x'), ('a', 'x'), ('z', 'x')] Output : [('a', 'z'), ('a', 'x'), ('z', 'x
5 min read
Python - Remove substring list from String
Our task is to remove multiple substrings from a string in Python using various methods like string replace in a loop, regular expressions, list comprehensions, functools.reduce, and custom loops. For example, given the string "Hello world!" and substrings ["Hello", "ld"], we want to get " wor!" by
3 min read
Python | Remove Kth character from strings list
Sometimes, while working with data, we can have a problem in which we need to remove a particular column, i.e the Kth character from string list. String are immutable, hence removal just means re creating a string without the Kth character. Let's discuss certain ways in which this task can be perfor
7 min read
Python | Substring removal in String list
While working with strings, one of the most used application is removing the part of string with another. Since string in itself is immutable, the knowledge of this utility in itself is quite useful. Here the removing of a substring in list of string is performed. Letâs discuss certain ways in which
5 min read
Python | Remove duplicates from nested list
The task of removing duplicates many times in the recent past, but sometimes when we deal with the complex data structure, in those cases we need different techniques to handle this type of problem. Let's discuss certain ways in which this task can be achieved. Method #1 : Using sorted() + set()Â Th
5 min read
Python | Remove given character from Strings list
Sometimes, while working with Python list, we can have a problem in which we need to remove a particular character from each string from list. This kind of application can come in many domains. Let's discuss certain ways to solve this problem. Method #1 : Using replace() + enumerate() + loop This is
8 min read