Data cleaning and Manipulation using String Functions
1. Stripping Whitespace
Removing leading and trailing whitespace from a string.
data = " Hello, World! "
cleaned_data = data.strip()
print(f"'{cleaned_data}'") # Output: 'Hello, World!'
2. Changing Case
Converting the case of a string to upper, lower, or title case.
data = "hello, world!"
upper_case = data.upper()
print(upper_case) # Output: 'HELLO, WORLD!'
lower_case = data.lower()
print(lower_case) # Output: 'hello, world!'
title_case = data.title()
print(title_case) # Output: 'Hello, World!'
3. Replacing Substrings
Replacing occurrences of a substring with another substring.
data = "Hello, World!"
replaced_data = data.replace("World", "Python")
print(replaced_data) # Output: 'Hello, Python!'
4. Splitting and Joining Strings
Splitting a string into a list of substrings and joining a list of strings into a single string.
data = "apple,banana,cherry"
split_data = data.split(",")
print(split_data) # Output: ['apple', 'banana', 'cherry']
joined_data = "-".join(split_data)
print(joined_data) # Output: 'apple-banana-cherry'
5. Checking String Content
Checking if a string starts with, ends with, or contains a substring.
data = "Hello, World!"
starts_with_hello = data.startswith("Hello")
print(starts_with_hello) # Output: True
ends_with_world = data.endswith("World!")
print(ends_with_world) # Output: True
contains_python = "Python" in data
print(contains_python) # Output: False
6. Finding Substrings
Finding the position of a substring within a string.
data = "Hello, World!"
position = data.find("World")
print(position) # Output: 7
# Finding all occurrences of a substring
indices = []
start = 0
while start < len(data):
start = data.find("o", start)
if start == -1:
break
indices.append(start)
start += 1
print(indices) # Output: [4, 8]
7. Removing Specific Characters
Removing specific characters from a string.
data = "Hello, World!"
cleaned_data = data.translate(str.maketrans('', '', '!,'))
print(cleaned_data) # Output: 'Hello World'
8. String Formatting
Using formatted strings (f-strings) to insert variables into strings.
name = "Alice"
age = 30
formatted_string = f"My name is {name} and I am {age} years old."
print(formatted_string) # Output: 'My name is Alice and I am 30 years old.'
9. Padding and Aligning Strings
Adding padding and aligning strings to a specific width.
data = "Hello"
left_padded = data.ljust(10)
print(f"'{left_padded}'") # Output: 'Hello '
right_padded = data.rjust(10)
print(f"'{right_padded}'") # Output: ' Hello'
center_padded = data.center(10)
print(f"'{center_padded}'") # Output: ' Hello '
10. Checking for Alphanumeric Characters
Checking if a string contains only alphanumeric characters, digits, or alphabets.
data = "Hello123"
is_alphanumeric = data.isalnum()
print(is_alphanumeric) # Output: True
is_digit = data.isdigit()
print(is_digit) # Output: False
is_alpha = data.isalpha()
print(is_alpha) # Output: False
11.String Concatenation in Python:
Python provides several ways to combine strings, variables, and numbers into a single string:
1. Using the +
operator:
This is the most common and straightforward approach for basic concatenation.
name = "Alice"
age = 30
greeting = "Hello, " + name + "! You are " + str(age) + " years old."
print(greeting) # Output: Hello, Alice! You are 30 years old.
Explanation:
- The
+
operator is used to concatenate strings. - We convert the integer
age
to a string usingstr(age)
before adding it to the string.
2. Using f-strings (Python 3.6+):
f-strings offer a cleaner and more readable way to embed variables within strings.
name = "Alice"
age = 30
greeting = f"Hello, {name}! You are {age} years old."
print(greeting) # Output: Hello, Alice! You are 30 years old.
Explanation:
- Curly braces
{}
are used to indicate places where variables should be inserted. - The variable names are directly referenced within the braces.
3. Using the .format()
method:
While less common than f-strings, the .format()
method provides more flexibility for complex formatting needs.
name = "Alice"
age = 30
greeting = "Hello, {}! You are {} years old.".format(name, age)
print(greeting) # Output: Hello, Alice! You are 30 years old.
Explanation:
- The
.format()
method is called on the base string. - Placeholders
{}
in the string are replaced with the provided arguments.
Use Case Example: Data Cleaning Pipeline
Here is an example of how you might use these functions together to clean a list of strings.
data_list = [
" Alice,30,Engineer ",
" Bob,25,Data Scientist ",
"Charlie, , Doctor",
" ,40,Lawyer",
"David,35, "
]
cleaned_data_list = []
for data in data_list:
# Strip leading and trailing whitespace
data = data.strip()
# Split into components
parts = data.split(',')
# Clean each part
cleaned_parts = [part.strip() for part in parts]
# Replace empty strings with None
cleaned_parts = [part if part else None for part in cleaned_parts]
# Rejoin cleaned parts
cleaned_data = ",".join([part if part else "" for part in cleaned_parts])
cleaned_data_list.append(cleaned_data)
print("Cleaned Data List:")
for cleaned_data in cleaned_data_list:
print(cleaned_data)
Head to Next
Discover more from HintsToday
Subscribe to get the latest posts sent to your email.