Data cleaning and Manipulation using String Functions

1. Stripping Whitespace

Removing leading and trailing whitespace from a string.

data = "   Hello, World!   "
cleaned_data = data.strip()
print(f"'{cleaned_data}'")  # Output: 'Hello, World!'

2. Changing Case

Converting the case of a string to upper, lower, or title case.

data = "hello, world!"

upper_case = data.upper()
print(upper_case)  # Output: 'HELLO, WORLD!'

lower_case = data.lower()
print(lower_case)  # Output: 'hello, world!'

title_case = data.title()
print(title_case)  # Output: 'Hello, World!'

3. Replacing Substrings

Replacing occurrences of a substring with another substring.

data = "Hello, World!"

replaced_data = data.replace("World", "Python")
print(replaced_data)  # Output: 'Hello, Python!'

4. Splitting and Joining Strings

Splitting a string into a list of substrings and joining a list of strings into a single string.

data = "apple,banana,cherry"
split_data = data.split(",")
print(split_data)  # Output: ['apple', 'banana', 'cherry']

joined_data = "-".join(split_data)
print(joined_data)  # Output: 'apple-banana-cherry'

5. Checking String Content

Checking if a string starts with, ends with, or contains a substring.

data = "Hello, World!"

starts_with_hello = data.startswith("Hello")
print(starts_with_hello)  # Output: True

ends_with_world = data.endswith("World!")
print(ends_with_world)  # Output: True

contains_python = "Python" in data
print(contains_python)  # Output: False

6. Finding Substrings

Finding the position of a substring within a string.

data = "Hello, World!"

position = data.find("World")
print(position)  # Output: 7

# Finding all occurrences of a substring
indices = []
start = 0
while start < len(data):
    start = data.find("o", start)
    if start == -1:
        break
    indices.append(start)
    start += 1

print(indices)  # Output: [4, 8]

7. Removing Specific Characters

Removing specific characters from a string.

data = "Hello, World!"
cleaned_data = data.translate(str.maketrans('', '', '!,'))
print(cleaned_data)  # Output: 'Hello World'

8. String Formatting

Using formatted strings (f-strings) to insert variables into strings.

name = "Alice"
age = 30
formatted_string = f"My name is {name} and I am {age} years old."
print(formatted_string)  # Output: 'My name is Alice and I am 30 years old.'

9. Padding and Aligning Strings

Adding padding and aligning strings to a specific width.

data = "Hello"

left_padded = data.ljust(10)
print(f"'{left_padded}'")  # Output: 'Hello     '

right_padded = data.rjust(10)
print(f"'{right_padded}'")  # Output: '     Hello'

center_padded = data.center(10)
print(f"'{center_padded}'")  # Output: '  Hello   '

10. Checking for Alphanumeric Characters

Checking if a string contains only alphanumeric characters, digits, or alphabets.

data = "Hello123"

is_alphanumeric = data.isalnum()
print(is_alphanumeric)  # Output: True

is_digit = data.isdigit()
print(is_digit)  # Output: False

is_alpha = data.isalpha()
print(is_alpha)  # Output: False

11.String Concatenation in Python:

Python provides several ways to combine strings, variables, and numbers into a single string:

1. Using the + operator:

This is the most common and straightforward approach for basic concatenation.

name = "Alice"
age = 30
greeting = "Hello, " + name + "! You are " + str(age) + " years old."
print(greeting)  # Output: Hello, Alice! You are 30 years old.

Explanation:

  • The + operator is used to concatenate strings.
  • We convert the integer age to a string using str(age) before adding it to the string.

2. Using f-strings (Python 3.6+):

f-strings offer a cleaner and more readable way to embed variables within strings.

name = "Alice"
age = 30
greeting = f"Hello, {name}! You are {age} years old."
print(greeting)  # Output: Hello, Alice! You are 30 years old.

Explanation:

  • Curly braces {} are used to indicate places where variables should be inserted.
  • The variable names are directly referenced within the braces.

3. Using the .format() method:

While less common than f-strings, the .format() method provides more flexibility for complex formatting needs.

name = "Alice"
age = 30
greeting = "Hello, {}! You are {} years old.".format(name, age)
print(greeting)  # Output: Hello, Alice! You are 30 years old.

Explanation:

  • The .format() method is called on the base string.
  • Placeholders {} in the string are replaced with the provided arguments.

Use Case Example: Data Cleaning Pipeline

Here is an example of how you might use these functions together to clean a list of strings.

data_list = [
    "  Alice,30,Engineer  ",
    " Bob,25,Data Scientist  ",
    "Charlie, , Doctor",
    "  ,40,Lawyer",
    "David,35,   "
]

cleaned_data_list = []

for data in data_list:
    # Strip leading and trailing whitespace
    data = data.strip()
    
    # Split into components
    parts = data.split(',')
    
    # Clean each part
    cleaned_parts = [part.strip() for part in parts]
    
    # Replace empty strings with None
    cleaned_parts = [part if part else None for part in cleaned_parts]
    
    # Rejoin cleaned parts
    cleaned_data = ",".join([part if part else "" for part in cleaned_parts])
    
    cleaned_data_list.append(cleaned_data)

print("Cleaned Data List:")
for cleaned_data in cleaned_data_list:
    print(cleaned_data)

Head to Next


Discover more from HintsToday

Subscribe to get the latest posts sent to your email.

Pages ( 6 of 8 ): « Previous1 ... 45 6 78Next »

Discover more from HintsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading