Regular expressions (regex) are a powerful tool for matching patterns in text. Python’s re module provides functions and tools for working with regular expressions. Here’s a complete tutorial on using regex in Python. 1. Importing the re Module To use…
Tutorials
PySpark Projects:- Scenario Based Complex ETL projects Part1
1.Exploratory Data Analysis (EDA) with Pandas in Banking – Converted in Pyspark While searching for A free Pandas Project on Google Found this link –Exploratory Data Analysis (EDA) with Pandas in Banking . I have tried to convert this Pyscript in Pyspark…
String Manipulation on PySpark DataFrames
String manipulation is a common task in data processing. PySpark provides a variety of built-in functions for manipulating string columns in DataFrames. Below, we explore some of the most useful string manipulation functions and demonstrate how to use them with…
Pyspark Dataframe programming – operations, functions, all statements, syntax with Examples
Creating DataFrames in PySpark Creating DataFrames in PySpark is essential for processing large-scale data efficiently. PySpark allows DataFrames to be created from various sources, ranging from manual data entry to structured storage systems. Below are different ways…
Python Project Alert:- Dynamic list of variables Creation
Let us go through the Project requirement:- 1.Let us create One or Multiple dynamic lists of variables and save it in dictionary or Array or other datastructre for further repeating use in python. Variable names are in form of dynamic names for example Month_202401 to…
Python Code Execution- Behind the Door- What happens?
I wrote a Python code or I created a Python script, and it executed successfully So what does it Mean? This will be the most basic question a Early Python Learner can ask ! So Consider this scenario- where i executed a script in python which saves a many csv in Local…
Spark SQL Join Types- Syntax examples, Comparision
Spark SQL supports several types of joins, each suited to different use cases. Below is a detailed explanation of each join type, including syntax examples and comparisons. Types of Joins in Spark SQL Inner Join Left (Outer) Join Right (Outer) Join Full (Outer) Join…
Temporary Functions in PL/Sql Vs Spark Sql
Temporary functions allow users to define functions that are session-specific and used to encapsulate reusable logic within a database session. While both PL/SQL and Spark SQL support the concept of user-defined functions, their implementation and usage differ…
How PySpark automatically optimizes the job execution by breaking it down into stages and tasks based on data dependencies. can explain with an example
Apache Spark, including PySpark, automatically optimizes job execution by breaking it down into stages and tasks based on data dependencies. This process is facilitated by Spark’s Directed Acyclic Graph (DAG) Scheduler, which helps in optimizing the execution…
Understanding Pyspark execution with the help of Logs in Detail
explain a typical Pyspark execution Logs A typical PySpark execution log provides detailed information about the various stages and tasks of a Spark job. These logs are essential for debugging and optimizing Spark applications. Here’s a step-by-step explanation of…
Pyspark RDDs a Wonder -Transformations, actions and execution operations- please explain and list them
RDD (Resilient Distributed Dataset) is the fundamental data structure in Apache Spark. It is an immutable, distributed collection of objects that can be processed in parallel across a cluster of machines. Purpose of RDD Distributed Data Handling: RDDs are designed to…
Are Dataframes in PySpark Lazy evaluated?
Yes, DataFrames in PySpark are lazily evaluated, similar to RDDs. Lazy evaluation is a key feature of Spark’s processing model, which helps optimize the execution of transformations and actions on large datasets. What is Lazy Evaluation? Lazy evaluation means…
BDL Ecosystem-HDFS and Hive Tables
Big Data Lake: Data Storage HDFS is a scalable storage solution designed to handle massive datasets across clusters of machines. Hive tables provide a structured approach for querying and analyzing data stored in HDFS. Understanding how these components work together…
Big Data, Data Warehouse, Data Lakes, Big Data Lake – Explain in simple words
Big data and big data lakes are complementary concepts. Big data refers to the characteristics of the data itself, while a big data lake provides a storage solution for that data. Organizations often leverage big data lakes to store and manage their big data, enabling…
Window functions in Oracle Pl/Sql and Hive explained and compared with examples
Window functions, also known as analytic functions, perform calculations across a set of table rows that are somehow related to the current row. This is different from regular aggregate functions, which aggregate results for the entire set of rows. Both Oracle PL/SQL…
Common Table Expressions (CTEs) in Oracle Pl/Sql / Hive / Spark SQL explained and Compared
Common Table Expressions (CTEs) are a useful feature in SQL for simplifying complex queries and improving readability. Both Oracle PL/SQL and Apache Hive support CTEs, although there may be slight differences in their syntax and usage. Common Table Expressions in…
String/Character Manipulation functions in Oracle PL/SQL, Apache Hive
Function NameDescriptionExample UsageResultCONCATConcatenates two strings.SELECT CONCAT(‘Oracle’, ‘PL/SQL’) FROM dual;OraclePL/SQL“ (Concatenation)Concatenates two strings.LENGTHReturns the length of a string.SELECT…
Date and Time manipulation in Oracle SQL, Apache Hive QL, Mysql
Date and Time manipulation in Oracle SQL In Oracle SQL, date and time manipulation is essential for many database operations, ranging from basic date arithmetic to complex formatting and extraction. Here’s a guide covering various common operations you might…
Python input function in Detail- interesting usecases
The input() function in Python is primarily used to take input from the user through the command line. While its most common use is to receive text input, it can be used creatively for various purposes. The input() function in Python The input() function in Python is…
Python Strings Interview Questions
Python Programming Strings Interview Questions Write a Python program to remove a Specific character from string? Here’s a Python program to remove a specific character from a string: def remove_char(text, char): “”” Removes a specific character from a string….
SAS Date Functions:- DATEPART( ), TIMEPART( ), Hour(), Minute(), Second() Part1
In SAS, the DATEPART() and TIMEPART() functions are used to extract the date and time parts from datetime values, respectively. Here’s how each function works: 1. DATEPART(): The DATEPART() function extracts the date part from a datetime value and returns it as…
Python Programming Projects- Write a python program to create calendar for current year?
There is a simple way- You can use the calendar module in Python to create a calendar for the current year. But it defeats the purpose – of getting your hands dirty by writing big lengthy Python Code. But anyway i am adding it here:- You can use the calendar…
SAS Character Functions, Date Functions
here’s a table summarizing some common SAS List Date functions with their syntax and examples: Here’s a breakdown of some key categories with representative functions, syntax, and examples: 1. Retrieving…
SAS Drop, Keep, Stop, Retain, point, Rename statement
SAS First., Last. Syntax and uses with examples
In SAS, the FIRST. and LAST. automatic variables are used within a DATA step to identify the first and last occurrences of observations within a BY group. These variables are particularly useful when working with sorted data or when you need to perform specific…