by lochan2014 | Feb 9, 2025 | Pyspark
In PySpark, DataFrame transformations and operations can be efficiently handled using two main approaches: 1️⃣ PySpark SQL API Programming (Temp Tables / Views) Each transformation step can be written as a SQL query. Intermediate results can be stored as temporary... by lochan2014 | Feb 1, 2025 | Python
Solving coding problems efficiently requires a structured approach. Here’s a step-by-step guide along with shortcuts and pseudocode tips. 📌 Step 1: Understand the Problem Clearly Read the problem statement carefully Identify: Input format (list, string, integer,... by lochan2014 | Feb 1, 2025 | Python
What are Iterables? An iterable is any object that can return an iterator, meaning it can be looped over using for loops or passed to functions like map(), filter(), etc. 🔹 List of Built-in Iterables in Python Python provides several built-in iterable objects:... by lochan2014 | Jan 29, 2025 | How To
This Post is Collection of Handy Tricks and Snippets. Passing Parameters in Automation of Scripts using Python Python provides several ways to pass parameters in automation of scripts, mimicking SAS macro variables, macro modules, and macro scripting. Here are some... by lochan2014 | Jan 7, 2025 | Pyspark, Python
#1. create a sample dataframe # create a sample dataframe data = [ (“Sam”,”Sales”, 50000), (“Ram”,”Sales”, 60000), (“Dan”,”Sales”, 70000), (“Gam”,”Marketing”, 40000),... by lochan2014 | Jan 4, 2025 | SQL
What is Indexing? Indexing is a data structure technique that allows the database to quickly locate and access specific data. It’s similar to the index at the back of a book, which helps you find specific pages quickly. How Indexing Works Index Creation: The... by lochan2014 | Dec 28, 2024 | SQL
Spark SQL Operators Cheatsheet 1. Arithmetic Operators OperatorSyntaxDescriptionExample+a + bAdds two valuesSELECT 5 + 3;-a – bSubtracts one value from anotherSELECT 5 – 3;*a * bMultiplies two valuesSELECT 5 * 3;/a / bDivides one value by anotherSELECT 6 /... by lochan2014 | Dec 28, 2024 | How To
Syntax Rules for Pseudocode Natural Language: Use simple and clear natural language to describe steps. Keywords: Use standard control flow keywords such as: IF, ELSE, ENDIF FOR, WHILE, ENDWHILE FUNCTION, CALL INPUT, OUTPUT Indentation: Indent blocks within loops or... by lochan2014 | Dec 8, 2024 | Pyspark
A quick reference for date manipulation in PySpark:– FunctionDescriptionWorks OnExample (Spark SQL)Example (DataFrame API)to_dateConverts string to date.StringTO_DATE(‘2024-01-15’, ‘yyyy-MM-dd’)to_date(col(“date_str”),... by lochan2014 | Dec 5, 2024 | Pyspark
Window functions in PySpark allow you to perform operations on a subset of your data using a “window” that defines a range of rows. These functions are similar to SQL window functions and are useful for tasks like ranking, cumulative sums, and moving...