Team AHT, Author at HintsToday

PySpark SQL API Programming- How To, Approaches, Optimization

by Team AHT | Feb 9, 2025 | Pyspark | 1 comment

In PySpark, DataFrame transformations and operations can be efficiently handled using two main approaches: 1️⃣ PySpark SQL API Programming (Temp Tables / Views) Each transformation step can be written as a SQL query. Intermediate results can be stored as temporary...

How the Python interpreter reads and processes a Python script and Memory Management in Python

by Team AHT | Feb 8, 2025 | Python | 0 comments

I believe you read our Post https://www.hintstoday.com/i-did-python-coding-or-i-wrote-a-python-script-and-got-it-exected-so-what-it-means/. Before starting here kindly go through the Link. How the Python interpreter reads and processes a Python script The Python...

Lists and Tuples in Python – List and Tuple Comprehension, Usecases

by Team AHT | Feb 5, 2025 | Python | 0 comments

Python Lists: A Comprehensive Guide What is a List? Lists are a fundamental data structure in Python used to store collections of items. They are: Ordered: Elements maintain a defined sequence. Mutable: Elements can be modified after creation. Defined by: Square...

Python ALL Eyes on Strings- String Data Type & For Loop Combined

by Team AHT | Feb 5, 2025 | Python | 0 comments

Here’s a comprehensive Python string function cheat sheet in tabular format: FunctionSyntaxDescriptionExampleReturn Typecapitalizestr.capitalize()Capitalizes the first character of the string.”hello”.capitalize() →...

How to Solve a Coding Problem in Python? Step to Step Guide?

by Team AHT | Feb 1, 2025 | Python | 0 comments

Solving coding problems efficiently requires a structured approach. Here’s a step-by-step guide along with shortcuts and pseudocode tips. 📌 Step 1: Understand the Problem Clearly Read the problem statement carefully Identify: Input format (list, string, integer,...

Python Built-in Iterables: Complete Guide with Use Cases & Challenges

by Team AHT | Feb 1, 2025 | Python | 0 comments

What are Iterables? An iterable is any object that can return an iterator, meaning it can be looped over using for loops or passed to functions like map(), filter(), etc. 🔹 List of Built-in Iterables in Python Python provides several built-in iterable objects:...

Python Dictionary in detail- Wholesome Tutorial on Dictionaries

by Team AHT | Feb 1, 2025 | Python | 1 comment

What is Dictionary in Python? First of All it is not sequential like Lists. It is a non-sequential, unordered, redundant and mutable collection as key:value pairs. Keys are always unique but values need not be unique. You use the key to access the corresponding value....

Automation in Python and Pyspark- Collection of Handy Tricks and Snippets

by Team AHT | Jan 29, 2025 | How To | 0 comments

This Post is Collection of Handy Tricks and Snippets. Passing Parameters in Automation of Scripts using Python Python provides several ways to pass parameters in automation of scripts, mimicking SAS macro variables, macro modules, and macro scripting. Here are some...

Python Programming Language Specials

by Team AHT | Jan 11, 2025 | Python | 0 comments

Python is a popular high-level, interpreted programming language known for its readability and ease of use. Python was invented by Guido Van Rossum and it was first released in February, 1991. The name python is inspired from Monte Python Flying Circus,...

Useful Code Snippets in Python and Pyspark

by Team AHT | Jan 7, 2025 | Pyspark, Python | 0 comments

#1. create a sample dataframe # create a sample dataframe data = [ (“Sam”,”Sales”, 50000), (“Ram”,”Sales”, 60000), (“Dan”,”Sales”, 70000), (“Gam”,”Marketing”, 40000),...

What is indexing in SQL- Syntax, Types, Uses, Advantages, Disadvantages, and Scenarios

by Team AHT | Jan 4, 2025 | SQL | 0 comments

What is Indexing? Indexing is a data structure technique that allows the database to quickly locate and access specific data. It’s similar to the index at the back of a book, which helps you find specific pages quickly. How Indexing Works Index Creation: The...

Spark SQL- operators Cheatsheet- Explanation with Usecases

by Team AHT | Dec 28, 2024 | SQL | 0 comments

Spark SQL Operators Cheatsheet 1. Arithmetic Operators OperatorSyntaxDescriptionExample+a + bAdds two valuesSELECT 5 + 3;-a – bSubtracts one value from anotherSELECT 5 – 3;*a * bMultiplies two valuesSELECT 5 * 3;/a / bDivides one value by anotherSELECT 6 /...

How to Write Perfect Pseudocode- Syntax , Standards, Terms

by Team AHT | Dec 28, 2024 | How To | 2 comments

Syntax Rules for Pseudocode Natural Language: Use simple and clear natural language to describe steps. Keywords: Use standard control flow keywords such as: IF, ELSE, ENDIF FOR, WHILE, ENDWHILE FUNCTION, CALL INPUT, OUTPUT Indentation: Indent blocks within loops or...

Date and Time Functions- Pyspark Dataframes & Pyspark Sql Queries

by Team AHT | Dec 8, 2024 | Pyspark | 0 comments

A quick reference for date manipulation in PySpark:– FunctionDescriptionWorks OnExample (Spark SQL)Example (DataFrame API)to_dateConverts string to date.StringTO_DATE(‘2024-01-15’, ‘yyyy-MM-dd’)to_date(col(“date_str”),...

Window functions in PySpark on Dataframe programming

by Team AHT | Dec 5, 2024 | Pyspark | 0 comments

Window functions in PySpark allow you to perform operations on a subset of your data using a “window” that defines a range of rows. These functions are similar to SQL window functions and are useful for tasks like ranking, cumulative sums, and moving...

Spark SQL windows Function and Best Usecases

by Team AHT | Nov 25, 2024 | SQL | 1 comment

For Better understanding on Spark SQL windows Function and Best Usecases do refer our post Window functions in Oracle Pl/Sql and Hive explained and compared with examples. Window functions in Spark SQL are powerful tools that allow you to perform calculations across a...

PySpark architecture cheat sheet- How to Know Which parts of your PySpark ETL script are executed on the driver, master (YARN), or executors

by Team AHT | Nov 16, 2024 | Pyspark | 2 comments

PySpark Architecture Cheat Sheet 1. Core Components of PySpark ComponentDescriptionKey FeaturesSpark CoreThe foundational Spark component for scheduling, memory management, and fault tolerance.Task scheduling, data partitioning, RDD APIs.Spark SQLEnables interaction...

Scientists find a ‘Unique’ Black Hole that is hungrier than ever in the Universe

by Team AHT | Nov 7, 2024 | News | 0 comments

Yup! Scientists find a ‘Unique’ Black Hole that is hungier than ever in the Universe! Scientists have observed a fascinating phenomenon involving a supermassive black hole, AT2022dsb, which appears to be devouring a star in a “tidal disruption event”...

Quick Spark SQL reference- Spark SQL cheatsheet for Revising in One Go

by Team AHT | Nov 7, 2024 | SQL | 0 comments

Here’s an enhanced Spark SQL cheatsheet with additional details, covering join types, union types, and set operations like EXCEPT and INTERSECT, along with options for table management (DDL operations like UPDATE, INSERT, DELETE, etc.). This comprehensive sheet...

Functions in Spark SQL- Cheatsheets, Complex Examples

by Team AHT | Nov 7, 2024 | SQL | 0 comments

Here’s a categorized Spark SQL function reference, which organizes common Spark SQL functions by functionality. This can help with selecting the right function based on the operation you want to perform. 1. Aggregate Functions FunctionDescriptionExampleavg()Calculates...

CRUD in SQL – Create Database, Create Table, Insert, Select, Update, Alter table, Delete

by Team AHT | Nov 6, 2024 | SQL | 3 comments

CRUD stands for Create, Read, Update, and Delete. It’s a set of basic operations that are essential for managing data in a database or any persistent storage system. It refers to the four basic functions that any persistent storage application needs to perform....

Pyspark, Spark SQL and Python Pandas- Collection of Various Useful cheatsheets, cheatcodes for revising

by Team AHT | Nov 2, 2024 | Tutorials | 0 comments

Comparative overview of partitions, bucketing, segmentation, and broadcasting in PySpark, Spark SQL, and Hive QL in tabular form, along with examples Here’s a comparative overview of partitions, bucketing, segmentation, and broadcasting in PySpark, Spark SQL,...

Types of SQL /Spark SQL commands- DDL,DML,DCL,TCL,DQL

by Team AHT | Nov 1, 2024 | SQL | 0 comments

Data Definition Language (DDL) – to define and modify the structure of a database. Data Manipulation Language (DML) – to access, manipulate, and modify data in a database. Data Control Language (DCL) – to control user access to the data in the database...

Python Pandas Series Tutorial- Usecases, Cheatcode Sheet to revise

by Team AHT | Oct 27, 2024 | Python | 0 comments

The pandas Series is a one-dimensional array-like data structure that can store data of any type, including integers, floats, strings, or even Python objects. Each element in a Series is associated with a unique index label, making it easy to perform data retrieval...

Pandas operations, functions, and use cases ranging from basic operations like filtering, merging, and sorting, to more advanced topics like handling missing data, error handling

by Team AHT | Oct 24, 2024 | Python | 0 comments

This tutorial covers a wide range of pandas operations and advanced concepts with examples that are practical and useful in real-world scenarios. The key topics include: Creating DataFrames, Series from various sources. Checking and changing data types. Looping...

PySpark Projects:- Scenario Based Complex ETL projects Part3

by Team AHT | Oct 22, 2024 | Pyspark | 0 comments

I have divided a pyspark big script in many steps –by using steps1=”’ some codes”’ till steps7, i want to execute all these steps one after another and also if needed some steps can be not be executed. if any steps fails then then next...

PySpark Projects:- Scenario Based Complex ETL projects Part2

by Team AHT | Oct 22, 2024 | Pyspark | 0 comments

How to code in Pyspark a Complete ETL job using only Pyspark sql api not dataframe specific API? Here’s an example of a complete ETL (Extract, Transform, Load) job using PySpark SQL API: from pyspark.sql import SparkSession # Create SparkSession spark =...

PySpark Control Statements Vs Python Control Statements- Conditional, Loop, Exception Handling

by Team AHT | Oct 21, 2024 | Pyspark | 0 comments

PySpark supports various control statements to manage the flow of your Spark applications. PySpark supports using Python’s if-else-elif statements, but with limitations. Supported Usage Conditional statements within PySpark scripts. Controlling flow of Spark...

TroubleShoot Pyspark Issues- Error Handling in Pyspark, Debugging and custom Log table, status table generation in Pyspark

by Team AHT | Oct 20, 2024 | Pyspark | 0 comments

When working with PySpark, there are several common issues that developers face. These issues can arise from different aspects such as memory management, performance bottlenecks, data skewness, configurations, and resource contention. Here’s a guide on troubleshooting...

Pyspark Memory Management, Partition & Join Strategy – Scenario Based Questions

by Team AHT | Oct 11, 2024 | Pyspark | 0 comments

Q1.–We are working with large datasets in PySpark, such as joining a 30GB table with a 1TB table or Various Transformation on 30 GB Data, we have 100 cores limit to use per user , what can be best configuration and Optimization strategy to use in pyspark ? will...