by Team AHT | Jun 16, 2024 | Pyspark |
RDD (Resilient Distributed Dataset) is the fundamental data structure in Apache Spark. It is an immutable, distributed collection of objects that can be processed in parallel across a cluster of machines. Purpose of RDD Distributed Data Handling: RDDs are designed to...
by Team AHT | Jun 16, 2024 | Pyspark |
Yes, DataFrames in PySpark are lazily evaluated, similar to RDDs. Lazy evaluation is a key feature of Spark’s processing model, which helps optimize the execution of transformations and actions on large datasets. What is Lazy Evaluation? Lazy evaluation means...
by Team AHT | Jun 15, 2024 | Pyspark |
Big Data Lake: Data Storage HDFS is a scalable storage solution designed to handle massive datasets across clusters of machines. Hive tables provide a structured approach for querying and analyzing data stored in HDFS. Understanding how these components work together...
by Team AHT | Jun 15, 2024 | Pyspark |
Big data and big data lakes are complementary concepts. Big data refers to the characteristics of the data itself, while a big data lake provides a storage solution for that data. Organizations often leverage big data lakes to store and manage their big data, enabling...
by Team AHT | Jun 6, 2024 | SQL |
Window functions, also known as analytic functions, perform calculations across a set of table rows that are somehow related to the current row. This is different from regular aggregate functions, which aggregate results for the entire set of rows. Both Oracle PL/SQL...
by Team AHT | Jun 6, 2024 | SQL |
Common Table Expressions (CTEs) are a useful feature in SQL for simplifying complex queries and improving readability. Both Oracle PL/SQL and Apache Hive support CTEs, although there may be slight differences in their syntax and usage. Common Table Expressions in...
by Team AHT | Jun 5, 2024 | SQL |
Function NameDescriptionExample UsageResultCONCATConcatenates two strings.SELECT CONCAT(‘Oracle’, ‘PL/SQL’) FROM dual;OraclePL/SQL“ (Concatenation)Concatenates two strings.LENGTHReturns the length of a string.SELECT...
by Team AHT | Jun 2, 2024 | SQL |
Date and Time manipulation in Oracle SQL In Oracle SQL, date and time manipulation is essential for many database operations, ranging from basic date arithmetic to complex formatting and extraction. Here’s a guide covering various common operations you might...
by Team AHT | May 14, 2024 | Python |
The input() function in Python is primarily used to take input from the user through the command line. While its most common use is to receive text input, it can be used creatively for various purposes. The input() function in Python The input() function in Python is...
by Team AHT | May 12, 2024 | Python |
Python Programming Strings Interview Questions Write a Python program to remove a Specific character from string? Here’s a Python program to remove a specific character from a string: def remove_char(text, char): “”” Removes a specific...
by Team AHT | May 11, 2024 | SAS |
In SAS, the DATEPART() and TIMEPART() functions are used to extract the date and time parts from datetime values, respectively. Here’s how each function works: 1. DATEPART(): The DATEPART() function extracts the date part from a datetime value and returns it as...
by Team AHT | May 8, 2024 | Python |
There is a simple way- You can use the calendar module in Python to create a calendar for the current year. But it defeats the purpose – of getting your hands dirty by writing big lengthy Python Code. But anyway i am adding it here:- You can use the calendar...
by Team AHT | Apr 30, 2024 | SAS |
here’s a table summarizing some common SAS List Date functions with their syntax and examples: Here’s a breakdown of some key categories with representative functions, syntax, and examples: 1. Retrieving...
by Team AHT | Apr 30, 2024 | SAS |
by Team AHT | Apr 29, 2024 | SAS |
In SAS, the FIRST. and LAST. automatic variables are used within a DATA step to identify the first and last occurrences of observations within a BY group. These variables are particularly useful when working with sorted data or when you need to perform specific...
by Team AHT | Apr 29, 2024 | SAS |
1. PROC PRINT: Syntax: PROC PRINT [DATA=dataset_name] [VAR variables]; Use: Prints the contents of a SAS dataset in a tabular format. You can specify a subset of variables to print using the VAR option. 2. PROC SORT: Syntax: PROC SORT DATA=dataset_name...
by Team AHT | Apr 29, 2024 | SAS |
The Program Data Vector (PDV) is a critical concept in SAS programming, particularly in the context of the DATA step. It represents the current state of data processing during the execution of a DATA step. Let’s delve into how the SAS PDV works in detail: 1....
by Team AHT | Apr 29, 2024 | SAS |
SAS: Reading and Writing Data – Important Points and Interview Q&A Important Points: Reading Data: SAS offers various tools to read data from different sources: SAS datasets (.sas7bdat): Use the SET statement to read existing SAS datasets. CSV files: Use...
by Team AHT | Apr 28, 2024 | Python |
Welcome to the ultimate guide for mastering data analysis with Python Pandas! Whether you’re new to Pandas or looking to level up your skills, this interactive tutorial will cover everything you need to know to become proficient in data manipulation and analysis...
by Team AHT | Apr 27, 2024 | Python |
What is List? Lists are a fundamental data structure in Python used to store collections of items. They are ordered, meaning elements have a defined sequence, and mutable, allowing you to modify their contents after creation. They are denoted by square brackets [ ],...
by Team AHT | Apr 15, 2024 | SQL |
What is database structure? A database structure is the blueprint that defines how data is arranged ,organized, stored, accessed, and managed within a database. It’s the underlying framework that ensures efficient data handling, minimizes redundancy, and...
by Team AHT | Apr 15, 2024 | SQL |
SQL (Structured Query Language) supports various data types to represent different kinds of data. These data types define the format and constraints of the data stored in each column of a table. Here are some common SQL data types: Numeric Types: INT: Integer type,...
by Team AHT | Apr 15, 2024 | SQL |
In this BlogPost we would like to define Most Basic Terms in SQL:- What is SQL, Data Database, DBMS , RDBMS. What is SQL? SQL is a language used for relational databases to query or get data out of a database. SQL is also referred to as SQL and is short for its...
by Team AHT | Apr 13, 2024 | Python |
Python control flow statements are constructs used to control the flow of execution within a Python program. Python control flow statements are powerful tools that dictate how your program executes. They allow your code to make decisions, repeat tasks conditionally,...
by Team AHT | Apr 13, 2024 | Python |
Functions in Python- Definition Functions in Python are blocks of code that perform a specific task, and they can be defined using the def keyword. Function template def function_name(input_varibales, …): processing return output_value_or_exppression...
by Team AHT | Apr 11, 2024 | Python |
In Python, data types define the type of data that can be stored in variables. Here are the main data types in Python: 1. Numeric Types: int: Integer values (e.g., 5, -3, 1000) float: Floating-point values (e.g., 3.14, -0.001, 2.0) 2. Sequence Types: str: Strings,...
by Team AHT | Apr 11, 2024 | Python |
Python syntax refers to the rules and conventions that dictate how Python code is written and structured. Here are some fundamental aspects of Python syntax: Statements and Indentation: Python uses indentation to define blocks of code, such as loops, conditionals, and...
by Team AHT | Apr 8, 2024 | SQL |
Indexing in SQL is a technique used to improve the performance of queries by creating special data structures (indexes) that allow for faster data retrieval. Indexes are created on one or more columns of a table, and they store the values of those columns in a sorted...
by Team AHT | Apr 7, 2024 | SQL |
LIKE Operator: The LIKE operator is used to search for a specified pattern in a column. It allows the use of wildcards: % (percent sign): Matches zero or more characters. _ (underscore): Matches any single character. Examples: SELECT * FROM employees WHERE last_name...
by Team AHT | Apr 7, 2024 | SQL |
Normalization and denormalization are two opposing database design techniques aimed at achieving different goals. Let’s explore each concept: Normalization: Normalization is the process of organizing the data in a database to minimize redundancy and dependency....