Tutorials Archives - Page 8 of 16

Pyspark RDDs a Wonder -Transformations, actions and execution operations- please explain and list them

by lochan2014 | Jun 16, 2024 | Pyspark

RDD (Resilient Distributed Dataset) is the fundamental data structure in Apache Spark. It is an immutable, distributed collection of objects that can be processed in parallel across a cluster of machines. Purpose of RDD Distributed Data Handling: RDDs are designed to...

Are Dataframes in PySpark Lazy evaluated?

by lochan2014 | Jun 16, 2024 | Pyspark

Yes, DataFrames in PySpark are lazily evaluated, similar to RDDs. Lazy evaluation is a key feature of Spark’s processing model, which helps optimize the execution of transformations and actions on large datasets. What is Lazy Evaluation? Lazy evaluation means...

BDL Ecosystem-HDFS and Hive Tables

by lochan2014 | Jun 15, 2024 | Pyspark

Big Data Lake: Data Storage HDFS is a scalable storage solution designed to handle massive datasets across clusters of machines. Hive tables provide a structured approach for querying and analyzing data stored in HDFS. Understanding how these components work together...

Big Data, Data Warehouse, Data Lakes, Big Data Lake – Explain in simple words

by lochan2014 | Jun 15, 2024 | Pyspark

Big data and big data lakes are complementary concepts. Big data refers to the characteristics of the data itself, while a big data lake provides a storage solution for that data. Organizations often leverage big data lakes to store and manage their big data, enabling...

Pyspark -Introduction, Components, Compared With Hadoop, PySpark Architecture- (Driver- Executor)

by lochan2014 | Aug 29, 2024 | Pyspark

PySpark is a powerful Python API for Apache Spark, a distributed computing framework that enables large-scale data processing. Spark History Spark was initially started by Matei Zaharia at UC Berkeley’s AMPLab in 2009, and open sourced in 2010 under a BSD...

SAS Date Functions:- DATEPART( ), TIMEPART( ), Hour(), Minute(), Second() Part1

by lochan2014 | May 11, 2024 | SAS

In SAS, the DATEPART() and TIMEPART() functions are used to extract the date and time parts from datetime values, respectively. Here’s how each function works: 1. DATEPART(): The DATEPART() function extracts the date part from a datetime value and returns it as...

« Older Entries

Next Entries »

Pyspark RDDs a Wonder -Transformations, actions and execution operations- please explain and list them

Are Dataframes in PySpark Lazy evaluated?

BDL Ecosystem-HDFS and Hive Tables

Big Data, Data Warehouse, Data Lakes, Big Data Lake – Explain in simple words

Pyspark -Introduction, Components, Compared With Hadoop, PySpark Architecture- (Driver- Executor)

SAS Date Functions:- DATEPART( ), TIMEPART( ), Hour(), Minute(), Second() Part1

Recent Posts

Recent Comments

Explore Our Tutorials

Connect With Us

About HintsToday

Success!