A quick reference for date manipulation in PySpark:–

FunctionDescriptionWorks OnExample (Spark SQL)Example (DataFrame API)
to_dateConverts string to date.StringTO_DATE('2024-01-15', 'yyyy-MM-dd')to_date(col("date_str"), "yyyy-MM-dd")
to_timestampConverts string to timestamp.StringTO_TIMESTAMP('2024-01-15 12:34:56', 'yyyy-MM-dd HH:mm:ss')to_timestamp(col("timestamp_str"), "yyyy-MM-dd HH:mm:ss")
date_formatFormats date or timestamp as a string.Date, TimestampDATE_FORMAT(CURRENT_DATE, 'dd-MM-yyyy')date_format(col("date_col"), "dd-MM-yyyy")
current_dateReturns the current date.CURRENT_DATEcurrent_date()
current_timestampReturns the current timestamp.CURRENT_TIMESTAMPcurrent_timestamp()
date_addAdds days to a date.Date, TimestampDATE_ADD('2024-01-15', 10)date_add(col("date_col"), 10)
date_subSubtracts days from a date.Date, TimestampDATE_SUB('2024-01-15', 10)date_sub(col("date_col"), 10)
months_betweenReturns months between two dates or timestamps.Date, TimestampMONTHS_BETWEEN('2024-01-15', '2023-12-15')months_between(col("date1"), col("date2"))
datediffReturns difference in days between two dates/timestamps.Date, TimestampDATEDIFF('2024-01-15', '2024-01-10')datediff(col("date1"), col("date2"))
yearExtracts year from a date or timestamp.Date, TimestampYEAR('2024-01-15')year(col("date_col"))
monthExtracts month from a date or timestamp.Date, TimestampMONTH('2024-01-15')month(col("date_col"))
dayExtracts day from a date or timestamp.Date, TimestampDAY('2024-01-15')day(col("date_col"))
dayofweekReturns the day of the week (1 = Sunday, 7 = Saturday).Date, TimestampDAYOFWEEK('2024-01-15')dayofweek(col("date_col"))
dayofmonthReturns the day of the month (1-31).Date, TimestampDAYOFMONTH('2024-01-15')dayofmonth(col("date_col"))
dayofyearReturns the day of the year (1-366).Date, TimestampDAYOFYEAR('2024-01-15')dayofyear(col("date_col"))
weekofyearReturns the week number of the year.Date, TimestampWEEKOFYEAR('2024-01-15')weekofyear(col("date_col"))
quarterExtracts quarter of the year.Date, TimestampQUARTER('2024-01-15')quarter(col("date_col"))
hourExtracts hour from a timestamp.TimestampHOUR('2024-01-15 12:34:56')hour(col("timestamp_col"))
minuteExtracts minute from a timestamp.TimestampMINUTE('2024-01-15 12:34:56')minute(col("timestamp_col"))
secondExtracts second from a timestamp.TimestampSECOND('2024-01-15 12:34:56')second(col("timestamp_col"))
last_dayReturns the last day of the month for a date.Date, TimestampLAST_DAY('2024-01-15')last_day(col("date_col"))
next_dayReturns the next day of the week after a date.Date, TimestampNEXT_DAY('2024-01-15', 'Sunday')next_day(col("date_col"), "Sunday")
truncTruncates a date to the specified part (e.g., month).Date, TimestampTRUNC('2024-01-15', 'MM')trunc(col("date_col"), "MM")
add_monthsAdds months to a date.Date, TimestampADD_MONTHS('2024-01-15', 2)add_months(col("date_col"), 2)
unix_timestampConverts a string to UNIX timestamp.StringUNIX_TIMESTAMP('2024-01-15', 'yyyy-MM-dd')unix_timestamp(col("date_str"), "yyyy-MM-dd")
from_unixtimeConverts UNIX timestamp to a formatted string.Integer (UNIX timestamp)FROM_UNIXTIME(1673827200, 'yyyy-MM-dd')from_unixtime(col("unix_col"), "yyyy-MM-dd")

Now starts our Post:-


Here’s a comprehensive list of some common PySpark date functions along with detailed explanations and examples on Dataframes(We will again discuss thess basis Pyspark sql Queries):

1. current_date()

Returns the current date.

2. current_timestamp()

Returns the current timestamp.

3. date_format()

Formats a date using the specified format.

4. year(), month(), dayofmonth()

Extracts the year, month, and day from a date.

5. date_add(), date_sub()

Adds or subtracts a specified number of days to/from a date.

6. datediff()

Returns the difference in days between two dates.

7. add_months()

Adds a specified number of months to a date.

8. months_between()

Returns the number of months between two dates.

9. next_day()

Returns the first date which is the specified day of the week after the given date.

10. last_day()

Returns the last day of the month which the given date belongs to.

11. trunc()

Truncates a date to the specified unit (‘year’, ‘month’).

12. date_trunc()

Truncates a timestamp to the specified unit (‘year’, ‘month’, ‘day’, ‘hour’, ‘minute’).

13. from_unixtime()

Converts the number of seconds from the Unix epoch to a string representing the timestamp.

14. unix_timestamp()

Converts a string timestamp to the number of seconds from the Unix epoch.

15. to_date()

Converts a string to a date.

16. to_timestamp()

Converts a string to a timestamp.

17. weekofyear()

Returns the week of the year for a date.

18. quarter()

Returns the quarter of the year for a date.

19. dayofweek()

Returns the day of the week for a date.

Example Project: Combining Multiple Date Functions


Pages: 1 2 3 4 5

Pages ( 1 of 5 ): 1 23 5Next »

Discover more from AI HitsToday

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

About the HintsToday

AI HintsToday is One Stop Adda to learn All about AI, Data, ML, Stat Learning, SAS, SQL, Python, Pyspark. AHT is Future!

Explore the Posts

Latest Comments

Latest posts

Discover more from AI HitsToday

Subscribe now to keep reading and get access to the full archive.

Continue reading