Time series data is omnipresent in fields ranging from finance to engineering, often necessitating a change in the frequency of data points to suit analysis needs. Pandas, a powerful Python data manipulation library, provides a suite of functions ideal for this task. In this article, we’ll delve into resampling methods that condense or expand our dataset to different time frames and aggregate data accordingly. Let’s explore some commands that transform 5-minute interval data into different, more insightful forms.
Resampling to Daily OHLC:
The following command converts granular 5-minute interval data into a daily OHLC (Open, High, Low, Close) format:
df.resample('D').agg({'open':'first', 'high':'max', 'low':'min', 'close':'last'}).dropna()
Breaking Down the Command:
df.resample('D')
: Theresample
method is akin to a groupby operation but for date/time data. Here, ‘D’ stands for ‘day,’ meaning we’re grouping our 5-minute intervals into daily buckets..agg({})
: Short for ‘aggregate,’ theagg
function lets us specify different aggregation operations for each column. We’re passing a dictionary where keys are our column names, and values are the aggregation functions we want to apply.'open':'first'
: For the ‘open’ price of the day, we take the ‘first’ value in each daily bucket, representing the opening price.'high':'max'
: The ‘high’ is the maximum value found in the daily bucket, representing the highest price of the day.'low':'min'
: Conversely, the ‘low’ is the minimum value, depicting the lowest price.'close':'last'
: The ‘close’ price is the ‘last’ value in the bucket, which is the closing price..dropna()
: After resampling and aggregating, we may have days with no data (like weekends for some markets).dropna()
removes these rows, leaving us with complete data sets.
Resampling to Weekly Data:
To gain insights on a weekly basis, consider the following command:
df.resample('W-MON').agg({'close':'mean','volume':'mean'})
Dissecting the Weekly Resample:
df.resample('W-MON')
: This time, ‘W-MON’ resamples the data into weekly chunks starting on Monday.{'close':'mean','volume':'mean'}
: For each week, we calculate the ‘mean’ or average of the ‘close’ prices and the traded ‘volume’. This can provide a sense of the overall trend and activity for the week.
Custom Business Day Resampling:
Business days vary across regions and industries. Here’s how you can create a custom frequency:
cbd = CustomBusinessDay(weekmask='Mon Tue Wed Thu Fri')
df.resample(cbd).mean().head(7)
Custom Business Day Explained:
CustomBusinessDay(weekmask='Mon Tue Wed Thu Fri')
: By creating aCustomBusinessDay
object with a specificweekmask
, we’re defining our own business week. This is especially useful when standard business days don’t apply to the data’s context.df.resample(cbd).mean()
: We apply the resample method using our custom business day frequency and compute the mean for each column..head(7)
: Thehead
method simply shows us the first 7 rows of the resulting DataFrame for a quick glance.
Complete Code For Resampling:
import pandas as pd
from pandas.tseries.offsets import CustomBusinessDay
df = pd.read_csv("BTC_USDT-5m.csv",parse_dates=['date'])
df.set_index('date',inplace=True)
# Resample to daily OHLC from the 5-minute data
df.resample('D').agg({'open':'first','high':'max','low':'min','close':'last'}).dropna()
# Resample to weekly data and calculate the mean for 'Close' and 'Volume'
df.resample('W-MON').agg({'close':'mean','volume':'mean'})
cbd = CustomBusinessDay(weekmask='Mon Tue Wed Thu Fri')
df.resample(cbd).mean()
Each of these resampling techniques transforms the data to match the analytical frame needed. Whether you’re looking to identify daily trends, summarize weekly movements, or adhere to a custom business schedule, Pandas provides the tools necessary to reshape time series data for insightful analysis.