Pandas measure elapsed time since a condition
up vote
9
down vote
favorite
I have the following dataframe:
Time Work
2018-12-01 10:00:00 Off
2018-12-01 10:00:02 On
2018-12-01 10:00:05 On
2018-12-01 10:00:06 On
2018-12-01 10:00:07 On
2018-12-01 10:00:09 Off
2018-12-01 10:00:11 Off
2018-12-01 10:00:14 On
2018-12-01 10:00:16 On
2018-12-01 10:00:18 On
2018-12-01 10:00:20 Off
I would like to creat a new column with the elapsed time since the device started working.
Time Work Elapsed Time
2018-12-01 10:00:00 Off 0
2018-12-01 10:00:02 On 2
2018-12-01 10:00:05 On 5
2018-12-01 10:00:06 On 6
2018-12-01 10:00:07 On 7
2018-12-01 10:00:09 Off 0
2018-12-01 10:00:11 Off 0
2018-12-01 10:00:14 On 3
2018-12-01 10:00:16 On 5
2018-12-01 10:00:18 On 7
2018-12-01 10:00:20 Off 0
How can I do it?
python pandas time timedelta
New contributor
add a comment |
up vote
9
down vote
favorite
I have the following dataframe:
Time Work
2018-12-01 10:00:00 Off
2018-12-01 10:00:02 On
2018-12-01 10:00:05 On
2018-12-01 10:00:06 On
2018-12-01 10:00:07 On
2018-12-01 10:00:09 Off
2018-12-01 10:00:11 Off
2018-12-01 10:00:14 On
2018-12-01 10:00:16 On
2018-12-01 10:00:18 On
2018-12-01 10:00:20 Off
I would like to creat a new column with the elapsed time since the device started working.
Time Work Elapsed Time
2018-12-01 10:00:00 Off 0
2018-12-01 10:00:02 On 2
2018-12-01 10:00:05 On 5
2018-12-01 10:00:06 On 6
2018-12-01 10:00:07 On 7
2018-12-01 10:00:09 Off 0
2018-12-01 10:00:11 Off 0
2018-12-01 10:00:14 On 3
2018-12-01 10:00:16 On 5
2018-12-01 10:00:18 On 7
2018-12-01 10:00:20 Off 0
How can I do it?
python pandas time timedelta
New contributor
4
Welcome to Stack Overflow, Rafael! I definitely came here just because the title seemed amusing, but left learning what Pandas actually means in this context.
– zarose
yesterday
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
I have the following dataframe:
Time Work
2018-12-01 10:00:00 Off
2018-12-01 10:00:02 On
2018-12-01 10:00:05 On
2018-12-01 10:00:06 On
2018-12-01 10:00:07 On
2018-12-01 10:00:09 Off
2018-12-01 10:00:11 Off
2018-12-01 10:00:14 On
2018-12-01 10:00:16 On
2018-12-01 10:00:18 On
2018-12-01 10:00:20 Off
I would like to creat a new column with the elapsed time since the device started working.
Time Work Elapsed Time
2018-12-01 10:00:00 Off 0
2018-12-01 10:00:02 On 2
2018-12-01 10:00:05 On 5
2018-12-01 10:00:06 On 6
2018-12-01 10:00:07 On 7
2018-12-01 10:00:09 Off 0
2018-12-01 10:00:11 Off 0
2018-12-01 10:00:14 On 3
2018-12-01 10:00:16 On 5
2018-12-01 10:00:18 On 7
2018-12-01 10:00:20 Off 0
How can I do it?
python pandas time timedelta
New contributor
I have the following dataframe:
Time Work
2018-12-01 10:00:00 Off
2018-12-01 10:00:02 On
2018-12-01 10:00:05 On
2018-12-01 10:00:06 On
2018-12-01 10:00:07 On
2018-12-01 10:00:09 Off
2018-12-01 10:00:11 Off
2018-12-01 10:00:14 On
2018-12-01 10:00:16 On
2018-12-01 10:00:18 On
2018-12-01 10:00:20 Off
I would like to creat a new column with the elapsed time since the device started working.
Time Work Elapsed Time
2018-12-01 10:00:00 Off 0
2018-12-01 10:00:02 On 2
2018-12-01 10:00:05 On 5
2018-12-01 10:00:06 On 6
2018-12-01 10:00:07 On 7
2018-12-01 10:00:09 Off 0
2018-12-01 10:00:11 Off 0
2018-12-01 10:00:14 On 3
2018-12-01 10:00:16 On 5
2018-12-01 10:00:18 On 7
2018-12-01 10:00:20 Off 0
How can I do it?
python pandas time timedelta
python pandas time timedelta
New contributor
New contributor
New contributor
asked yesterday
Rafael
462
462
New contributor
New contributor
4
Welcome to Stack Overflow, Rafael! I definitely came here just because the title seemed amusing, but left learning what Pandas actually means in this context.
– zarose
yesterday
add a comment |
4
Welcome to Stack Overflow, Rafael! I definitely came here just because the title seemed amusing, but left learning what Pandas actually means in this context.
– zarose
yesterday
4
4
Welcome to Stack Overflow, Rafael! I definitely came here just because the title seemed amusing, but left learning what Pandas actually means in this context.
– zarose
yesterday
Welcome to Stack Overflow, Rafael! I definitely came here just because the title seemed amusing, but left learning what Pandas actually means in this context.
– zarose
yesterday
add a comment |
5 Answers
5
active
oldest
votes
up vote
9
down vote
You can use groupby
:
# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
The idea is to extract the seconds portion and subtract the elapsed time from the first moment the state changes from "Off" to "On". This is done using transform
and first
.
cumsum
is used to identify groups:
df.Work.eq('Off').cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 3
7 3
8 3
9 3
10 4
Name: Work, dtype: int64
If there's a possibility your device can span multiple minutes while in the "On", then, initialise sec
as:
sec = df['Time'].values.astype(np.int64) // 10e8
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
@Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame?
– coldspeed
yesterday
@Rafael Okay, and regarding your second point, doesdf['Time'].values.astype(np.int64) // 10e8
work?
– coldspeed
yesterday
The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32];
– Rafael
yesterday
@Rafael can you read my comments just about yours again please?
– coldspeed
yesterday
I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work.
– Rafael
yesterday
add a comment |
up vote
8
down vote
IIUC first
with transform
(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]:
0 0
1 2
2 5
3 6
4 7
5 0
6 0
7 3
8 5
9 7
10 0
Name: Time, dtype: int64
add a comment |
up vote
7
down vote
You could use two groupbys
. The first calculates the time difference within each group. The second then sums those within each group.
s = (df.Work=='Off').cumsum()
df['Elapsed Time'] = df.groupby(s).Time.diff().dt.total_seconds().fillna(0).groupby(s).cumsum()
Output
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero.
– Rafael
yesterday
@Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact withdf.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0
. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too.
– ALollz
yesterday
add a comment |
up vote
4
down vote
Using a groupby, you can do this:
df['Elapsed Time'] = (df.groupby(df.Work.eq('Off').cumsum()).Time
.transform(lambda x: x.diff()
.dt.total_seconds()
.cumsum())
.fillna(0))
>>> df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
add a comment |
up vote
3
down vote
A numpy slicy approach
u, f, i = np.unique(df.Work.eq('Off').values.cumsum(), True, True)
t = df.Time.values
df['Elapsed Time'] = t - t[f[i]]
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 00:00:00
1 2018-12-01 10:00:02 On 00:00:02
2 2018-12-01 10:00:05 On 00:00:05
3 2018-12-01 10:00:06 On 00:00:06
4 2018-12-01 10:00:07 On 00:00:07
5 2018-12-01 10:00:09 Off 00:00:00
6 2018-12-01 10:00:11 Off 00:00:00
7 2018-12-01 10:00:14 On 00:00:03
8 2018-12-01 10:00:16 On 00:00:05
9 2018-12-01 10:00:18 On 00:00:07
10 2018-12-01 10:00:20 Off 00:00:00
We can nail down the integer bit with
df['Elapsed Time'] = (t - t[f[i]]).astype('timedelta64[s]').astype(int)
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
add a comment |
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
9
down vote
You can use groupby
:
# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
The idea is to extract the seconds portion and subtract the elapsed time from the first moment the state changes from "Off" to "On". This is done using transform
and first
.
cumsum
is used to identify groups:
df.Work.eq('Off').cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 3
7 3
8 3
9 3
10 4
Name: Work, dtype: int64
If there's a possibility your device can span multiple minutes while in the "On", then, initialise sec
as:
sec = df['Time'].values.astype(np.int64) // 10e8
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
@Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame?
– coldspeed
yesterday
@Rafael Okay, and regarding your second point, doesdf['Time'].values.astype(np.int64) // 10e8
work?
– coldspeed
yesterday
The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32];
– Rafael
yesterday
@Rafael can you read my comments just about yours again please?
– coldspeed
yesterday
I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work.
– Rafael
yesterday
add a comment |
up vote
9
down vote
You can use groupby
:
# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
The idea is to extract the seconds portion and subtract the elapsed time from the first moment the state changes from "Off" to "On". This is done using transform
and first
.
cumsum
is used to identify groups:
df.Work.eq('Off').cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 3
7 3
8 3
9 3
10 4
Name: Work, dtype: int64
If there's a possibility your device can span multiple minutes while in the "On", then, initialise sec
as:
sec = df['Time'].values.astype(np.int64) // 10e8
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
@Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame?
– coldspeed
yesterday
@Rafael Okay, and regarding your second point, doesdf['Time'].values.astype(np.int64) // 10e8
work?
– coldspeed
yesterday
The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32];
– Rafael
yesterday
@Rafael can you read my comments just about yours again please?
– coldspeed
yesterday
I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work.
– Rafael
yesterday
add a comment |
up vote
9
down vote
up vote
9
down vote
You can use groupby
:
# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
The idea is to extract the seconds portion and subtract the elapsed time from the first moment the state changes from "Off" to "On". This is done using transform
and first
.
cumsum
is used to identify groups:
df.Work.eq('Off').cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 3
7 3
8 3
9 3
10 4
Name: Work, dtype: int64
If there's a possibility your device can span multiple minutes while in the "On", then, initialise sec
as:
sec = df['Time'].values.astype(np.int64) // 10e8
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
You can use groupby
:
# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
The idea is to extract the seconds portion and subtract the elapsed time from the first moment the state changes from "Off" to "On". This is done using transform
and first
.
cumsum
is used to identify groups:
df.Work.eq('Off').cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 3
7 3
8 3
9 3
10 4
Name: Work, dtype: int64
If there's a possibility your device can span multiple minutes while in the "On", then, initialise sec
as:
sec = df['Time'].values.astype(np.int64) // 10e8
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
edited yesterday
answered yesterday
coldspeed
113k18103176
113k18103176
@Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame?
– coldspeed
yesterday
@Rafael Okay, and regarding your second point, doesdf['Time'].values.astype(np.int64) // 10e8
work?
– coldspeed
yesterday
The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32];
– Rafael
yesterday
@Rafael can you read my comments just about yours again please?
– coldspeed
yesterday
I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work.
– Rafael
yesterday
add a comment |
@Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame?
– coldspeed
yesterday
@Rafael Okay, and regarding your second point, doesdf['Time'].values.astype(np.int64) // 10e8
work?
– coldspeed
yesterday
The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32];
– Rafael
yesterday
@Rafael can you read my comments just about yours again please?
– coldspeed
yesterday
I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work.
– Rafael
yesterday
@Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame?
– coldspeed
yesterday
@Rafael Yeah, the assumption here is that your row starts in the "Off" condition. Can you append a row at the beginning of your frame?
– coldspeed
yesterday
@Rafael Okay, and regarding your second point, does
df['Time'].values.astype(np.int64) // 10e8
work?– coldspeed
yesterday
@Rafael Okay, and regarding your second point, does
df['Time'].values.astype(np.int64) // 10e8
work?– coldspeed
yesterday
The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32];
– Rafael
yesterday
The code worked fine for seconds. However, when the first cell of the column Work was 'On' the elapsed time did not begin in zero. Besides, when the time changed to the next minute, the elapsed time was negative. I tried using sec = df['Time'].astype(int) but I got the error: cannot astype a datetimelike from [datetime64[ns]] to [int32];
– Rafael
yesterday
@Rafael can you read my comments just about yours again please?
– coldspeed
yesterday
@Rafael can you read my comments just about yours again please?
– coldspeed
yesterday
I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work.
– Rafael
yesterday
I deleted the comment and posted it again so I could edit it. Regarding your answers, I receive the data every day, it begins 'On' and ends 'On', so I am not sure if I can append a row, but I will try to using the date change as condition. The code df['Time'].values.astype(np.int64) // 10e8 did work.
– Rafael
yesterday
add a comment |
up vote
8
down vote
IIUC first
with transform
(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]:
0 0
1 2
2 5
3 6
4 7
5 0
6 0
7 3
8 5
9 7
10 0
Name: Time, dtype: int64
add a comment |
up vote
8
down vote
IIUC first
with transform
(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]:
0 0
1 2
2 5
3 6
4 7
5 0
6 0
7 3
8 5
9 7
10 0
Name: Time, dtype: int64
add a comment |
up vote
8
down vote
up vote
8
down vote
IIUC first
with transform
(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]:
0 0
1 2
2 5
3 6
4 7
5 0
6 0
7 3
8 5
9 7
10 0
Name: Time, dtype: int64
IIUC first
with transform
(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]:
0 0
1 2
2 5
3 6
4 7
5 0
6 0
7 3
8 5
9 7
10 0
Name: Time, dtype: int64
answered yesterday
W-B
96.5k72962
96.5k72962
add a comment |
add a comment |
up vote
7
down vote
You could use two groupbys
. The first calculates the time difference within each group. The second then sums those within each group.
s = (df.Work=='Off').cumsum()
df['Elapsed Time'] = df.groupby(s).Time.diff().dt.total_seconds().fillna(0).groupby(s).cumsum()
Output
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero.
– Rafael
yesterday
@Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact withdf.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0
. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too.
– ALollz
yesterday
add a comment |
up vote
7
down vote
You could use two groupbys
. The first calculates the time difference within each group. The second then sums those within each group.
s = (df.Work=='Off').cumsum()
df['Elapsed Time'] = df.groupby(s).Time.diff().dt.total_seconds().fillna(0).groupby(s).cumsum()
Output
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero.
– Rafael
yesterday
@Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact withdf.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0
. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too.
– ALollz
yesterday
add a comment |
up vote
7
down vote
up vote
7
down vote
You could use two groupbys
. The first calculates the time difference within each group. The second then sums those within each group.
s = (df.Work=='Off').cumsum()
df['Elapsed Time'] = df.groupby(s).Time.diff().dt.total_seconds().fillna(0).groupby(s).cumsum()
Output
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
You could use two groupbys
. The first calculates the time difference within each group. The second then sums those within each group.
s = (df.Work=='Off').cumsum()
df['Elapsed Time'] = df.groupby(s).Time.diff().dt.total_seconds().fillna(0).groupby(s).cumsum()
Output
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
answered yesterday
ALollz
10.6k31234
10.6k31234
The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero.
– Rafael
yesterday
@Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact withdf.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0
. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too.
– ALollz
yesterday
add a comment |
The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero.
– Rafael
yesterday
@Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact withdf.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0
. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too.
– ALollz
yesterday
The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero.
– Rafael
yesterday
The code worked fine. However, when the first work cell of the dataframe was 'On', the elapsed time was not zero.
– Rafael
yesterday
@Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact with
df.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0
. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too.– ALollz
yesterday
@Rafael good point. There might be a neat way to fix it in the calculation, but you can just fix it after the fact with
df.loc[df.index < s[s==1].idxmax(), 'Elapsed Time'] = 0
. I guess there's still an issue if the machine never tuns on, but that can be fixed or handled too.– ALollz
yesterday
add a comment |
up vote
4
down vote
Using a groupby, you can do this:
df['Elapsed Time'] = (df.groupby(df.Work.eq('Off').cumsum()).Time
.transform(lambda x: x.diff()
.dt.total_seconds()
.cumsum())
.fillna(0))
>>> df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
add a comment |
up vote
4
down vote
Using a groupby, you can do this:
df['Elapsed Time'] = (df.groupby(df.Work.eq('Off').cumsum()).Time
.transform(lambda x: x.diff()
.dt.total_seconds()
.cumsum())
.fillna(0))
>>> df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
add a comment |
up vote
4
down vote
up vote
4
down vote
Using a groupby, you can do this:
df['Elapsed Time'] = (df.groupby(df.Work.eq('Off').cumsum()).Time
.transform(lambda x: x.diff()
.dt.total_seconds()
.cumsum())
.fillna(0))
>>> df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
Using a groupby, you can do this:
df['Elapsed Time'] = (df.groupby(df.Work.eq('Off').cumsum()).Time
.transform(lambda x: x.diff()
.dt.total_seconds()
.cumsum())
.fillna(0))
>>> df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
answered yesterday
sacul
29k41639
29k41639
add a comment |
add a comment |
up vote
3
down vote
A numpy slicy approach
u, f, i = np.unique(df.Work.eq('Off').values.cumsum(), True, True)
t = df.Time.values
df['Elapsed Time'] = t - t[f[i]]
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 00:00:00
1 2018-12-01 10:00:02 On 00:00:02
2 2018-12-01 10:00:05 On 00:00:05
3 2018-12-01 10:00:06 On 00:00:06
4 2018-12-01 10:00:07 On 00:00:07
5 2018-12-01 10:00:09 Off 00:00:00
6 2018-12-01 10:00:11 Off 00:00:00
7 2018-12-01 10:00:14 On 00:00:03
8 2018-12-01 10:00:16 On 00:00:05
9 2018-12-01 10:00:18 On 00:00:07
10 2018-12-01 10:00:20 Off 00:00:00
We can nail down the integer bit with
df['Elapsed Time'] = (t - t[f[i]]).astype('timedelta64[s]').astype(int)
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
add a comment |
up vote
3
down vote
A numpy slicy approach
u, f, i = np.unique(df.Work.eq('Off').values.cumsum(), True, True)
t = df.Time.values
df['Elapsed Time'] = t - t[f[i]]
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 00:00:00
1 2018-12-01 10:00:02 On 00:00:02
2 2018-12-01 10:00:05 On 00:00:05
3 2018-12-01 10:00:06 On 00:00:06
4 2018-12-01 10:00:07 On 00:00:07
5 2018-12-01 10:00:09 Off 00:00:00
6 2018-12-01 10:00:11 Off 00:00:00
7 2018-12-01 10:00:14 On 00:00:03
8 2018-12-01 10:00:16 On 00:00:05
9 2018-12-01 10:00:18 On 00:00:07
10 2018-12-01 10:00:20 Off 00:00:00
We can nail down the integer bit with
df['Elapsed Time'] = (t - t[f[i]]).astype('timedelta64[s]').astype(int)
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
add a comment |
up vote
3
down vote
up vote
3
down vote
A numpy slicy approach
u, f, i = np.unique(df.Work.eq('Off').values.cumsum(), True, True)
t = df.Time.values
df['Elapsed Time'] = t - t[f[i]]
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 00:00:00
1 2018-12-01 10:00:02 On 00:00:02
2 2018-12-01 10:00:05 On 00:00:05
3 2018-12-01 10:00:06 On 00:00:06
4 2018-12-01 10:00:07 On 00:00:07
5 2018-12-01 10:00:09 Off 00:00:00
6 2018-12-01 10:00:11 Off 00:00:00
7 2018-12-01 10:00:14 On 00:00:03
8 2018-12-01 10:00:16 On 00:00:05
9 2018-12-01 10:00:18 On 00:00:07
10 2018-12-01 10:00:20 Off 00:00:00
We can nail down the integer bit with
df['Elapsed Time'] = (t - t[f[i]]).astype('timedelta64[s]').astype(int)
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
A numpy slicy approach
u, f, i = np.unique(df.Work.eq('Off').values.cumsum(), True, True)
t = df.Time.values
df['Elapsed Time'] = t - t[f[i]]
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 00:00:00
1 2018-12-01 10:00:02 On 00:00:02
2 2018-12-01 10:00:05 On 00:00:05
3 2018-12-01 10:00:06 On 00:00:06
4 2018-12-01 10:00:07 On 00:00:07
5 2018-12-01 10:00:09 Off 00:00:00
6 2018-12-01 10:00:11 Off 00:00:00
7 2018-12-01 10:00:14 On 00:00:03
8 2018-12-01 10:00:16 On 00:00:05
9 2018-12-01 10:00:18 On 00:00:07
10 2018-12-01 10:00:20 Off 00:00:00
We can nail down the integer bit with
df['Elapsed Time'] = (t - t[f[i]]).astype('timedelta64[s]').astype(int)
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
answered yesterday
piRSquared
151k22138281
151k22138281
add a comment |
add a comment |
Rafael is a new contributor. Be nice, and check out our Code of Conduct.
Rafael is a new contributor. Be nice, and check out our Code of Conduct.
Rafael is a new contributor. Be nice, and check out our Code of Conduct.
Rafael is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53656878%2fpandas-measure-elapsed-time-since-a-condition%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Welcome to Stack Overflow, Rafael! I definitely came here just because the title seemed amusing, but left learning what Pandas actually means in this context.
– zarose
yesterday