Over-ride one TimeSeries with another
I'd like to over-ride the values of one time series with another. The
input series has values at all points. The over-ride time series will have
the same index (i.e. dates) but I would only want to over-ride the values
at some dates. The way I have thought of specifying this is to have a time
series with values where I want to over-ride to that value and NaN where I
don't want an over-ride applied. Perhaps best illustrated with a quick
example:
index ints orts outts
2013-04-01 1 NaN 1
2013-05-01 2 11 11
2013-06-01 3 NaN 3
2013-07-01 4 9 9
2013-08-01 2 97 97
As you can see from the example, I don't think the replace or where
methods would work as the values of replacement are index location
dependent and not input value dependent. Because I want to do this more
than once I've put it in a function and I do have a solution that works as
shown below:
def overridets(ts, orts):
tmp = pd.concat([ts, orts], join='outer', axis=1)
out = tmp.apply(lambda x: x[0] if pd.isnull(x[1]) else x[1], axis=1)
return out
The issue is that this runs relatively slowly: 20 - 30 ms for a 500 point
series in my environment. Multiplying two 500 point series takes ~200 us
so we're talking about 100 times slower. Any suggestions on how to pick up
the pace?
No comments:
Post a Comment