Uncategorized

Python – Get the count of activity based on relative delta from today


I have a dataframe like below, about a million rows with unique person_id

+-----------+------------+----------+
| person_id |    date    | activity |
+-----------+------------+----------+
| A         | 31/03/2022 | Sell     |
| A         | 02/03/2023 | Buy      |
| A         | 29/08/2023 | Buy      |
| A         | 13/05/2023 | Buy      |
| A         | 28/02/2023 | Sell     |
| A         | 02/01/2024 | Sell     |
+-----------+------------+----------+

I want to calculate count of activity based on time when the activity was created till “today” and group them based on the person_id. the new columns will be 12 months, 9 months, 6 months and 3 months for each buy and sell.

For example, if someone is to ask, how many “buy” did person A had in past 3 months (from “today”) then we should be able to answer zero. and if asked the same about “sell” we should say one.

Here, the length is calculated from current date which is the system date on the day the script has run.

The output should look like below.

+-----------+------------+----------+---------+--------+--------+--------+----------+---------+---------+---------+
| person_id |    date    | activity | buy_12m | buy_9m | buy_6m | buy_3m | sell_12m | sell_9m | sell_6m | sell_3m |
+-----------+------------+----------+---------+--------+--------+--------+----------+---------+---------+---------+
| A         | 31/03/2022 | Sell     |       3 |      2 |      1 |      0 |        2 |       1 |       1 |       1 |
| A         | 02/03/2023 | Buy      |       3 |      2 |      1 |      0 |        2 |       1 |       1 |       1 |
| A         | 29/08/2023 | Buy      |       3 |      2 |      1 |      0 |        2 |       1 |       1 |       1 |
| A         | 13/05/2023 | Buy      |       3 |      2 |      1 |      0 |        2 |       1 |       1 |       1 |
| A         | 28/02/2023 | Sell     |       3 |      2 |      1 |      0 |        2 |       1 |       1 |       1 |
| A         | 02/01/2024 | Sell     |       3 |      2 |      1 |      0 |        2 |       1 |       1 |       1 |
+-----------+------------+----------+---------+--------+--------+--------+----------+---------+---------+---------+

The count of input rows will be same as output, I am sort of “exploding” the group_by results to every row. I am fine with the duplication like this as this is connected to another system which expects same number of rows.

I did this my way but have been using relativedelta and a lot of joins and group_by, so ended up with very long code with takes time to execute.

Thank you for your help 🙂



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *