I need to compute some Hospital Readmission Variables using Python.
The variables needed for the variables are:
- ID: Patient ID
- Date Admission: Date of entrance to hospital
- Date Discharge: Date of exit from hospital
- Diagnosis: type of disease the person has
- Hospital. It is an ID of the hospital.
I would need to compute the following metrics:
- Simple Readmission:
Compute variables for different periods 3, 7, 14 30 and 45 days after discharge from index admission.
Rule:Readmission is possible the day after discharge from index admission – at the earliest. This means that if a patient is discharged and admitted on the same day, it is considered a transfer and not a readmission.
I was able to calculate this by using:
df = df.sort_values(['ID','Admission_Date'])
df[['Admission_Date','Discharge_Date']] = df[['Admission_Date','Discharge_Date']].apply(lambda x: pd.to_datetime(x))
df['Readmit3']=df.groupby('ID').apply(lambda x: (((x['Admission_Date'].shift(-1)-x['Discharge_Date']).dt.days.shift(1).le(3))).astype(int)).values
df['Readmit7']=df.groupby('ID').apply(lambda x: (((x['Admission_Date'].shift(-1)-x['Discharge_Date']).dt.days.shift(1).le(7))).astype(int)).values
df['Readmit14']=df.groupby('ID').apply(lambda x: (((x['Admission_Date'].shift(-1)-x['Discharge_Date']).dt.days.shift(1).le(14))).astype(int)).values
df['Readmit30']=df.groupby('ID').apply(lambda x: (((x['Admission_Date'].shift(-1)-x['Discharge_Date']).dt.days.shift(1).le(30))).astype(int)).values
df['Readmit45']=df.groupby('ID').apply(lambda x: (((x['Admission_Date'].shift(-1)-x['Discharge_Date']).dt.days.shift(1).le(45))).astype(int)).values
However, I am facing difficulties with:
-
Hospital Readmission
Variable indicating if readmission is to the same hospital or not (yes/no) within 3, 7, 14 30 and 45 days from discharge from the index admission. -
Variable indicating if readmission is to a list of specific hospitals. List Hospital: [3,34]. These hospitals are not in the snippet but just as toy example.
-
Variable indicating if readmission is with the same diagnosis or not, within 3, 7, 14 30 and 45 days from discharge from the index admission (yes/no).
Here is the data snippet:
data = {'Date_Admission': ['19/04/20', '20/02/20', '06/04/20', '11/03/20', '11/04/20', '13/05/20', '10/01/20', '16/04/20', '08/02/20', '21/05/20', '06/04/20', '03/01/20', '15/05/20', '04/04/20', '13/01/20', '11/05/20', '19/02/20', '25/02/20', '14/05/20', '07/02/20', '14/03/20', '03/01/20', '14/02/20', '12/02/20', '09/05/20', '19/01/20', '07/04/20', '27/04/20', '14/05/20', '09/02/20', '23/03/20', '22/04/20', '14/02/20', '10/01/20', '05/03/20', '14/01/20', '04/04/20', '04/05/20', '22/05/20', '24/01/20', '11/02/20', '28/03/20', '03/05/20', '15/05/20', '02/01/20', '20/02/20', '13/01/20', '31/03/20', '16/04/20', '27/02/20', '10/02/20', '22/03/20', '15/05/20', '06/02/20', '05/04/20', '26/01/20', '28/05/20', '11/05/20', '29/04/20', '21/04/20', '13/01/20', '10/01/20', '27/05/20', '28/03/20', '27/01/20', '15/01/20', '16/03/20', '20/04/20', '10/03/20', '26/04/20', '28/01/20', '27/01/20', '26/04/20', '07/01/20', '28/04/20', '01/02/20', '18/02/20', '06/02/20', '18/03/20', '21/02/20', '01/04/20', '20/05/20', '03/02/20', '25/01/20', '23/03/20', '06/04/20', '13/05/20', '15/02/20', '20/02/20', '27/04/20', '02/03/20', '10/03/20', '19/05/20', '01/02/20', '26/05/20', '12/03/20', '17/02/20', '15/04/20'],
'Date_Discharge': ['23/04/20', '25/02/20', '15/04/20', '04/04/20', '24/04/20', '15/05/20', '15/01/20', '25/04/20', '27/02/20', '24/05/20', '11/04/20', '05/01/20', '20/10/20', '08/04/20', '18/01/20', '11/05/20', '04/03/20', '03/03/20', '22/05/20', '29/02/20', '16/03/20', '07/01/20', '17/02/20', '13/03/20', '22/05/20', '22/01/20', '27/04/20', '19/05/20', '20/05/20', '15/02/20', '04/06/20', '30/04/20', '19/02/20', '16/01/20', '10/03/20', '20/01/20', '16/04/20', '18/05/20', '08/06/20', '29/01/20', '16/02/20', '01/04/20', '22/05/20', '23/05/20', '08/01/20', '20/02/20', '20/01/20', '10/04/20', '27/04/20', '85406', '13/02/20', '25/03/20', '28/05/20', '12/02/20', '20/04/20', '4848', '04/02/20', '19/06/20', '13/05/20', '581', '29/04/20', '03/05/20', '29532', '17/01/20', '01/02/20', '5849', '11/04/20', '42979', '22/01/20', '17/03/20', '4280', '11/03/20', '01/05/20', '40211', '06/02/20', '5400', '29/04/20', '29663', '06/05/20', '78039', '17/03/20', '51881', '24/05/20', '42781', '19/03/20', '10/04/20', '9962', '29/05/20', '18/02/20', '24/02/20', '29/04/20', '06/03/20', '17/03/20', '500', '30/05/20', '05/02/20', '27/05/20', '25/03/20', '22/02/20', '05/05/20'],
'Diagnosis_Primary': [65421, 51881, 1889, 431, 431, 85400, 56081, 56211, 1912, 650, 1911, 5409, 51882, 650, 78609, 49301, 82321, 5119, 4111, 82020, 650, 30183, 41071, 9962, 28860, 650, 4848, 1398, 51881, 5111, 4848, 5789, 29690, 485, 5852, 419, 8244, 7994, 29020, 51881, 51881, 650, 43401, 4373, 80841, 5856, 1628, 1961, 1961, 85406, 4289, 40211, 82020, 46611, 4848, 81200, 1890, 591, 66981, 29532, 30502, 82001, 5849, 5168, 42979, 5609, 632, 4280, 60820, 5609, 40211, 5400, 650, 29663, 29642, 78039, 431, 51881, 42781, 51881, 650, 51881, 56089, 5118, 85220, 8832, 4848],
'ID': [4, 16, 25, 42, 42, 50, 60, 64, 65, 67, 72, 77, 96, 101, 112, 116, 124, 146, 146, 154, 160, 161, 184, 185, 189, 192, 201, 201, 215, 234, 240, 248, 267, 286, 286, 292, 299, 309, 309, 318, 318, 340, 346, 354, 365, 367, 368, 368, 368, 385, 404, 420, 423, 431, 487, 492, 493, 519, 581, 598, 607, 620, 637, 646, 661, 664, 666, 672, 685, 723, 740, 744, 745, 751, 751, 753, 759, 760, 764, 774, 777, 779, 795, 807, 807, 817, 854, 862, 862, 865, 868, 868, 868, 873, 874, 877, 907],
'Hospital_ID': [1] * 92}
df = pd.DataFrame(data)
I do not know what to try, still a beginner.