I am tasked with addressing an extreme class imbalance in a binary classification problem, where the positive class constitutes only 0.05% of the total samples. Assume you have a dataset with N samples and D features.
Mathematics (50 points): Derive the mathematical formulation of a cost function that explicitly considers the class imbalance. Incorporate a regularization term and explain how it helps in preventing overfitting. Provide a step-by-step derivation and rationale for each term in the cost function.
Algorithm Design (30 points): Propose a novel algorithm or modification to an existing one that is specifically tailored for extreme class imbalance. Provide pseudocode or a detailed algorithmic description of how your approach works. Highlight any key hyperparameters and their significance.
Coding Implementation (20 points): Implement your proposed algorithm using a programming language of your choice (e.g., Python). Apply the algorithm to the provided dataset, and ensure your code is well-documented. Evaluate the model performance using appropriate metrics and visualization techniques. Discuss the results and any observed trade-offs.
Mathematics:
I attempted to derive a cost function that accounts for extreme class imbalance, incorporating a regularization term for preventing overfitting. I expected a comprehensive and step-by-step explanation of each term in the cost function, demonstrating a deep understanding of the mathematical foundations of machine learning.
Algorithm Design:
I proposed a novel algorithm or modification to an existing one that specifically addresses extreme class imbalance. I expected to present a clear and detailed algorithmic description, including pseudocode, and to highlight the significance of key hyperparameters. I aimed to showcase creative thinking and an in-depth understanding of algorithm design principles.
Coding Implementation:
I implemented the proposed algorithm using a programming language of my choice, ensuring well-documented code. I applied the algorithm to the provided dataset, evaluating model performance using appropriate metrics and visualization techniques. I expected to demonstrate strong coding skills, showcase the application of the proposed algorithm, and discuss observed results and any trade-offs.