Advanced Heart Disease Detection: A Machine Learning Approach

Introduction

Heart disease remains one of the leading causes of death globally, with 17.9 million people succumbing to cardiovascular diseases in 2019 alone. Early detection and intervention are crucial in mitigating the risks associated with heart disease. However, the healthcare sector often lacks the necessary resources to adequately address the large-scale challenges posed by this condition.

Our team has developed an advanced machine learning project aimed at improving the detection of patients at risk of heart disease. This project addresses a critical health issue and demonstrates the potential of AI in medical diagnostics, potentially reducing the burden on healthcare providers and improving patient outcomes.

Objective

The primary goals of our project were to:

Improve the rate of success for detecting patients at risk of heart disease
Decrease the risk of not identifying patients at risk of heart disease
Develop machine learning models with greater efficacy than prior models
Study and compare various models for heart disease classification

Research and Literature Review

Our project began with extensive research into heart disease and its risk factors. We conducted a comprehensive literature review to understand the current state of heart disease detection and the potential for machine learning in this field.

We utilized datasets from the University of California, Irvine repository, which included trials from the Cleveland Clinic Foundation. The dataset comprises 13 key risk factors, including age, sex, chest pain type, resting blood pressure, cholesterol levels, and more.

Experiments and Results

We conducted experiments with different optimizers and learning rates. The results are summarized in the following images:

Performance Measurement Results Figure 7.1: Optimizers with LR = 0.0001

Performance Measurement Results Figure 7.2: Optimizers with LR = 0.001

From the results above, we can see that the best optimizer to use is Adam.

Approach

Our approach involved testing various machine learning models, including:

Recurrent Neural Networks (RNN)
K-Nearest Neighbors (KNN)
Support Vector Machine (SVM)
Decision Tree Classification
Random Forest Classification

These models were evaluated based on metrics derived from a confusion matrix, including accuracy, precision, recall, and F1-score, to determine their effectiveness in detecting heart disease.

Implementation

We implemented our models using Python, leveraging popular libraries such as TensorFlow, Keras, and scikit-learn. Here are snippets of our code for creating classification and binary models using neural networks:

Categorical Model

def create_categorical_model():
    model = Sequential()
    model.add(Dense(10, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(5, activation='softmax'))
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

classification_model = create_classification_model()
classification_model.fit(X_train, Y_train_classification, epochs=100, batch_size=10, verbose=1)

Binary Model

def create_binary_model():
    model = Sequential()
    model.add(Dense(10, input_dim=13, kernel_initializer='normal', activation='relu'))
    model.add(Dense(6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    
    optimizer = Adam(learning_rate=0.001)
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

binary_model = create_binary_model()
binary_model.fit(X_train, Y_train_binary, epochs=100, batch_size=10, verbose=1)

Results and Analysis

After rigorous testing and optimization, our best model achieved an impressive accuracy of 90% in detecting heart disease. We experimented with various parameters such as learning rates, optimizers, and neuron counts to achieve this result.

Categorical Model Evaluation

The graph above shows a comparison of classification models with different learning rates (0.001 and 0.0001) across various neuron counts.

Binary Model Evaluation

This graph displays the performance of the binary model, also comparing learning rates of 0.001 and 0.0001 across different neuron configurations.

We found that the binary model generally performed better than the classification model, especially with a learning rate of 0.001. The binary model achieves higher accuracy across different neuron counts, making it a more reliable choice for heart disease prediction in this case.

Conclusion

By choosing the Recurrent Neural Network model for the UCI repository dataset and fine-tuning its parameters, our algorithm was able to predict True Positive results more accurately compared to other models, achieving an impressive 90% accuracy.

This success in increasing the accuracy of our machine learning algorithm to detect heart disease has significant implications for medical professionals in future diagnoses. Early detection of heart disease can greatly improve survivability, and we are humbled to play a part in advancing this critical area of healthcare.