PORTFOLIO
Sentiment Analysis on Mental Health
The "Sentiment Analysis for Mental Health" project leverages natural language processing and machine learning to assess individuals' mental health statuses through their textual expressions. By analyzing language patterns, the project aims to identify emotional indicators and predict mental health conditions, offering valuable insights for timely support and intervention. Key features include data preprocessing techniques such as tokenization and stopword removal, sentiment polarity classification, and predictive modeling to discern potential mental health issues. This approach underscores the potential of technology in enhancing mental health awareness and support systems.
LLM Prompt Engineering
The "LLM Prompt Engineering" project explores the use of prompt engineering to improve dialogue summarization with large language models (LLMs). It evaluates the impact of different inference methods—zero-shot, one-shot, and few-shot—on the quality of model-generated summaries. By experimenting with various prompt strategies, the project aims to enhance the accuracy and relevance of outputs from LLMs.
Credit Card Fraud Detection with ML
The "Credit Card Fraud Detection Using Machine Learning" project applies various classification algorithms to identify fraudulent credit card transactions. The dataset comprises anonymized features representing financial transactions, aiming to distinguish between genuine and fraudulent activities. Key aspects of the project include data preprocessing—such as handling missing values, addressing class imbalance, and feature scaling—and the implementation of machine learning models like Logistic Regression, Random Forest, Support Vector Machine (SVM), and Decision Tree. Model performance is evaluated using metrics like accuracy, precision, recall, and F1 score, with the best-performing models identified based on F1 score and test accuracy.
Fine-tuning LLMs
The "Fine-Tuning OpenAI's GPT-3.5 Turbo" project demonstrates the process of customizing the GPT-3.5 Turbo model to enhance its performance on specific tasks. This involves preparing a dataset tailored to the desired application, configuring the model's parameters, and executing the fine-tuning process to adapt the model's outputs to align more closely with the intended use case. The project provides a comprehensive, step-by-step guide, enabling practitioners to effectively fine-tune GPT-3.5 Turbo for specialized applications, thereby improving the model's relevance and accuracy in targeted scenarios.
Building a Data pipeline with Airflow & AWS
The "Reddit Data Pipeline with Airflow and AWS" project showcases the development of an automated data pipeline that extracts information from Reddit using its API and the PRAW (Python Reddit API Wrapper) module. The extracted data is then uploaded to Amazon S3 storage, with the entire workflow orchestrated by Apache Airflow running within a Docker environment. Subsequent data processing is performed using AWS Glue, and analysis is conducted through Amazon Athena. This project exemplifies the integration of various technologies—Docker, Airflow, and multiple AWS services—to create a cohesive and efficient data engineering solution.