EXPERIENCE
Data Scientist
July 2023 - Present
Hartford Financial Services Group
Developed and implemented LLM-powered pipelines utilizing GPT-4 and the OpenAI API to automate the summarization of financial documents, efficiently processing high volumes of 10-K filings with 99.9% accuracy.
Designed and conducted A/B tests to compare the accuracy of automated financial document summarization models, leading to a 10% improvement in overall model accuracy.
Implemented a scalable ETL pipeline on Databricks, integrating Azure Functions and Pandas to streamline the extraction, transformation, and loading of Excel data into Azure SQL Database, achieving over 20% efficiency improvement in data processing.
Constructed a predictive model using Scikit-learn and Python to evaluate funding application viability, enhancing data-driven decision-making processes with actionable insights.
Created interactive Power BI dashboards powered by optimized SQL queries for Azure SQL Database, effectively visualizing monthly cash flow of $13.6 million, and providing actionable insights to enhance financial oversight.
Leveraged Bash scripting and command-line utilities in Linux-based environments to automate workflows, manage system resources, and troubleshoot issues, ensuring seamless integration with tools like Docker, Kubernetes, and Airflow.
Data Science Research AssistantÂ
Jan 2023 - May 2023
UMBC
Enhanced data model efficiency by 40% through the development and optimization of ETL workflows using Python, Airflow, and SQL on AWS, significantly improving data accuracy and processing speed for event analysis tasks.
Designed and implemented NLP-driven data pipelines in Python and R, leveraging NLTK and Pandas to preprocess and classify university announcements with high precision, improving data organization and accessibility for stakeholders.
Developed machine learning models using TensorFlow and PySpark for predicting event categories, incorporating advanced feature engineering techniques to boost model accuracy and scalability for large datasets.
Data Scientist
June 2019 - August 2020
Coforge
Designed and optimized data pipelines using AWS S3, Glue, and PySpark to efficiently process and manage terabytes of data, reducing access times and operational costs while improving data transfer and availability.
Automated ETL workflows with Apache Airflow, achieving a 30% increase in operational efficiency by streamlining task scheduling, execution, and dependency management.
Applied advanced statistical models (ARIMA, LSTM) to forecast behavioral health trends, enabling improved patient care planning and reducing manual data processing by 60% through Python/Pandas pipelines.
Enhanced data integration and warehousing efficiency with tools like Apache Kafka, AWS Lambda, and Redshift, achieving a 40% improvement in ingestion performance and implementing robust security measures with IAM.