JeanneReppert.com

Programming Project Highlights

Projects listed below include work done for classes at the University of North Carolina at Greensboro for the Master of Science in Informatics and Analytics degree completed in May, 2021. All listed projects including code and reports (and additional projects) may be viewed in detail on my GitHub page.

Capstone Project

In 2020, with the onset of the Covid-19 pandemic, the problem of misinformation became pervasive, particularly on social media. This text and social media analytics project focused on sources of misinformation in three popular alternative health websites.

Using packages in RStudio (rselenium and rvest) the 2020 corpus of health related articles on each website was scraped. Massmine, a command line Linux program, was used to access the Twitter API for relevant tweets, Facebook posts were collected with CrowdTangle, and website traffic data was obtained from Semrush. More than a dozen datasets containing several thousand observations were analyzed. (view project report)

Other packages used: textclean, tidyverse packages, stringr, UTF8, lubridate (for cleaning text and parsing dates), snowballC (word stems), quanteda (natural language processing), tm (text mining), syuzhet (sentiment analysis), plotly, ggplot, igraph, wordcloud (view code)

Internet of Things and Power BI

Two arduinos connected to six sensors were programmed to collect and send data every 15 minutes to two IFTTT applets which sent each observation to two google sheets. After three weeks the data was downloaded to two csv files. The files were merged, cleaned, sorted, and transformed using packages in RStudio. (view code) A power BI dashboard was used to visualize the results. (view dashboard)

World Happiness Datasets - Shiny Dashboard

This project showcases three datasets from the United Nations' World Happiness Report (from Kaggle). Included in the data are scores assigned to 155 countries based on variables such as GDP per capita, family support, life expectancy, freedom, trust of government, and generosity. Data was cleaned, formatted, and visualized in a Shiny Dashboard. (view dashboard)

Packages used: shiny, shinydashboard, dplyr, tidyr, DT, rworldmap, plotly, ggplot2, shinWidgets, pairsD3 (view code)

Predicting Heart Disease with a KNeighbors Classifier

A KNeighbors classifer model was used to predict heart disease risk with data from the Framingham study. Sklearn metrics were used to measure the effectiveness of the model. Data used is included (obtained from Kaggle) and a report of results is found in the pdf file.

Python packages used: itertools, numpy, seaborn, matplotlib, pandas, sklearn (metrics, neighbors, model_selection) (view code)

Predicting Hospital Readmissions Rates with a Diabetes Dataset

A collection of 70,000 clinical database patient records was used to predict hospital readmissions for diabetic patients using logistic regression and random forest models. Python was used to clean data and modeling was performed in R Studio.

Python packages used: pandas, numpy; R packages used: reticulate, dplyr, psych, magritter, randomForest, caTools, caret, arsenal, mosaic, sjPloy, stargazer, summarytools, visdat (view code)

MySQL Project - Animal Rescue Database

For this group project we used Lucidchart to create an ER diagram of our database plan. The plan involved creating a database that could potentially serve as a template for an animal rescue organization. Using PHP MySQL, data pertaining to adoptable animals including adoption fees, animal descriptions, medical procedures, costs, ownership, and other relevant descriptors were linked for searches and querying. Diagram and sample queries of the database are included in the report. (view project)

World Happiness Project - Regression Analysis With SQLite3 and Spark

This group project utilized 6 datasets containing environmental, fiscal, and demographic data on countries found in the United Nations' 2017 World Happiness dataset. All datasets were cleaned, explored and visualized and sqlite3 was used to create a database linking the six datasets and perform queries. Spark regression analysis was used to determine which variables within the database were most predictive of overall World Happiness Scores. A PDF report of findings is included in the github file as well as the 6 datasets used. Additionally, a ER diagram is included to visualize the relational database created.

Python packages used: pandas, numpy, sklearn, matplotlib, scaler, seaborn, sqlite3, plotly, pyspark (view code)