Jessica Schwartz, Boren Li, Hao Hu Zhao
About 9.3% of the adult population in the US has been diagnosed with diabetes and about
35.7% of adults in the US are considered obese. A 2017 study by Cleveland Clinic found that
obesity is the leading cause of early preventable deaths in the US, exceeding even tobacco.
This project aims to explore different
factors that may be associated with obesity and diabetes, their distribution across the US,
and how these variables correlate with each other. Finding patterns in these relationships
may improve our understanding of public health issues and help inform policy.
It is important to note that the relationships found between variables are solely correlational
and have not been proven to be causal. Nonetheless, we can learn a lot by observing these relationships.
The "Learn" section contains descriptions of a few interesting relationships we have found in the
data.
In the "Explore" section, you may explore the data for yourself, comparing different variables and
viewing the relationships between them.
Our data set consists of different categories of information about each county in the United States taken from the USDA’s Food Environment Atlas. One of the goals of the Food Environment Atlas is to “provide a spatial overview of a community's ability to access healthy food and its success in doing so.” However, the atlas is slow to load and does not allow for comparison between variables and finding patterns. Therefore, we decided to use the data to dig deeper. These data describe the county level and not individual people; nonetheless, given the large sample size, we can still find meaningful relationships. Furthermore, the Food Environment Atlas contains information from different studies from different years. All the variables we used are from studies between 2010-2014. Assuming that huge changes did not occur in the range of four years, we decided that the richness of the data is more important than the exact year of the study.
Our page occurs in four sections: the title section, the “Learn” section, the “Explore” section, and the “About” section. The title section is there to introduce our project, put it in context, and give a reason for why the viewer should care. The “Learn” section is a slideshow to give the user an overview of what is coming and prepare them for the interaction. The first slide is an introduction, the second slide describes the overview of the page they will see and interact with, and the remaining slides give examples of variables with statistically significant correlations, show a static image of what the interaction page looks like for those two variables, and include a description of what the correlation means and possible implication. The goal is for this to get the viewer’s mind “warmed up” and ready to make their own discoveries!
The “Explore” section is the most vital part of the project. It allows the user to interact with the data and find interesting patterns. We hope the users will find even more interesting patterns than we did. We created a choropleth map to meaningfully represent the distribution of data throughout the US. Because our goal is to compare different variables, we decided to overlay a second choropleth map, allowing the user to view the distributions of two variables at once and find patterns. We also provided a slider bar allowing the user to slide to each side and view each variable separately. The two dropdown menus allow the user to select one variable at a time for each layer. There is a legend for each dropdown underneath the map, each corresponding to the dropdown directly above it. The dropdown menus allow for selection of a variety of options without cluttering the screen with buttons.
The color scale chosen was a linear lightness scale from white to red or white to blue, due to the continuous nature of percentage data. Each variable's scale is customized to show contrast in their respective value ranges. Some variables have narrower ranges and some have wider ones. For example, adult diabetes rate ranges from 0 to 20%, while poverty rate ranges from 0 to 50%. All ranges start at 0. The variety of ranges is necessary so that the colors of the map can display meaningful change and patterns for the different variables.
In addition to the map, we added a scatterplot showing the relationship between the two variables for each county. We also included a linear regression line to indicate the overall trend in the data. While we would have liked to include p-values and R-Squared values for the linear regression, we were unable to find the proper tool with which to calculate these values on our webpage.
Finally, we included two textboxes: one containing the exact definitions of the variables currently selected, and one that displays details about a county when hovered over on the map. The details included are the county name, state, diabetes rate, obesity rate, and the variables currently selected (if different from the above). This allows the user to discover more specific information about each county to understand the underlying distribution of data better. We chose to include this data in a textbox on the side so the user can view this information without disturbing their overall view of the display.
The “About” section, this section, is to provide users with a deeper understanding of our data, design, and process.
Our team has 3 members. We worked together to choose the dataset and specify which parts of it we wanted to use. Together, we decided we wanted to visualize the data on a map. We split the development among the three of us. For the prototype, Jessica created the initial map visualization and added visualization of one variable. Boren added the remaining variables and added a dropdown menu for interaction so the user can visualize different variable on the map. Hao worked on getting our data into a JSON, beautifying our design and helped fix a few bugs that arose which took about 10 hours total. Since then, we worked together to improve our design. Hao took on the feat of adding a second layer to our map and a slider bar, importing new variables, editing the legends, and re-styling our entire page. Jessica added in the county details hover data, information to guide the user through the visualization, and the “Learn” section. Boren added in the trendline and linear regression line. This has project has been labor intensive and highly rewarding. We hope it helps improve people’s understanding of diabetes rates, obesity rates, and other factors.