Diabetes and Obesity Factors

Jessica Schwartz, Boren Li, Hao Hu Zhao

About 9.3% of the adult population in the US has been diagnosed with diabetes and about 35.7% of adults in the US are considered obese. A 2017 study by Cleveland Clinic found that obesity is the leading cause of early preventable deaths in the US, exceeding even tobacco. This project aims to explore different factors that may be associated with obesity and diabetes, their distribution across the US, and how these variables correlate with each other. Finding patterns in these relationships may improve our understanding of public health issues and help inform policy. It is important to note that the relationships found between variables are solely correlational and have not been proven to be causal. Nonetheless, we can learn a lot by observing these relationships.
The "Learn" section contains descriptions of a few interesting relationships we have found in the data.
In the "Explore" section, you may explore the data for yourself, comparing different variables and viewing the relationships between them.

Learn

1 / 9

The following slides demonstrate various interesting trends in the data. They are intended to inspire your own data explorations in the following interactive section. While unfortunately we do not currently have statistical calculations set up for our interaction, we used outside tools such as Tableau to statistically analyze these relationships.
In statistics, the p-value shows how likely a relationship between two variables is to have been observed by chance. If a p-value is less than 0.05, this means the relationship observed is highly unlikely to have occurred by chance and we accept that there is a statistically significant relationship between the two variables. The relationships on the following slides all have a p-value less than 0.05 and thus all are statistically significant. The R-Squared value describes the percent change in the dependent variable that is explained by change in the independent variable. That is, if R-Squared = 0.5, this means that change in the independent variable explains 50% of change in the dependent variable.
The data explored come from the USDA's Food Environment Atlas. It is a collection of data from a variety of sources including the USDA Economic Research Service (ERS), the US Census Bureau County Business Patterns (CBP), the USDA Food and Nutrition Service (FNS), and the US Centers for Disease Control and Prevention (CDC). Click the right arrow to view the next slide or use the dots below to navigate.

2 / 9
Let's start our exploration by familiarizing ourselves with the layout. At element 1, there are two drop down bars with which you can select variables to compare. Two choropleth maps are layered in element 2: the left variable in blue, the right variable in red. A legend describing the color values appears below the map in element 3. Element 4 is a slider bar, allowing you to slide to either side to view the distribution of that variable alone. By default the map is overlaid so you can find patterns. Many purple regions implies the variables are positively correlated with each other, patches of red and blue implies negative correlation, and patches of all three implies no correlation. When you hover over a county, the county name, state, diabetes and obesity rates, and the values of the variables you have selected will appear in element 5. Element 6 contains a scatterplot of the distribution of counties for the selected variables and a linear regression line, and element 7 contains the definitions of the variables selected.
3 / 9
Here, we can explore the percentage of households receiving SNAP benefits and its relationship to the percentage of adults with diabetes. The percentage of adults with diabetes increases as the percentage of the population receiving SNAP benefits increases with R-Squared=0.0125. This means that about 1.3% of variation in adult diabetes in a county can be explained by variation in the percentage of households receiving SNAP benefits. This could mean many things: that households receiving SNAP benefits are more likely to eat less nutritious food due to pricing, or possibly the two have nothing to do with each other and there is something else causing variation in both SNAP benefits and adult diabetes rates.
4 / 9
Here, we can explore the percentage of students eligible for free lunch benefits and its relationship to the percentage of adults with diabetes. The percentage of adults with diabetes increases as the percentage of students eligible for free lunch increases (R-Squared 0.0229). This means that about 2.3% of variation in adulthood diabetes can be explained by variation in the percentage of students eligible for free lunch. Perhaps families with lower income in which the children qualify for free lunch cannot afford food with enough nutritional variety. This would cause an increase in diabetes among the adults. Again, there is no way to determine what causes this relationship, but it exists and therefore is important to consider.
5 / 9
Here, we can explore the relationship between the poverty rate the obesity rate in a county. The adult obesity rate increases with the poverty rate with R-Squared=0.0122. From this we learn 1.2% of variation in adult obesity can be explained by variation in the poverty rate. It is possible that there is another factor common to these areas that causes these rates to have such a strong relationship? Or do people living in poverty simply have less access to nutritious food, and thus their diet causes obesity?
6 / 9
Here, we can explore the relationship between the poverty rate the diabetes rate in a county. The adult diabetes rate increases with the poverty rate with R-Squared=0.0169 From this we learn 1.7% of variation in adult diabetes can be explained by variation in the poverty rate. It is possible that there is another factor common to these areas that causes these rates to have such a strong relationship? Or do people living in poverty simply have less access to nutritious food, and thus their diet contributes to the development of diabetes?
7 / 9
Here, we can explore the relationship between the percentage of adults with diabetes and the price of soda in each county. The diabetes rate decreases as soda price increases with P-value < 0.0001 and R-Squared about 0.0094. This means that about 1% of change in the diabetes rate in each county can be attributed to change in the price of soda. A possible explanation for this relationship is that higher soda prices discourage people from consuming soda, decreasing people’s sugar consumption and thus potentially decreasing the diabetes rate. However, there are many possibilities.
8 / 9
Here we explore the relationship between the adult obesity rate and the number of recreation and fitness facilities available. An increased number of recreation and fitness facilities correlates with decrease obesity rates with R-Squared= 0.0332. This means that 3.2% of variation in obesity rates can be explained by variation in the availability of fitness facilities. It is possible that counties with many recreation facilities may also have other attributes that lead to lower obesity rates. Or, it’s possible that higher availability of fitness centers means more people have access to exercise and thus the obesity rate is lower. Counties with more fitness facilities likely have higher demand for fitness facilities. This demand could either be created by a wealthier population, or just a population more aware of the health benefits of exercise. It is also important to notice the lack of data for many counties. Regardless, this relationship could be important to consider when policy is discussed.
9 / 9
Here we explore the relationship between the diabetes rate and the number of recreation and fitness facilities available. An increased number of recreation and fitness facilities correlates with decrease diabetes rates. Adult diabetes decreases as this number increases with R-Squared= 0.0761. This means that 7.6% of variation in diabetes rates can be explained by variation in the availability of fitness facilities. It is possible that counties with many recreation facilities may also have other attributes that lead to lower diabetes rates. Or, it’s possible that higher availability of fitness centers means more people exercise and thus the diabetes rate is lower. It is a very important trend to notice due to its strength. It is also important to notice the lack of data for many counties. This relationship could be important to consider when policy is discussed.

Explore

'
County details
Variable descriptions

Data

Our data set consists of different categories of information about each county in the United States taken from the USDA’s Food Environment Atlas. One of the goals of the Food Environment Atlas is to “provide a spatial overview of a community's ability to access healthy food and its success in doing so.” However, the atlas is slow to load and does not allow for comparison between variables and finding patterns. Therefore, we decided to use the data to dig deeper. These data describe the county level and not individual people; nonetheless, given the large sample size, we can still find meaningful relationships. Furthermore, the Food Environment Atlas contains information from different studies from different years. All the variables we used are from studies between 2010-2014. Assuming that huge changes did not occur in the range of four years, we decided that the richness of the data is more important than the exact year of the study.

Design Rationale

Our page occurs in four sections: the title section, the “Learn” section, the “Explore” section, and the “About” section. The title section is there to introduce our project, put it in context, and give a reason for why the viewer should care. The “Learn” section is a slideshow to give the user an overview of what is coming and prepare them for the interaction. The first slide is an introduction, the second slide describes the overview of the page they will see and interact with, and the remaining slides give examples of variables with statistically significant correlations, show a static image of what the interaction page looks like for those two variables, and include a description of what the correlation means and possible implication. The goal is for this to get the viewer’s mind “warmed up” and ready to make their own discoveries!

The “Explore” section is the most vital part of the project. It allows the user to interact with the data and find interesting patterns. We hope the users will find even more interesting patterns than we did. We created a choropleth map to meaningfully represent the distribution of data throughout the US. Because our goal is to compare different variables, we decided to overlay a second choropleth map, allowing the user to view the distributions of two variables at once and find patterns. We also provided a slider bar allowing the user to slide to each side and view each variable separately. The two dropdown menus allow the user to select one variable at a time for each layer. There is a legend for each dropdown underneath the map, each corresponding to the dropdown directly above it. The dropdown menus allow for selection of a variety of options without cluttering the screen with buttons.

The color scale chosen was a linear lightness scale from white to red or white to blue, due to the continuous nature of percentage data. Each variable's scale is customized to show contrast in their respective value ranges. Some variables have narrower ranges and some have wider ones. For example, adult diabetes rate ranges from 0 to 20%, while poverty rate ranges from 0 to 50%. All ranges start at 0. The variety of ranges is necessary so that the colors of the map can display meaningful change and patterns for the different variables.

In addition to the map, we added a scatterplot showing the relationship between the two variables for each county. We also included a linear regression line to indicate the overall trend in the data. While we would have liked to include p-values and R-Squared values for the linear regression, we were unable to find the proper tool with which to calculate these values on our webpage.

Finally, we included two textboxes: one containing the exact definitions of the variables currently selected, and one that displays details about a county when hovered over on the map. The details included are the county name, state, diabetes rate, obesity rate, and the variables currently selected (if different from the above). This allows the user to discover more specific information about each county to understand the underlying distribution of data better. We chose to include this data in a textbox on the side so the user can view this information without disturbing their overall view of the display.

The “About” section, this section, is to provide users with a deeper understanding of our data, design, and process.

Development Process

Our team has 3 members. We worked together to choose the dataset and specify which parts of it we wanted to use. Together, we decided we wanted to visualize the data on a map. We split the development among the three of us. For the prototype, Jessica created the initial map visualization and added visualization of one variable. Boren added the remaining variables and added a dropdown menu for interaction so the user can visualize different variable on the map. Hao worked on getting our data into a JSON, beautifying our design and helped fix a few bugs that arose which took about 10 hours total. Since then, we worked together to improve our design. Hao took on the feat of adding a second layer to our map and a slider bar, importing new variables, editing the legends, and re-styling our entire page. Jessica added in the county details hover data, information to guide the user through the visualization, and the “Learn” section. Boren added in the trendline and linear regression line. This has project has been labor intensive and highly rewarding. We hope it helps improve people’s understanding of diabetes rates, obesity rates, and other factors.