The following slides demonstrate various interesting trends in the data.
They are intended to inspire your own data explorations in the following
interactive section. While unfortunately we do not currently have statistical
calculations set up for our interaction, we used outside tools such as Tableau
to statistically analyze these relationships.
In statistics, the p-value shows how likely a relationship between two variables is to have been observed by chance. If a p-value is less than 0.05, this means the relationship observed is highly unlikely to have occurred by chance and we accept that there is a statistically significant relationship between the two variables. The relationships on the following slides all have a p-value less than 0.05 and thus all are statistically significant. The R-Squared value describes the percent change in the dependent variable that is explained by change in the independent variable. That is, if R-Squared = 0.5, this means that change in the independent variable explains 50% of change in the dependent variable.
The data explored come from the USDA's Food Environment Atlas. It is a collection of data from a variety of sources including the USDA Economic Research Service (ERS), the US Census Bureau County Business Patterns (CBP), the USDA Food and Nutrition Service (FNS), and the US Centers for Disease Control and Prevention (CDC). Click the right arrow to view the next slide or use the dots below to navigate.
Our data set consists of different categories of information about each county in the United States taken from the USDA’s Food Environment Atlas. One of the goals of the Food Environment Atlas is to “provide a spatial overview of a community's ability to access healthy food and its success in doing so.” However, the atlas is slow to load and does not allow for comparison between variables and finding patterns. Therefore, we decided to use the data to dig deeper. These data describe the county level and not individual people; nonetheless, given the large sample size, we can still find meaningful relationships. Furthermore, the Food Environment Atlas contains information from different studies from different years. All the variables we used are from studies between 2010-2014. Assuming that huge changes did not occur in the range of four years, we decided that the richness of the data is more important than the exact year of the study.
Our page occurs in four sections: the title section, the “Learn” section, the “Explore” section, and the “About” section. The title section is there to introduce our project, put it in context, and give a reason for why the viewer should care. The “Learn” section is a slideshow to give the user an overview of what is coming and prepare them for the interaction. The first slide is an introduction, the second slide describes the overview of the page they will see and interact with, and the remaining slides give examples of variables with statistically significant correlations, show a static image of what the interaction page looks like for those two variables, and include a description of what the correlation means and possible implication. The goal is for this to get the viewer’s mind “warmed up” and ready to make their own discoveries!
The “Explore” section is the most vital part of the project. It allows the user to interact with the data and find interesting patterns. We hope the users will find even more interesting patterns than we did. We created a choropleth map to meaningfully represent the distribution of data throughout the US. Because our goal is to compare different variables, we decided to overlay a second choropleth map, allowing the user to view the distributions of two variables at once and find patterns. We also provided a slider bar allowing the user to slide to each side and view each variable separately. The two dropdown menus allow the user to select one variable at a time for each layer. There is a legend for each dropdown underneath the map, each corresponding to the dropdown directly above it. The dropdown menus allow for selection of a variety of options without cluttering the screen with buttons.
The color scale chosen was a linear lightness scale from white to red or white to blue, due to the continuous nature of percentage data. Each variable's scale is customized to show contrast in their respective value ranges. Some variables have narrower ranges and some have wider ones. For example, adult diabetes rate ranges from 0 to 20%, while poverty rate ranges from 0 to 50%. All ranges start at 0. The variety of ranges is necessary so that the colors of the map can display meaningful change and patterns for the different variables.
In addition to the map, we added a scatterplot showing the relationship between the two variables for each county. We also included a linear regression line to indicate the overall trend in the data. While we would have liked to include p-values and R-Squared values for the linear regression, we were unable to find the proper tool with which to calculate these values on our webpage.
Finally, we included two textboxes: one containing the exact definitions of the variables currently selected, and one that displays details about a county when hovered over on the map. The details included are the county name, state, diabetes rate, obesity rate, and the variables currently selected (if different from the above). This allows the user to discover more specific information about each county to understand the underlying distribution of data better. We chose to include this data in a textbox on the side so the user can view this information without disturbing their overall view of the display.
The “About” section, this section, is to provide users with a deeper understanding of our data, design, and process.
Our team has 3 members. We worked together to choose the dataset and specify which parts of it we wanted to use. Together, we decided we wanted to visualize the data on a map. We split the development among the three of us. For the prototype, Jessica created the initial map visualization and added visualization of one variable. Boren added the remaining variables and added a dropdown menu for interaction so the user can visualize different variable on the map. Hao worked on getting our data into a JSON, beautifying our design and helped fix a few bugs that arose which took about 10 hours total. Since then, we worked together to improve our design. Hao took on the feat of adding a second layer to our map and a slider bar, importing new variables, editing the legends, and re-styling our entire page. Jessica added in the county details hover data, information to guide the user through the visualization, and the “Learn” section. Boren added in the trendline and linear regression line. This has project has been labor intensive and highly rewarding. We hope it helps improve people’s understanding of diabetes rates, obesity rates, and other factors.