Weather and Bike Sharing in San Francisco

Team Members: Matthew Chun, Felix Ouk, Boyan Li, Qianying Chen

Click here to see our visualization

Design Rationale

To understand why we chose our visual encodings and interaction techniques we must first explain our visualization goals and target audience. The goal of our visualization is to help bring a more environmentally friendly transportation method via bike sharing to big cities. We use San Francisco as a case study to influence politicians, local officials, and businesses of other cities to implement safe and well-designed bike sharing programs in their own cities. Our visualization is presented at a good time as some large cities like Seattle are looking to decrease their bike sharing program due to lack of use.

An important metric politicians and businesses track before implementing such a program is the number of people that use an existing bike sharing program. Our visualization provides a number of filters to allow our viewers to explore the bike sharing usage in varying times and temperatures. Through an exploration process our government and business users can learn how to optimize public tax dollars used on bike sharing systems and boost their political ratings as a result. Businesses would use our visualization to effectively analyze the amount of money they could make at any point of time and location. Police could use this visualization as well to explore when and where to patrol the city to keep bikers safe.

An important metric politicians and businesses track before implementing such a program is the number of people that use an existing bike sharing program. Our visualization provides a number of filters to allow our viewers to explore the bike sharing usage in varying times and temperatures. Through an exploration process our government and business users can learn how to optimize public tax dollars used on bike sharing systems and boost their political ratings as a result. Businesses would use our visualization to effectively analyze the amount of money they could make at any point of time and location. Police could use this visualization as well to explore when and where to patrol the city to keep bikers safe.

Our visualization depicts the relationship between weather and bike sharing usage patterns in San Francisco in 2014. In our prototype we focus on the relationship between the outside temperature and time versus bike sharing usage. To display this relationship our visualization consists of two correlated components: a weather plot visualization and a detailed map.

Our D3 plot is a dual line graph that simultaneously visualizes the change of temperature and number of trips over time. We chose a line encoding with time on the x-axis because line encodings encourage viewers to think about rate of change over time. We allow users to plot temperature and trip count data simultaneously in order to expose any correlation between the two variables. Users can choose to focus on correlation or a specific data line by toggling the temperature or trip count line on and off. When a specific day is selected by the user, we use a vertical today/current bar to partition the chart into a historical section and a future section to help our users contextualize the selected day's data. We help guide our viewers from the current day to the day they pick by animating an 800ms transition that slides from the original date to the selected date. We asked random people in the labs whether they preferred the transition or an erase and immediate redraw of the bar and found that the sliding transition was preferred. One interviewee said that the transition made it feel like they were "moving in time". We made sure to maintain the same time scale and kept the bar present throughout the entire transition to adhere to congruence and apprehension principles for animation. The today bar disappears when the user filters the data by temperature and is replaced by a highlight of applicable data.

The map below the weather plot shows bike sharing usage. It does this by creating an undirected graph where the nodes are the bike sharing stations and the edges are weighted by the usage of bikes between two stations. We use blue markers for the stations and red edges for the paths to follow a convention similar to what Google Maps has found to be effective color encodings for maps. The edges encode trip count but can have two representations, depending on the application of a temperature filter. When a temperature filter is not applied the edge weight represents the usage between two stations in both directions for one day. However, when the filter is applied the edge weight represents the average usage per day between two stations for all the days where the temperature was in the specified range. We encode edge stroke width with trip count or average trip count because length is the second best encoding according to Mackinlay's effectiveness ranking for quantitative fields.

Our visualization lets users explore further by binning the trip counts into low, medium, and high. We found these ranges by taking a statistically significant sample of the days to discover the ranges that would evenly distribute the data into the three ranges. We use color value (lightness or darkness) to encode the three ranges because it is the fourth best encoding technique for encoding ordinal fields according to Mackinlay. We used a darker shade of red for larger trip counts because it follows the convention presented in the color lecture that we encode larger values with darker color values. Now on to the shape of the edges. There are three reasons why we decided on straight line edges. The first reason we used a straight line edge is after using various shortest path API's like Google Maps, OSRM, etc we found the computation took too long or we would exceed the number of free API requests. Another reason we used straight-line paths is our dataset does not give us the paths, just the start and end station. Therefore even if we used the shortest path we would be making an assumption about the path the rider took. Lastly if we used the shortest path we would get a lot of occlusion over commonly used city streets and bike paths. Therefore, since we use stroke width to encode the trip count this may lead to confusing trip count values.

Each of the edges and markers on the map provide details on demand in the form of tooltips to display more information about the component of the visualization. The edge tooltips will show the actual trip count value and the marker tooltip will show the station name. An important design decision we made here was since users would use the edge tooltip more than the station tooltip we show edge tooltip when the user hovers over an edge and the station tooltip only when the user clicks on a marker.

We made a number of design decisions when deciding the display options of the map. First, instead of using a map as detailed as the default Google map view, we chose to use what Mapbox calls their light style. This removes nonessential details allowing viewers to focus on the bike sharing data. We are confident in this choice since our target users are politicians and businesses looking to improve their own cities and it is a fair assumption such users should have a good understanding of their own city's geography. Therefore, we provide a map with just enough details for our target users to easily orient themselves in the map. We also configured as the zone of the map to constrain the view. Although it is easy to drag and zoom, this has some negative implications because it makes it easy for users to lose where the data points are. Therefore we chose to constrain the view to a box around the data points and limit the amount a user is able to zoom out.

The map gives us a number of implicit encodings. One implicit encoding made is the map's positional encodings which is the most effective encoding for quantitative coordinate values, according to Mackinlay. The map also implicitly provides the common interaction techniques panning and zooming. On top of these basic interaction techniques, to explore the data further, we provide a number of filters. Some of these filters allow for incremental changes that allow the user for example to see how bike usage changes as temperatures changes over time.

We currently support three types of filters over temperature, time, and trip count. The first filter that we provide is one over temperature. We use a slider to allow users to filter to ranges of temperatures. We chose steel blue for the slider to represent a color commonly used to indicate temperature and to be consistent with the line that encodes temperature in the weather plot.

The next filter we provide is over time. We support time filtering by providing a calendar to change to specific days and while also providing a plus and minus button to increment by one day. There were a number of design decisions for the calendar. First when a temperature filter is applied we change the opacity of the calendar because a temperature range could refer to multiple days and therefore the calendar no longer makes sense. Another intentional design decision was we limited the days the user could select in the calendar to the days in the dataset. We will alert the user if they try to increment to an invalid day. The plus and minus buttons make it easy to see the incremental changes over time of bike usage. One will also notice the animated current line moves as the user changes the day. This acts as visual reinforcement for the change of date.

The last filter we support is one over trip count. Since there are a combinatorial explosion of possible edges (the factorial of the number of stations), on top of encoding the low, medium, and high trip counts with color value we also can show just the requested ranges with a series of checkboxes. This will allow users to see for example how the location of high usage bike sharing changes over temperature or time.

The three filters are great for exploring the data through our visualization. Another helpful thing to see is the important, actual values encoding in the map and weather plots. We show the most critical values in the summary section. We show current date, temperature, total number of trips, and annual average temperature and trip counts. We display values like total trip counts because this value would require hovering over all edges and summing the values. We predict that the user will want to know this value because it provides a quick summary of how active a specific day was. Due to the large number of filters, we also support a reset filters button which will reset all the filters and plots to their default values.

We also ran a couple usability tests. One issue we discovered during our usability tests had to do with the weather plot toggle buttons. These buttons were originally just text so they did not afford being pressed. We fixed that by changing the text to be a button. Another person in our usability pilot found the today line unintuitive. They didn't understand what it represented until they started changing the date with the calendar or minus and plus buttons. In order to remedy this we added a the text "current" to show better what the line represents.

Our visualization consists of two main plots, a map plot and weather plot. This correlated visualization shows the relationship between time, temperature, and bike sharing usage. To allow our target users to explore the data further and easily achieve their goals our visualization provides a rich set of filters over time, temperature, and bike sharing usage. All of our design decisions were intentionally done to be most optimal for our target user and their goals.

Development Process

The work was split evenly. We structured our team responsibilities similar to the structure recommended in lecture. Matt took on a project manager role which included coordinating, design and evaluation, and building various filter widgets. Qianying worked as the data lead writing the queries and scripts to get the data we needed after filtering and also the css styling. Boyan developed all the components of the map and helped glue all the team member's code together. Felix built the weather plot component of the visualization and helped with design and evaluation.

We roughly spent 140 people hours (35 hours per person). The three aspects that took the longest were learning the d3 and leaflet library, glueing together the various components of the visualization, and dealing with our large dataset.