CSE442 | Your Corner of the News Ecosystem

Team Members: Conor Kelly, Brendan King, Stephen Fleischman, Jingwei Zhang

Guiding Question : How might we allow the user to explore the interconnectivity of the online news ecosystem and allow the user to compare their personal filter bubble to the "global graph" network?

Design Rationale

Network Graph

After an initial discussion about project goals and framing, and inspired by GDELT's visualization, we decided to use a network graph to represent connections between various sources of news media content. We are using nodes to represent user curated urls scrapped from reddit pages, binned by domain for the sake of scale. Edges are still being being defined but for this proof of concept they represent hyperlinks between sources.

Tooltip

The tooltip is an important feature in this type of visualization as it provides the current primary means of drilling down into the data. Here we can include additional information and content relating to the node in focus. A challenge this method brings to the table is occlusion, when showing the tooltip the div container and content overlay other nodes, making them difficult to get to.

Spatial/Structural Encoding

Perhaps the most important ability of any exploratory visualization is the capacity to view a given dataset at different cross sections, promoting insights for the user. We chose to encode this cross sectioning via the spatial positioning of nodes to form an overall graph structure. Reshaping the graph according to various attribute affinities will allow users to quickly infer deeper relational patterns among nodes.

Color Encoding

In this prototype we are using color encoding to indicate selection and node distance independently. When Step Mode is enabled, selecting a node will update the colors of connected nodes to encode their distance from the selected node. Currently this color scheme uses several standard css colors but can easily be converted to use a gradient scale, representing closer connections with more saturated color.

Development Process

In the early stages of the process our team identified key components to research, test, and build and diverged in independent exploration. We then converged on a subset of critical features to test in our prototyping assignment.

Web Scraping

While the Initial DataSet through the GDELT project was decent, we wanted to try and supplement that dataset with an additional dataset to analyze. As our visualization uses a simple graph model of nodes / edges, there were many possibilities. We decided to scrape reddit for information, as we felt the site was large enough to find some “links” between news sites. Politically-Oriented subreddits were chosen to be scraped, and the top 1000 or so posts were collected and processed. For the top news sites in each subreddit, we established an association between those sites. In the end, we are left with a graph of news sites linked by this associativity.

Front End Setup

Our initial concept was to initialize a single page app which handled and manipulated D3 instances throughout the DOM. We initialize a new “GlobalGraph” object to render and maintain the main visualization while external data sources and user interactions are handled by the SPA. We attempted to use DRY (Don’t Repeat Yourself) coding resulting in somewhat fragmented code. I’m not too sure how to strike a balance between not repeating code and not dispersing functionality throughout the file. The GlobalGraph object grew way too big and will either have to be broken down or dissolved altogether. The initialization of the GlobalGraph chould be moved inside of the SPA to enable multiple graphs on a page.

D3 Forces / Structuring

A significant challenge in getting any graph to visualize well is position nodes and edges in a meaningful way that allows users to easily interpret structure in the graph. With force layouts, it is challenging to make an arbitrary graph present well, so we focused on a concrete example. Future work will allow adjustments to forces and their parameters as an element of user interaction with the visualization, allowing users to have sub-graphs cluster, push apart, etc.

In short, positioning is a vital part of a good network visualization but also one of the most challenging. In our prototype, we aimed to have all nodes visually separated and meaningfully dispersed.We also wanted the graph centered in the display, and for link structure to have some distinguishable impact on graph layout (i.e. nodes that are closely connected in the graph should appear closer together). We were able to accomplish these with simulated electrical charge, centering forces, and link based forces.

We also support highlighting of subgraphs, provided to the visualization as a list of nodes (this could be a list of a user’s visited web sites, for example). Currently, our highlights primarily distinguish matching nodes by color. We made attempts at separating and clustering sub graphs with forces, but had mixed results. Next steps will involve meaningful force responses when selecting a sub graph, as well as highlighting edges (with shading) which participate in these sub graphs. Color and shading of edges will allow a user to quickly distinguish a component, and this can be enhanced with force interactions with the rest of the graph, as well as other labelled components.

Djikstra algorithm

One the biggest difficulties when working with a connected graph is producing a visualization where insights can be gathered. Graphs provide great insight into the overall relation between nodes and groups of nodes, but they fall flat when trying to look at how a specific node relates to other nodes. To address this, we wanted a way to perceive the distance from a selected node to every other node. Luckily, a specific algorithm had already been invented for this very task; Dijkstra’s Algorithm! We implemented this so when a node is clicked, the rest of the nodes change their color based on their distance from the selected node. The color scale goes from Green to Red as the distance increases. In the future this will be a linear scale where transitions are more smooth. For now, this works fine as a proof of concept.

Next Steps

The hardest part so far has been finding a decent system and setup of visualization. Questions alike “How do we organize the data Schema” and “How do we handle Events” took a large time to both decide on, and to actually implement. Now that we’ve got these details mostly ironed out and implemented, our focus will be shifted towards using this framework to power a visualization where meaningful insights can be gathered.

One of the next steps for us is to further organize and Drill down our data. The datasets work well as prototypes, but they are very much unrefined. The question to “What defines a ‘link’ between sites?” is a big question that we will need to invest serious time towards. Scraping the web gives us many possibilities, but it’s up to us on figuring out a meaningful relation between the each site, and furthermore displaying that.

More so, our prototype needs serious refinement in the UI side. Buttons and Interactions are not very user-friendly, with unclear purposes. The design overall is very rough and messy, and tools alike our tooltip need serious refinement.

We might need a relational database in the future to organize user’s profile with history data for a larger dataset. For the relational database, it need to read data from the JSON file and we can get more information about users from that database easily.