NORTHERN ILLINOIS UNIVERSITY
Information Dashboard to Visualize Large Datasets
A Capstone Submitted to the
University Honors Program
In Partial Fulfillment of the
Requirements of the Baccalaureate Degree
With Honors
Department Of
Computer Science
By
May-Myo Khine
DeKalb, Illinois
May 11, 2019
University Honors Program
Capstone Approval Page
Capstone Title (print or type)
Information Dashboard to Visualize Large Datasets
Student Name (print or type) _____May-Myo Khine______________
Faculty Supervisor (print or type) ___Dr. Michael Papka___________
Faculty Approval Signature _________________________________
Department of (print or type) ___Computer Science______________
Date of Approval (print or type) ______________________________
Completed Honors Capstone projects may be used for student reference
purposes, both electronically and in the Honors Capstone Library (CLB 110).
If you would like to opt out and not have this student’s completed capstone used
for reference purposes, please initial here: _______ (Faculty Supervisor)
HONORS CAPSTONE ABSTRACT
Data visualization is the representation of data and information using visual
elements such as graphs, charts and maps to communicate information clearly
and efficiently. Data visualization exposes data patterns, trends and correlations
better than text-based data. It allows users to comprehend and analyze
information quickly. It makes complex data to be more understandable and
usable.
This Capstone project is a research study of creating data visualization
dashboard to present data in a meaningful way, and to improve understanding of
large data sets in a simple format. To achieve the project purpose, I use datasets
from the Array of Things (AoT) project. AoT is a network of nodes that are
installed around the cities to collect real-time, location-based data on
environment, infrastructure and activity for research and public use. There are
AoT nodes in Chicago, Denver, Detroit, Portland, Seattle, Stanford, and
Syracuse.
The result of this capstone project is an interactive dashboard to visualize the
AoT data focusing on the city of Chicago. Using the dashboard, users can select
the node id and the tool presents the result data using visual graphs and charts
of the selected node id. Though my work uses and creates a visualization tool for
the AoT data, the techniques and ideas that can be applied to other similar data
sources as well.
Project Description
This project report includes project abstract, project description, methodology,
project and project achievements. The objective of this project is to fulfil the
requirements of the Baccalaureate degree with Honors.
My project title is Information Dashboard to Visualize Large Datasets. This
project is to create an interactive dashboard that present large datasets in a clear
visual representation. To achieve the objective of this project, I wrote NodeJS
server and client. NodeJS server will query the data from PostgreSQL database
and livestreaming data sources. Client side will be able to connect to the server
and get the data to visualize. I mainly used HTML, CSS, JavaScript - Node.js;
express.js; D3.js, PostgreSQL database, GitHub and Chicago datasets from
Array of Things (AoT) project, urban sensing project.
Project Overview
Array of Things Dataset
The Array of Things (AoT) is an urban sensing research project among scientists,
universities, local governments and communities which is led by researchers
from the Urban Center for Computation and Data a joint initiative of Argonne
National Laboratory and the University of Chicago. AoT is a network of nodes
that are installed around the cities to collect real-time, location-based data on
environment, infrastructure and activity for research and public use. There are
AoT nodes in Chicago, Denver, Detroit, Portland, Seattle, Stanford, and
Syracuse.
The AoT nodes have multiple sensors to measure environmental data such as
temperature, barometric pressure, light, vibration, carbon monoxide, nitrogen
dioxide, sulfur dioxide, ozone, ambient sound intensity, physical shock/vibration,
acceleration and orientation, sunlight intensity, sound intensity and so on. The
AoT data is published openly with free of charge to the public to study urban
environments, develop new data analysis tools and applications.
The sensors in AoT nodes collect the data every ~30 seconds. For this project, I
used AoT dataset for the city of Chicago. There are over 100 nodes with sensors
around Chicago. The current AoT dataset for Chicago is above 300 GB and is
growing rapidly.
HTML
HTML is the abbreviation of Hypertext Markup Language. It is the standard
markup language to create web pages. HTML is one of three core technologies
for the World Wide Web (www).
HTML is used to render the documents into web pages and to build the structure
of a web page. HTML embeds programs writing in JavaScript, and the content of
HTML can be defined by CSS. HTML file has the extension .html.
CSS
Cascading Style Sheets (CSS) is a stylesheet language. It is used to describe
how the HTML documents to be rendered on screen. CSS is also a core
technology for the World Wide Web. Markup languages such as XHTML, XML,
SVG and HTML supports the use of CSS. CSS file has the extension .css.
JavaScript
!
JavaScript is a high-leveled, lightweight interpreted compiled programming
language. Alongside HTML and CSS, JavaScript is one of the core technologies
for the Web pages. JavaScript is also known as JS. JS supports event-driven,
functional and object-oriented programming styles. JS was used only for client-
side in web browsers initially. These days, JS is used for both client-side and
server-side of the web. JS file has the extension .js.
Node.js
Node.js is an open-source and cross-platform JavaScript run-time environment. It
executes JavaScript code outside of a web browser and has a unique advantage
– developers can write JavaScript on the server side in addition to the client side
without a requirement to learn a different language. Node.js is an event-driven,
non-blocking I/O model. Node.js is used for server side programming to build fast
and scalable network applications. npm registry hosts around 500,000 packages
for Node.js. There are thousands of libraries were built upon Node.js.
npm
Npm is the world’s largest software library. It is also a software package manager
and installer. npm is used to share software by open-source developers. npm
has a command line client (CLI) to download and install software. It is required to
install Node.js to install npm.
Express.js
Express.js is one of the most popular Node.js libraries. Express is a web
application framework that provides a robust set of features for applications. It is
a simple and powerful ways to create a web server.
D3.js
D3.js is a JavaScript library to create dynamic, interactive data visualization.
D3.js uses HTML, CSS and SVG to create powerful visual representation of data
on modern browsers. It is required to add ‘<script
src='https://d3js.org/d3.v4.min.js'></script>’ in HTML file script tag to use D3.js.
PostgreSQL
PostgreSQL (Postgres) is an open-source relational database management
system (RDBMS). Postgres is well known for its architecture, reliability, data
integrity, robust feature set and extensibility. Postgres uses and extends the SQL
language, and it can run on various platforms such as UNIX, Mac OS X,
Windows and so on.
GitHub
GitHub is a web-based hosting service. It provides version control, access control
and collaboration features such as a wifis and basic task management tools.
GitHub uses Git, an open source project started by Linus Torvalds. GitHub is
mostly used for codes.
Raspberry Pi with Touch Screen
Raspberry Pi is a series of small and affordable single-board computers
developed in the United Kingdom by the Raspberry Pi Foundation with the
purpose to encourage and promote teaching basic programming and coding to
schools. I used Raspberry Pi 3 Model B+ and Raspberry Pi Touch Display for this
project. The interactive data visualization dashboard is specifically designed for
Raspberry Pi Touch Display. It is a 7” touchscreen monitor with 800 x 480 display
for Raspberry Pi. It lets Raspberry Pi into an interactive standalone touch screen
tablet.
Methodology
The very first step of working on this project was learning Node.js.
It was for the dashboard server side which will pull the node status, recent data
and last seven data from the PostgreSQL.
My first attempt of the server was able to retrieve from the AoT links successfully.
All the code were written in server.js file. However, it was not compactable to
fetch multiple requests over time. I learned the lesson and I rewrote my server
from the scratch for the second time. The second server not only using Node.js
standalone but also along with Express library.
The new NodeJS server contains multiple files with various purpose. There are
main index file to call api file to start the server, routes/index.js file to receive
requests from the client and routes/query.js file to retrieve data. There is also
db/index.js to connect to the postgres database.
On the client side, there are project.html, Chicago.json, barGraph.js, heatMap.js,
LineGraph.js, map.js, nodeid.js and project.css. Poject.html contains the html
code for the webpage structure and required documents. Project.css is the
styling sheet for HTML file. Js/map.js is the JavaScript file to draw the Chicago
map using Chicago.json file that contains spatial data. This JS file also draw the
nodes on the map based on the node’s latitude and longitude. Js/barGraph.js
contains the function to create the bar chart that present how many nodes are on
online and how many nodes are not on online.
Js/ nodeid.js contains the function to print the selected node id on the right side
of the dashboard. Js/lineGraph.js has the function to draw the line graph and its
features. Js/heatMap.js contains the function to create the heat map. The project
starts by starting the server using this command:
PGUSER=postgreUserName\PGHOST=IPaddress_of_serverMachine\
PGPASSWORD= \ PGDATABASE=database_Name \ PGPORT=port_Number \
node index.js.
Then, open the html file. When the html file is opened, it will connect to the
server, and server will call required files and pull all the node id and its status
from the Chicago area to server the client side. Node status will be retrieved from
this link:
https://www.mcs.anl.gov/research/projects/waggle/downloads/beehive1/live-
nodes.txt.
Map.js will draw the Chicago map and node ids, and then it will call barGraph.js
to present total node id based on its status.
Clicking on the node id action will call nodeId.js lineGraph.js and heatMap.js.
nodeId.js will print the selected Node ID on the dashboard. lineGraph.js will send
recent data request to the server side, and server will pull recent data from this
link:
https://www.mcs.anl.gov/research/projects/waggle/downloads/datasets/AoT_Chic
ago.complete.recent.csv.
heatMap.js will send last seven PostgreSQL data to the server side. Once server
side receives the client request, it will connect to the database and pull the data
using the credentials, node ID and date provided by the client.
This dashboard is intended for the Raspberry Pi Touchscreen display, and the
temperature values for this project is from the “pr103j2” AoT sensor.
Conclusion
The accomplishment of this project is an interactive dashboard that presents the
AoT data in a meaningful way. By default, the dashboard presents Chicago map
with the AoT nodes on the left side of the dashboard. Red nodes indicate that the
nodes are offline, and the green nodes are the ones on online. The bar chart
under the Chicago map represents the total number of nodes based on its status.
The user can hover on the node to see its node ID and its address. If the user
click on a node ID, the selected node ID will be printed on the right top side of the
dashboard. If there is the recent data available for the selected node Id, the line
graph will be presented. If there is the data from last seven day for that node ID,
there will be a heat map. If there is no data, a message will be printed instead of
the graph. Since this is an interactive graph, user can hover on the line graph
and heat map to get the temperature and time of the hover point.
This project outcome can be used to see if the AoT nodes’ status, the recent
temperature and the last week temperature changes in the Chicago area. This
project can also be used for other urban sensing project or as a part of the
Internet of Things (IOT) projects.
To sum up, this has been a wonderful experience. It was my first time writing a
server using Node.js. I learned new web development skills which will be carried
with me throughout my life. I developed my JavaScript skill and D3 knowledge.
Throughout the project, there were many obstacles, the biggest challenge was to
have a stable server that is capable of dynamic multiple requests. I wrote the D3
graphs separately using local data because I had trouble with my server. There
were some complications while putting all the D3 graphs into a web page using
server data. However, these difficulties pushed me to work harder, think outside
of the box and be a better learner. Overall, I am delighted with the project,
accomplishments and experience under the supervision of Dr. Papka.
Dashboard
The start of the dashboard
When a node ID is selected, the graphs are presented on the right side.
Hover on the line graph
Hover on the heat map
Hover on a node Id
No data message