Summary
Transformation of complex data into a visual form — possibly interactive — to more easily comprehend trends, outliers, and significant patterns in large data sets.
Jots
also: InfoVis, DataVis, or even just viz
Key attributes to ponder
- scalability to large number of data
- interactivity for multiple views
Why?
- Involve lizard brain in data analysis process. Lizard brain quickly notice patterns and changes in display of data to support ape-brain understanding and decision-making.
- Interactive data visualization can also engage as a form of play – I know we are describing very important things with that data, but “what if we change that value or reassign this axis” is also make fun colors move go pretty sparkly oh no that bit look wrong please fix this, ape-brain.
When?
- when there’s a common underlying structure
- when users don’t know the details of a collection
- when users don’t understand how the collection is organized (show vs tell for learning)
- when info is easier to recognize than to describe
Subfields of Data Visualization
- scientific visualization
- model real-world phenomena
- information visualization
- present abstract concepts for decision making and analysis
- visual analytics
- combining interactive visualization with data analysis and transformation to enable high-level activities like sense-making and decision-making.
Chart Types
- Bar Chart
- for grouping by category or illustrating discrete variables
- Histogram
- groups data into bins to show distribution over a continuous interval or time period
- helps identify trends, outliers, patterns, and anomalies
- Line Chart
- lines connecting Cartesian points
- Stacked Chart
- compare different categories and total sizes at the same time
- Scatter Plot
- uses Cartesian coordinates to show values over two variables
- Heatmap
- Represent matrix values in a heatmap as colors, representing degree or strength of an occurrence with differing shades or intensities.
- useful for pattern recognition and outlier detection
- Funnel Chart
- Displays values in progressively diminishing stages, to better understand those stages and how they contribute to a system’s output
- Pie Chart
- a circle divided into slices to visualize simple categorization and grouping of variables
- best for a small number of categories with obvious proportions.
Data Visualization Tools
- Pandas and other DataFrame toolkits make frequent appearances with data visualization tools. Many of the listed data visualization tools are Python because it’s a popular choice but also because that’s mostly what I know. I hope to fill in some gaps later.
- Bokeh
- Python library for Web-based data visualization
- Bokeh documentation
- D3.js
- Popular data visualization library for JavaScript
- D3 by Observable | The JavaScript library for bespoke data visualization
- ggplot2
- A data visualization tool for the R programming language with an API based on The Grammar of Graphics
- plotnine - A Grammar of Graphics for Python #Python
- Holoviz
- A whole suite of high-level tools and libraries for more powerful data visualization applications. Many of these tools show off their best side from inside a Jupyter notebook.
-
Panel
- for creating and maintaining DataViz tools, dashboards, and applications in Python
- Overview — Panel
-
hvPlot
- API for data visualization and exploration
- hvPlot
-
HoloViews
- pretty handy for live exploration of data
- HoloViews
-
GeoViews
- visualization and exploration of geographical, meteorological, and oceanographic data
- GeoViews
-
Datashader
- Pipeline for visualization of large datasets
- Datashader
-
Lumen
- declarative framework for data dashboards
- Welcome to Lumen!
-
Param
- Parameter declaration in Python
- Loads of overlap with Pydantic and attrs, but it’s a tight enough API to still be worthwhile as a concise means of expressing type information and validation requirements.
- Welcome to Param!
-
Colorcet
-
- High-level tools to simplify visualization in Python — HoloViz
- A whole suite of high-level tools and libraries for more powerful data visualization applications. Many of these tools show off their best side from inside a Jupyter notebook.
- matplotlib
- Possibly the most widely known data visualization library for Python?
- Matplotlib — Visualization with Python
- seaborn
- High-level interface to matplotlib
- seaborn: statistical data visualization — seaborn 0.13.2 documentation
- Plotly
- Company producing commercial and open source tools for data visualization. I’ll focus on the open source stuff until I need to concern myself with their commercial offerings.
- Plotly: Low-Code Data App Development
- Plotly Graphing Libraries
- API for interactive data visualization available in several languages
- Plotly Open Source Graphing Libraries
- Dash
- Plotly’s low-code framework for data applications
- https://dash.plotly.com/
- Power BI
- data visualization tool from Microsoft
- Power BI - Data Visualization | Microsoft Power Platform
- Tableau
- No-code tool for transforming, understanding, and visualizing raw data
- Business Intelligence and Analytics Software | Tableau
- Vega-Altair
- Python library for data visualization using the Vega-Lite grammar. I enjoyed using this one because of its consistent language which I could understand even without a strong backing in data visualization.
- Vega-Altair: Declarative Visualization in Python — Vega-Altair 5.3.0 documentation
- A High-Level Grammar of Interactive Graphics | Vega-Lite
Related
- Data Analysis
- Data Science
- Key Concepts of Data Visualization #Media/BlogPost
Backlinks
Added to vault 2024-05-06. Updated on 2024-05-06