Collecting my attempts to improve at tech, art, and life

# Data Analysis

## Summary

Examining and summarizing large data sets to identify trends and help organizations make strategic decisions.

## Jots

Related to and often tagged as Data Science. Data analysis is more about using past data to inform current decisions, while data science is trying to predict future outcomes based on available data.

### Relevant Categorizations

Should probably break these out into their own pages eventually.

Kinds of data: categorical or numerical, discrete or continuous

Kind Examples
categorical gender, location
numerical number of customers, active users
discrete number of applicants to a job
continuous infinite possible outcomes

Data characteristics:

Characteristic Explanation
cross-sectional snapshot of pattern or trend
time series test scores, wages over time
panel data multiple subjects and multiple points in time
dispersion
how data is organized; high dispersion has a large range
confidence interval
a range of values likely to include population value with a certain degree of confidence
sampling population
selection of a subset of individuals from within a statistical population, used to estimate characteristics of the whole population

#### Basic Data Analysis #process

An iterative loop:

• Disassemble the problems and data into smaller pieces
• Evaluate the problems and data to draw conclusions about what you’ve learned
• Decide on a course of action that solves the problem

#### Main Concepts for Data Analysis Workflow

Data Collection
systematically gathering and measuring information about specific variables for later processing
Data Cleanup
inspecting and processing raw data to improve its quality, integrity, and relevance to the problem at hand.
Data Exploration
identifying patterns and anomalies in data while examining its structure, and testing hypotheses to confirm or clarify your understanding.

Data Visualization

Statistical Analysis
transforming data sets into information that can be used for understanding and decision-making.
Machine Learning
a subset of Artificial Intelligence that allows an application to discern patterns and automatically improve its analysis of extremely large datasets over time.

#### Types of Data Analytics

Descriptive Analytics
transforming data into more easily understood forms through summarization, organization, and simplification
Diagnostic Analytics
examines historical data and connections within its chronology to understand the root cause of observed changes.
Predictive Analytics
identifies patterns in existing data which may be used to forecast future outcomes and trends.
Prescriptive Analytics
builds on Predictive Analytics to recommend actions based on known parameters, anticipating future outcomes and explaining why those outcomes will take place.

### Specializations

streamline IT processes, organizational structures, or staff development
financial analyst
guide investment opportunities, identify revenue opportunities, and mitigate financial risk
health care analyst
use data from health records, cost reports, and patient surveys to help providers improve their quality of care
market research analyst
analyze market trends to help determine product and service offerings, price points, and target customers
operations analyst
collaborative role working with teams to identify and solve technical, structural, and procedural issues in order to optimize org performance
systems analyst
use cost-benefit analysis to help match technological solutions to company needs

## Terms

infrastructure to support collection and analysis of business ops data
turning an org’s raw data into useful information to identify trends, predict outcomes, etc
data warehouse
central repository of data integrated from one or more sources
ETL
Extract, Transform, Load; munging data from multiple sources into a single set for further processing by multiple processes
ELT
Extract, Load, Transfer; grab it and store it, letting consumers transform it as needed when they grab it
data blending
munging data from multiple sources into a single set or warehouse for a specific use case

## Tools

Added to vault 2024-03-15. Updated on 2024-05-06