일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 31 |
- 소프티어
- 데이터사이언스
- programmers
- Coursera
- 자바
- IBM
- 코테
- 데이터 사이언스
- 코딩테스트
- 문자열
- 클린코드 파이썬
- 파이썬
- Clean Code
- 티스토리챌린지
- softeer
- AI Mathematics
- 데이터과학
- data science methodology
- 부스트캠프
- Boostcamp AI
- 코세라
- Python
- string
- 깨끗한 코드
- 클린코드
- 오블완
- Java
- Data Science
- 알고리즘
- 프로그래머스
- Today
- Total
떼닝로그
Data Science Methodology - From Problem to Approach and From Requirements to Collection (1) 본문
Data Science Methodology - From Problem to Approach and From Requirements to Collection (1)
떼닝 2023. 12. 27. 07:38Data Science Methodology
From Problem to Approach
Data Science Methodology Overview
Addressing data science challenges
- data science combines statistics, techonology, and domain expertise to extract insights from vast data
- adopting a methodology can help address these issues
Challenges:
- resolve the problems of misunderstanding of the business questions
- not knowing how to apply the data to resolve the business problem correctly
What is a methodology?
- a system of methods
- a guideline for decision-making during the scientific process
- guides data scientists in solving complex problems with data
Applying data science methodology
- perform data collection
- creation of measurement strategies
- comparisons of data analysis methods
Addressing data science challenges
- apply practical guidance
- avoid the mistakes that can happen by jumping to solutions before the analysis
Data methodology stages
Data science methodology questions
- Define the issue
- Determine your approach
1. What is the problem that you are trying to solve?
2. How can you use data to answer the business question?
- Get organized around the data
3. What data do you need to answer the question?
4. Where is the data sourced from, how will you receive the data?
5. Does the data you collected represent the problem to be solved?
6. What additional work is required to manipulate and work with the data?
- Validate your approach and final design of the data
7. When you apply data visualizations, do you see answers that address the business problem?
8. Does the data model answer the initial business question, or must you adjust the data?
9. Can you put the model into practice?
10. Can you get constructive feedback from the data and the stakeholder to answer the business question?
Business Understanding
From Understanding to approach
- Business understanding : What is the problem that you are trying to solve?
- Analytic approach : How can you use data to answer the question?
Case Study : Goals & Objectives
- Define the GOALS : to provide quality care without increasing costs
- Define the OBJECTIVEs : to review the process to identify inefficiencies
Case Study : What's the sponsor's involvement?
1. Set overall direction
2. remain engaged and provide guidance
3. ensure necessary support, when needed
Case Study : Identifying the business requirements
1. Predict CHF admission outcome (Y or N) for each point (CHF : Congestive Heart Failure. 심부전증)
2. Predict the readmission risk for each patient
3. Understand explicitly what combination of events led to the predicted outcome for each patient (explicitly : 명시적으로)
4. Easy to understand and apply to new patients to predict their readmission risk
Analytic Approach
Pick Analytic Approach based on the question type
- Descriptive : Current status
- Diagnostic (Statistical Analysis) : What happened? / Why is this happening?
- Predictive (Forecasting) : What if these trends continue? / What will happen next?
- Prescriptive : How do we solve it?
What are types of questions?
- If the question is to determine the probabilities of an action : use a predictive model
- If the question is to show relationships : use a descriptive model
- If the question requires a yes/no answer : use a classification model
Analytic approach:
- How can you use data to answer the question?
- the correct approach depends on business requirements for the model
Which machine learning will be utilized?
machine learning:
- learning without being explicitly programmed
- identifies relationships and trends in data that might otherwise not be accessible or identified
- uses clustering association approaches
Case Study : Decision tree classificaiton selected
Predictive model :
- to predict an outcome
Decision tree classification:
- categorical outcome
- explicit "decision path" showing conditions leading to high risk
- likelihood of classified outcome
- easy to understand and apply
Business Understanding : Asking Questions
Relevant Questions to Business Goal | Not so Relevant Questions to Business Goal |
How do customer purchase behaviors change during specific promotional periods? | What is the company's organizational structure? |
How do customer demographics influence their price sensitively? | How many employees work in the marketing department? |
How do product ratings and reviews influence customer puchase decisions? | How much does the company spend on office supplies? |
What are the profit margins for different products? | What are the customer's preferred payment methods? |
Which products have experienced the highest sales volumes in the past? | What is the historical website traffic data for the e-commerce site? |
Analytical Approach : Identifying the Pattern to Address the Question
Predictive Model | Descriptive Model | Classification Model |
How can we determine the most suitable devliery routes for perishable goods, ensuring timely deliveries without explicitly using past data to make predictions? | What are the most frequently used routes and their respective delivery time variations during peak and off-peak hours? | How can we group delivery regions based on customer density and order frequency to optimize delivery route planning? |
How can we forecast the optimal number of delivery vehicles required for a specific day based on the expected order volume? | What are the average delivery costs for different delivery routes, and how do they vary during different times of the day? | How can we classify delivery routes into different categories based on the average delivery time and order volume? |
What are the expected delivery time for each route considering historical traffic patterns and anticipated weather conditions? | What historical data highlights the busiest delivery days and time intervals during the week based on past order data? | How can we cluster customer locations to create distinct groups for efficient delivery route planning, without explicitly making predictions based on past data? |
How can we anticipate the potential impact of traffic incidents or road closures on delivery times to procactively adjust routes? | What insights can be gathered on the average delivery times for different vehicle types, how do these times vary based on the complexity of the delivery route? | What are the various time slots in which delivery schedules can be classified to balance workload and minimize delivery delays? |