떼닝로그

Data Science Methodology - From Problem to Approach and From Requirements to Collection (1) 본문

Coursera/IBM Data Science

Data Science Methodology - From Problem to Approach and From Requirements to Collection (1)

떼닝 2023. 12. 27. 07:38

Data Science Methodology

From Problem to Approach

Data Science Methodology Overview

Addressing data science challenges

- data science combines statistics, techonology, and domain expertise to extract insights from vast data

- adopting a methodology can help address these issues

Challenges:

- resolve the problems of misunderstanding of the business questions

- not knowing how to apply the data to resolve the business problem correctly

 

What is a methodology?

- a system of methods

- a guideline for decision-making during the scientific process

- guides data scientists in solving complex problems with data

 

Applying data science methodology

- perform data collection

- creation of measurement strategies

- comparisons of data analysis methods

 

Addressing data science challenges

- apply practical guidance

- avoid the mistakes that can happen by jumping to solutions before the analysis

 

Data methodology stages

 

Data science methodology questions

- Define the issue

- Determine your approach

  1. What is the problem that you are trying to solve?

  2. How can you use data to answer the business question?

- Get organized around the data

  3. What data do you need to answer the question?

  4. Where is the data sourced from, how will you receive the data?

  5. Does the data you collected represent the problem to be solved?

  6. What additional work is required to manipulate and work with the data?

- Validate your approach and final design of the data

  7. When you apply data visualizations, do you see answers that address the business problem?

  8. Does the data model answer the initial business question, or must you adjust the data?

  9. Can you put the model into practice?

  10. Can you get constructive feedback from the data and the stakeholder to answer the business question?

 

Business Understanding

From Understanding to approach

- Business understanding : What is the problem that you are trying to solve?

- Analytic approach : How can you use data to answer the question?

 

Case Study : Goals & Objectives

- Define the GOALS : to provide quality care without increasing costs

- Define the OBJECTIVEs : to review the process to identify inefficiencies

 

Case Study : What's the sponsor's involvement?

1. Set overall direction

2. remain engaged and provide guidance

3. ensure necessary support, when needed

 

Case Study : Identifying the business requirements

1. Predict CHF admission outcome (Y or N) for each point (CHF : Congestive Heart Failure. 심부전증)

2. Predict the readmission risk for each patient

3. Understand explicitly what combination of events led to the predicted outcome for each patient (explicitly : 명시적으로)

4. Easy to understand and apply to new patients to predict their readmission risk

 

Analytic Approach

Pick Analytic Approach based on the question type

- Descriptive : Current status

- Diagnostic (Statistical Analysis) : What happened? / Why is this happening?

- Predictive (Forecasting) : What if these trends continue? / What will happen next?

- Prescriptive : How do we solve it?

 

What are types of questions?

- If the question is to determine the probabilities of an action : use a predictive model

- If the question is to show relationships : use a descriptive model

- If the question requires a yes/no answer : use a classification model

Analytic approach:

- How can you use data to answer the question?

- the correct approach depends on business requirements for the model

 

Which machine learning will be utilized?

machine learning:

- learning without being explicitly programmed

- identifies relationships and trends in data that might otherwise not be accessible or identified

- uses clustering association approaches

 

Case Study : Decision tree classificaiton selected

 

Predictive model :

- to predict an outcome

Decision tree classification:

- categorical outcome

- explicit "decision path" showing conditions leading to high risk

- likelihood of classified outcome

- easy to understand and apply

 

Business Understanding : Asking Questions

Relevant Questions to Business Goal Not so Relevant Questions to Business Goal
How do customer purchase behaviors change during specific promotional periods? What is the company's organizational structure?
How do customer demographics influence their price sensitively? How many employees work in the marketing department?
How do product ratings and reviews influence customer puchase decisions? How much does the company spend on office supplies?
What are the profit margins for different products? What are the customer's preferred payment methods?
Which products have experienced the highest sales volumes in the past? What is the historical website traffic data for the e-commerce site?

 

 

Analytical Approach : Identifying the Pattern to Address the Question

Predictive Model Descriptive Model Classification Model
How can we determine the most suitable devliery routes for perishable goods, ensuring timely deliveries without explicitly using past data to make predictions? What are the most frequently used routes and their respective delivery time variations during peak and off-peak hours? How can we group delivery regions based on customer density and order frequency to optimize delivery route planning?
How can we forecast the optimal number of delivery vehicles required for a specific day based on the expected order volume? What are the average delivery costs for different delivery routes, and how do they vary during different times of the day? How can we classify delivery routes into different categories based on the average delivery time and order volume?
What are the expected delivery time for each route considering historical traffic patterns and anticipated weather conditions? What historical data highlights the busiest delivery days and time intervals during the week based on past order data? How can we cluster customer locations to create distinct groups for efficient delivery route planning, without explicitly making predictions based on past data?
How can we anticipate the potential impact of traffic incidents or road closures on delivery times to procactively adjust routes? What insights can be gathered on the average delivery times for different vehicle types, how do these times vary based on the complexity of the delivery route? What are the various time slots in which delivery schedules can be classified to balance workload and minimize delivery delays?

 

 

 

Comments