떼닝로그

Data Science Methodology - Final Project and Assessment 본문

Coursera/IBM Data Science

Data Science Methodology - Final Project and Assessment

떼닝 2024. 1. 18. 07:52

Data Science Methodology

Final Project

Introduction to CRISP-DM

What is CRISP-DM?

- an acronym for Cross-industry Standard Process for Data Mining (acronym : 앞글자만 딴 단어)

- a structured approach to guide data-driven decision-making

 

CRSIP-DM as a data methodology

the CRISP-DM model includes:

- data mining stages

- data mining stage descriptions

- explanations of the relationships between tasks and stages

 

CRISP-DM : A high-level process model

- Provides high-level insights into the data mining life cycle

 

Flexibility and communication using CRISP-DM

Data Scientists might need to:

- communicate with peers, amangement, and stakeholdes to keep the project on track

- revisit earlier stages

 

The business Understanding stage

- sets and outlines the project's data analysis intentions and goals

- requires communication and clarity to overcome stakeholders' differing objectives, biases, and information modalities

- is necessary to avoid wasted time and resources

 

The Data Understanding stage

- CRISP-DM combines the stages of Data Requirements, Data Collection, Data Understanding

- Data scientists decide on data sources and acquire data

 

The Data Preparation stage

Data Scienctists perform the following tasks:

- transform data

- determine if more data is needed

- address questionable missing and ambiguous data values

 

The Modeling stage

Data mining:

- reveals patterns and structure within the data

- provides knowledge and insights that address the stated business problem and goals

Data scientists perform the following tasks:

- select data models

- adjust the models

 

The evaluation stage

Data scientissts perform the following tasks:

- test the selected module

- assess the models' effectiveness

- results determine the model's efficacy (efficacy : 효험)

 

The Deployment stage

Data scientists and stakeholders perform the following tasks:

- use the data model on new data outside of the data set

- analyze the results to determine the need for new variables, a new dataset, or a new model

 

CRISP-DM : Iterative and cyclical

Deployment results might initiate revisions to the following data analysis items

- the business needs (question)

- the necessary business actions

- the data model

- the data

- or any combination of these items

 

Discussing the results

After completing all six stages:

- meet with the stakeholders to discuss the results

- this stage is unnamed in CRISP-DM

- this stage is the Feedback stage in John Rollins Foundational Data SCience model

 

 

 

Comments