떼닝로그

Tools for Data Science - Libraries, APIs, Datasets and Models 본문

Coursera/IBM Data Science

Tools for Data Science - Libraries, APIs, Datasets and Models

떼닝 2023. 12. 14. 06:35

Tools for Data Science

Libraries, APIs, Datasets and Models

Libraries for Data Science

Introduction

- Libraries are a collcection of functions and methods that allow you to perform many actions without writing the code

Python Libraries:

- Scientific Computing Libraries in Python

- Visaulization Libraries in Python

- High-Level Machine Learning and Deep Learning

- Deep Learning Libraries in Python

- Libraries used in other languages

 

Scientifics Computing Libraries in Python

- Libaries contain built-in modules providing different functionalities

- Pandas : Data structures & tools. provide easy indexing to work with the data

- Numpy : Arrays & matrices

 

Visualization libraries in Python

- use data visualization libraries to communicate with others and display meaningful results of an analysis

- Matplotlib : plots&graphs, most popular

- Seaborn : plots (heat maps, time series, violin plots)

 

Machine Learning and Deep Learning Libraries in Python

- Scikit-learn : machine learning(regression, classification, clustering)

- Keras : deep learning nerual networks

 

Deep Learning Libraries in Python

- Tensorflow : Deep learning (production and deployment)

- PyTorch : Deep learning (regression, classification)

 

Recap

- Libraries usually contain built-in modules providing different functionalities that can be used directly

- can use data visualization methods to communicate with others and display meaningful results of an analysis

- Scikit-learn library contains tools for statistical modeling including regression, classification, clustering, ...

- Tensorflow is a low-level framework used in large scale production of deep learning models

- Apache Spark is a general-purpose cluster-computing framework allowing you to process data using compute clusters

 

Application Programming Interfaces (API)

What is an API?

- Application Programming Interface (API) allows communication between two pieces of software

 

API library

 

Other Languages API

 

More Languages API

 

REST APIs

- REpresentational State Transfer APIs

- allow to communicate through the internet

- enables to use resources like storage, data, and artificially intelligent algorithms

 

Working of REST APIs

- used to interact with web services

- have a set of rules regarding : communication, input or request, output or response

 

Common terms

 

HTTP

 

Recap

- An application programming interface (API) allows communication between two pieces of software

- API is the part of the lbirary you see while the library contains all the components of the program

- REST APIs allow you to communicate through the internet and take advantage of resource like storage, data, aritificially intelligent algorithms, and much more

 

Data Sets - Powering Data Science

What's a data set?

- collection of data

- data structures : tabular data(eg. csv), hierarchical data, network data, raw files

 

Data Ownership

- Private data : confidential, private or personal information, commercially sensitive

- Open data : publicly available, companies, scientific institutions, government, organizations, companies

 

Where to find open data

- open data portal list from around the world : datacatalogs.org

- governmental, intergovernmental and organizaion websites : data.un.org (UN), data.gov (USA), ...

- Kaggle : kaggle.com/data_sets

- Google data set search : datasetresearch.research.google.com

 

**

나열형이 많아서 그런 건 좀 생략...

나중에 하면서 알게 되겠지 싶었던...

**

Additional Sources of Datasets

Open Datasets and Sources

Government Data:

Financial Data Sources:

Crime Data:

Health Data:

Academic and Business Data:

Other General Data:

Propriety datasets and sources

Health Care:

https://www.sgim.org/communities/research/dataset-compendium/proprietary-datasets

Financial Market data:

https://datarade.ai/data-categories/proprietary-market-data

Google Cloud based datasets:

https://cloud.google.com/datasets

 

Machine Learning Models - Learning from Models to Make Predictions

What is a machine learning model?

- data contains a wealth of information

- Machine Learning models identify patterns in data by using the models

- Model training is the process by which the model learns the data patterns

- after a model is trained it can be used to make predictions

- types of ML are Supervised, Unsupervised, and Reinforcement

 

Supervised Learning

- identifies relationships and dependencies between the input data and the correct output

- Regression : to predict real numerical values. (eg. home sales prices, stock market prices)

- Classification : to classify data into categories (eg. email spam filters, fraud detection, image classification)

 

Other learning types

Unsupervised Learning:

- data is not labeled

- model tries to identify patterns without external help (eg. clustering divides each record of a dataset into one of similar group)

Reinforcement Learning:

- conceptually similar to human learning processes

- eg. Mouse and maze, robot learning to walk, chess, Go, and other board games of skill

 

Deep Learning

- tries to loosely emaluate show the human brain works

- applications : Natural Language Processing, Image, audio and video analysis, Time series forecasting

- requires large datasets of labeled data and is compute intensive

- requires special purpose hardware

 

Deep Learning Models

- build from scratch or download from public model repositories

- built using frameworks, such as tensorflow, PyTorch, Keras

- provide Python API and support C++ and JavaScript

- popular model repositories : most frameworks provides a "model zoo", TensorfFlow, PyTorch, Keras, ...

 

Using models to solve a problem

 

Practice Quiz - Libraries, APIs, Data Sets, Models

Q. Which library offers data structures and tools for effective data cleaning, manipulation, and analysis?

A. Pandas

 

Q. What is an API?

A. Interface

 

Q. What is the best way to represent the network data?

A. As a graph

 

Q. What is the primary purpose of the Data Asset eXchange (DAX)? Select all that apply

A. To keep datasets that have clearly defined license and usage terms

    To collect data sets that are high quality

 

Q. Which of the following are machine learning models? Select all that apply.

A. Reinforcement Learning, Unsupervised Learning, Supervised Learning

 

Q. Which of the following elements are used to make a model? Select all that apply.

A. Compute resources, Domain Expertise

Comments