떼닝로그

Tools for Data Science - Languages of Data Science 본문

Coursera/IBM Data Science

Tools for Data Science - Languages of Data Science

떼닝 2023. 12. 14. 06:29

Tools for Data Science

Languages of Data Science

Languages of Data Science

Which language should I learn?

- wide range of available technical options

- different programming languages have their own strengths and weaknesses

- language depends on your needs, the problems to solve, and who you're solving them for

 

What problems do you need to solve?

- can be related to the company, role, age of existing application

Roles in Data Science

- business analyst, database engineer, data analyst, data engineer, data scientist, research scientist, software engineer, statistician, product manager, project manager

 

Introduction to Python

Who is Python for?

- people who already know how to program

- people who want to learn to program : because of the huge global community and wealth of documentation

- over 80% of data professionals worldwide

- areas like data science, AI and machine learning, web development, and IoT with devices like Raspberry Pi

- large organizations like IBM, Wikipedia, Google, Yahoo!, CERN, NASA, Facebook, Amazon, Instagram, ...

 

What makes Python Great

- is a general-purpose language (응용 프로그램 도메인 전반에 걸쳐 광범위하게 적용될 수 있는 언어)

- has a large standard library

- for data science, it has scientific computing libraries like pandas, numpy, scipy, and matplotlib

- for AI, it has libraris like Tensorflow, PyTorch, Keras, and Scikit-learn

- can be used for Natural Language Processing(NLP) using the natural language toolkit (NLTK)

 

Diversity and Inclusion Efforts

- python community has a well-documented history of paving the way for diversity and inclusion efforts in the tech industry as a whole (pave : 포장하다)

 

Introduction to R Language

Open Source(eg. Python) vs Free Software(eg. R)

Similarities:

- both are free to use

- both commonly refer to the same set of licenses

- both support collaboration

- in many cases these terms can be used interchangeably (but not all)

Differences:

- Open Source Initiative (OSI) champions open source while the Free Software Foundation (FSF) defines free software (champion : 싸우다, 옹호하다)

- open source is more business focused while free software is more focused on a set of values

 

Why R?

- better for everyone if the tools used for data science are free and open

- support coding as the most powerful path to tackle data science

- allows for private use, commercial use, and public collaboration

 

Who is R for?

- statisticians, mathematicians, and data miners for developing statistical software, graphing, and data analysis

- someone with no or minimal programming background

- for learners with a data science careers

- R is popular in academia

- companies like IBM, Google, Facebook, Microsoft, Bank of America, Uber, ...

 

What makes R great?

- is the largest repository of statistical knowledge

- has more than 15,000 publicly released packages to conduct complex explatory data analysis

- integrates well with other computer languages like C++, Java, C, .Net and Python

- Common mathematical operations like matrix multiplication give immediate results

- has stronger object-oriented programming skills

 

Introduction to SQL

What is SQL?

- SQL : Structured Query Language (non-precedual language)

- older than Python and R by about 20 years

- first appeard in 1974 and was developed at IBM

- useful in handling structured data

 

Relational Databases

 

What makes SQL great

- helps you get jobs in data science and data engineering

- speeds up workflow executions

- acts as an interpreter between you and the database

- is an American Normal Standards Institute standard

- enbales you to apply your SQL knowledge with other databases

 

Other Languages for Data Science

Java

- A general-purpose object-oriented programming language

- Huge adoption in the enterprise space , designed to be fast and scalable

- applications are compiled to bytecode and run on JVM

 

Scala

- general-purpose programming language that provides support for functional programming

- designed as an extension to Java, it is inter-operable with Java as it also runs on JVM
- the name Scala comes from "Scalable Language"

- Apache Spark provides APIs that make parallel jobs easy to write

 

C++

- general-purpose language, an extension of C

- improves processing speed, enables system programming, and gives you broader control over the application

- develops programs that feed data to customers in real-time

 

JavaScript

- a core technology for the world wide web

- a general-purpose language that extended beyond the browser with Node.js and other server-side approaches

- NOT related to the Java language

- TensorFlow.js makes machine learning and deep learning possible in Node.js and in the browser

 

Julia

- designed for high-performance numerical analysis and computational science 

- provides speedy development and fast programs

- exectues directly on the processor

- calls C, Go, Java, MATLAB, R, Fortran, and Python libraries with refined parallelism

 

Practice Quiz - Languages

Q. Which of the following options determines your choice of language to learn in data science?

A. Problem to be solved and for who are you solving for

 

Q. In python, which library is used for Artificial Intelligence?

A. Tensorflow

 

Q. What are the differences between Python and R languages?

A. Python is open source, and R is free software

 

Q. What is the primary purpose of SQL?

A. To query and handle structured data

 

Q. Which of the following language is an object-oriented programming language?

A. Java

 

 

Comments