| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 |
- AI Mathematics
- 알고리즘
- Java
- 프로그래머스
- data science methodology
- 클린코드 파이썬
- 소프티어
- 오블완
- 부스트캠프
- 깨끗한 코드
- 데이터사이언스
- programmers
- 자바
- 클린코드
- Data Science
- Coursera
- 코세라
- softeer
- 데이터과학
- 데이터 사이언스
- string
- Clean Code
- Python
- 티스토리챌린지
- IBM
- 파이썬
- Boostcamp AI
- 코딩테스트
- 문자열
- 코테
- Today
- Total
떼닝로그
Tools for Data Science - RStudio & Github 본문
Tools for Data Science
RStudio IDE
Introduction to R and RStudio
What is R?
- Statistical Programming Language
- used for data processing and manipulation
- statistical, data analysis, and machine learning
- R is used most my academics, healthcare and the government
- R supports importing of data from different sources : flat files, databases, web, statistical software (eg. SPSS)
R Capabilities
- easy to use compared to other data science tools
- great tool for visualization
- basic data analysis doesn't require instaling packages
What is RStudio
- is an Integrated Development Environment (IDE)
- increases productivity in running R programming language
Plotting in RStudio
Using data visualization in R
- to install packages, use the command : install.packages <package name>
Packages are:
- ggplot : histograms, bar chats, scatterplots
- Plotly : web-based data visualizations
- Lattice : complex, multi-variable data sets
- Leafleat : interactive plots
Using ggplot
- ggplots adds layers of functions and arguments
- geom_point is used to create scatterplots
- ggtitle helps you to show the title of the graph
- can define the x-axis and y-axis via labs
library(ggplot2)
ggplot(mtcars, aes(x=mpg, y=wt))+geom_point() + ggtitle("Miles per gallon vs weight") + labs(y="weight", x="Miles per gallon")

Eg. using ggpolot

GitHub
Overview of Git/Github
Git
- free and open source software
- distributed version control system
- accessible anywhere in the world
- one of the most common version control systems available
- can also version control images, documents, etc.
SHORT Glossary of Terms
- SSH protocol : a method for secure remote login from one computer to another
- Repository : the folders of your project that are set up for version control
- Fork : a copy of a repository
- Pull Request : the process you use to request that someone reviews and approves your changes before they become final
- Working directory : a directory on your file system, including its files and subdirectories, that is associated with a git repository
Introduction to Github
Backgroud of Git
- large software projects need a way to track and control source code updates
- Linux needs automated source-version control
Key characteristics Include:
- strong support for non-linear development
- distributed development
- compaitibility with existing systems and protocols
- efficient handling of large projects
- Cryptographic authentication of history
- pluggable merge strategies
Git Repository Model
What is special about the Git Repository Model? :
- distributed version-control system
- tracks source code
- coordinates among programmers (tracks changes, supports non-linear workflows)
- created in 2005 by Linus Torvalds
What is Git?
- is a distributed version-control system
- tracks changes to content
- provides a central point for collaboration
- allows for centralized administration
- teams have controlled access scope
- the main branch should always correspond to deployable code
What is Github?
- online hosting service for Git repositories
- hosted by a subsidiary of Microsoft (subsidiary : 부수적인, 자회사의)
- offers free, professional and enterprise accounts
- as of August 2019, github had over 100M repositories
What is a Repository:
- a data structure for storing documents including application source code
- a repository can track and maintain version-control
What is GitLab?
- a DevOps platform, delivered as a single application
- provide access to Git Repositories
- provides source code management
Gitlab enables developers to:
- Collaborate
- work from a local copy
- branch and merge code
- streamline testing and delivery with CI/CD
Github - Working with Branches
What is a branch?
- a branch is a snapshot of your repository
- master branch is the official version of the project
- the child branch creates a copy of the master branch
Why create a branch?
- edits and changes are made in the child branch
- tests are done to ensure quality before merging with the master branch

Merging multiple branches
- branches allow for simultaneous development and testing by multiple team members
Pull Request (PR)
- pull requests are a way to proposing changes to the main branch
- other team members review the changes and approve the merging of the master branch
Getting Started with Branches using Git Commands
- git init : create a new local repository
- git add : create and add a file to the repo
- git commit : commit changes
- git branch : create a branch
- git checkout : switch to a branch
- git status : check the status of files changed
- git log : review recent commits
- git revert : revert changes
- git branch : get a list of branches and active branch
- git merge : merge changes in your active branch into another branch
Github Branches
What are Branches?

- branches store all files in GitHub
- the master branch stores the deployable code
- create a new branch for planned changes
Merging Branches
1. Start with a common base
2. The code is branched while new features are developed
3. Both branches are undergoing changes
4. When the two streams of work are ready to merge, each branch's code is identified as a tip and the two tips are merged into a third, combined branch

Make a Commit
- saved changes are called commits
- to change the contents of a file : select file, click pencil icon, make changes, scroll down to find Commit Changes
- in the Commit changes box, add a comment that describes the changes
- choose to commit directly to the current branch or to create a new branch
What is a Pull Request?
- a pull request makes the proposed (committed) changes available for others to review and use
- a pull can follow any commits, even if code is unfinished
- pull requests can target specific users
- GitHub automatically makes a pull request if you make change on a branch you do not own
- log files record the approval of the merge
Merging into the Master Branch
- the master branch should be the only deployed code
- developers can change source files in a branch but the changes are not released until : they are committed, a pull request is issued, the code is reviewed and approved, the approved code is merged back into the master code
Practice Quiz - RStudio
Q. Where can you type R commands in RStudio?
A. Console
Q. Which R library will you use for data visualizations such as histograms, bar charts, and scatterplots? Select all that apply.
A. ggplot, Plotly
Q. Which is the command used to install packages in R?
A. install.packages()
Practice Quiz - GitHub
Q. Which term describes the folders set up for version control?
A. Repository
Q. Which tab in your Repository enables a review of changes made before being merged into the main branch?
A. Pull requests
Q. Which command is used to clone an existing repository?
A. git clone