Do complex things with just a few keystrokes!

Bash is a Unix shell and command language, it survived and thrived for almost 50 years because it lets people do complex things with just a few keystrokes. Sometimes called "the universal glue of programming," it helps users combine existing programs in new ways, automate repetitive tasks, and run programs on clusters and clouds that may be halfway around the world.

Basic

How to move around in the shell, and how to create, modify, and delete files and folders.

pwd print working directory
ls: listing files or directories
cd: change directory
cp: copy
mv: move or rename
rm: remove
$1 or $2 in bash script file, receive ARGV in bash scripts. $@ , $* means get ARGV list, $# means get ARGV length.

Find files

by name

find ., list all file and folder below current
find folder
find . -type d, find all foldre, no file
find . -type f, find all file , no foldre
find . -type f -name "test.txt", name as text txt file
find . -type f -name "text*", name as txt all file
find . -type f -iname "text*", 不区分大小写
find . -type f -name "*.py"

by time

find . -type f -mmin -10,过去十分钟修改过的文件
find . -type f -mmin +10
find . -type f -mmin +1 -mmin -5
find . -type f -mtime -20

amin,atime: access min and access day; cmin,ctime: change min and change day; mmin,mtime: modify;

by size

find . -size +5m, k,g is work too
ls -lah ./folders, info about sub folder and files,including size
find . -empty

by permission

find. -perm 777, read, write, and excute
find folder -exec chown coreschafer:www-data {} +
find folder, will return all folder, -exec will run the command in that results, {} palceholder, + end of the command.
find folder -type f -exec chmod 664 {} +
find folder -perm 664
find . -type f -name "*.jpg"
find . -type f -name "*.jpg" -maxdepth 1, searched 1 level down
find . -type f -name "*.jpg" -maxdepth 1 -exec rm {} +, delete serched files

Grep

Grep single file

searched text

grep "text_you_want_search" filename.txt
grep -w "text_you_want_search" filename.txt, have to match all words
grep -wi "text_you_want_search" filename.txt, igore the lowcase and uppearcse.
grep -win "text_you_want_search" filename.txt, get info about the line number
grep -win -B 4 "text_you_want_search" filename.txt, return the context about the searched words, 4 line, behind
grep -win -A 4 "text_you_want_search" filename.txt, return the context about the searched words, 4 line, ahead
grep -win -C 4 "text_you_want_search" filename.txt, return the context about the searched words, 4 line, two line before and two behind.

Grep multi file

grep -win "text_" ./*, all file
grep -win "text_" ./*.txt, txt file
grep -winr "text" ./ , search all subdir
grep -wirl "text" ./ , no need match info, just file list
grep -wirc "text" ./ , show matched number in eatch file

Grep command history

history | grep "git commit"
history | grep "git commit" | grep "dotfile"

Grep rgx

grep -P "--" file.txt, work well in linux, mac need to config, I configed

cURL

Requests

curl url
curl http://localhost:5000
curl http:www.wittyfans.com/json_file
curl -i http:www.wittyfans.com/json_file, details info about the get
curl http:www.wittyfans.com/method
curl -d "first=name&last=lastname" http:www.wittyfans.com/method, d for data, Post request
curl -X PUT -d "first=name&last=lastname" http:www.wittyfans.com/method, d for data, Pust request
curl -X DELETE http:www.wittyfans.com/method, delete request

Verify

Could not verify your access ?

curl -u username:password http://wittyfans.com, Auth

Download

curl http://wittyfans.com/folder, return binary file , error
curl -o filename.jpg http://wittyfans.com/folder , sucess
curl -o file_name.json http:/.api.wittyfans.com , Saving large json file

rsync

Install

aviable in Mac, debian-based linux need to install

apt-get install rsync
yum install rsync

Use

rsync folder1/* backup/ , sync fils to backup folder,will skping the subfolder's file, but affected subfolder
rsync -r folder1/* backup/ , including subfolder's file
rsync -r folder1 backup/, sync folder, not content in it

Check chage before run

rsync -a --dry-run folder1/* backup/, check before the command run, now view showed
rsync -av --dry-run folder1/* backup/, auto view

Source_folder has new file

rsync -av --delete --dry-run original/ backup/, check, be careful !

Do it in local and host

rsync -zaP -p local_folder username@ip:~/public/, z for compress, a for all, P for tarnsfer in internet
rsync 0zaP username@ip:~/public/file ~/Downloads/, revers

Manipulating data

How to work with the data in those files

cat: view a files contents, meaning concatenate

less & more: view contents piece by piece, more is superseded by less now, In less:

:n, Move to next file
:p, Go back to previous file
:q, quit

head: look at the start of a text file, head -3, only display the first three lines

tail: look at the end of a text file, tail -n +7, display content from line 7 to end

ls : list everything below a directory, ls -R -F, -R recursive -Fprints a / after the name of every directory and a * after the name of every runnable program.

man: manual, automatically invokes less

cut -f 2-5,8 -d values.csv: select columns 2 through 5 and columns 8, using comma as the separator. -d means delimiter, -f meaning fields to specify columns

!command: re-run the most recent use of that command matched

grep bicuspid seasonal/winter.csv: prints lines from winter.csv that contain "bicuspid"

cat two_cities.txt | egrep 'Sydney Carton|Charles Darnay' | wc -l : Count the number of lines in the book that contain either the character 'Sydney Carton' or 'Charles Darnay'.

-c: print a count of matching lines rather than the lines themselves
-h: do not print the names of files when searching multiple files
-i: ignore case (e.g., treat "Regression" and "regression" as matches)
-l: print the names of files that contain matches, not the matches
-n: print line numbers for matching lines
-v: invert the match, i.e., only show lines that don't match
egrep: grep -E

Combining tools

How to use this power to select the data you want, and introduce commands for sorting values and removing duplicates.

head -n 5 seasonal/summer.csv > top.csv: get first 5 rows content of summer.csv, write to top.csv

cut -d , -f 2 seasonal/summer.csv | grep -v Tooth, select all of the tooth names from column 2 of the comma delimited file.

wc, word count, count a date from a file. grep 2017-07 seasonal/spring.csv | wc -l

head -n 3 seasonal/s*, show all s* files first 3 rows.

*, all
?, single word
[...] matches any one of the characters inside the square brackets, 201[78].txt matches 2017.txt or 2018.txt, but not 2016.txt
{...} matches any of the comma-separated patterns inside the curly brackets, so {*.txt, *.csv} matches any file whose name ends with .txt or .csv, but not files whose names end with .pdf.

sort, -n: sort numerically, -r: reverse, -b: ignore leading blanks, -f: be case-insensitive

uniq, remove adjacent duplicated lines .

wc -l seasonal/*.csv, Print line numbers for each file in folder seasonal

wc -l seasonal/*.csv | grep -v 'total' | sort -n | head -n 1, remove rows with word 'total' and select first row.

Batch processing

How to make your own pipelines do that. Along the way, you will see how the shell uses variables to store information.

set: check environment variables

echo: print

$User, user name
$OSTYPE name of the kind of operating system you are using

training=seasonal/summer.csv then echo $training, define a variable and print it.

For loop:

# Example 1
for filetype in gif jpg png; do echo $filetype; done
# Example 2
for filename in seasonal/*.csv; do echo $filename; done
# Example 3
datasets=seasonal/*.csv
for filename in $datasets; do echo $filename; done
# Example 4
for file in seasonal/*.csv; do head -n 2 $file | tail -n 1; done

Do not using space in file name, it will causing issue in bash.

Creating new tools

How to go one step further and create new commands of your own.

Edit file using nano:

nano filename.txt, edit a file

Ctrl + K: delete a line.
Ctrl + U: un-delete a line.
Ctrl + O: save the file ('O' stands for 'output'). You will also need to press Enter to confirm the filename!
Ctrl + X: exit the editor.

grep -h -v Tooth spring.csv summer.csv > temp.csv, -h stop it from printing filenames, -v printing all rows exclude Tooth

history | tail -n 3, Show most recent 3 commands

$@, pass filenames to scripts. tail -q -n +2 $@ | wc -l

Downloading data

how to download data files from web servers via the command line

curl, Client for URLs. man curl , check curl installation.

curl -O url save the file with it's original name
curl -o newname.txt url new file name
Download all 100 data file, curl -O https://s3.amazonaws.com/datafile[001-100].txt

Wget, World Wide Web and get. better than curl when downloading multiple files recursively. which wget, check wget installation.

wget -c -b https://wittyfans.com/201812SpotifyData.zip

-c, resume broken download
-b, go to background
wget --wait=1 -i url_list.txt, # Create a mandatory 1 second pause between downloading all files in url_list.txt

# Use curl, download and rename a single file from URL
curl -O Spotify201812.zip -L https://assets.datacamp.com/production/repositories/4180/datasets/eb1d6a36fa3039e4e00064797e1a1600d267b135/201812SpotifyData.zip

# Unzip, delete, then re-name to Spotify201812.csv
unzip Spotify201812.zip && rm Spotify201812.zip
mv 201812SpotifyData.csv Spotify201812.csv

# View url_list.txt to verify content
cat url_list.txt

# Use Wget, limit the download rate to 2500 KB/s, download all files in url_list.txt
wget --limit-rate=2500k -i url_list.txt

# Take a look at all files downloaded
ls

CSV Kit

Using csvkit to convert, preview, filter and manipulate files to prepare our data for further analyses.

pip install csvk for install, Doc

in2csv

in2csv -h, converting files to csv.
in2csv SpotifyData.xlsx > SpotifyData.csv
in2csv SpotifyData.xlsx --sheet "Worksheet1_Popularity" > Spotify_Popularity.csv, Only converting a sheet

csvlook

csvlook -h, data preview on the command line.
csvlook SpotifyData.csv

csvsort

csvsort -c 2 Spotify_Popularity.csv | csvlook

csvstat

csvstat Spotify_Popularity.csv, summary statistics

csvcut

csvcut -n Spotify_MusicAttributes.csv, Print a list of column headers in data file
csvcut -c 1,3,5 Spotify_MusicAttributes.csv, Print the first column, by position
csvcut -c "track_id","duration_ms","loudness" Spotify_MusicAttributes.csv, Print the track id, song duration, and loudness, by name

csvgrep

csvgrep -c "danceability" -m 0.812 Spotify_MusicAttributes.csv, filter row danceability by value 0.812, column name must with "".

csvstack

csvstack, merge files.
csvstack Spotify_Rank6.csv Spotify_Rank7.csv > Spotify.csv, merge two files to one
csvstack -g "Rank6","Rank7" \ Spotify_Rank6.csv Spotify_Rank7.csv > Spotify_Al, merge two files to one and add a source column.

chain commands

;, links commands together and runs sequentially
&&, links commands together, but only runs the 2nd command if the 1st succeeds
>, using outputs from the 1st command
|, using outputs form the 1st as input to the 2nd

sql2csv

sql2csv -v or sql2csv --verbose, printing more tracebacks and logs

1
2
3

# Pull the entire Spotify_Popularity table and print in log
sql2csv --db "sqlite:///SpotifyDatabase.db" \
        --query "SELECT * FROM Spotify_Popularity"

csvsql

Manipulating data using SQL syntax (Small to medium files only) :

1
2
3

# Reformat the output using csvlook
csvsql --query "SELECT * FROM Spotify_MusicAttributes ORDER BY duration_ms LIMIT 1" \
	Spotify_MusicAttributes.csv | csvlook

using bash variable:

# Store SQL query as shell variable
sqlquery="SELECT * FROM Spotify_MusicAttributes ORDER BY duration_ms LIMIT 1"

# Apply SQL query to Spotify_MusicAttributes.csv
csvsql --query "$sqlquery" Spotify_MusicAttributes.csv

join two file:

# Store SQL query as shell variable
sql_query="SELECT ma.*, p.popularity FROM Spotify_MusicAttributes ma INNER JOIN Spotify_Popularity p ON ma.track_id = p.track_id"

# Join 2 local csvs into a new csv using the saved SQL
csvsql --query "$sql_query" Spotify_MusicAttributes.csv Spotify_Popularity.csv > Spotify_FullData.csv

# Preview newly created file
csvstat Spotify_FullData.csv

Pushing data back to database:

# Store SQL for querying from SQLite database
sqlquery_pull="SELECT * FROM SpotifyMostRecentData"

# Apply SQL to save table as local file
sql2csv --db "sqlite:///SpotifyDatabase.db" --query "$sqlquery_pull" > SpotifyMostRecentData.csv

# Store SQL for UNION of the two local CSV files
sqlquery_union="SELECT * FROM SpotifyMostRecentData UNION ALL SELECT * FROM Spotify201812"

# Apply SQL to union the two local CSV files and save as local file
csvsql 	--query "$sqlquery_union" SpotifyMostRecentData.csv Spotify201812.csv > UnionedSpotifyData.csv

# Push UnionedSpotifyData.csv to database as a new table
csvsql --db "sqlite:///SpotifyDatabase.db" --insert UnionedSpotifyData.csv

Bash Script

Stream editor

sed: stream editor.

cat soccer_scores.csv | sed 's/Cherno/Cherno City/g' > soccer_scores_edited.csv : replace word Cherno to herno City then save it to a new file, for more, check this.

Argument

$1 or $2 in bash script file, receive ARGV in bash scripts. $@ , $* means get ARGV list, $# means get ARGV length. cat hire_data/*.csv | grep "$1" > "$1".csv: take in a city (an argument) as a variable, filter all the files by this city and output to a new CSV with the city name.

Quotes

Single,double,backticks.

Single quotes ('sometext') = Shell interprets what is between literally
Double quotes ("sometext") = Shell interpret literally except using $ and backticks
Backticks (`sometext`) = Shell runs the command and captures STDOUT back into a variable

Numeric variables

In bash, Type >>> 1 + 5 will get error. instead, you need type expr 1 + 5. expr is utility program just like cat and grep. but expr cannot natively handle decimal places. expr 1 + 2.5 will get not a decimal nuber error.

Introduce bc (basic calculator), a useful command-line program. using bc without opening the calculator:

echo "5+7.5" | bc , bc has a scale argument for how many decimal places: echo "scale=3; 10 /3 | bc", ; is to separate lines in terminal.

Array

Normal array

Create array: capital_cities=("Sydney" "New York" "Paris")

Add element:

# Create a normal array with the mentioned elements using the declare method
declare -a capital_cities

# Add (append) the elements
capital_cities+=("Sydney")
capital_cities+=("New York")
capital_cities+=("Paris")

Get all element and length of array:

# The array has been created for you
capital_cities=("Sydney" "New York" "Paris")

# Print out the entire array
echo ${capital_cities[@]}

# Print out the array length
echo ${#capital_cities[@]}

Associative arrays

Like dictionary in python.

Create:

# Create empty associative array
declare -A model_metrics

# Add the key-value pairs
model_metrics[model_accuracy]=98
model_metrics[model_name]="knn"
model_metrics[model_f1]=0.82

Create in one line:

# An associative array has been created for you
declare -A model_metrics=([model_accuracy]=98 [model_name]="knn" [model_f1]=0.82)

# Print out just the keys
echo ${____model_metrics[____]}

Example:

# Create variables from the temperature data files
temp_b="$(cat temps/region_B)"
temp_c="$(cat temps/region_C)"

# Create an array with these variables as elements
region_temps=($temp_b $temp_c)

# Call an external program to get average temperature
average_temp=$(echo "scale=2; (${region_temps[0]} + ${region_temps[1]}) / 2" | bc)

# Append to array
region_temps+=($average_temp)

# Print out the whole array
echo ${region_temps[@]}

Control Statements

IF statements

if [ condition1 ] && [ condition2 ]; then
	# some code
else
	# some code
fi

Move files based on content:

# Extract Accuracy from first ARGV element
accuracy=$(grep Accuracy $1 | sed 's/.* //')

# Conditionally move into good_models folder
if [ $accuracy -ge 90 ]; then
    mv $1 ./good_models
fi

# Conditionally move into bad_models folder
if [ $accuracy -lt 90 ]; then
    mv $1 ./bad_models
fi

# Normal flags
-eq =
-ne !=
-lt <
-le <=
-gt >
-ge >=

# File related flags
-e if the file exists
-s if the file exists and has size greater than zero
-r if the file exists and is readable
-w if the file exists and is writable

## And and OR
&& for and
|| for or

For loops

# Use a FOR loop on files in directory
for file in inherited_folder/*.R
do
    # Echo out each file
    echo $file
done

# Move python file to to_keep if it using random forest class fier
for file in robs_files/*.py
do
    # Create IF statement using grep
    if grep -q 'RandomForestClassifier' $file ; then
        # Move wanted files to to_keep/ folder
        mv $file to_keep/
    fi
done

Case

# Create a CASE statement matching the first ARGV element
case $1 in
  # Match on all weekdays
  Monday|Tuesday|Wednesday|Thursday|Friday)
  echo "It is a Weekday!";;
  # Match on all weekend days
  Saturday|Sunday)
  echo "It is a Weekend!";;
  # Create a default
  *)
  echo "Not a day!";;
esac

Function

function function_name {
	#function_code
	return #something
}

function print_hello () {
echo "Hello world!"
}

print_hello

Scope: All variables in Bash are global by default!Using local val to restrict variable scope.

To get data out from function:

Assign to a global variable
echo what we want back in last line and capture using shell-within-shell

function convert {
	echo $(echo $1)
}
converted = $(convert 30)

# Create a function with a local base variable
function sum_array () {
  local sum=0
  # Loop through, adding to base variable
  for number in "$@"
  do
    sum=$(echo "$sum + $number" | bc)
  done
  # Echo back the result
  echo $sum
  }
# Call function with array
test_array=(14 12 23.5 16 19.34)
total=$(sum_array "${test_array[@]}")
echo "The total sum of the test array is $total"

Python script on bash

Scheduling a job with crontab

# Preview both Python script and requirements text file
cat create_model.py
cat requirements.txt

# Pip install Python dependencies in requirements file
pip install -r requirements.txt

# Run Python script on command line
python create_model.py

# Add CRON job that runs create_model.py every minute
echo "* * * * * python create_model.py" | crontab

# Verify that the CRON job has been scheduled via CRONTAB
crontab -l

Cron

crontab -l, list the crons

set your editor to nano, default vim

export EDITOR=/user/bin/nano
crontab -e, open editor
press i to input


Algorhythm for schuel you task with time

# ┌───────────── minute (0 - 59)
# │ ┌───────────── hour (0 - 23)
# │ │ ┌───────────── day of month (1 - 31)
# │ │ │ ┌───────────── month (1 - 12)
# │ │ │ │ ┌───────────── day of week (0 - 6) (Sunday to Saturday;
# │ │ │ │ │                                       7 is also Sunday on some systems)
# │ │ │ │ │
# │ │ │ │ │
# * * * * *  command_to_execute