CBIO (CSCI) 4835/6835: Introduction to Computational Biology
In this lecture, we'll eschew all things Python and Biology, and focus entirely on the step before either of these: becoming familiar with the command line (or command prompt). By the end of this lecture, you should be able to:
If you've never used a command-line before... Don't be intimidated!
Other command prompts include
csh
(some would say the original: the "C-shell"bash
("bourne-again" shell; tends to be default on most Linux and macOS systems)ksh
(Korn shell)zsh
(Z shell)If you're on a Windows machine, you can either:
I have a macOS laptop, an Ubuntu workstation, a bunch of RedHat servers, and a Windows 10 home desktop.
I'm most at home with either macOS or Ubuntu.
It's like learning another language: you'll only get better at it if you immerse yourself in it, even when you don't want to.
You've fired up the command prompt (or Terminal
in macOS). How do you see what's in the current folder?
Last login: Mon Jan 9 18:36:07 on ttys006 example1:~ squinn$ ls Applications Dropbox Music SpiderOak Hive Desktop Google Drive Pictures metastore_db Documents Library Programming nltk_data Downloads Movies Public rodeo.log example1:~ squinn$
ls
¶Allows you to view the contents of the current directory--folders and files.
But how do we tell the difference between the two? Use an optional -l
flag.
example1:~ squinn$ ls -l total 264 drwx------ 7 squinn staff 238 Oct 23 2015 Applications drwx------+ 59 squinn staff 2006 Jan 9 17:49 Desktop drwx------+ 20 squinn staff 680 Dec 23 09:35 Documents drwx------+ 5 squinn staff 170 Jan 9 18:27 Downloads drwx------@ 17 squinn staff 578 Jan 8 18:03 Dropbox drwx------@ 49 squinn staff 1666 Jan 4 15:47 Google Drive drwx------+ 74 squinn staff 2516 Nov 17 15:06 Library drwx------+ 6 squinn staff 204 May 20 2015 Movies drwx------+ 5 squinn staff 170 Oct 22 2014 Music drwx------+ 18 squinn staff 612 Jul 29 11:31 Pictures drwxr-xr-x 37 squinn staff 1258 Jan 4 15:57 Programming drwxr-xr-x+ 5 squinn staff 170 Oct 21 2014 Public drwx------@ 8 squinn staff 272 Jun 30 2015 SpiderOak Hive drwxr-xr-x 9 squinn staff 306 Sep 17 2015 metastore_db drwxr-xr-x 4 squinn staff 136 Apr 27 2016 nltk_data -rw-r--r-- 1 squinn staff 131269 Jan 9 18:32 rodeo.log example1:~ squinn$
Anything that starts with a d
on the left is a folder (or directory), otherwise it's a file.
Ok, that's cool. I can tell what is what where I currently am. ...but wait, how do I even know where I am?
example1:~ squinn$ pwd /home/squinn example1:~ squinn$
pwd
¶Pretty straightforward--stands for Print Wworking Directory. Gives you the full path to where you are currently working. Not really any other needed optional flags.
Great! Now I know where I am, and what is what where I am. How do I move somewhere else?
example1:~ squinn$ cd Music/ example1:Music squinn$ ls iTunes example1:Music squinn$
You'll notice the output of the ls
command has now changed, which hopefully isn't surprising.
Since we've Changed Directories with the cd
command--you essentially double-clicked the "Music" folder--now we're in a different folder with different contents; in this case, a lone "iTunes" folder.
Folders within folders represent a recursive hierarchy. We won't delve too much into this concept, except to say that, unless you're in the root directory (/
on Linux, C:\
on Windows), there is always a parent directory--the enclosing folder around the folder you are currently in.
Therefore, while you can always change to a very specific directory by supplying the full path--
example1:~ squinn$ cd /home/squinn/Dropbox example1:Dropbox squinn$ ls Cilia_Papers Imaging_Papers OdorAnalysis Public Computer Case LandUseChange OrNet cilia movies Icon? NSF_BigData_2015 OrNet Videos example1:Dropbox squinn$
--I can also navigate to the parent folder of my current location, irrespective of my specific location, using the special ..
notation.
cd ..
¶Takes you up one level to the parent directory of where you currently are.
example1:Dropbox squinn$ pwd /home/squinn/Dropbox example1:Dropbox squinn$ cd .. example1:~ squinn$ pwd /home/squinn example1:~ squinn$
Let's see some other examples!
example1: squinn$ ls Lecture1.ipynb example1: squinn$ ls -l total 40 -rw-r--r-- 1 squinn staff 18620 Jan 5 19:54 Lecture1.ipynb example1: squinn$ pwd /home/squinn/teaching/4835/lectures example1: squinn$ cd .. example1: squinn$ pwd
What prints out?
~/
/home/squinn
/home/squinn/teaching
/home/squinn/teaching/4835
$ ls -l total 8 -rw-rw-r-- 1 squinn staff 19 Sep 3 09:08 hello.txt drwxrwxr-x 2 squinn staff 4096 Sep 3 09:08 lecture $ ls *.txt
What prints out?
hello.txt
*.txt
hello.txt lecture
du - disk usage of files/directores
[squinn tmp]$ du -s 146564 . [squinn tmp]$ du -sh 144M . [squinn tmp]$ du -sh intro 4.0K intro
df - usage of full disk
[squinn tmp]$ df -h . Filesystem Size Used Avail Use% Mounted on pulsar:/home 37T 28T 9.3T 75% /net/pulsar/home
locate find a file system wide
find search directory tree
which print location of a command
man print manual page of a command
NAME=value set NAME equal to value No spaces around equals
export NAME=value set NAME equal to value and make it stick
\$ dereference variable
$ X=3 $ echo $X 3 $ X=hello $ echo $X hello $ echo X X
Which does not print the value of X?
echo $X
echo ${X}
echo '$X'
echo "$X"
env list all set environment variables
PATH where shell searches for commands
LD_LIBRARY_PATH library search path
PYTHONPATH where python searches for modules
.bashrc initialization file for bash - set PATH etc here
history show commands previously issued
up arrow cycle through previous commands
Ctrl-R search through history for command AWESOME
.bash_history file that stores the history
HISTCONTROL environment variable that sets history options: ignoredups
HISTSIZE size of history buffer
make a nickname for a command $ alias l='ls -l' $ alias $ l
The first word you type is the program you want to run. bash will search PATH for an appropriately named executable and run it with the specified arguments.
ls - list files
cd - change directory
pwd - print working (current) directory
.. - special file that refers to parent directory
. - the current directory
cat file - print out contents of file
more file - print contents of file with pagination
> send standard output to file
$ echo Hello > h.txt
>> append to file
$ echo World >> h.txt
< send file to standard input of command
2> send standard error to file
>& send output and error to file
$ echo Hello > h.txt $ echo World >> h.txt $ cat h.txt
What prints out?
$ echo Hello > h.txt $ echo World > h.txt $ cat h.txt
What prints out?
A pipe (|) redirects the standard output of one program to the standard input of another. It's like you typed the output of the first program into the second. This allows us to chain several simple programs together to do something more complicated.
$ echo Hello World | wc
cat dump file to stdout
more paginated output
head show first 10 lines
tail show last 10 lines
wc count lines/words/characters
sort sort file by line and print out (-n for numerical sort)
uniq remove adjacent duplicates (-c to count occurances)
cut extract fixed width columns from file
$ cat text a b a b b $ cat text | uniq | wc
What is the first number to print out?
$ cat text a b a b b $ cat text | sort | uniq | wc
What is the first number to print out?
grep search contents of file for expression
sed stream editor - perform substitutions
awk pattern scanning and processing, great for dealing with data in columns
Search file contents for a pattern.
grep pattern file(s)
$ grep a text | wc
What is the first number to print out?
Search and replace
sed 's/pattern/replacement/' file
$ sed 's/a/b/' text | uniq | wc
What is the first number to print out?
Pattern scanning in processing language. We'll mostly use it to extract columns/fields. It processes a file line-by-line and if a condition holds runs a simple program on the line.
awk 'optional condition {awk program}' file
$ cat names id last,first 1 Smith,Alice 2 Jones,Bob 3 Smith,CharlieTry these:
$ awk '{print $1}' names $ awk -F, '{print $2}' names $ awk 'NR > 1 {print $2}' names $ awk '$1 > 1 {print $0}' names $ awk 'NR > 1 {print $2}' names | awk -F, '{print $1}' | sort | uniq -c
mkdir intro cd intro wget https://eds-uga.github.io/cbio4835-sp17/files/Spellman.csv wget https://eds-uga.github.io/cbio4835-sp17/files/1shs.pdb
wc Spellman.csv (gives number of lines, because of header this is off by one) grep YA Spellman.csv |wc grep ^YA Spellman.csv |wc (this is a bit better, ^ matches begining of line) grep ^YA -c Spellman.csv (grep can provide the count itself) awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-2 | sort | uniq -c awk -F, 'NR > 1 {print $1}' Spellman.csv | cut -b 1-3 | sort | uniq -c awk -F, 'NR > 1 && $2 > 0 {print $0}' Spellman.csv | wc awk -F, 'NR > 1 {print $1,$2}' Spellman.csv | sort -k2,2 -n | tail awk -F, 'NR > 1 {print $1,$2}' Spellman.csv | sort -k2,2 -n -r | tail awk -F, 'NR > 1 && $3 > $2 && $4 > $3 {print $0}' Spellman.csv |wc awk -F, 'NR > 1 && $3 > $2 && $4 > $3 {print $4-$2,$0}' Spellman.csv | sort -n -k1,1
grep ^ATOM 1shs.pdb > newpdb.pdb (^matches beginning of line) grep ^ATOM 1shs.pdb | awk '$5 == "A" {print $0}' #this is UNSAFE with pdb files since there is no guarantee that fields #will be whitespace seperated, safer is: grep ^ATOM 1shs.pdb | awk ' substr($0,22,1) == "A" {print $0}' > newpdb.pdb grep ^ATOM 1shs.pdb | awk ' substr($0,22,1) == "A" {print $0}' | cut -b 78- | sort | uniq -c
Did everyone finish the pre-test? It was due today before lecture. https://docs.google.com/forms/d/1ka9yH5G3bOCfdJUTaeZXV2BdtvqqsiPaxnvKI2f4YK4/
Office hours: Tuesdays (today!) at 11:00 - 12:30. Boyd GSRC 638A.