Filter file 2 by file 1 column 1.

awk 'FNR==NR{a[$1];next};!($1 in a)' file1 file2 > file3


FNR == NR: This test is true when the number of records is equal to the number of records in the file. This is only true for the first file, for the second file NR will be equal to the number of lines of file1 + FNR.

a[$1]: Create an array element index of the first field of file1.

next: skip to the next record so no more processing is done on file1.

!($1 in a): See if the first field ($1) is present in the array, ie in file1, and print the whole line (to file3).

awk '{split($13, editDist, ":");split($16, mismatch, ":"); split($17, gap, ":"); split($18, gapLen, ":"); print editDist[3]"\t"mismatch[3]+gapLen[3]"\t"mismatch[3]"\t"gap[3]"\t"gapLen[3]}' n3yesn2no.sam >n3yesn2no_summary.sam

awk 'FNR==NR{a[$1];next};($1 in a)' file1 file2 > file3

Group by the first and second column and sum the third column.

awk '{a[$1" "$2]+=$3}END{for (i in a) print i,a[i]}' aaa.txt | sort

Insert a column after certain column.

e.g.insert after the second column

awk '$2 = $2 FS "0"' file >outfile

Sort value

eg. chr1 1005; chr1 105. -> chr1 105; chr1 1005

sort -k1,1V -k2,2n infile >outfile

sort -g: exponential value sort -r: decreasing order

wc -l write to file: remove the path of the counted file:

sed 's/\([0-9]*\).*/\1/' input >outfile

less input: 284 /p/keles/ENCODE-TE/volume13/SRR881996_Large/defaultDir/fithicDir/chr1/ 281 /p/keles/ENCODE-TE/volume13/SRR881996_Large/defaultDir/fithicDir/chr2/ less output: 284 281

Split a large file into parts.

-l equal lines in each part
-b equal bytes in each part
-d enables a numeric suffix like prefix00 prefix01
enabling the option -a to 1, single digit numeric suffix is se.t

Find multiple folder size.

du -sch *

awk filter by variable values.

Cannot input the variable directly, instead define a new value and compare with the column of interest:

awk '$1 < thre {print $0}' thre=$thre input

awk print just change the first column and print all the column except the modified first column:

awk -v OFS="\t" '{split($1, id, "."); $1=""; {print id[1]"."id[2], $0}}' SRR881997_2_01_noheader.sam >/mnt/gluster/yzheng74/HiC/HiCPro/data/SRR881997_2_01_noheader

For duplicate parital columns (for example, column2-5), keep the first row when duplicate column2-5 appear.

awk '{if(! a[$1]){print; a[$1]++}}'
more condemssed way :
awk '!a[$1]++' file

Sum one column values.

awk '{s+=$1} END {printf "%.0f", s}' mydatafile
awk '{s+=$1} END {print s}' mydatafile

Read from file separated by “,” and save as array.

IFS=$"," read -r -a test <infile

When sort or picard runs, use tmp folder created locally instead of /tmp or $TMPDIR for space quota safety.

sort -T 
picard -jar TMP_DIR=tmp