This lesson expands text and file tooling. It shows how to search the filesystem with find
, chain work with xargs
, edit text streams and files with sed
, and report on structured data with awk
. The focus is safe patterns that work on real projects.
find
: locate files or directories by name, size, time, or typexargs
: convert a stream of names into argument lists for a commandsed
: non interactive editing and search or replaceawk
: field based processing and quick reports
Prerequisites
- Day 12 completed
- A project folder with sample files, or the provided playground
find basics
# by name and type
find . -type f -name "*.log"
# case insensitive name search
find src -type f -iname "*.md"
# by size and modification time
find /var/log -type f -size +50M -mtime -7 -printf '%p %s bytes\n'
# limit depth
find . -maxdepth 2 -type d -name build
# exclude directories
find . -type f -name "*.py" -not -path "*/venv/*"
Useful tests and actions:
-type f|d|l
file, directory, symlink-name
and-iname
for case insensitive-mtime -7
modified in last 7 days,-mmin -60
modified in last hour-size +100M
larger than 100 MiB-print0
with NUL separators for safe piping-delete
to remove matches, used only after testing with-print
Use -xdev
to avoid descending into mounted filesystems like backups or network shares when running cleanup jobs.
xargs for safe bulk actions
xargs
builds command lines from standard input. Combine with -print0
to handle spaces and special characters.
# remove editor swap files safely
find . -type f -name "*.swp" -print0 | xargs -0 -r rm -v
# compress large logs in place
find /var/log -type f -name "*.log" -size +50M -print0 | xargs -0 -r gzip -v
# run a command once per file using {}
find images -type f -name "*.png" -print0 | xargs -0 -I{} -r echo "Converting {}"
Flags used:
-0
read NUL separated input-r
do nothing on empty input-I{}
replace occurrences of{}
in the command template
Replace the final command with printf '%s\n'
to print file names before running destructive actions.
sed for search and replace
sed
edits streams and files. The most common task is search and replace.
# replace a word in a file and print to stdout
sed 's/error/warning/g' app.log | head
# in place with backup
sed -i.bak 's/http:\/\//https:\/\//g' site/config.ini
# replace only on lines that match a filter
sed '/^#.*TODO/ s/TODO/DONE/' notes.txt
# remove blank lines
sed '/^$/d' README.md
Use a different delimiter when slashes are noisy.
sed -i 's|/var/www|/srv/www|g' nginx.conf
sed
is line oriented and byte based. For binary files or complex encodings, choose a specialized tool. Always keep backups when running in place edits on important data.
awk for fields and quick reports
awk
reads lines, splits them into fields, and runs small programs written as patterns and actions.
# print the first and third fields of a CSV
echo 'user,action,duration' | awk -F, '{print $1, $3}'
# sum a numeric field grouped by a key
awk -F, 'NR>1 {sum[$2]+=$3} END {for (k in sum) printf "%s,%d\n", k, sum[k]}' data.csv | sort
# filter by status and compute average duration
awk '$9 ~ /^5../ {n++; s+=$NF} END {if (n) printf "avg=%0.2f\n", s/n}' access.tsv
Notes:
-F
sets the field delimiter$1
,$2
,$3
refer to fields,$0
is the whole lineNR
is the current line number,NF
is the number of fieldsBEGIN
andEND
blocks run before and after records
Pretty table from delimited input:
awk -F, 'BEGIN{printf "%-10s %-10s %-10s\n", "user","action","ms"} NR>1{printf "%-10s %-10s %-10s\n", $1,$2,$3}' data.csv
When printing floats in awk
, set a format string with printf
to control decimal places. Example printf "%0.2f"
.
Project wide search and replace
# change a config key across many files, preview first
rg -n "^api_url=" 2>/dev/null || grep -R -n "^api_url=" .
# safe in place edit with backups
find . -type f -name "*.env" -print0 \
| xargs -0 -r sed -i.bak 's|^API_URL=.*|API_URL=https://api.example.com|'
Mass rename with rename
when available, or with a small bash
loop.
# rename JPG to jpg using perl rename
rename 's/\.JPG$/.jpg/' *.JPG 2>/dev/null || true
# portable loop fallback
for f in *.JPG; do mv -v -- "$f" "${f%.JPG}.jpg"; done
Use version control or make .bak
files before bulk edits. Test on a small subset first.
CSV and log summaries
# CSV: total and average by action
awk -F, 'NR>1{sum[$2]+=$3; cnt[$2]++} END{for(k in sum) printf "%s,%.2f\n", k, sum[k]/cnt[k]}' data.csv | sort
# Nginx access log: count status codes
awk '{c[$9]++} END{for(k in c) printf "%s %d\n", k, c[k]}' /var/log/nginx/access.log 2>/dev/null | sort -nr
# Auth log: top source IPs for failures
grep -E "Failed password|authentication failure" /var/log/auth.log 2>/dev/null \
| grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" \
| sort | uniq -c | sort -nr | head
Practical lab
- Prepare a playground with mixed files.
mkdir -p ~/playground/day13/{logs,src,conf}
printf "alpha\n\nbeta\n" > ~/playground/day13/src/readme.md
printf "2025-10-03 user=alice time_ms=120\n2025-10-03 user=bob time_ms=80\n" > ~/playground/day13/logs/app.log
cp /etc/hosts ~/playground/day13/conf/hosts.sample 2>/dev/null || true
- Find all
.sample
files and copy them without the suffix.
cd ~/playground/day13
find conf -type f -name "*.sample" -print0 | xargs -0 -I{} sh -c 'cp -v "{}" "${1%.sample}"' _ {}
- Change a config key across
.sample
files.
find conf -type f -name "*.sample" -print0 | xargs -0 sed -i.bak 's/^HOSTNAME=.*/HOSTNAME=demo.local/'
- Summarize log timing by user with
awk
.
awk '{for(i=1;i<=NF;i++){if($i~/^user=/)u=substr($i,6); if($i~/^time_ms=/)t=substr($i,9)}} {sum[u]+=t; cnt[u]++} END{for(k in sum) printf "%s %.2f\n", k, sum[k]/cnt[k]}' logs/app.log | sort
- Remove empty lines from markdown under
src/
.
find src -type f -name "*.md" -print0 | xargs -0 sed -i.bak '/^$/d'
Troubleshooting
Argument list too long
when passing many files. Usefind ... -print0 | xargs -0
or-exec ... +
to batch arguments.- Files with spaces break commands. Always prefer NUL safe pairs:
-print0
withxargs -0
. sed -i
is different on macOS and GNU sed. Use a backup suffix for portability, for example-i.bak
.awk
field splits look wrong. Set the correct delimiter with-F
and check that lines do not contain embedded commas or spaces in quoted fields.- Bulk edits modify generated or vendor files. Exclude directories with
-not -path
infind
or--exclude-dir
ingrep -R
.
Next steps
Day 14 covers shell scripting basics. It introduces writing reusable scripts with arguments, exit codes, functions, set -euo pipefail
, strict mode, and logging. It ends with a small script that wraps a multi step workflow from today.