GeekCoding101 - Make your way to geek

Git Notes

GeekCoding101 — Sun, 07 Jan 2018 00:00:00 GMT

Hi there!

Recently, I've spent some time to organize my git commands notes. I know you can find those commands online easily, but I would like to share these what I think useful and put them together here for my own references. Let's take a look!

General Settings

Setting Editor for git. This would be useful when writing Commit Messages. When committing changes without specifying a message inline (using git commit -m "message"), git opens the default editor to let you write a commit message. Without a predefined editor, it might not know which text editor to open, especially when you have your own preferences for editing messages.
In my case, I like vim, so here comes how I set it. Simple.

git config --global core.editor "vim"

Setting Committer Name & Email Globally. Of course, this is a must.

git config user.name "your name"
git config user.email "your_email@email.com"

If you want to apply Committer Name & Email per repository, just simply omit the --global flag from above commands and run it from your repository folder.
Changing the Author Information Just for the Next Commit

git commit --author="your name <your_email@email.com>"

Show all commiter/author in log

git log --pretty="%an %ae%n%cn %ce" | sort | uniq

%an author name
%ae author email
%n  new line
%cn committer name
%ce committer email

You will notice that, for each commit, it has both author name and committer name.

Branches

Operations	Command
Get current working branch	`git branch`
Checkout specific branch	`git clone -b specific_branch --single-branch http://username@192.168.99.100:8080/scm/your-repo.git`
Create a branch	`git checkout -b new_branch`
Push to remote branch	`git push origin remote_branch`
Delete local branch	`git branch -d <branch>`
Delete remote branch	`git push origin :[name_of_your_new_branch]`

Notes:

"git branch <branch_name>" : The repository history remains unchanged. All you get is a new pointer to the current commit.

The "git branch <branch_name>" only creates the new branch. To start adding commits to it, you need to select it with git checkout, and then use the standard git add and git commit commands.

Delete branch: "-d" is a “safe” operation in that Git prevents you from deleting the branch if it has unmerged changes. The only difference is the ":" to say delete, you can do it too by using github interface to remove branch : https://help.github.com/articles/deleting-unused-branches.

Fast-forward merge

Start a new feature

git checkout -b new-feature master

Edit some files

git add <file>
git commit -m "Start a feature"

Edit some files

git add <file>
git commit -m "Finish a feature"

Merge in the new-feature branch

git checkout master  (This will switch to master branch)
git merge new-feature (This will merge your changes from new-feature branch to master)
git branch -d new-feature

Git Clone

Clone into current directory

git init .
git remote add origin <repository-url>
git pull origin master

Git Checkout

Check out files deleted locally

Sometimes, you might accidently delete some files in your local repos. Then you can use below command to pull them back from remote:

git checkout HEAD <path>

Clone a subdirectory only of a Git repository

What you are trying to do is called a sparse checkout.

mkdir <repo>
cd <repo>
git init
git remote add -f origin <https://product.git>

This creates an empty repository with your remote, and fetches all objects but doesn't check them out. Then do:

git config core.sparseCheckout true

Now you need to define which files/folders you want to actually check out. This is done by listing them in .git/info/sparse-checkout, eg:

echo "temp" >> .git/info/sparse-checkout

Last but not least, update your empty repo with the state from the remote:

git pull origin master

You will now have files "checked out" for "temp" folder on your file system, and no other paths present.

Clone specific branch of a Git repository

Just use singel-branch option:

git clone -b <branch tag> --single-branch https://user.name@product.git

Git Remote

I was wondering what is a git remote, here is:

A remote in git is basically a bookmark for a different repository from which you may wish to pull or push code. The bookmarked repository may be on your local computer in a different folder, on remote server, or it may even be the repository itself ( I haven't tried this ) but the simplest analogy is a bookmark. The repository doesn't even have to be a version of your repository, it may even be a completely unrelated repository.

Other explanations:

As you probably know, git is a distributed version control system. Most operations are done locally. To communicate with the outside world, git uses what are called remotes. These are repositories other than the one on your local disk which you can push your changes into (so that other people can see them) or pull from (so that you can get others changes). The command git remote add origin git@github.com:peter/first_app.git creates a new remote called origin located at git@github.com:peter/first_app.git. Once you do this, in your push commands, you can push to origin instead of typing out the whole URL. Is the word 'origin` is arbitrary? Yes.

Git Log

git log -1 --graph   --name-only  feature/your-feature
* commit d1f669674305665f9e6b8914511ed709aa8f09xb (HEAD -> feature/your-feature, origin/feature/your-feature)
| Author: xx.author <xx.author@yourdomain.com>
| Date:   Wed Jul 26 15:14:48 2017 -0700
|
|     Your comments.
|
| your-source-code/file.sh
| your-source-code/file02.sh

git log -1 --graph   --name-only --pretty=oneline feature/your-feature
* d1f669674305665f9e6b8914511ed709aa8f0x2b (HEAD -> feature/your-feature, origin/feature/your-feature) Your comments.
| your-source-code/file.sh
| your-source-code/file02.sh

The log command takes a --follow argument that continues history before a rename operation:

git log --follow ./renamed_path/to/file

Git Diff

git diff HEAD your_file
git diff HEAD@{1} your_file    --> The @{1} means "the previous position of the ref I've specified", so that evaluates to what you had checked out previously - just before the pull.
git diff HEAD^                 --> This will diff all files which have been changed in previous commit.

Diff that changed between two commits

git diff --word-diff SHA1 SHA2

If you only need last commit diff with previous one:

git show

If you only need names and comments:

git show --name-only

Git Stash

Operations	Command
Stash	git stash
Bring back your local changes	git stash pop

git stash pop throws away the (topmost, by default) stash after applying it, whereas git stash apply leaves it in the stash list for possible later reuse (or you can then git stash drop it).

This happens unless there are conflicts after git stash pop, in this case, it will not remove the stash, behaving exactly like git stash apply.

Another way to look at it: git stash pop is git stash apply && git stash drop.

Git Squash

Why do we need git squash and how it helps?

https://softwareengineering.stackexchange.com/questions/263164/why-squash-git-commits-for-pull-requests has collected a lot of great explanation. Among those answers, I am in more favor of this point:

Because often the person pulling a PR cares about the net effect of the commits "added feature X", not about the "base templates, bugfix function X, add function Y, fixed typos in comments, adjusted data scaling parameters, hashmap performs better than list"... level of detail

Steps:

Set your current branch: export curr_branch="<your_current_branch>"
git log --oneline origin/master..$curr_branch
git reset --soft `git merge-base origin/master $curr_branch`
git commit -c <hash string of one of your previous msg>
git diff HEAD <your_file>
git push --force

Revert-Reset

Undo a commit and Redo

$ git commit -m "Something comments"
$ git reset HEAD~
<< edit files as necessary >>
$ git add ...
$ git commit -c ORIG_HEAD

For the last command, if you do not need to edit the message, you could use the -C option.

Revert a commit already pushed to a remote repository

Revert with log history for tracing rollback opration

$ git revert <commit hash>

It will delete a previous commit(ab12cd15) from local branch and remote branch, but you will get a log.

Revert even without any log trace for rollback operation

You just commited a change to your local branch and immediately pushed to the remote branch. Suddenly realized , Oh no! I dont need this change. Now what can you do?

git reset --hard HEAD~1 [for deleting that commit from local branch]
git push origin HEAD --force

Revise commit log

git commit --amend

Ignore local files instead of updating .gitignore

Source: https://practicalgit.com/blog/make-git-ignore-local-changes-to-tracked-files.html

git update-index --assume-unchanged <file-to-ignore>

Now you can go ahead and do whatever you want in that file and it will not show up as a changed file in git.

This will work unless that file is changed on the remote branch. In that case if you do a pull, you will get an error.

When that happens you need to tell Git to start caring about the file again, stash it, pull, apply your stashed changes, and tell Git to start ignoring the file again:

# tell Git to stop ignoring this file
$ git update-index --no-assume-unchanged <file-to-ignore>

# stash your local changes to the file
$ git stash <file-to-ignore>

# Pull from remote
$ git pull

# Apply your stashed changes and resolve the possible conflict
$ git stash apply

# Now tell Git to ignore this file again
$ git update-index --assume-unchanged <file-to-ignore>

Pull latest changes from another branch to current branch

My Git Commands Alias

alias gs="git status "
alias ga="git add "
alias gb="git branch "
alias gba="git branch -a "
alias gbd="git branch -d "
alias gbr="git branch -r "
alias gc="git commit "
alias gd="git diff "
alias gco="git checkout "
alias glg="git log --graph --name-only "

Q/A

Why git keep asking user credentials

Just need to run commands to store credentials into credential.helper, like this:

git config user.name "geekcoding"
git config user.email "geekcoding@users.noreply.github.com"
git pull  <- It might ask credentails at this moment
git config --global credential.helper store
git config --global credential.helper cache

Seeing `HEAD detached at xxxxxx`

Warning:

Apparently below command will result in losing all the changes made in the detached mode. So be careful when using it.

git checkout -f master

Okay, that's all I have so far. Enjoy!

Docker Notes

GeekCoding101 — Fri, 09 Mar 2018 00:00:00 GMT

Hi there!

This is yet another note from me ^^

This is for my notes about Docker. I've been dealing with container technologies for years, it's a good habit to dump all of my notes here.

I hope you find this useful as well.

Build Docker Image

Method 1: Docker build

Using dockerfile is the formal way to build a docker image.

We can define the base image to pull from, copy files inside it, run configuration, specify what process to start with.

You know I like using Django for projects, here comes a dockerfile from Cookiecutter:

# define an alias for the specific python version used in this file.
FROM python:3.11.6-slim-bullseye as python

# Python build stage
FROM python as python-build-stage

ARG BUILD_ENVIRONMENT=local

# Install apt packages
RUN apt-get update && apt-get install --no-install-recommends -y \
  # dependencies for building Python packages
  build-essential \
  # psycopg2 dependencies
  libpq-dev

# Requirements are installed here to ensure they will be cached.
COPY ./requirements .

# Create Python Dependency and Sub-Dependency Wheels.
RUN pip wheel --wheel-dir /usr/src/app/wheels  \
  -r ${BUILD_ENVIRONMENT}.txt

# Python 'run' stage
FROM python as python-run-stage

ARG BUILD_ENVIRONMENT=local
ARG APP_HOME=/app

ENV PYTHONUNBUFFERED 1
ENV PYTHONDONTWRITEBYTECODE 1
ENV BUILD_ENV ${BUILD_ENVIRONMENT}

WORKDIR ${APP_HOME}

# devcontainer dependencies and utils
RUN apt-get update && apt-get install --no-install-recommends -y \
  sudo git bash-completion ssh vim
RUN echo "alias ls='ls -G --color=auto" >> ~/.bashrc
RUN echo "alias ll='ls -lh --color=auto" >> ~/.bashrc

# Create devcontainer user and add it to sudoers
RUN groupadd --gid 1000 dev-user \
  && useradd --uid 1000 --gid dev-user --shell /bin/bash --create-home dev-user \
  && echo dev-user ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/dev-user \
  && chmod 0440 /etc/sudoers.d/dev-user

# Install required system dependencies
RUN apt-get update && apt-get install --no-install-recommends -y \
  # psycopg2 dependencies
  libpq-dev \
  # Translations dependencies
  gettext \
  # cleaning up unused files
  && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
  && rm -rf /var/lib/apt/lists/*

# All absolute dir copies ignore workdir instruction. All relative dir copies are wrt to the workdir instruction
# copy python dependency wheels from python-build-stage
COPY --from=python-build-stage /usr/src/app/wheels  /wheels/

# use wheels to install python dependencies
RUN pip install --no-cache-dir --no-index --find-links=/wheels/ /wheels/* \
  && rm -rf /wheels/

COPY ./compose/production/django/entrypoint /entrypoint
RUN sed -i 's/\r$//g' /entrypoint
RUN chmod +x /entrypoint

COPY ./compose/local/django/start /start
RUN sed -i 's/\r$//g' /start
RUN chmod +x /start

COPY ./compose/local/django/celery/worker/start /start-celeryworker
RUN sed -i 's/\r$//g' /start-celeryworker
RUN chmod +x /start-celeryworker

COPY ./compose/local/django/celery/beat/start /start-celerybeat
RUN sed -i 's/\r$//g' /start-celerybeat
RUN chmod +x /start-celerybeat

COPY ./compose/local/django/celery/flower/start /start-flower
RUN sed -i 's/\r$//g' /start-flower
RUN chmod +x /start-flower

# copy application code to WORKDIR
COPY . ${APP_HOME}

ENTRYPOINT ["/entrypoint"]

Method 2: Docker commit

Another way is to use docker commit <container_id> <new_image_name>, it will create a new image based on your existing image in you docker local storage.

Export/Import

After we have docker images, we usually want to share it with other or transfer to another places, that's where export/import are used:

docker save <image_name:version> > exported_file.tar
docker load < exported_file.tar

Docker Registry

Environment: CentOS 7.2

Setup Docker repository

sudo tee /etc/yum.repos.d/docker.repo <<-'EOF'
[dockerrepo]
name=Docker Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7/
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
EOF

Install and enable docker-registry

yum install docker-registry
systemctl enable docker-registry.service
service docker-registry start

Verify docker-registry service

Using curl to checkcurl localhost:5000 You should get results: "\"docker-registry server\""
systemctl status docker-registry

Configure storage_path

Update local storage path to your specific location in /etc/docker-registry.yml:

local: &local
     <<: *common
     storage: local
     storage_path: _env:STORAGE_PATH:/data/docker/docker-registry

Then restart: systemctl restart docker-registry.service

Setup client to use the registry

Update /etc/sysconfig/docker to add --insecure-registry your_ip_or_hostname:5000 as below:

# /etc/sysconfig/docker

# Modify these options if you want to change the way the docker daemon runs
OPTIONS='--insecure-registry your_ip_or_hostname:5000 --selinux-enabled --log-driver=journald'
DOCKER_CERT_PATH=/etc/docker

Push to the registry

In order to have some images to push to the registry, let's pull from docker.io firstly: docker pull centos

Please write down the IMAGE ID for the centos image

If you push it to your own registry now, you will get error as blow:

# docker push your_ip_or_hostname:5000/ci
The push refers to a repository [your_ip_or_hostname:5000/ci]
An image does not exist locally with the tag: your_ip_or_hostname:5000/ci

So you need to create a repo on your private registry then try to push again.

To do that, you can tag a repo on your private registry and push:

# docker tag the_centos_image_id_you_wrote_down your_ip_or_hostname:5000/centos
[root@geekcoding101 ~]# docker push your_ip_or_hostname:5000/centos
The push refers to a repository [your_ip_or_hostname:5000/centos]
97ca462ad9cc: Image successfully pushed
Pushing tag for rev [the_centos_image_id_you_wrote_down] on {http://your_ip_or_hostname:5000/v1/repositories/centos/tags/latest}
[root@geekcoding101 ~]#

Docker Storage

Where does docker store images?

Usually is /var/lib/docker/.

But vary depending on the driver Docker is using for storage.

You can manually set the storage driver with the -s or --storage-driver= option to the Docker daemon.

/var/lib/docker/{driver-name} will contain the driver specific storage for contents of the images.
/var/lib/docker/graph/<id> now only contains metadata about the image, in the json and layersize files.

In the case of aufs:

/var/lib/docker/aufs/diff/<id> has the file contents of the images.
/var/lib/docker/repositories-aufs is a JSON file containing local image information. This can be viewed with the command docker images

Cheat Sheet

Command
`docker version`
`docker info`
`docker images`
`docker rmi <image name>`
`docker run -t -i centos`
`docker run -d centos /bin/sh -c "while true; do echo hello world; sleep 1; done"`
`docker stop <container name>`
`docker inspect <container name>`
`docker tag <tag of container>`
`docker logs <container name>`

Okay, that's all from me. Thank you for reading!

Tmux Notes

GeekCoding101 — Fri, 24 Jan 2020 00:00:00 GMT

Hi there!

Today I'd like to share you my notes about tmux!

Tmux is my favorite terminal multiplexer! Several years ago I didn't give a **it for people using it! Because I thought that might consume too much of my time to customize. However, one day I was free, then tested the water! I feel like I couldn't live without it in my coding environment!

It likes Vim, the learning curve is steep, but once you're comfortable with it, you will addict to it!

No more talking, let's dive into it!

Introduction

It’s tmux, a so-called terminal multiplexer. Simply speaking, tmux acts as a window manager within your terminal 1 and allows you to create multiple windows and panes within a single terminal window.

Pane

Shortcut	Comment
`Pre %`	Splitting panes in left and right
`Pre "`	Splitting panes in top and bottom
`Pre <arrow key>`	Navigating in panes
`C-d`	Close panes
`Pre: swap-pane -s <sid> -t <tid>`	Swap sid pane to tid pane
`Pre z`	Make a pane go full screen, vice versa
`Pre C-<arrow key>` `Pre ⌥-<arrow key>`	Resize pane in direction of

Windows

Shortcut	Comment
`Pre c`	Create a new window
`Pre ,`	Rename current window
`Pre x`	Close current window with prompt and deattach

Sessions

Shortcut	Comment
`Pre :new -s <name>`	Create a new session
`Pre C-c`	Create a new session
`Pre $`	Rename current session
`Pre s, then x on the session`	Delete the selected session

Configuration

This is the folder of configuration: ~/.tmux.
This is the configuratino file: ~/.tmux.conf

Check all configuration: tmux show-options -g
Reload conf in a session: source-file <tmux.conf>
Reload conf out of a session: tmux source-file <.tmux.conf>

Session Handling

tmux ls (Same as: tmux list-sessions)
tmux kill-server                    (You can think this is to kill/remove all sessions)
tmux attach -t 0
tmux rename-session -t 0 <new session name>
tmux new -s <session name>
tmux attach -t <session name>
tmux rename-session -t <old session name> <new session name>
tmux kill-session -t targetSession  (kill the specific session)

Search

Pre [ to enter copy mode.
If you're using vi key bindings ( Ctrl-b:set-window-option -g mode-keys vi ), press / then type the string to search for and press Enter . Press n to search for the same string again. Press Shift-n for reverse search as in emacs mode. Press q twice to exit copy mode. You can use ? to search in the reverse direction.

Pluggins

I haven't explored much in pluggins, but this is the ones I used or come with tmux by default, you can give a try:

tmux-plugins/tpm
tmux-plugins/tmux-sensible
tmux-plugins/tmux-resurrect
- Save an entire tmux session: prefix + Control + s
- Restore an entire tmux session: prefix + Control + r

Use conf from Github

I used https://github.com/gpakosz/.tmux.git. It's well customized.
My .tmux.conf and .tmux.conf.local are based on it.

Integrate with iTerm2

I have iTerm2 installed on my Mac, so I want to integrate tmux with it.

In iTerm2, General -> Command:

  tmux attach -t base || tmux new -s base

In tmux, Prefix (Control + B) + w, it will list all windows. Each entry it has a shortcut. You might notice that, after 0~9 , it started with M, like M-i. The M means Meta Key. In iTerm2, you need to set it. I've set it as below:

My customization

Moving windows by adding below settings into ~/.tmux.conf.local (C means Control, S means Shift, Left/Right means arrow keys):

bind-key -n C-S-Left swap-window -t -1; select-window -t -1
bind-key -n C-S-Right swap-window -t +1; select-window -t +1

Navigating between panes (NOT windows): Prefix + (→ | ← | ↑ | ↓)
Find the mouse mode setting in ~/.tmux.conf.local and uncomment it out as below:

# start with mouse mode enabled
set -g mouse on

The script to attach or create tmux session. You can configure it at the startup of iterm2. The script will prevent attaching to the tmux session again if it has already been attached in another iTerm2 tab. The reason why I'd like to have this script is that, if not prohibit it, you will see exact same tmux session in your all new iTerm2 tab... then what's the purpose of creating new iTerm2 tab.

#!/bin/zsh

tmux ls|grep kongfu|grep -q attached

if [[ $? != 0 ]] ; then
  tmux attach -t kongfu  ||  tmux new-session -s kongfu
else
  echo "********************************************************************************"
  echo "* Ignore attaching tmux kongfu session as it has been attached already.        *"
  echo "********************************************************************************"
fi

Misc

Resolution problem in multiple monitors

Let’s say you’re connecting to a remote server over ssh with Terminal.app. When you “tmux attach” with bigger resolution monitor from smaller one you previously started tmux, it draws dots around the console. It doesn’t fit the new window size.

Resolution: You can always choose which client you want to detach from the session by pressing: C-b D

Okay, that's all for us today!
Hope you love it!

Build and Sign RPM package and repo

GeekCoding101 — Fri, 22 Jan 2021 00:00:00 GMT

Hi there!

Welcome to geekcoding101.com!

I have two decades years of working experiences on Linux. There are many things I have come across, but I want to say, building package for Linux is something you couldn't avoid at all in your work or study!

I have summarized the steps/tricks in this article, hope you will find it useful!

Enjoy!

Create unsigned rpm

I will first demonstrate how to create unsigned rpm.

Create Folder Structure

First step is creating folder structure.

If you don't specify top_dir in ~/.rpmmacros (It's a config file), then it will use ~/rpmbuild by default

cd ~ 
mkdir rpmbuild 
cd rpmbuild 
mkdir BUILD BUILDROOT SOURCES SRPMS RPMS SPECS

Create SPEC file for unsigned rpm

Now we can work on the spec file SPECS/rpm-no-sig.spec:

# cat SPECS/rpm-no-sig.spec
Name:       rpm-no-sig
Version:    1.0
Release:    1
Summary:    This is a unsigned rpm.

Vendor:     Geekcoding101
License:    Copyright (c) 2021

BuildArch:  noarch
BuildRoot:  %{_tmppath}/%{name}-%{version}

Packager:   Geekcoding101

Source0:    %{name}-%{version}.tar.gz
%define INSTALL_DIR /usr/lib/unsigned-rpm/
%define INSTALL_FILE rpm-helper-unsigned.py

%description
%{Summary}
This package provides rpm gpgcheck demo.

%prep
%setup -q

%install
rm -rf %{buildroot}
install --directory %{buildroot}/%{INSTALL_DIR}
install -m 0755 %{INSTALL_FILE} %{buildroot}/%{INSTALL_DIR}

%clean
rm -rf %{buildroot}

%files
%defattr(-,root,root,-)
%{INSTALL_DIR}
%{INSTALL_DIR}/%{INSTALL_FILE}
%exclude %{INSTALL_DIR}/*.pyc
%exclude %{INSTALL_DIR}/*.pyo

%doc

%changelog
* Sun Jan 21 2021 - Geekcoding101
- Initial commit.

%post

Create a dummy source file for unsigned rpm

Use a dummy py file to be packed into the rpm: rpm-helper-unsigned.py:

#!/usr/bin/env python

import sys
import os
import re

def main():
    print("This is from unsigned rpm.")

if __name__ == '__main__':
    main()

Create a folder: mkdir <rpm-name>-<version>

For example:

cd ~
mkdir rpm-no-sig-1.0

Then put rpm-helper-unsigned.py under it.

Then make gz file for the folder:

tar cf rpm-no-sig-1.0.tar rpm-no-sig-1.0
gzip rpm-no-sig-1.0.tar

You will get file rpm-no-sig-1.0.tar.gz.

Move it to SOURCES folder.

When building rpm, it will recognize this gz file and extract it automatically.

Build rpm-no-sig.rpm

Run command: rpmbuild -ba SPECS/rpm-no-sig.spec

Example:

# rpmbuild -ba SPECS/rpm-no-sig.spec
warning: bogus date in %changelog: Sun Jan 21 2021 - Geekcoding101
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.ut6Q5z
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd /root/rpmbuild/BUILD
+ rm -rf rpm-no-sig-1.0
+ /usr/bin/gzip -dc /root/rpmbuild/SOURCES/rpm-no-sig-1.0.tar.gz
+ /usr/bin/tar -xf -
+ STATUS=0
+ '[' 0 -ne 0 ']'
+ cd rpm-no-sig-1.0
+ /usr/bin/chmod -Rf a+rX,u+w,g-w,o-w .
+ exit 0
Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.UOxKuK
+ umask 022
+ cd /root/rpmbuild/BUILD
+ '[' /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64 '!=' / ']'
+ rm -rf /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64
++ dirname /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64
+ mkdir -p /root/rpmbuild/BUILDROOT
+ mkdir /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64
+ cd rpm-no-sig-1.0
+ rm -rf /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64
+ install --directory /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64//usr/lib/unsigned-rpm/
+ install -m 0755 rpm-helper-unsigned.py /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64//usr/lib/unsigned-rpm/
+ '[' noarch = noarch ']'
+ case "${QA_CHECK_RPATHS:-}" in
+ /usr/lib/rpm/check-buildroot
+ /usr/lib/rpm/redhat/brp-compress
+ /usr/lib/rpm/redhat/brp-strip /usr/bin/strip
+ /usr/lib/rpm/redhat/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump
+ /usr/lib/rpm/redhat/brp-strip-static-archive /usr/bin/strip
+ /usr/lib/rpm/brp-python-bytecompile /usr/bin/python 1
+ /usr/lib/rpm/redhat/brp-python-hardlink
+ /usr/lib/rpm/redhat/brp-java-repack-jars
Processing files: rpm-no-sig-1.0-1.noarch
warning: File listed twice: /usr/lib/unsigned-rpm/rpm-helper-unsigned.py
Provides: rpm-no-sig = 1.0-1
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PartialHardlinkSets) <= 4.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires: /usr/bin/env
Checking for unpackaged file(s): /usr/lib/rpm/check-files /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64
Wrote: /root/rpmbuild/SRPMS/rpm-no-sig-1.0-1.src.rpm
Wrote: /root/rpmbuild/RPMS/noarch/rpm-no-sig-1.0-1.noarch.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.w4KHSg
+ umask 022
+ cd /root/rpmbuild/BUILD
+ cd rpm-no-sig-1.0
+ rm -rf /root/rpmbuild/BUILDROOT/rpm-no-sig-1.0-1.x86_64
+ exit 0

You will get RPMS/noarch/rpm-no-sig-1.0-1.noarch.rpm

Check MD5: rpm -Kv <rpm file>

Example:

# rpm -Kv RPMS/noarch/rpm-no-sig-1.0-1.noarch.rpm
RPMS/noarch/rpm-no-sig-1.0-1.noarch.rpm:
    Header SHA1 digest: OK (84bd6662874a27ccd5cd3247ef7a4107c1919f54)
    MD5 digest: OK (198ab02bd5765c383c57dbe113551af0)

Backup this rpm to somewhere else.

Create signed rpm

Create SPEC file for signed rpm

The spec file SPECS/rpm-with-sig.spec:

# cat SPECS/rpm-with-sig.spec
Name:       rpm-with-sig
Version:    1.0
Release:    1
Summary:    This is a signed rpm.

Vendor:     Geekcoding101
License:    Copyright (c) 2021

BuildArch:  noarch
BuildRoot:  %{_tmppath}/%{name}-%{version}

Packager:   Geekcoding101

Source0:    %{name}-%{version}.tar.gz
%define INSTALL_DIR /usr/lib/signed-rpm/
%define INSTALL_FILE rpm-helper-signed.py

%description
%{Summary}
This package provides rpm gpgcheck demo.

%prep
%setup -q

%install
rm -rf %{buildroot}
install --directory %{buildroot}/%{INSTALL_DIR}
install -m 0755 %{INSTALL_FILE} %{buildroot}/%{INSTALL_DIR}

%clean
rm -rf %{buildroot}

%files
%defattr(-,root,root,-)
%{INSTALL_DIR}
%{INSTALL_DIR}/%{INSTALL_FILE}
%exclude %{INSTALL_DIR}/*.pyc
%exclude %{INSTALL_DIR}/*.pyo

%doc

%changelog
* Sun Jan 21 2021 - Geekcoding101
- Initial commit.

%post

Create a dummy source file for signed rpm

Use a dummy py file to be packed into the rpm: rpm-helper-signed.py. You can just reuse the one in above and change the print message accordingly.

Create a folder: cd ~ && mkdir <rpm-name>-<version>

Example: cd ~ && mkdir rpm-with-sig-1.0

Move rpm-helper-signed.py into the folder.

Also create the gz file with same process.

Example:

tar cf rpm-with-sig-1.0.tar rpm-with-sig-1.0
gzip rpm-with-sig-1.0.tar

You will get file rpm-with-sig-1.0.tar.gz.

Remove rpm-no-sig-1.0.tar.gz from SOURCES folder. Move rpm-with-sig-1.0.tar.gz into SOURCES.

Generate GPG key and Build rpm-with-sig.rpm

Generate a gpg key for signing the rpm: gpg --gen-key
Check the new keys on your system:

gpg --fingerprint
gpg --list-keys

Update ~/.rpmmacros to specify which key to be used for signing: %_gpg_name <secret key's last 8 digits>
Build the rpm: rpmbuild -ba SPECS/rpm-with-sig.spec
You will get RPMS/noarch/rpm-with-sig-1.0-1.noarch.rpm
Sign the rpm: rpm --addsign RPMS/noarch/rpm-with-sig-1.0-1.noarch.rpm
Now you can check MD5 again: rpm -Kv <rpm file> Example:

# rpm -Kv ../rpm-with-sig-1.0-1.noarch.rpm
../rpm-with-sig-1.0-1.noarch.rpm:
    Header V4 RSA/SHA1 Signature, key ID 1ddb39c6: NOKEY
    Header SHA1 digest: OK (64fc89cf3eb3054e6316a77f4b22c183221ab13d)
    V4 RSA/SHA1 Signature, key ID 1ddb39c6: NOKEY
    MD5 digest: OK (fadae5f41bb0c939edbe865a974fce4c)

You might see "NOKEY" in above output, because we didn't import the key into OS.
Export your key: gpg --export -a <last_8_dig_of_your_pub_key> > PUB_KEY_SIGNING_RPM
Import it into RPM's database: rpm --import PUB_KEY_SIGNING_RPM
Now you can check MD5 again: rpm -Kv rpm-with-sig-1.0-1.noarch.rpm Example:

# rpm -Kv ../rpm-with-sig-1.0-1.noarch.rpm
../rpm-with-sig-1.0-1.noarch.rpm:
    Header V4 RSA/SHA1 Signature, key ID 1ddb39c6: OK
    Header SHA1 digest: OK (64fc89cf3eb3054e6316a77f4b22c183221ab13d)
    V4 RSA/SHA1 Signature, key ID 1ddb39c6: OK
    MD5 digest: OK (fadae5f41bb0c939edbe865a974fce4c)

Creating repo database/conf for unsigned rpm

Create repo database for the rpm-no-sig rpm

cd ~
mkdir unsigned_repo_with_rpm_no_sig
cp rpm-no-sig-1.0-1.noarch.rpm unsigned_repo_with_rpm_no_sig
createrepo --database unsigned_repo_with_rpm_no_sig/

Example:

# createrepo --database unsigned_repo_with_rpm_no_sig/
Spawning worker 0 with 1 pkgs
Spawning worker 1 with 0 pkgs
Spawning worker 2 with 0 pkgs
Spawning worker 3 with 0 pkgs
Workers Finished
Saving Primary metadata
Saving file lists metadata
Saving other metadata
Generating sqlite DBs
Sqlite DBs complete

Create repo conf file for unsigned repo

# cat /etc/yum.repos.d/rpm-no-sig.repo
[RPM-NO-SIG]
name=rpm no sig repository
baseurl=file:///root/unsigned_repo_with_rpm_no_sig
enabled=1
gpgcheck=0
localpkg_gpgcheck=0
repo_gpgcheck=0
skip_if_unavailable=1

Creating signed repo database/conf/gpg for signed rpm

Generate GPG key and Create repo for the rpm-with-sig rpm

cd ~
mkdir signed_repo_with_rpm_with_sig
cp rpm-with-sig-1.0-1.noarch.rpm signed_repo_with_rpm_with_sig
createrepo --database signed_repo_with_rpm_with_sig/

Generate a new gpg key for signing the repo: gpg --gen-key Create asc file:

gpg --detach-sign --armor -r Ox<secret key's last 8 digits fingerprint> signed_repo_with_rpm_with_sig/repodata/repomd.xml

It will generate signed_repo_with_rpm_with_sig/repodata/repomd.xml.asc.

Create repo conf file for signed repo

# cat /etc/yum.repos.d/rpm-with-sig.repo
[RPM-WITH-SIG]
name=rpm with sig repository
baseurl=file:///root/signed_repo_with_rpm_with_sig
enabled=1
gpgcheck=1
localpkg_gpgcheck=1
repo_gpgcheck=1
skip_if_unavailable=1

RPM/YUM relevant GPG knowlege

There are two types of GPG keyrings used on RPM-based systems:

RPM's GPG keyring. This keyring is used for verifying signatures on RPM packages. This can be check by gpg without specifying homedir.
YUM's GPG keyring. This keyring is used for verifying signatures on repository metadata. There is one keyring per repository on the system. Once scanned the repo by yum repolist, you could find the gpg folder like this: /var/lib/yum/repos/x86_64/7/<your_repo_name>/gpgdir. You could use gpg command ADD/LIST/DELETE keys as below:

sudo gpg --homedir /var/lib/yum/repos/x86_64/7/<your_repo_name>/gpgdir --delete-key <keyid>

YUM clean up

yum clean all will not remove everything. In order to do a real clean, you could try this:

yum clean all
rm -fr /var/lib/yum/repos/x86_64/7/<your_repo_name>
rm -fr /var/cache/yum/x86_64/7/<your_repo_name>

Yum commands references

yum install <rpm name>
yum install <path of the rpm>
yum clean all
yum clean metadata
yum-config-manager
yum-config-manager <rpm name>

Q/A

Can't remove keys from RPM due to duplicate entries

You might hit problem that there are duplicate entries in rpm -qa gpg-pub* with same fingerprints.

rpm -e gpg-pubkey-xxxx can't remove any.

You should use rpm -e --all-matches gpg-pubkey-xxxx

All right!
That's all for my sharing today!
Hope you find it useful!
Bye!

A Tutorial of Angular, Karma and Jasmine

GeekCoding101 — Fri, 08 Apr 2022 00:00:00 GMT

Hey!

In my career, I haven't spent much time on front-end programming. However, I had it now!
It's a really exciting journey learning Angular/Karma/Jasmine and I feel like I will probably spent more time on it to gain more depth insights!

Today's article is my learning journey on this, hope you will find it as a great tutorial ^^

Introductions

Angular Testing Utilities

Angular is a TypeScript-based free and open-source web application framework led by the Angular Team at Google and by a community of individuals and corporations. Angular is a complete rewrite from the same team that built AngularJS.

Angular testing utilities provide you a library to create a test environment for your application.

Classes such as TestBed and ComponentFixtures and helper functions such as async and fakeAsync are part of the @angular/core/testing package.

Getting acquainted with these utilities is necessary if you want to write tests that reveal how your components interact with their own template, services, and other components.

Ref Links

Angular Testing Guide

Karma

Karma is a tool that lets you test your application on multiple browsers.
Karma has plugins for browsers like Chrome, Firefox, Safari, and many others.
But I prefer using a headless browser for testing.
A headless browser lacks a GUI, and that way, you can keep the test results inside your terminal.

Ref Links

Jasmine

Jasmine is a popular behavior-driven testing framework for JavaScript. With Jasmine, you can write tests that are more expressive and straightforward.

Here is an example to get started:

it('should have a defined component', () => {
        expect(component).toBeDefined();
});

Ref Links

Steps

Environment

❯ nvm ls
       v12.13.1
->     v16.14.0
        v17.6.0
default -> 16.14 (-> v16.14.0)
iojs -> N/A (default)
unstable -> N/A (default)
node -> stable (-> v17.6.0) (default)
stable -> 17.6 (-> v17.6.0) (default)
lts/* -> lts/gallium (-> v16.14.0)
lts/argon -> v4.9.1 (-> N/A)
lts/boron -> v6.17.1 (-> N/A)
lts/carbon -> v8.17.0 (-> N/A)
lts/dubnium -> v10.24.1 (-> N/A)
lts/erbium -> v12.22.10 (-> N/A)
lts/fermium -> v14.19.0 (-> N/A)
lts/gallium -> v16.14.0
❯ npm -v
8.3.1
❯ node -v
v16.14.0
❯ ng version

     _                      _                 ____ _     ___
    / \   _ __   __ _ _   _| | __ _ _ __     / ___| |   |_ _|
   / △ \ | '_ \ / _` | | | | |/ _` | '__|   | |   | |    | |
  / ___ \| | | | (_| | |_| | | (_| | |      | |___| |___ | |
 /_/   \_\_| |_|\__, |\__,_|_|\__,_|_|       \____|_____|___|
                |___/

Angular CLI: 13.2.6
Node: 16.14.0
Package Manager: npm 8.3.1
OS: darwin x64

Angular: 13.2.7
... animations, common, compiler, compiler-cli, core, forms
... platform-browser, platform-browser-dynamic, router

Package                         Version
---------------------------------------------------------
@angular-devkit/architect       0.1302.6
@angular-devkit/build-angular   13.2.6
@angular-devkit/core            13.2.6
@angular-devkit/schematics      13.2.6
@angular/cli                    13.2.6
@schematics/angular             13.2.6
rxjs                            7.5.5
typescript                      4.5.5

New An Angular Project

The developers at Angular have made it easy for us to set up our test environment. To get started, we need to install Angular first.

I prefer using the Angular-CLI. It's an all-in-one solution that takes care of creating, generating, building and testing your Angular project.

ng new Pastebin

Answer yes to Would you like to add Angular routing?; Answer CSS to Which stylesheet format would you like to use? CSS.

Directory structure:

❯ ls -l
total 1568
-rw-r--r-- 1 geekcoding101  staff    1054 Apr  8 11:51 README.md
-rw-r--r-- 1 geekcoding101  staff    3051 Apr  8 11:51 angular.json
-rw-r--r-- 1 geekcoding101  staff    1425 Apr  8 11:51 karma.conf.js
drwxr-xr-x  600 geekcoding101  staff   19200 Apr  8 11:53 node_modules
-rw-r--r-- 1 geekcoding101  staff  773285 Apr  8 11:53 package-lock.json
-rw-r--r-- 1 geekcoding101  staff    1071 Apr  8 11:51 package.json
drwxr-xr-x   11 geekcoding101  staff     352 Apr  8 11:51 src
-rw-r--r-- 1 geekcoding101  staff     287 Apr  8 11:51 tsconfig.app.json
-rw-r--r-- 1 geekcoding101  staff     863 Apr  8 11:51 tsconfig.json
-rw-r--r-- 1 geekcoding101  staff     333 Apr  8 11:51 tsconfig.spec.json
❯ tree src
src
├── app
│   ├── app-routing.module.ts
│   ├── app.component.css
│   ├── app.component.html
│   ├── app.component.spec.ts
│   ├── app.component.ts
│   └── app.module.ts
├── assets
├── environments
│   ├── environment.prod.ts
│   └── environment.ts
├── favicon.ico
├── index.html
├── main.ts
├── polyfills.ts
├── styles.css
└── test.ts

3 directories, 14 files

Launch Angular project:

Run karma:

You can define a headless browser in your karma.conf.js as below:

...
browsers: ['Chrome','ChromeNoSandboxHeadless'],

customLaunchers: {
 ChromeNoSandboxHeadless: {
    base: 'Chrome',
    flags: [
      '--no-sandbox',
      // See https://chromium.googlesource.com/chromium/src/+/lkgr/headless/README.md
      '--headless',
      '--disable-gpu',
      // Without a remote debugging port, Google Chrome exits immediately.
      ' --remote-debugging-port=9222',
    ],
  },
},
...

You can refer to Cheatsheet about how to run unit test and specify which browser to run your test.

Add Class

ng generate class Pastebin

Pastebin.ts:

export class Pastebin {
 
    id: number;
    title: string;
    language: string;
    paste: string;
 
    constructor(values: Object = {}) {
        Object.assign(this, values);
  }
 
}
 
export const Languages = ["Ruby","Java", "JavaScript", "C", "Cpp"];

pastebin.spec.ts:

import { Pastebin } from './pastebin';

describe('Pastebin', () => {
  it('should create an instance of Pastebin', () => {
    expect(new Pastebin()).toBeTruthy();
  });
  it('should accept values', () => {
    let pastebin = new Pastebin();
    pastebin = {
      id: 111,
      title: "Hello world",
      language: "Ruby",
      paste: 'print "Hello"',
    }
    expect(pastebin.id).toEqual(111);
    expect(pastebin.language).toEqual("Ruby");
    expect(pastebin.paste).toEqual('print "Hello"');
  });
});

Setting Up Angular-in-Memory-Web-API

We don't have a server API for the application we are building. Therefore, we are going to simulate the server communication using a module known as InMemoryWebApiModule.

npm install angular-in-memory-web-api --save

Add Services

ng generate service pastebin
ng generate service in-memory-data

PastebinService will host the logic for sending HTTP requests to the server.

pastebin.service.ts

import { Injectable } from '@angular/core';
import { Pastebin } from './pastebin';
import { HttpClient, HttpHeaders } from '@angular/common/http';
// import { lastValueFrom } from 'rxjs';
import 'rxjs/add/operator/toPromise';

@Injectable()
export class PastebinService {
  // The project uses InMemoryWebApi to handle the Server API. 
  // Here "api/pastebin" simulates a Server API url 
  private pastebinUrl = "api/pastebin";
  private headers = new Headers({ 'Content-Type': "application/json" });
  constructor(private http: HttpClient) { }

  // getPastebin() performs http.get() and returns a promise
  public getPastebin(): Promise<any> {
    return this.http.get(this.pastebinUrl)
      .toPromise()
      .then(response => response.json().data)
      .catch(this.handleError);
  }

  private handleError(error: any): Promise<any> {
    console.error('An error occurred', error);
    return Promise.reject(error.message || error);
  }
}

in-memory-data.service.ts will implement InMemoryDbService:

import { InMemoryDbService } from 'angular-in-memory-web-api';
import { Pastebin } from './pastebin';

export class InMemoryDataService implements InMemoryDbService {
  createDb() {
    const pastebin:Pastebin[] = [
      { id: 0,  title: "Hello world Ruby", language: "Ruby", paste: 'puts "Hello World"' },
      {id: 1, title: "Hello world C", language: "C", paste: 'printf("Hello world");'},
      {id: 2, title: "Hello world CPP", language: "C++", paste: 'cout<<"Hello world";'},
      {id: 3, title: "Hello world Javascript", language: "JavaScript", paste: 'console.log("Hello world")'}
       
    ];
    return {pastebin};
  }
}

Update `app.module.ts`

import { NgModule } from '@angular/core';
import { BrowserModule } from '@angular/platform-browser';
import { HttpClientModule }    from '@angular/common/http';

import { AppRoutingModule } from './app-routing.module';
import { AppComponent } from './app.component';

//In memory Web api to simulate an http server
import { InMemoryWebApiModule } from 'angular-in-memory-web-api';
import { InMemoryDataService }  from './in-memory-data.service';

import { PastebinService } from "./pastebin.service";

@NgModule({
  declarations: [
    AppComponent
  ],
  imports: [
    BrowserModule,
    HttpClientModule,
    InMemoryWebApiModule.forRoot(InMemoryDataService),
    AppRoutingModule
  ],
  providers: [PastebinService],
  bootstrap: [AppComponent]
})
export class AppModule { }

Create

Cheatsheet

Operations	Command	Comments
New an Angular project	`ng new <project_name>`
Generate a new class	`ng generate class <component name>`
Generate a new service	`ng generate service <service component name>`
Launch Angular project	`ng serve` or `npm start`
Launch unit test	`ng test` or `npm test`
Launch unit test in specific browser	`npm test -- --browsers ChromeNoSandboxHeadless` or `ng test --browsers ChromeNoSandboxHeadless`	Prerequistes: You need have `ChromeNoSandboxHeadless` defined in your `karma.conf.js`
Create component without specs	`ng g component --skip-tests=true <component name>`	You can refer to Stackoverflow for more solutions.
Run specific unit test	`ng t -- --include "src/**/your_file_name.component.spec.ts"`
Run specific unit test with relative path	`ng test -- --include "relative_path_of_the_spec.ts "`	I've tried to use `./` start the relative path, it didn't work. So you'd better to use src/....

Q/A

References

Angular unit testing tutorial with examples

Password Authentication in Node.js: A Step-by-Step Guide

GeekCoding101 — Sun, 23 Jul 2023 00:00:00 GMT

Introduction

Password-based authentication remains one of the most common and widely used methods to verify user identity in various online systems. It involves users providing a unique combination of a username and password to gain access to their accounts. Despite its prevalence, password-based authentication comes with security challenges, as weak or compromised passwords can lead to unauthorized access and data breaches.

In this blog, I will guide you exploring password-based authentication from an easy to medium level, implementing password hashing in a Node.js and TypeScript environment. By the end of this hands-on tutorial, you will have a better understanding of how Password-based authentication works in your applications.

Step 1: Setting Up the Node.js and TypeScript Environment

To get started, ensure you have Node.js installed on your machine. Create a new project folder and initialize it with a package.json file.

Here is the steps to show what I’ve done on Mac:

brew install npm httpie
mkdir password-auth
cd password-auth
npm init -y
npm install -g ts-node
npm install body-parser bcryptjs express --save
npm install @types/bcryptjs @types/express @types/body-parser --save

Setting up the programming environment is no doubt crucial, but let’s be honest, it can be a bit daunting. In my tutorials, I will try to make sure not to leave you hanging. I love providing comprehensive explanations, even for the simple tasks or commands. Let’s make this setup process a breeze together! I genuinely hope you find it helpful and that it keeps you smoothly sailing through the tutorial 🤓

Let’s walk through above commands.

▹ 1. brew is the package manager for macOS or Linux. You can find the installation guide easily at their website. Here we used it to install npmand httpie . npm is the JavaScript package manager for Node.js. We will test the server by using http command provided by httpie . The later one is a command-line HTTP client.

▹ 2. Then we created our project folder password-auth .

▹ 3. npm init -y is to instantly initialize a project. We avoid answering a bunch of questions with -y.

▹ 4. When you want to use the commands provided by the package in your shell, on the command line or something, use npm install it globally with -g, so that its binaries end up in your PATH environment variable. In our case, we need ts-node from command line.

▹ 5. ts-node is a TypeScript execution engine for Node. js. It allows you to run your TypeScript code directly without precompiling your TypeScript code to JavaScript. Typically ts-node transforms TypeScript to JavaScript in-memory without writing it to disk. You can find more deatils at here.

▹ 6. express is a web framework for Node.js to build web application and APIs. Building a backend from-scratch for an application in Node.js can be tedious and time consuming. With express , you can save time and focus on other important tasks.

▹ 7. If not installing @types/bcryptjs, @types/express, @types/body-parser , you will hit below error when running your application:

npx stands for Node Package eXecute. It is simply an NPM package runner. NPX is installed automatically with NPM version 5.2. 0 and above.asdf

To enable TypeScript support on a Node.js backend API project, you need to set up TypeScript to compile your TypeScript code into JavaScript. Since TypeScript requires the types information for the package, we need to provide that. These @types packages offer type definitions for external modules that lack them. If you’re using an external package that already includes TypeScript definitions, you won’t need to install the corresponding @types package.

▹ 8. I don’t want to overcrowd this article with setup instructions, but this is the last point. You might have seen -save and -save-dev when using npm. When using -save , it will put the dependency into core dependency section of package.json, the dependencies section. The other will put the dependencies to devDependencies section. A core dependency is any package without which the application cannot perform its intended work. Example: express, body-parser etc.

Now, here is the output of npm list and npm list -g FYI:

Step 2: Creating the Server

Create an app.ts file and set up a basic Express server with routes for user registration and login.

import express from 'express';
import bodyParser from 'body-parser';
import bcrypt from 'bcryptjs';

const app = express();
const PORT = 3000;

app.use(bodyParser.json());

interface User {
  id: number;
  username: string;
  password: string;
}

let users: User[] = [];

app.post('/register', async (req, res) => {
  try {
    const { username, password } = req.body;
    const salt = await bcrypt.genSalt(10);
    const hashedPassword = await bcrypt.hash(password, salt);
    const newUser: User = {
      id: users.length + 1,
      username,
      password: hashedPassword,
    };
    users.push(newUser);
    res.status(201).json({ message: 'User registered successfully!' });
  } catch (error) {
    res.status(500).json({ error: 'Internal server error' });
  }
});

app.post('/login', async (req, res) => {
  try {
  const { username, password } = req.body;
  const user = users.find((user) => user.username === username);
  if (!user) {
    return res.status(404).json({ error: 'User not found' });
  }
  const isPasswordValid = await bcrypt.compare(password, user.password);
  if (!isPasswordValid) {
    return res.status(401).json({ error: 'Invalid password' });
  }
  res.json({ message: 'Login successful!' });
  } catch (error) {
    res.status(500).json({ error: 'Internal server error' });
  }
});

app.listen(PORT, () => {
  console.log(`Server is running on http://localhost:${PORT}`);
});

Step 3: Explain the code

In this example, we are using an in-memory array to store registered users. In a real-world scenario, you would typically use a database for this purpose.

The /register route handles user registration. When a user sends a POST request with their desired username and password, the server will hash the password using bcrypt and store the new user in the users array.

The /login route handles user login. When a user sends a POST request with their username and password, the server will find the corresponding user in the users array, and then use bcrypt to compare the hashed password with the provided password.

⚠ You might have noticed, why above bcrypt.compare can compare the password without salt?
The reason is the salt has been stored as part of the hashed password. When you hash a password using bcrypt, the resulting hash contains both the salt and the password’s cryptographic hash.
For example, given plain password testpassword, the hashed password would be $2a$10$34rHf5RmJx1TZmZ7FM5BYe0BPXuw1bs6rYzzqyM7IXgN/VGcQmVMu .
So in the above hashed password, there are three fields delimited by $ symbol.

I) First part $2a$ identifies the Bcrypt algorithm version used. BCrypt was designed by the OpenBSD people. It was designed to hash passwords for storage in the OpenBSD password file. Hashed passwords are stored with a prefix to identify the algorithm used. BCrypt got the prefix $2$ . So, besides $2a$ , there are $2x$ , $2y$ and $2b$ for BCrypt.

II) Second part $10$ 10 is the cost factor (nothing but the salt rounds used while creating the salt string) If we do 15 rounds, then the value will be $15$ .

III) Third part is the first 22 characters which are the salt string. In this case it is 34rHf5RmJx1TZmZ7FM5BYe . The remaining 31 characters are the hashed password.

In short, wikipedia gives this formula of bcrypt hashed password:
$2<a/b/x/y>$[cost]$[22 character salt][31 character hash]

The remaining string is the hashed password — Zqlv9ENS7zlIbkMvCSDIv7aup3WNH9W

So basically, the saltedHash = salt string + hashedPassword to protect from rainbow table attacks.

Step 4: Testing the Server

Now that the server is set up, I am going to test it using a tool called httpied. Make sure you have installed it in previous steps.

First we need to launch the server from command line (make sure you’re already in password-auth folder on command line):

npx ts-node ./app.ts

Open another terminal, testing register by http :

echo '{"username": "testuser", "password": "testpassword"}' | http POST http://localhost:3000/register

Then it have created a user testuser in the server with password testpassword .

Next, we will test login:

echo '{"username": "testuser", "password": "testpassword"}' | http POST http://localhost:3000/login

We can also try a test with wrong password or wrong user name:

✎ Make sure to send requests with the appropriate JSON data to the appropriate endpoints (/register and /login).

Summary

In this blog, we explored password-based authentication in Node.js and TypeScript using bcrypt for secure password hashing. By understanding the fundamentals of bcrypt and its automatic management of salts, we created a simple authentication mechanism that stores and compares hashed passwords securely.

We learned how to set up a Node.js and TypeScript environment on macOS, implemented an Express server with routes for user registration and login, and utilized bcrypt to securely hash and compare passwords.

The source code of this tutorial has been uploaded to GeekCoding101 github repo as well, feel free to take a look.

In the next blogs, we will continue building upon this foundation of secure authentication. We will explore more authentication methods, such as Basic Authentication, Two-Factor Authentication (2FA), and token-based authentication using JSON Web Tokens (JWT) and so on.

As we proceed, feel free to ask questions and provide feedback. I am here to support your journey towards building secure and reliable authentication solutions. So, stay tuned for the upcoming blogs 🎉🎉🎉

A Deep Dive into HTTP Basic Authentication

GeekCoding101 — Sun, 01 Oct 2023 00:00:00 GMT

Introduction

In this blog post, we will dive into HTTP Basic Authentication, a method rooted in the principles outlined in RFC 7617.

It’s worth noting that, the RFC specification defines the use of the “Authorization” header in HTTP requests to transmit the credentials. The credentials are typically sent as a Base64-encoded string of the form username:password. It also describes how servers should respond with appropriate status codes (e.g., 401 Unauthorized) when authentication fails.

Step 1: Setting Up the Node.js and TypeScript Environment

Please refer to the steps explained in our previous blog post Password Authentication In Node.Js: A Step-By-Step Guide at Step 1: Setting Up the Node.js and TypeScript Environment.

Step 2: Creating the Server

usersData.ts

In this file, we define a simulated database of users with their hashed passwords using bcrypt. Each user has a username and a password field.

This file acts as our database for the sake of this example.

The usage of bcrypt also has been explained in Password Authentication In Node.Js: A Step-By-Step Guide already.

interface User {
    username: string;
    password: string;
}
  
const users: User[] = [];
  
export default users;

`basicAuthMiddleware.ts`

This file contains the basic authentication middleware. The middleware is responsible for authenticating users based on the credentials provided in the Authorization header. It uses bcrypt to compare the provided password with the hashed password stored in the usersData.ts file.

import { Request, Response, NextFunction } from 'express';
import { Buffer } from 'buffer';
import bcrypt from 'bcryptjs';

interface User {
    username: string;
    password: string;
}

const basicAuthMiddleware = (users: User[]) => async (req: Request, res: Response, next: NextFunction) => {
    try {
        const authHeader = req.headers.authorization;
        if (!authHeader) {
            // If no authorization header is provided, send a 401 response with the WWW-Authenticate header
            // so that browser will pop up username/password dialog
            res.setHeader('WWW-Authenticate', 'Basic');
            return res.status(401).json({ error: 'Authorization header missing' });
          }

        const credentials = Buffer.from(authHeader.split(' ')[1], 'base64').toString('utf-8');
        const [username, password] = credentials.split(':');

        const user = users.find((user) => user.username === username);
        if (!user) {
            return res.status(401).json({ error: 'Invalid username' });
        }

        const isPasswordValid = await bcrypt.compare(password, user.password);
        if (!isPasswordValid) {
            return res.status(401).json({ error: 'Invalid password' });
        }

        next();
    } catch (error) {
        res.status(500).json({ error: 'Internal server error' });
    }
};

export default basicAuthMiddleware;

In the interest of security, a production-ready authentication system should not provide explicit feedback on whether the username or password is invalid. However, the code examples provided in this article aim to illustrate the principles of Basic Authentication based on RFC 7617 and are intended for educational purposes. They demonstrate the basic mechanics of authentication but may not fully address all security concerns.

`app.ts`

The app.ts file will set up the Express server, handles user registration, and protects a route using the basic authentication middleware.

By implementing authentication in middleware, when the middleware detects invalid credentials, it directly sends the appropriate error response, and the route handler will not be executed.

app.ts imports the users' data from usersData.ts and creates the middleware by passing the users' data as an argument to basicAuthMiddleware.

The /register will take {"username": "your_name", "password": "your_password"} as input.

The /protected endpoint is to verify account credentials.

import express from 'express';
import bodyParser from 'body-parser';
import bcrypt from 'bcryptjs';
import basicAuthMiddleware from './basicAuthMiddleware'; // Import the basicAuthMiddleware
import users from './usersData'; // Import the users data

const app = express();
const PORT = 3001;

app.use(bodyParser.json());

// User registration route
app.post('/register', async (req, res) => {
    try {
        const { username, password } = req.body;

        // Check if the user already exists
        if (users.some((user) => user.username === username)) {
            return res.status(400).json({ error: 'Username already exists' });
        }

        // Hash the password using bcrypt
        const salt = await bcrypt.genSalt(10);
        const hashedPassword = await bcrypt.hash(password, salt);

        // Save the user in the database (in this example, we're using an in-memory array)
        const newUser = { username, password: hashedPassword };
        users.push(newUser);

        res.status(201).json({ message: 'User registered successfully!' });
    } catch (error) {
        res.status(500).json({ error: 'Internal server error' });
    }
});

// Create the basicAuthMiddleware with the users array as an argument
const authMiddleware = basicAuthMiddleware(users);

// Use the authMiddleware to protect a route
app.get('/protected', authMiddleware, (req, res) => {
    res.json({ message: 'You have successfully accessed the protected route!' });
});

app.listen(PORT, () => {
    console.log(`Server is running on http://localhost:${PORT}`);
});

3. Testing the Server

Launch the server:

npx ts-node ./app.ts

Open another terminal and run below command:

echo '{"username": "testuser01", "password": "testpassword01"}' | http POST http://localhost:3001/register

It created a user testuser01 in the server with password testpassword01.

Let’s try to access the protected URI /protected :

Goto online Base64 encode/decode website (link here), we can see the decoding results:

If trying to access from browser, the browser will pop up the username/password verification dialog automatically as below:

Pros and Cons

Pros:

Simplicity: HTTP Basic Authentication is easy to implement and understand. It requires minimal additional overhead for client and server implementations.
Standardization: It is a standardized authentication method supported by most web browsers and server frameworks.

Cons:

Security: The credentials are Base64-encoded but not encrypted. This means they can be intercepted if transmitted over an insecure network. It’s crucial to use HTTPS to mitigate this issue.
No Built-in Password Hashing: Basic Authentication does not provide built-in mechanisms for securely storing or hashing passwords. Implementing password hashing and salting is the responsibility of the application developer. Like in our article, we have to implement password hashing.
Limited Features: It lacks advanced features like multi-factor authentication (MFA) or token-based authentication, which are often needed for more robust security.
No Session Management: Basic Authentication does not manage user sessions. If session management is required, it needs to be implemented separately.
User Experience: While browsers handle the credential prompt, the user experience can be intrusive, especially for web applications.

Summary

HTTP Basic Authentication is a straightforward method for securing web resources. It serves well for simple use cases but may not be suitable for applications requiring more advanced security measures.

The source code of this tutorial has been uploaded to GeekCoding101 github repo as well, feel free to take a look.

In the next blog, we will explore more advanced authentication methods, including token-based authentication using JSON Web Tokens (JWT).

OAuth 2.0 Grant Types

GeekCoding101 — Thu, 30 Nov 2023 00:00:00 GMT

List of Grant Types

Below is a table summarizing the different grant types in OAuth 2.0 along with brief descriptions and recommendations regarding their use:

Grant Type	Description	Recommendation
Authorization Code	The most commonly used flow in OAuth 2.0. It involves the exchange of an authorization code for an access token. Suitable for server-side web applications and confidential clients.	Recommended for web applications and confidential clients.
Implicit	Designed for user-agent-based clients (e.g., browser-based JavaScript applications). Access token is returned directly to the client without an authorization code exchange.	Deprecated due to security concerns.
Resource Owner Password Credentials	Allows the client to exchange the user's username and password for an access token directly. Generally discouraged due to security implications and lack of federation support.	Not recommended unless unavoidable legacy scenarios.
Client Credentials	Enables clients to directly exchange client credentials (client ID and client secret) for an access token. Typically used for machine-to-machine communication.	Recommended for machine-to-machine communication.
Refresh Token	Allows clients to request a new access token without requiring the user to re-authenticate. It's not a grant type but rather a mechanism for obtaining new access tokens.	Recommended for long-lived sessions and offline access.

It's important to note that while some grant types may be deprecated or discouraged due to security concerns or lack of use cases, their applicability can vary based on specific requirements and use cases. However, it's generally recommended to adhere to best practices and use the authorization code flow whenever possible for enhanced security and flexibility.

Is PKCE A Grant Type?

No, PKCE (Proof Key for Code Exchange) is not a grant type in OAuth 2.0.

Instead, PKCE is an extension to the OAuth 2.0 authorization code flow, designed to enhance security, particularly in scenarios where the client secret cannot be reliably stored, such as mobile or native applications.

In the OAuth 2.0 authorization code flow with PKCE, the core grant type remains the "Authorization Code" grant type.

PKCE introduces additional security measures during the authorization code exchange process by utilizing a dynamically generated secret (the code verifier) and a hash-based challenge (the code challenge). This mechanism helps mitigate certain security risks, such as authorization code interception attacks.

OAuth 2.0 Authorization Code Flow

GeekCoding101 — Sun, 03 Dec 2023 00:00:00 GMT

Brief Description

The OAuth 2.0 authorization code flow is a secure and widely adopted method for obtaining access tokens to access user resources on behalf of the user.

Steps

Here's a summary of the steps in the authorization code flow:

Client Initiation: The client application initiates the authorization process by redirecting the user to the authorization server's authorization endpoint.
User Authentication and Consent: The user is prompted to authenticate with the authorization server and grant permission to the client application to access their resources.
Authorization Code Generation: Upon successful authentication and consent, the authorization server generates an authorization code and redirects the user back to the client application along with the authorization code.
Access Token Exchange: The client application exchanges the authorization code for an access token by making a request to the authorization server's token endpoint.
Access Token Usage: The client application uses the access token to access the user's protected resources, such as APIs or data endpoints.

To clarify, in the authorization code flow, the authorization endpoint issues an authorization code to the client application upon user consent, not an access token directly.

Why Authorization Code Flow Not Issue Access Token Directly?

The OAuth 2.0 authorization code flow is designed to enhance security and minimize certain risks associated with transmitting sensitive information, such as access tokens, through the user's browser or mobile device.

Here are some reasons why the authorization endpoint issues an authorization code instead of an access token directly:

Reduced Exposure of Access Tokens: Access tokens are sensitive pieces of information that grant access to the user's protected resources. By issuing an authorization code instead of an access token directly, the authorization server reduces the exposure of access tokens to potentially compromised user agents (such as web browsers or mobile apps). Since the authorization code is short-lived and can only be exchanged for tokens by the client application with its credentials, the risk associated with the interception of the authorization code is lower than if an access token were transmitted directly.
Separation of Concerns: Separating the authorization process into two steps—obtaining the authorization code and exchanging it for an access token—helps clarify the roles and responsibilities of different components in the OAuth 2.0 flow. The authorization endpoint is responsible for handling user consent and authentication, while the token endpoint is responsible for issuing access tokens based on valid authorization codes and client credentials. This separation enhances the security and maintainability of the OAuth 2.0 protocol.
Support for Additional Security Measures: The authorization code flow allows for the implementation of additional security measures, such as client authentication at the token endpoint using client credentials (client ID and client secret), which helps verify the identity of the client application before issuing access tokens. This adds an extra layer of security to the token issuance process and helps prevent unauthorized access to user resources.

Overall, by issuing an authorization code instead of an access token directly, the OAuth 2.0 authorization code flow aims to improve security, reduce exposure to sensitive information, and provide a clear separation of concerns in the authentication and authorization process.

Benefits of Authorization Code Flow

Enhanced Security: By separating the authorization and token exchange steps, it reduces the risk of exposing sensitive information, such as access tokens, during the authorization process.
User Consent: Users have control over which resources the client application can access, ensuring privacy and security.
Scalability: The authorization code flow is well-suited for a wide range of client types, including web applications, mobile apps, and desktop applications.
Refresh Token Support: It supports the use of refresh tokens, allowing clients to obtain new access tokens without requiring user interaction.

Unlocking Web Security: Master JWT Authentication

GeekCoding101 — Mon, 15 Jan 2024 00:00:00 GMT

Introduction

JSON Web Tokens (JWTs) play a crucial role in web application security. In this blog, we walkthrough the concept of JWT, focusing on the different types of claims, the structure of a JWT, and the algorithms used in signatures, and finally I will implement JWT authentication from scratch in Node.js and Express.js.

This is my 4th article in Auth101! It’s 2024 now! Looking forward to a wonderful year filled with cool tech updates, new tricks in cyber security, and a bunch of fun coding adventures. I can’t wait to dive into more authentication topics with you all 😃

Understanding JWT

JSON Web Tokens (JWTs) originated as a compact and self-contained way for securely transmitting information between parties as a JSON object. Defined in RFC 7519, JWTs have become a widely adopted standard in the field of web security for their simplicity and versatility.

A JWT is a string comprising three parts separated by dots (.): Base64Url encoded header, Base64Url encoded payload, and signature.

It typically looks like xxxxx.yyyyy.zzzzz.

Let’s deep dive into the three parts: Header, Payload, and Signature.

Header

The header typically consists of the token type and the signing algorithm, such as HMAC SHA256 or RSA.

For example:{ "alg": "HS256", "typ": "JWT" }

Payload

The payload contains claims, which are statements about an entity and additional metadata. Claims are categorized into registered, public, and private claims. The later two are for custom claims. Public claims are collision-resistant while private claims are subject to possible collisions. In a JWT, a claim appears as a name/value pair where the name is always a string and the value can be any JSON value. For example, the following JSON object contains three claims (sub, name, admin):

{
  "sub": "1234567890",
  "name": "Tom Green",
  "admin": false
}

> 1. Registered Claims

These are predefined claim names with specific meanings recommended for interoperability. For example:

iss (Issuer): Identifies the principal that issued the JWT.
sub (Subject): Identifies the principal that is the subject of the JWT.
aud (Audience): Identifies the recipients that the JWT is intended for.
exp (Expiration Time): Identifies the expiration time on or after which the JWT must not be accepted for processing.
nbf (Not Before): Identifies the time before which the JWT must not be accepted for processing.
iat (Issued At): Identifies the time at which the JWT was issued.
jti (JWT ID): Unique identifier; can be used to prevent the JWT from being replayed (allows a token to be used only once).

You can see a full list of registered claims at the IANA JSON Web Token Claims Registry

> 2. Public Claims

These can be defined at will and should be registered in the IANA JSON Web Token Registry or defined as a URI.

> 3. Private Claims

These are custom claims created to share information between parties that agree on using them.

When creating custom claims for JWTs that are specific to your application, it’s often beneficial to use namespacing. This ensures that your claims are unique and do not conflict with other standard or custom claims. Here’s an example of how to implement namespaced custom claims:

{
  "https://yourdomain.com/claims/user_type": "admin",
  "https://yourdomain.com/claims/access_level": "5"
}

In this example, custom claims are prefixed with a URL (https://yourdomain.com/claims/) that is under your control. This URL acts as a namespace, reducing the likelihood of your claims conflicting with others. The claims user_type and access_level are specific to the application and are namespaced to ensure uniqueness.

Signature

The signature is created by taking the encoded header, payload, and a secret, then signing it with the algorithm specified in the header. The signature verifies that the sender of the JWT is who it says it is and ensures that the message wasn’t changed along the way.

Example

An example JWT for a user john.doe using HMAC SHA256 might look like this:

Header: { "alg": "HS256", "typ": "JWT" }
Payload: { "sub": "john.doe", "name": "John Doe", "admin": false, "iat": 1615070800 }
Signature: Cryptographic signature generated from the header, payload, and secret key. We will see the implementation later.

Implementing JWT Authentication

Step 1: Setting Up the Node.js and TypeScript Environment

Please refer to the steps explained in our previous blog post Password Authentication In Node.Js: A Step-By-Step Guide at Step 1: Setting Up the Node.js and TypeScript Environment.

Step 2: Creating the Server

usersData.ts

This is same as the file usersData.ts we talked in A Deep Dive Into HTTP Basic Authentication except we added a new field called refreshToken:

interface User {
  username: string;
  password: string;
  refreshToken?: string;
}

const users: User[] = [];

export default users;

jwt.ts

I decided to do a custom implementation for generating and verifying JWTs (JSON Web Tokens) without using external libraries like jsonwebtoken.

It mainly provided below functionalities:

Base64 URL Encoding Function (base64UrlEncode):

Converts a Buffer object to a Base64 URL-encoded string. This is necessary because standard Base64 encoding includes characters (+, /, and =) that are not URL-safe. The function replaces these characters to make the string URL-safe.

Signature Function (sign):

Takes the encoded header, payload, and secret key, then generates a signature using HMAC SHA256.
The resulting signature is then Base64 URL-encoded.

Generate Access Token Function (generateAccessToken):

Creates a JWT with a header specifying the algorithm (HS256) and token type (JWT).
The payload includes the username and an exp (expiration time), set to 15 minutes from the current time.
The header and payload are Base64 URL-encoded and concatenated with a period, and then signed to generate the JWT.

Generate Refresh Token Function (generateRefreshToken):

Similar to the access token, but the payload includes a longer expiration time (7 days) and an additional type field set to 'refresh'.
This token is used to obtain new access tokens without requiring the user to log in again.

Verify Token Function (verifyToken):

Splits the JWT into its components (header, payload, signature).
Regenerates the signature based on the header and payload from the token and compares it with the received signature.
If the signatures match, the function returns the decoded payload; otherwise, it throws an error indicating an invalid token.

secretKey:

The secretKey must be kept confidential and secure because it is essentially the "key" that locks and unlocks the JWTs. If an unauthorized party gains access to the secretKey, they could potentially generate their own valid tokens or tamper with existing tokens, leading to security breaches.

const secretKey = 'your_secret_key'; // Use a strong secret key

const base64UrlEncode = (str: Buffer): string => {
  return str.toString('base64')
    .replace(/\+/g, '-')
    .replace(/\//g, '_')
    .replace(/=/g, '');
};

const sign = (header: string, payload: string, secret: string): string => {
  const signature = crypto.createHmac('SHA256', secret)
    .update(`${header}.${payload}`)
    .digest('base64');
  return base64UrlEncode(Buffer.from(signature));
};

export const generateAccessToken = (username: string): string => {
  const header = { alg: 'HS256', typ: 'JWT' };
  const payload = { username, exp: Math.floor(Date.now() / 1000) + (15 * 60) }; // 15 minutes expiry
  const encodedHeader = base64UrlEncode(Buffer.from(JSON.stringify(header)));
  const encodedPayload = base64UrlEncode(Buffer.from(JSON.stringify(payload)));
  const signature = sign(encodedHeader, encodedPayload, secretKey);
  return `${encodedHeader}.${encodedPayload}.${signature}`;
};

export const generateRefreshToken = (username: string): string => {
  const header = { alg: 'HS256', typ: 'JWT' };
  const payload = { username, type: 'refresh', exp: Math.floor(Date.now() / 1000) + (7 * 24 * 60 * 60) }; // 7 days expiry
  const encodedHeader = base64UrlEncode(Buffer.from(JSON.stringify(header)));
  const encodedPayload = base64UrlEncode(Buffer.from(JSON.stringify(payload)));
  const signature = sign(encodedHeader, encodedPayload, secretKey);
  return `${encodedHeader}.${encodedPayload}.${signature}`;
};

export const verifyToken = (token: string): any => {
  const [encodedHeader, encodedPayload, signature] = token.split('.');
  const verifiedSignature = sign(encodedHeader, encodedPayload, secretKey);
  if (verifiedSignature !== signature) {
    throw new Error('Invalid token');
  }
  return JSON.parse(Buffer.from(encodedPayload, 'base64').toString());
};

jwtAuthMiddleware.ts

This middleware is designed to handle JWT authentication for incoming HTTP requests.

export const jwtAuthMiddleware = (req: Request, res: Response, next: NextFunction) => {
    try {
        const authHeader = req.headers.authorization;
        if (!authHeader) {
            return res.status(401).json({ error: 'Authorization header missing' });
        }

        const token = authHeader.split(' ')[1];
        const decodedUser = verifyToken(token);

        // Create a closure to pass the decoded user
        (req as any).getUser = () => decodedUser;

        next();
    } catch (error) {
        res.status(401).json({ error: 'Invalid token' });
    }
};

app.ts

Now let’s assemble all in app.ts.

It includes routes for user registration, deletion, listing, login, token refresh, and accessing a protected route.

const app = express();
const PORT = 3001;

app.use(bodyParser.json());

// User registration route
app.post('/register', async (req: Request, res: Response) => {
    try {
        const { username, password } = req.body;

        // Check if the user already exists
        if (users.some((user) => user.username === username)) {
            return res.status(400).json({ error: 'Username already exists' });
        }

        // Hash the password using bcrypt
        const salt = await bcrypt.genSalt(10);
        const hashedPassword = await bcrypt.hash(password, salt);

        // Save the user
        const newUser = { username, password: hashedPassword };
        users.push(newUser);

        res.status(201).json({ message: 'User registered successfully!' });
    } catch (error) {
        res.status(500).json({ error: 'Internal server error' });
    }
});

app.delete('/user/:username', (req: Request, res: Response) => {
  const { username } = req.params;
  const userIndex = users.findIndex(user => user.username === username);

  if (userIndex === -1) {
      return res.status(404).json({ error: 'User not found' });
  }

  users.splice(userIndex, 1);
  res.status(200).json({ message: `User ${username} deleted successfully` });
});

app.get('/users', (req: Request, res: Response) => {
  const usersWithoutPasswords = users.map(({ password, ...userWithoutPassword }) => userWithoutPassword);
  res.json(usersWithoutPasswords);
});

// User login route
app.post('/login', async (req: Request, res: Response) => {
    try {
        const { username, password } = req.body;
        const user = users.find((user) => user.username === username);

        if (!user) {
            return res.status(401).json({ error: 'Invalid username' });
        }

        const isPasswordValid = await bcrypt.compare(password, user.password);
        if (!isPasswordValid) {
            return res.status(401).json({ error: 'Invalid password' });
        }

        const accessToken = generateAccessToken(username);
        const refreshToken = generateRefreshToken(username);
        res.json({ accessToken, refreshToken });
    } catch (error) {
        res.status(500).json({ error: 'Internal server error' });
    }
});

// Refresh token route
app.post('/refresh', (req: Request, res: Response) => {
    const { refreshToken } = req.body;
    try {
        const decoded = verifyToken(refreshToken);
        if (decoded.type !== 'refresh') {
            return res.status(401).json({ error: 'Invalid refresh token' });
        }
        const newAccessToken = generateAccessToken(decoded.username);
        res.json({ accessToken: newAccessToken });
    } catch (error) {
        res.status(401).json({ error: 'Invalid refresh token' });
    }
});

// Protected route
app.get('/protected', jwtAuthMiddleware, (req: Request, res: Response) => {
  const user = (req as any).getUser();
  res.json({ message: 'Protected route accessed', user });
});

app.listen(PORT, () => {
    console.log(`Server is running on http://localhost:${PORT}`);
});

Step 3: Testing the Server

Launch the server:

npx ts-node ./app.ts

Open another terminal and run below command:

echo '{"username": "testuser01", "password": "testpassword01"}' | http POST http://localhost:3001/register

It created a user testuser01 in the server with password testpassword01.

Now we need to login with this user to get accessToken and refreshToken :

# echo '{"username": "testuser01", "password": "testpassword01"}' | http POST http://localhost:3001/login

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 365
Content-Type: application/json; charset=utf-8
Date: Mon, 15 Jan 2024 00:53:22 GMT
ETag: W/"16d-IsPcbeAuThyiqhEWd7jZTpqMHlQ"
Keep-Alive: timeout=5
X-Powered-By: Express

{
  "accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRlc3R1c2VyMDEiLCJleHAiOjE3MDUyODA5MDJ9.Q0MrUXA4RGI1Smc4SUFmYjV6UFlFRmhWL2NsV20rTHppSlpHemZjSWdsZz0",
  "refreshToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRlc3R1c2VyMDEiLCJ0eXBlIjoicmVmcmVzaCIsImV4cCI6MTcwNTg4NDgwMn0.TWozZlNvVnhBODJuUjFLc2JVcDRZT2hxZmFSNU9nR01MK3gvNTRnSlNWRT0"
}

Let’s try to access the protected URI /protected with the accessToken:

❯ http GET http://localhost:3001/protected "Authorization:Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRlc3R1c2VyMDEiLCJleHAiOjE3MDUyODA1NjN9.TXlFR0NUMFZKOXJRVTgvYzFaaGZ5R0JMSTAwdVF3YkNRN1dUa1FQbG9NVT0"
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 88
Content-Type: application/json; charset=utf-8
Date: Mon, 15 Jan 2024 00:50:57 GMT
ETag: W/"58-CkXXzga6an0r8ICmEq1Q9VAps9I"
Keep-Alive: timeout=5
X-Powered-By: Express

{
  "message": "Protected route accessed",
  "user": {
    "exp": 1705280563,
    "username": "testuser01"
  }
}

So far so good!

Now let’s use the refreshToken to request a new accessToken :

❯ http POST http://localhost:3001/refresh 'Content-Type:application/json' <<< '{"refreshToken":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRlc3R1c2VyMDEiLCJ0eXBlIjoicmVmcmVzaCIsImV4cCI6MTcwNTg4NDQ2M30.MDNHMzI0MEd5SXJCQXRZVCtxVEdCWVVOeDd5Z2F4cXlyaU9xYzB0dTFBWT0"}'

HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 171
Content-Type: application/json; charset=utf-8
Date: Mon, 15 Jan 2024 00:53:09 GMT
ETag: W/"ab-TBXJ1UxTtbjzvvrwtTfhDXlSc1Q"
Keep-Alive: timeout=5
X-Powered-By: Express

{
  "accessToken": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VybmFtZSI6InRlc3R1c2VyMDEiLCJleHAiOjE3MDUyODA4ODl9.WkQzeWJTMU80R0dycFhNc1ZLWTVTZjBjbGZwTEpwMi96RFI1Mnh6ZkNIWT0"
}

By the way, there are online JWT encoder/decoder you can use, for example https://www.jstoolset.com/jwt. Just paste the JWT string and it can help you to decode header and payload .

Summary

And there we have it — our exploration of JWTs is at a pause. I hope this journey has shed some light on the inner workings of JWT authentication and its role in securing web applications. But as they say, every end is a new beginning.

The source code of this tutorial has been uploaded to GeekCoding101 github repo as well, feel free to star my repo and explore.

Now, let’s ponder a common scenario: You log into a website and stay there, browsing around. Ever wondered how the server keeps recognizing you as you navigate from page to page? How does it ensure you still have access to all those protected areas without asking you to log in again and again? This isn’t just a matter of convenience; it’s a crucial aspect of user experience and security.

Is it something to do with sessions or cookies, perhaps? Well, that’s precisely the topic we’ll delve into in our next blog. We’ll unravel the mysteries of session management, cookies, and how they work together to maintain your authenticated state in a web application. It’s an essential piece of the puzzle for understanding comprehensive web security.

So stay tuned for our next discussion where we decode the secrets behind seamless and secure browsing experiences. Until then, happy coding, and keep those applications secure!

Mastering Openssl Command and NSS Database Management

GeekCoding101 — Fri, 05 Apr 2024 00:00:00 GMT

Greetings to all you geeks out there!

It's a pleasure to have you here at geekcoding101.com!

With almost 20 years immersed in the vibrant world of Linux and security domain, I've encountered a myriad of tools and technologies that have shaped my journey. Today, I'm excited to introduce you OpenSSL and Certutil—two indispensable utilities that play pivotal roles in managing digital certificates and encryption. Whether you're safeguarding your web servers or securing communications, understanding these tools is crucial. I've distilled my insights and tips into this post, aiming to arm you with the knowledge to leverage these powerful utilities effectively.

Enjoy!

Openssl

OpenSSL is an open-source software library that provides a robust, commercial-grade, and full-featured toolkit for SSL and TLS protocols, as well as a general-purpose cryptography library. It is widely used by internet servers, including the majority that implement secure web (HTTPS) connections, as well as in countless other security-sensitive applications. Here are some key aspects of OpenSSL:

Core Features

Encryption: Offers cryptographic algorithms for encrypting data, ensuring that information can be transmitted or stored securely. This includes algorithms like AES, DES, RC4, and more.
SSL/TLS Protocols: Facilitates secure communications over computer networks against eavesdropping, tampering, and message forgery. OpenSSL includes implementations of the SSL and TLS protocols to secure network communications.
Cryptographic Hash Functions: Supports hash functions like SHA-1, SHA-256, and MD5, used for creating message digests that ensure the integrity of data.
Digital Certificates: Manages X.509 certificates which are essential for establishing SSL/TLS connections. OpenSSL can generate certificate signing requests (CSRs), create certificates, and manage certificate chains.
Public Key Infrastructure (PKI): Supports PKI essentials for managing public and private keys, including generating key pairs, signing certificates, and more.

Query Information

Query on Private Key:

openssl rsa -in privatekey.pem -check

Query All Information:

openssl x509 -in certificate.pem -text -noout

Query Subject:

openssl x509 -in certificate.pem -subject -noout

Query Validity:

openssl x509 -in certificate.pem -dates -noout

Query Purpose:

openssl x509 -in certificate.pem -purpose -noout

Example:

Certificate purposes:
SSL client : No
SSL client CA : Yes
SSL server : No
SSL server CA : Yes
Netscape SSL server : No
Netscape SSL server CA : Yes
S/MIME signing : No
S/MIME signing CA : Yes
S/MIME encryption : No
S/MIME encryption CA : Yes
CRL signing : No
CRL signing CA : Yes
Any Purpose : Yes
Any Purpose CA : Yes
OCSP helper : Yes
OCSP helper CA : Yes
Time Stamp signing : No
Time Stamp signing CA : Yes

Download Cert from Remote Server:

openssl s_client -ssl3 -showcerts -debug -connect ldap.XXXX.com:636 < /dev/null > /tmp/ldap.out 2>&1
sed -n '/BEGIN CERTIFICATE/,/END CERTIFICATE/p' /tmp/ldap.out  > /tmp/ldap.pem

PKCS#12 (PFX) File Management

Convert PFX to PEM:

openssl pkcs12 -in filename.pfx -out certificate.pem -nodes

Print Some Info About a PKCS#12 File:

openssl pkcs12 -info -in filename.pfx

Print Some Info About a PKCS#12 File in Legacy Mode:

openssl pkcs12 -info -in filename.pfx -legacy

Extract Only Client Certificates + Key:

openssl pkcs12 -in filename.pfx -clcerts -out clientcert.pem

Extract Only Client Cert:

openssl pkcs12 -in filename.pfx -clcerts -nokeys -out clientcert.pem

Extract Unencrypted Key File from PFX:

openssl pkcs12 -in filename.pfx -nocerts -nodes -out privatekey.pem

Extract CA Cert from PFX:

openssl pkcs12 -in filename.pfx -cacerts -nokeys -out cacert.pem

NSS Database Management

The NSS (Network Security Services) Database is a set of libraries designed to support cross-platform development of security-enabled client and server applications. Applications can use NSS for SSL/TLS, PKI (Public Key Infrastructure) certificate management, cryptographic operations, and other security standards. The NSS Database, specifically, is a critical component for managing certificates, keys, and other security assets.

Key Features of the NSS Database

Certificate and Key Storage: It stores and manages SSL/TLS certificates, private keys, and trust settings in a secure, encrypted database format. This storage is essential for applications needing to establish secure connections, authenticate themselves or their users, and ensure data integrity and confidentiality.
Cross-Platform Support: NSS provides a platform-independent way to manage security assets, making it suitable for a wide range of operating systems and environments.
Security: The database is designed with a strong focus on security, including support for various encryption algorithms and mechanisms to protect sensitive information.
PKI Support: It supports a comprehensive range of PKI standards, allowing applications to perform tasks such as certificate signing, issuance, and revocation checking.

Components of the NSS Database

CertDB: A database for storing certificates, including user, server, and CA (Certificate Authority) certificates.
KeyDB: A database for storing private keys associated with the certificates.
SecmodDB: A database for managing PKCS#11 module configurations. PKCS#11 modules are used to interface with cryptographic tokens like smart cards or hardware security modules (HSMs).

Management Tools

NSS comes with several command-line tools for managing the NSS Database, including:

certutil: For managing certificates and keys within the database.
pk12util: For importing and exporting certificates and keys in PKCS#12 format.
modutil: For managing PKCS#11 modules.

Usage

NSS Databases are often used in web browsers (like Mozilla Firefox), email clients, and other networked applications requiring secure communication. By managing cryptographic keys and certificates, the NSS Database plays a crucial role in enabling secure internet communications and data protection efforts across various applications.

Import Cert/Key (PEM) into NSS:

certutil -A -n "certificate name" -t "TCu,Cu,Tu" -i certificate.pem -d sql:/path/to/nssdb

-t trustargs
Specify the trust attributes to modify in an existing certificate or to apply to a certificate when creating it or adding it to a database. There are three
available trust categories for each certificate, expressed in the order SSL, email, object signing for each trust setting. In each category position, use none,
any, or all of the attribute codes:

· p - Valid peer

· P - Trusted peer (implies p)

· c - Valid CA

· C - Trusted CA (implies c)

· T - trusted CA for client authentication (ssl server only)

The attribute codes for the categories are separated by commas, and the entire set of attributes enclosed by quotation marks. For example:

-t "TC,C,T"

Use the -L option to see a list of the current certificates and trust attributes in a certificate database.

Note that the output of the -L option may include "u" flag, which means that there is a private key associated with the certificate. It is a dynamic flag and you cannot set it with certutil.

certutil

Import PFX into NSS DB:

pk12util -i filename.pfx -d sql:/path/to/nssdb

Export PEM from NSS DB:

certutil -L -n "certificate name" -d sql:/path/to/nssdb -a > certificate.pem

List Keys from NSS DB:

certutil -K -d sql:/path/to/nssdb

Remove Key from NSS DB:

certutil -D -n "certificate name" -d sql:/path/to/nssdb

Cool! I believe that's a lot for today's topic!
Let's wrap up and see you next time!

Crafting A Bash Script with Tmux

GeekCoding101 — Sun, 07 Apr 2024 00:00:00 GMT

The Background...

I have Django/Vue development environment running locally.

To streamline my Django development, I typically open six tmux windows 😎 :

Celery window - It also check and start necessary local services, like mailpit and redis, then fianlly start Celery.
Flower window - Start Flower
Django window - Start Django runserver
Django manager shell window - For Django manager operations
Heroku window - Checking Heroku status and commit and other Heroku operations
Vue window - Start npm run serve or build

I used one Tmux session to hold all above.

However, my laptop sometimes needs to reboot, after reboot, all of my windows are gone 😓

I have configured tmux-resurrect and tmux-continuum to try to handle this scenario, but they couldn't re-run those commands even they could restore the windows correctly.

Let me show you the screenshots.

The problem...

Typically, my development windows look like this:

As you see, the services are running within the respective windows.

If I save them with tmux-resurrect, after reboot, of course tmux-resurrect and tmux-continuum could restore them, but services and all environment variables are gone.

To simulate, let me kill all sessions in tmux, check the output:

Now start tmux again, here are the status I can see, tmux restored the previous saved windows:

Let's check the window now:

None of the services is running 🙉

The Complain...

As the supreme overlord of geekcoding101.com, I simply cannot let such imperfection slide.
Not on my watch.
Nope, not happening.
This ain't it, chief.

Okay, let's fix it!

The Fix...

....

Okay! I wrote a script.. oh no! Two scripts!

One is called start_tmux_dev_env.sh to create all windows, it will invoke prepare_dev_env.sh which export functions to initialize environment variables in specific windows.

A snippet of start_tmux_dev_env.sh:

#!/bin/bash

# Please note: don't use dot in session name.
SESSION_NAME="matrixlink_ai"

# Check if the tmux session already exists
tmux has-session -t $SESSION_NAME 2>/dev/null

if [ $? != 0 ]; then
  # Create a new detached tmux session named matrixlink.ai
  tmux new-session -d -s $SESSION_NAME

  # Set up the 'celery' window
  tmux rename-window -t $SESSION_NAME 'celery'
  echo "Starting celery window..."
  echo "Sleeping 5s to wait window celery finish initialization....."
  sleep 5
  echo "Checking psql in window celery..."
  tmux send-keys -t $SESSION_NAME 'psql -h localhost -p 5432 -d matrixlink_ai' C-m
  sleep 2
  tmux send-keys -t $SESSION_NAME '\q' C-m
  sleep 1

  # Can't use ENVIRONMENT variable for the script path be sourced.
  # Remember to put below command to background, otherwise it will wait here forever.
  tmux send-keys -t $SESSION_NAME '. ${YOUR_PATH_TO_PROJECT}/matrixlink.ai/utils/prepare_dev_env.sh && setup_celery_window' C-m &

  echo "Starting flower window..."
  tmux new-window -t $SESSION_NAME -n 'flower'
  echo "Sleeping 5s to wait window flower finish initialization....."
  sleep 5
  # $SESSION_NAME:flower is to specify the window
  # $SESSION_NAME.flower is to specify the pane
  tmux send-keys -t $SESSION_NAME:flower '. ${YOUR_PATH_TO_PROJECT}/matrixlink.ai/utils/prepare_dev_env.sh && setup_flower_window' C-m &

  echo "Starting nvm window..."
  ...

  echo "Starting Django window..."
  ...

  echo "Starting Django manager window..."
  ...

  echo "Starting Heroku window..."
  ...
fi

# Attach to the tmux session
tmux attach -t $SESSION_NAME

The prepare_dev_env.sh looks like:

#!/bin/sh

WORKDIR="${YOUR_PATH_TO_PROJECT}/github/matrixlink.ai/"
CONDA_ENV="matrixlinkai.django"

setup_celery_window() {
  conda activate ${CONDA_ENV}
  cd $WORKDIR
  brew services start mailpit
  brew services start redis
  export REDIS_URL=redis://localhost:6379/0
  export USE_DOCKER=no
  export CELERY_CONFIG_TASK_ALWAYS_EAGER=yes
  celery -A config.celery_app worker --loglevel=info
}

setup_flower_window() {
  conda activate ${CONDA_ENV}
  cd $WORKDIR
  export CELERY_BROKER_URL=$(echo "REDIS_URL")
  export REDIS_URL=redis://localhost:6379/0
  export CELERY_BROKER_URL=$REDIS_URL
  export CELERY_FLOWER_USER=debug
  export CELERY_FLOWER_PASSWORD=debug
  export USE_DOCKER=no
  export CELERY_CONFIG_TASK_ALWAYS_EAGER=yes
  celery -A config.celery_app -b ${CELERY_BROKER_URL} flower --basic_auth="${CELERY_FLOWER_USER}:${CELERY_FLOWER_PASSWORD}"
}

setup_django_window() {
  conda activate ${CONDA_ENV}
  cd $WORKDIR
  export CELERY_BROKER_URL=$(echo "REDIS_URL")
  export REDIS_URL=redis://localhost:6379/0
  export CELERY_BROKER_URL=$REDIS_URL
  export CELERY_FLOWER_USER=debug
  export CELERY_FLOWER_PASSWORD=debug
  export USE_DOCKER=no
  export EMAIL_HOST=localhost
  export CELERY_CONFIG_TASK_ALWAYS_EAGER=yes
  
  # Please export sensitive information manually, like OPENAI key

  # You need to manually replace ${DB_USERNAME} here if not yet set environment variable.
  export DATABASE_URL=postgres://${DB_USERNAME}@127.0.0.1:5432/matrixlink_ai
  python manage.py migrate
  echo "Sleeping 10s to wait npm to start in another window..."
  sleep 10
  python manage.py runserver 0.0.0.0:8000
}

setup_django_manager_window() {
  conda activate ${CONDA_ENV}
  cd $WORKDIR
  export USE_DOCKER=no
  # You need to manually replace ${DB_USERNAME} here if not yet set environment variable.
  export DATABASE_URL=postgres://${DB_USERNAME}@127.0.0.1:5432/matrixlink_ai
  export REDIS_URL=redis://localhost:6379/0
  export CELERY_CONFIG_TASK_ALWAYS_EAGER=yes
  python manage.py shell
}

setup_nvm_window() {
  conda activate ${CONDA_ENV}
  cd $WORKDIR/frontend
  npm run serve
}

The End...

Now, after reboot, I can just invoke script start_tmux_dev_env.sh and it will spin up all windows for me in seconds!

I'M Really Pround Of You GIFfrom Ron Swanson GIFs

I've recorded a video about how it looks like when running the script, please check out at my post Terminal Mastery: Crafting A Productivity Environment With ITerm, Tmux, And Beyond

Thanks for watching!

Vue: Secrets to Resolving Empty index.html in WebHistory

GeekCoding101 — Mon, 08 Apr 2024 00:00:00 GMT

Greetings

Hi there!

I was trying some new stuff about VUE recently.

I downloaded a free version of VUE Argon dashboard code and tried to compile it locally.

It's straghtforward:

nvm use lts/iron
npm install
npm run build

Then I got the dist folder:

Interesting...

Then I double clicked the index.html, expecting it will display the beautiful landing page, but it didn't happen...

This is strange... What went wrong?

I tried npm run serve, it works well, I can see the portal and navigate between pages without issues.

I must fix this! Should be quick!

Bingo!

The root cause is that the VUE project used router with createWebHistory instead of createWebHashHistory!

It resulted a differenve ways to handle static assets and routing.

Absolute Paths for Static Assets: By default, Vue CLI configures the build to use absolute paths for assets (JS, CSS, images, etc.). When I open the index.html file directly in a browser (using the file:// protocol), these paths may not resolve correctly, because they expect to be served from a web server's root.
Single Page Application (SPA) Routing: Vue applications, especially those built with Vue Router in history mode, rely on the web server to correctly handle URLs. Directly opening index.html doesn't allow Vue Router to intercept and handle the routing, leading to routes possibly not resolving as intended. npm run serve starts a development server that correctly handles SPA routing, serving index.html for all routes.

Using createWebHistory in Production environment is required as it provides several significant benefits:

Clean URLs: If having clean, professional-looking URLs is important for your application's user experience or branding, createWebHistory is the preferred choice. This is often the case for public-facing production websites.
SEO Considerations: For SEO purposes, clean URLs (without hashes) are generally better. However, modern SEO practices and improved search engine capabilities have mitigated these concerns significantly.
Ease of Deployment: createWebHashHistory is simpler to deploy because it doesn't require specific server configurations to handle SPA routing. If your hosting environment or knowledge of server configurations is limited, this might be a more straightforward option.
Refresh Behavior: With createWebHistory, directly refreshing or entering URLs can lead to 404 errors if the server isn't correctly configured to redirect all such requests to index.html. With createWebHashHistory, this issue doesn't arise, making it a more foolproof solution for environments where server control is limited.

I just want to use createWebHashHistory in my local development environment.

The fix

Now, the fix is easy.

First, modify scripts in package.json to specify mode for serve and build, and I added two new items serve_prod and build_dev:

"scripts": {
    "serve": "vue-cli-service serve --mode development",
    "serve_prod": "vue-cli-service serve --mode production",
    "build": "vue-cli-service build --mode production",
    "build_dev": "vue-cli-service build --mode development"
  },

Second, creating or editing vue.config.js as below:

module.exports = {
  publicPath: process.env.NODE_ENV === 'production'
    ? '/'
    : '',
}

Lastly, update src/router/index.js to handle the mode accordingly:

The original code was:

import { createRouter, createWebHistory } from "vue-router";

# other existing code

const router = createRouter({
  history: createWebHistory(),
  history: history,
  routes,
  linkActiveClass: "active"
});

Now it looks like this:

import { createRouter, createWebHistory } from "vue-router";

# other existing code

// Determine the history mode based on the environment
const history = process.env.NODE_ENV === 'production'
  ? createWebHistory()
  : createWebHashHistory();

const router = createRouter({
  history: history,
  routes,
  linkActiveClass: "active"
});

Now, run npm run build_dev again, I can see the portal 😎

Thanks for reading!
Have a good day!

An Adventurer's Guide to Base64, Base64URL, and Base32 Encoding

GeekCoding101 — Wed, 10 Apr 2024 00:00:00 GMT

Hey there!

Recently, I encountered some encoding issues. Then I realized that, looks like I haven't seen any articles give a crispy yet interesting explanation on Base64/Base64URL/Base32 encoding! Ah! I should write one!

So, grab your gear, and let's decode these fascinating encoding schemes together!

The Enigma of Base64 Encoding

Why do we need Base64?

Imagine you're sending a beautiful picture postcard through the digital world, but the postal service (the internet, in this case) only handles plain text.

How do you do it?

Enter Base64 encoding – it's like magic that transforms binary data (like images) into a text format that can easily travel through the internet without getting corrupted.

Base64 takes your binary data and represents it as text using 64 different characters:

10 numeric values i.e., 0, 1, 2, 3, …..9.
26 Uppercase alphabets i.e., A, B, C, D, …….Z.
26 Lowercase alphabets i.e., a, b, c, d, ……..z.
Two special characters i.e., +, / (typically)

In more details, it will:

Grouping Bytes: It groups the input bytes into sets of three, providing 24 bits in total.
Dividing Bits: These 24 bits are then divided into four sets of 6 bits each.
Mapping to Characters: Each set of 6 bits is mapped to one of 64 characters in the Base64 alphabet (A-Z, a-z, 0-9, +, and /). Since each set is 6 bits, they can represent values from 0 to 63, perfectly matching the 64 characters in the Base64 alphabet.
Padding: If the total number of input bytes is not divisible by three, padding characters (typically =) are added to make the final encoded output length a multiple of four. This ensures that the encoded data can be evenly divided back into its original byte format during decoding.

It's widely used in email attachments, data URLs in web pages, and anywhere you need to squeeze binary data into text-only zones.

A simple text like "Hello!" when encoded in Base64, turns into "SGVsbG8h".

Usage of Base64 in Data URIs

Data URIs (Uniform Resource Identifiers) offer a powerful way to embed binary data, such as images, directly into HTML or CSS files, using Base64 encoding. This method eliminates the need for external file references, resulting in fewer HTTP requests and potentially faster page loads. Here's how it works in practice:

Embedding an Image in HTML Using Data URI

Let's say you have a small logo or icon that you want to include directly in your HTML page without linking to an external file. You can use Base64 to encode the image and then incorporate it directly into an <img> tag's src attribute.

Original Image: An image file, logo.png.

Base64 Encoding: Convert logo.png into a Base64-encoded string. The result will be a long text string.

Embed in HTML: Use the encoded string within an <img> tag as follows:

<img src="data:image/png;base64,Base64EncodedStringHere" alt="Logo">

Replace Base64EncodedStringHere in above with the actual Base64-encoded string of your image. The data:image/png;base64, part tells the browser that what follows is a Base64-encoded PNG image.

Embedding images directly with Data URIs can reduce the number of HTTP requests, speeding up page loads for small resources.

Navigating the Waters of Base64URL

But, oh, the plot thickens with Base64URL. It's a close cousin of Base64, tailored for the web.

The twist? It replaces the + and / characters with - and _ to make it URL and filename safe. No more worrying about those characters being misinterpreted as special URL characters or directory paths!

The Expedition to Base32

Then, there's Base32, another encoding scheme in our adventure.

It's less compact than Base64 but has its charm, especially when you need to ensure readability and avoid confusion.

Base32 uses a set of 32 characters, making it more resilient against errors like misreading or miswriting.

Base32 shines in specific scenarios, such as encoding email addresses in DNS records for email validation (think SPF records) or in situations where you want to avoid characters that could be altered or misinterpreted.

Why These Encoding Schemes Matter

Why do we bother with all these encoding shenanigans? It's all about compatibility and safety.

These encoding schemes allow us to safely transmit binary data over mediums that only support text, ensuring that our data arrives intact and unaltered at its destination.

Choosing Your Path

Use Base64 when you need a compact, text-based representation of binary data for emails, data URIs, and when integrating with APIs that expect data in this format.
Opt for Base64URL when your data needs to be part of URLs or file names, ensuring a smooth and safe journey through the web.
Choose Base32 for maximum readability and error resilience, perfect for transmitting data that might be entered manually or when you want to avoid certain problematic characters.

Alternatives and Mysteries Beyond

Our adventure doesn’t end here. There are other encoding schemes like Base58, popularized by Bitcoin, which further reduces the chance of misinterpretation by excluding similar-looking characters. And let's not forget hexadecimal encoding, a simpler form often used in programming and debugging.

In conclusion, whether you’re encoding treasure maps to share with your fellow digital pirates or simply ensuring that your data travels safely across the vast internet, understanding when and how to use these encoding schemes is an essential skill in the digital world.

Remember, the right encoding at the right time can be the difference between smooth sailing and getting lost in the digital sea.

So, choose wisely!

Until our next digital odyssey, keep exploring and encoding.

Terminal Mastery: Crafting a Productivity Environment with iTerm, tmux, and Beyond

GeekCoding101 — Thu, 11 Apr 2024 00:00:00 GMT

I love working on Linux terminals

Rewind a decade or so, and you'd find me ensconced within the embrace of a Linux terminal for the duration of my day. Here, amidst the digital ebb and flow, I thrived—maneuvering files and folders with finesse, weaving code in Vim, orchestrating services maintenance, decoding kernel dumps, and seamlessly transitioning across a mosaic of tmux sessions.

The graphical user interface? A distant thought, unnecessary for the tapestry of tasks at hand.

Like all geeks, every tech enthusiast harbors a unique sanctuary of productivity—a bespoke digital workshop where code flows like poetry, and ideas ignite with the spark of creativity. It’s a realm where custom tools and secret utilities interlace, forming the backbone of unparalleled efficiency and innovation.

Today, I'm pulling back the curtain to reveal the intricacies of my personal setup on Mac.

I invite you on this meticulous journey through the configuration of my Mac-based development sanctuary.

Together, let's traverse this path, transforming the mundane into the magnificent, one command, one tool, one revelation at a time.

iTerm2

After account setup on Mac, the initial terminal looks like this when I logged in:

Let's equip it with iTerm2!

What is iTerm2?

iTerm2 is a replacement for Terminal and the successor to iTerm. It works on Macs with macOS 10.14 or newer. iTerm2 brings the terminal into the modern age with features you never knew you always wanted.

Why Do I Want It?

Check out the impressive features and screenshots. If you spend a lot of time in a terminal, then you'll appreciate all the little things that add up to a lot. It is free software and you can find the source code on Github.

How Do I Use It?

Try the FAQ or the documentation. Got problems or ideas? Report them in the bug tracker, take it to the forum, or send me email (gnachman at gmail dot com).

Go ahead to https://iterm2.com/ download it and finish installation. Open it:

Well, still not impressive. But, now you have got all advanced features from iTerm!

This blog post is not focus on iTerm2, we can discuss later. So I am not going through those fancy features right now, please explore on the official website.

Let's start to customize on it.

Wait, Nerd ... Font

Before we jump on the customization on iTerm2, I want to introduce you Nerd Font.

It will be required and installed by https://github.com/romkatv/powerlevel10k which we will talk later.

This project aims to enhance the usability and aesthetic appeal of the development environment without sacrificing the functionality or readability of the text. The added icons can represent common actions or tools in the development workflow, allowing for a more intuitive and visually engaging interface.

By incorporating icons directly into the fonts, Nerd Fonts allows developers to use these icons across different applications and tools seamlessly, without needing to rely on external libraries or tool-specific extensions. This can simplify setup and configuration across tools and platforms, providing a consistent and enriched visual experience.

Customize iTerm2

Just follow me:

Set your favorite character as background image if you like:

Let's compare previous and now:

Oh-my-zsh and powerlevel10k

Install https://github.com/ohmyzsh/ohmyzsh (to manage zsh):

sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)"

Install https://github.com/romkatv/powerlevel10k for ohmyzsh and configure it (powerlevel10k is a theme of zsh).

git clone --depth=1 https://github.com/romkatv/powerlevel10k.git ${ZSH_CUSTOM:-$HOME/.oh-my-zsh/custom}/themes/powerlevel10k

Set environment:

echo "source ~/.oh-my-zsh/custom/themes/powerlevel10k/powerlevel10k.zsh-theme" >> ~/.zshrc

Restart Zsh with exec zsh.

You should see:

Let's install the fonts and follow the wizard, choose whatever you like!

Now it looks much better!

Much better, but still not meet my expectation!

Let's begin working with our adaptable TMUX expert now!

tmux is a terminal multiplexer. It lets you switch easily between several programs in one terminal, detach them (they keep running in the background) and reattach them to a different terminal.

Tmux should have been installed by default, type tmux, you should see:

Can't wait to customize it!

I am using https://github.com/gpakosz/.tmux.git:

It's a Self-contained, pretty and versatile .tmux.conf configuration file.

cd ~
rm -fr .tmux
git clone https://github.com/gpakosz/.tmux.git
ln -s -f .tmux/.tmux.conf
cp .tmux/.tmux.conf.local .

Append below into .tmux.conf.local before the line "# -- custom variables":

# increase history size
set -g history-limit 9999999
# start with mouse mode enabled
set -g mouse on

bind-key -n C-S-Left swap-window -t -1\; select-window -t -1
bind-key -n C-S-Right swap-window -t +1\; select-window -t +1

# -- custom variables ----------------------------------------------------------

I am a fan of Vi/Vim, must enable Vi mode in "~/.tmux.conf.local":

Customize status bar:

tmux_conf_theme_status_right_fg="$tmux_conf_theme_colour_12,$tmux_conf_theme_colour_14,$tmux_conf_theme_colour_6"
tmux_conf_theme_status_right_bg="$tmux_conf_theme_colour_15,$tmux_conf_theme_colour_17,$tmux_conf_theme_colour_9"

tmux_conf_theme_left_separator_main='\uE0B0'
tmux_conf_theme_left_separator_sub='\uE0B1'
tmux_conf_theme_right_separator_main='\uE0B2'
tmux_conf_theme_right_separator_sub='\uE0B3'

Find below lines in "~/.tmux.conf.local" and uncomment them to enable:

Set a icon for the left status:

Now reload configuration:

tmux source ~/.tmux.conf

Check it now!

I have a script to launch Tmux when starting iTerm2, here you go:

#!/bin/zsh

tmux ls|grep kongfu | grep -q attached

if [[ $? != 0 ]] ; then
  tmux attach -t kongfu  ||  tmux new-session -s kongfu
else
  echo "********************************************************************************"
  echo "* Ignore attaching tmux kongfu session as it has been attached already.        *"
  echo "********************************************************************************"
fi

Save it "~/bin/tmux_init.sh" and "chmod 755 ~/bin/tmux_init.sh", configure it in iTerm2 default profile:

Customize P10K Status Bar for Anaconda and Node.js

I have Anaconda and Node.js environments.

I am not satisfied with the default color settings for Anaconda and node.js. It's ugly.

Open your ~/.p10k.zsh, make below changes:

Now resource the file:

source ~/.p10k.zsh

Show time!

I recorded this to show you how it looks like on my environment:

https://www.youtube.com/watch?v=TBfvoSeyP4U

Fix Font in VSCode Terminal

GeekCoding101 — Sat, 13 Apr 2024 00:00:00 GMT

The Font Problem in VSCode

After done the configuration in Terminal Mastery: Crafting A Productivity Environment With ITerm, Tmux, And Beyond, we got a nice terminal:

However, after I installed VSCode, the terminal couldn't display certain glyphs, it looks like this:

The Fix

We need to fix it by updating the font family in VSCode.

1. Identify the name of font family. Open Font Book on Mac, we can see:

The font supports those glyphs is "MesloLGM Nerd Font Mono", that's also what I configured for iTerm2.

2. Go to VSCode, go to Command + comma, go to settings, search "terminal.integrated.fontFamily", set the font name as below:

3. Now we can see it displays correctly:

Well done!

Supervised Machine Learning - Day 1

GeekCoding101 — Sun, 14 Apr 2024 00:00:00 GMT

The Beginning

As I've been advancing technologies of my AI-powered product knowlege base chatbot which based on Django/LangChain/OpenAI/Chroma/Gradio which is sitting on AI application/framework layer, I also have kept an eye on how to build a pipeline for assessing the accuracy of machine learning models which is a part of AI Devops/infra.

But I realized that I have no idea how to meature a model's accuracy. This makes me upset.

Then I started looking for answers.

My first google search on this is "how to measure llm accuracy", it brought me to Evaluating Large Language Models (LLMs): A Standard Set of Metrics for Accurate Assessment, it's informative. It's not a lengthy article and I read through it. This opens a new world to me.

There are standard set of metrics for evaluating LLMs, including:

Perplexity - A measure of how well a language model predicts a sample of text. It is calculated as the inverse probability of the test set normalized by the number of words.
Accuracy - It is a measure of how well a language model makes correct predictions. It is calculated as the number of correct predictions divided by the total number of predictions.
Accuracy can be calculated using the following formula: accuracy = (number of correct predictions) / (total number of predictions).
F1-score - It is a measure of a language model's balance between precision and recall. It is calculated as the harmonic mean of precision and recall.
ROUGE score - It is a measure of how well a language model generates text that is similar to reference texts. It is commonly used for text generation tasks such as summarization and paraphrasing.
There are different ways to calculate ROUGE score, including ROUGE-N, ROUGE-L, and ROUGE-W.
BLEU score - This is to measure how well a language model generates text that is fluent and coherent. It is commonly used for text generation tasks such as machine translation and image captioning.
METEOR score - It is about how well a language model generates text that is accurate and relevant. It combines both precision and recall to evaluate the quality of the generated text.
Question answering metrics - Question answering metrics are used to evaluate the ability of a language model to provide correct answers to questions. Common metrics include accuracy, F1-score, and Macro F1-score.
Question answering metrics can be calculated by comparing the generated answers to one or more reference answers and calculating a score based on the overlap between them.
Lets say we have a language model that is trained to answer questions about a given text. We test the model on a set of 100 questions, and the generated answers are compared to the actual answers. The accuracy, F1-score, and Macro F1-score of the model are calculated based on the overlap between the generated answers and the actual answers.
Sentiment analysis metrics - Sentiment analysis metrics are used to evaluate the ability of a language model to classify sentiments correctly. Common metrics include accuracy, weighted accuracy, and macro F1-score.
Named entity recognition metrics - It is used to evaluate the ability of a language model to identify entities correctly. Common metrics include accuracy, precision, recall, and F1-score.
Contextualized word embeddings - It is used to evaluate the ability of a language model to capture context and meaning in word representations. They are generated by training the language model to predict the next word in a sentence given the previous words.
Lets say we have a language model that is trained to generate word embeddings for a given text. We test the model on a set of 100 texts, and the generated embeddings are compared to the actual embeddings. The evaluation can be done using various methods, such as cosine similarity and Euclidean distance.

I don't know all of them and where to start!

I have to tell meself, "Man, you don't know machine learning..." So my next search was "machine learning course", Andrew Ng's Supervised Machine Learning: Regression and Classification now came on top of the google search results! It's so famous and I knew this before!

Then I made a decision, I want to take action now and finish it thoroughly!

I immedially enrolled into the course. Now let's start the journey!

Day 1 Started

Basics

1. What is ML?

Defined by Arthur Samuel back in the 1950 😯

"Field of study that gives computers the ability to learn without being explicitly programmed."

The above claims gaves the key point (The highlighted part) which could answer the question from one of my colleague who asked me "what's the difference between a programed system trigerring alert on events than a AI-powered system which also triggering alert on events.", he couldn't tell the differences. One that day, I've tried to explain but couldn't find the right word. Now I found it.

2. What are major ML algorithms?

Supervised Learning
Unsupervised Learning
Reinforcement Learning

3. Supervised Learning

Examples: visual inspection, self-driving car, online advertising, machine translation, speech recognition, spam filtering.

3.1. Regression (Example, house price prediction)

By regression, we mean by predicting a number from infinitely many possible output

3.2. Classification (Example, Breast Cancer Detection)

Malignant, Benign

By classification, we mean by predicting categories, categories don't have to be numbers, it could be non-numeric.

It can predict whether a picture is that of a cat or a dog. And it can predict if a tumor is benign or malignant. Categories can also be numbers like 0, 1 or 0, 1, 2. But what makes classification different from regression when you're interpreting the numbers is that classification predicts a small finite limited set of possible output categories such as 0, 1 and 2 but not all possible numbers in between like 0.5 or 1.7.

First example for predicting cancer has only one input, the size of the tumor. Classification also works with more inputs, like this:

The key is to find out the boundary of benign and malignant.

4. Unsupervised Learning

Unsupervised learning is to find something interesting in unlabled data, which involved algorithms like clustering algorithm.

4.1 Clustering algorithm

Google news used clustering algorithm.

In other workds, clustering algorithm, takes data without labels and tries to automatically group them (similar) into clusters by finding some structure or some pattern or something interesting in the data.

4.2. Anomaly Detection

Find unusual data.

4.3 Dementionality Reduction

Compress data using fewer numbers.

Regression Model

Selling house example, building a linear model to predict how much the house could sell for.

Any supervised learning model that predicts a number such as 220,000 or 1.5 or negative 33.2 is addressing what's called a regression problem.
Linear regression is one example of a regression model.

Linear Regression

$$f_{w,b}(x) = wx + b$$

Or simpler format:

$$f(x) = wx + b$$

Now the question is, how do you find values for $w$ and $b$ so that the prediction $\hat{y}^{(i)}$ is close to the true target $y^{(i)}$ for many or maybe all training examples $(x^{(i)}, y^{(i)})$?

How to measure how well a line fits the training data?

$$ (\hat{y}^{(i)} - y^{(i)})^2 $$

When measuring the error (ŷ - y), for example i, we'll compute this squared error term.

Square error cost function:

Well, I spent almost 20mins to figure out how to write math formula on Wordpress, look! 🥳

$$J(w, b) = \frac{1}{2m} \sum_{i=1}^{m}(\hat{y}^{(i)}-y^{(i)})^{2}$$

The extra division by 2 is just meant to make some of our later calculations look neater, but the cost function still works whether you include this division by 2 or not.

The final one:

$$J(w, b) = \frac{1}{2m} \sum_{i=1}^{m}(f_{w,b}(x^{(i)})-y^{(i)})^{2}$$

This graph compare (when b = 0) is important!

The Challenge Part in Today's Learning Journey

"Visualizing the cost function" is most challenge...

I watched it at least twice then got the idea.

Compare Regression/Classification Models

Items	Regression	Classfication	Comments
Output	Infinitely	Finite

Terminology

Term	Comments
Discrete Category
Training Set	The data set we used to train model
input variable or feature or input feature	Denote as $x$
target variable	Denote as $y$
$(x, y)$	Indicate the single training example
$(x^{(i)}, y^{(i)})$	$i$-th training example
hypothesis	"$f$" means function, historically, it's called hypothesis.
$\hat{y}$	y hat (On Mac press Option + i then followed by the letter y, then you can get ŷ). In machine learning, the convention is that y-hat is the estimate or the prediction for y.
$y$	When the symbol is just the letter y, then that refers to the target, which is the actual true value in the training set.
$\hat{y} - y$	This differences is called "Error". We're measuring how far off to prediction is from the target
parabola curve	You know, quadratic function
Univariate	Uni means one in Latin. Univariate is just a fancy way of saying one variable
parameters/coefficients/weights	In machine learning parameters of the model are the variables you can adjust during training in order to improve the model.
downward-sloping line
hammock	Have some fun...
contour plot
topographical	Learning some geography when learning ML ...
gradient descent	This algorithm is one of the most important algorithms in machine learning. Gradient descent and variations on gradient descent are used to train, not just linear regression, but some of the biggest and most complex models in all of AI.

Supervised Machine Learning – Day 2 & 3 - On My Way To Becoming A Machine Learning Person

GeekCoding101 — Tue, 16 Apr 2024 00:00:00 GMT

A brief introduction

Day 2 I was busy and managed only 15 mins for "Supervised Machine Learning" video + 15 mins watched But what is a neural network? | Chapter 1, Deep learning from 3blue1brown.

Day 3 I managed 30+ mins on "Supervised Machine Learning", and some time spent on articles reading, like Parul Pandey's Understanding the Mathematics behind Gradient Descent, it's really good. I like math 😂

So this notes are mixed.

Notes

Implementing gradient descent

Notation

I was struggling in writting Latex for the formula, then found this table is useful (source is here):

Andrew said I don't need to worry about the derivative and calculas at all, I trust him, next I dived into my bookcases and found out my advanced mathmatics books used in college, and spent 15 mins to review, yes, I don't need.

Snapped two epic shots of my "Advanced Mathematics" book used in my college time to show off my killer skills in derivative and calculus - pretty sure I've unlocked Math Wizard status!

Reading online

Okay. Reading some online articles.

If we are able to compute the derivative of a function, we know in which direction to proceed to minimize it (for the cost function).

From Parul Pandey, Understanding the Mathematics behind Gradient Descent

Parul briefly introduced Power rule and Chain rule, fortunately, I still remember them learnt from colleage. I am so proud.

After reviewing various explanations of gradient descent, I truly appreciate Andrew's straightforward and precise approach!

He was kiddish some times drawing a stick man walking down a hill step by step, suggesting that one might imagine flowers in the valley and clouds in the sky. This comfortable and engaging method made me forget I was in learning mode!

The hardest part still comes

But anyway, the hardest part still comes, I need to master this at the end:

$$ \text{repeat until convergence: } \left{ \begin{aligned} w &= w - \alpha \frac{\partial J(w,b)}{\partial w} \ b &= b - \alpha \frac{\partial J(w,b)}{\partial b} \end{aligned} \right} $$

Learning Rate

If ⍺ is too small -> baby step, taking long time to get minimum.

If ⍺ is too big -> stride, may overshot, may never reach minimum. Fail to converge, even diverge.

What if w is near at the local minimum?

The derivative will become smaller.
The updated step will also become smaller

Above will result reaching minimum without decreasing learning rate.

At here, the conclusion came:

So that's the gradient descent algorithm, you can use it to try to minimize any cost function J.
Not just the mean squared error cost function that we're using for the new regression.

Tips:

Used derivative (partial derivative)
Needs to update w and b simultaneously. This requires to use a temp variable (Of course, it's a standard practice in programing)

Finished Week 1's course!

It's a milestone to me!

I used 3 days (technically two days) finished the first week's course!

I love ML! Let's continue the momotum!

Terminology

Term	Comments
Squared error cost function
Local minimal
Tangent line	Andrew introduced this when trying to show how derivative impact the cost.
Converge/Diverge
Convex function	It has a single global minimum because of this bowl-shape. The technical term for this is that this cost function is a convex function.
Batch Gradient Descent	"Batch": Each step of gradient desent uses all the training examples.

Mastering Multiple Features & Vectorization: Supervised Machine Learning – Day 4 and 5

GeekCoding101 — Thu, 18 Apr 2024 00:00:00 GMT

So difficult to manage some time on this

Day 4 was a long day for me, just got 15mins before bed to quickly skim through the video "multiple features" and "vectorization part 1".

Day 5, a longer day than yesterday... went to urgent care in morning... then back-to-back meeting after come back.... lunch... back-to-back meeting again... need to step out again...

Anyway, that's life.

Multiple features (variables) and Vectorization

In "multiple features", Andrew uses crispy language explained how to simplify the multiple features formula by using vector and dot product.

In "Part 1", Andrew introduced how to use NumPy to do dot product and said GPU is good at this type of calculation. Numpy function can use parallel hardware (like GPU) to make dot product fast.

In "Part 2", Andrew further introduced why computer can do dot product fast. He used gradient descent as an example.

The lab was informative, I walked through all of them though I've known most of them before.

More linkes about Vectorization can be find here.

Questions for helping myself learning

I created the following questions to test my knowledge later.

Why is there a arrow hat on variable?
A: Optional, but nice to have to indicate it's a vector not a number.
What does dot product do?
A: the dot products of two vectors of two lists of numbers W and X, is computed by checking the corresponding pairs of numbers.
What is multiple linear regression?
A: It's the name for the type of linear regression model with multiple input features, which is contrast of "univariate regression".
Is multivariate regression same as "multiple linear regression"?
A: No. multivariate regression is something else.
Why do we need to vectorization?
A: Codes can perform calculations in much less time than codes without vectorization on specialized HW.
This matters more when you're running algorithms on large data sets or trying to train large models, which is often the case with machine learning.

What is x(4)1 in above graph?

Ps. feel free to check out the series of my Supervised Machine Learning journey.

Supervised Machine Learning – Day 6

GeekCoding101 — Fri, 19 Apr 2024 00:00:00 GMT

Today I spent 10 30 60 mins reviewing previous notes, just realized that's a lot.

I am amazing 🤩

Today start with "Gradient descent for multiple linear regression".

Gradient descent for multiple linear regression

Holy... At the beginning Andrew throw out below and said hope you still remember, I don't:

Why I couldn't recognize... Where does this come?

I spent 30mins to review several previous videos, then found it... The important videos are:

1. Week 1 Implementing gradient descent, Andrew just wrote down below without explaining (he explained later)

2. Gradient descent for linear regression

Holy! Found a mistake in Andrew's course!
On above screenshot, Andrew lost x(i) at the end of the first line!

WOW! I ROCK! Spent almost 60mins!

I am done for today!

The situation reversed an hour later

But I felt upset, I was NOT convinced I found the simple mistake especially in Andrew's most popular Machine learning course!

I started trying to resolve the fomula.

And.... I found out I was indeed too young too naive... Andrew was right...

I got help from my college classmate who has been dealing with calculus everyday for more than 20 years...

This is the derivation process of the formula written by him:

He said this to me like my math teacher in college:

Chain rule, deriving step by step.

If you still remember, I have mentined Parul Pandey’s Understanding the Mathematics behind Gradient Descent in my previous post Supervised Machine Learning – Day 2 & 3 – On My Way To Becoming A Machine Learning Person, in her post she did mentioned:

Primarily we shall be dealing with two concepts from calculus :

Power rule and chain rule.

Well, she is right as well 😁

So happy I leant a lot today!

Finished Machine Learning for Absolute Beginners - Level 1

GeekCoding101 — Thu, 25 Apr 2024 00:00:00 GMT

As you know I was in progress learning Andrew Ng's Supervised Machine Learning: Regression and Classification, it's so dry!

So I also spare some time to pick up some easy ML courses to help me to understand.

Today I came across Machine Learning for Absolute Beginners - Level 1 and it's really easy and friendly to beginner.

Finished in 2.5 hours - Maybe because I've made some good progress in Supervised Machine Learning: Regression and Classification and so feel it's easy.

I want to share my notes in this blog post.

Applied AI or Shallow AI

Industry’s robot can handle specific small task which has been programmed, it’s called Applied AI or Shallow AI.

Under-fitting and over-fitting are challenges for Generalization.

Under-fitting

The trained model is not working well on the training data and can’t generalize to new data.

Reasons may be:

The model was too simple
Data set is not good enough

An idea training process, it would looks like:

Under fitting….. better fitting…. Good fit

Over-fitting

The trained model is working well on the training data and can’t generalize well to new data.

Reasons may be:

Training dataset is not a true distribution of the data (mitigation: use much larger training dataset, using a test dataset)
Too complex model (fit the data as simple as possible)
Small training dataset

Training dataset (labeled) -> ML Training phase -> Trained Model

The input (unlabeled dataset) -> processed by Trained model (inference phase) -> output (labeled dataset)

Approaches or learning algorithms of ML systems can be categorized into:

Supervised Learning
Unsupervised Learning
Reinforcement Learning

Supervised Learning

There are two very typical tasks that are performed using supervised learning:

classification (this is not clustering which belongs to Unsupervised learning!)
1. One of the algorithm: Support Vector Machines (SVM)
regression -> Statistical methods for estimating for strength of the relationship between a dependent variable and one or more independent variables.
1. Learner regression
2. Logistic regression
3. Polynomial regression

Shallow Learning

One of the common classification algorithms under the shallow learning category is called Support Vector Machines (SVM).

Unsupervised Learning

The goal is to identify automatically meaningful patterns in unlabeled data.

Clustering -> clustering is the task of identifying similar instances with shared attributes in a data set and group them together into clusters, grouping a set of objects in such a way that objects in the same group are more similar to each other than those in other groups.
The output of the algorithm will be a set of labels assigning each data point to one of the identified clusters.
1. Customer segmentation, like demographic information
2. Anomaly/Outlier detection
Dimension reduction - why we have this: one of the biggest challenges of supervised learning is that there are too many features as input.
1. Large dataset -> pre-processing (Dimension Reduction) -> produce smaller dataset -> use the dataset for training Supervised learning model.
2. Image segmentation

Semi-supervised Learning

Sitting between supervised learning and unsupervised learning

It works like this way:

Unlabeled dataset -> clustering -> clusters 1, 2, 3… -> Label the dataset -> Labeled dataset can be used for training a Supervised Learning.

Reinforcement Learning

Completely different with above (supervised and unsupervised).

It’s not using a group of labeled or unlabeled examples.

Used as a framework for decision-making tasks based on goals.

Perform a complex objective while performing multiple sequences of actions.

Usage

Chess game (all computer games typically) to achieve superhuman performance
1. The feedback on selected strategies is delayed!
2. Try things and get feedback. In other words, based on the feedback or interaction, the model is learning.
Training robot to perform tasks in dynamic environment or building real time recommendations

RL is a method being used to let machines learn know to behave based on interaction with the environment while focusing on some end goal**.**

Decision Making Agent (Under RL)

Because there are some end goals, so there must be a reward mechanism to move forward to the right direction while in the learning process.

RL builds a prediction model by gaining feedback from random trial and error and leveraging insight from previous interations.

The cumulative knowledge of how to achieve a specific goal is reinforced again and again by experience.

Comments on the course

This course was indeed designed for "Absolute Beginners".

One suggestion is that the quizzes are too easy, as most simply reiterate the content explained in the course without introducing any variations or ambiguities to challenge the learner.

Master Feature Scaling & Gradient Descent: Supervised Machine Learning – Day 7

GeekCoding101 — Thu, 25 Apr 2024 00:00:00 GMT

Welcome back

I didn't get much time working on the course in past 5 days!!!

Finally resuming today!

Today I reviewed Feature scaling part 1 and learned Feature scaling part 2 and Checking gradient descent for convergence.

The difficulty of the course is getter harder, 20mins video, I spent double time and needed to checking external articles to get better understanding.

Feature Scaling

Trying to understand what is "Feature Scaling"...

What are features and parameters in below formula?

hat of Price = w1x1 + w2x2 + b.

x1 and x2 are features, former one represents size of house, later one represents number of bedrooms.

w1 and w2 are parameters.

When a possible range of values of a feature is large, it's more likely that a good model will learn to choose a relatively small parameter value.

Likewise, when the possible values of the feature are small, like the number of bedrooms, then a reasonable value for its parameters will be relatively large like 50.

So how does this relate to grading descent?

At the end of this video, Andrew explained that the features need to be re-scaled or transformed sl that the cost function J using the transfomed data would shape better and gradient descent can find a much more direct path to the global minimum.

When you have different features that take on very different ranges of values, it can cause gradient descent to run slowly but re scaling the different features so they all take on comparable range of values. because speed, upgrade and dissent significantly.

Andrew Ng

One key aspect of feature engineering is scaling, normalization, and standardization, which involves transforming the data to make it more suitable for modeling. These techniques can help to improve model performance, reduce the impact of outliers, and ensure that the data is on the same scale.

The Feature scaling part 2 mentioned why we need to scale. I did some google search and found out Feature Scaling: Engineering, Normalization, and Standardization (Updated 2024) is really good.

As a summary of the video, we know a few methods to do scaling:

Divid with the maxium value of features.
Mean Normalization
Z-score normalization

As a rule of thumb, when performing feature scaling, you might want to aim for getting the features to range from maybe anywhere around negative one to somewhere around plus one for each feature x.
But these values, negative one and plus one can be a little bit loose.
If the features range from negative three to plus three or negative 0.3 to plus 0.3, all of these are completely okay.
If you have a feature x_1 that winds up being between zero and three, that's not a problem.
You can re-scale it if you want, but if you don't re-scale it, it should work okay too.

Andrew Ng

Too large or small, then should rescale.

Checking gradient descent for convergence

I learnt that using learning curve of J function to see whether the function is going convergence.

If graph of function J ever increases after one iteration, that means either Alpha is chosen poorly, and it usually means Alpha is too large, or there could be a bug in the code.

J of vector w and b should decrease as iteration increases.

If the curve has flattened out, this means that gradient descent has more or less converged because the curve is no longer decreasing.

Andrew said he usually found that choosing the right threshold epsilon was pretty difficult. He actually tended to look at learning curve, rather than rely on automatic convergence tests.

References

Feature Scaling: Engineering, Normalization, and Standardization (Updated 2024)

Terminology

Term	Comments
Normalization
Mean Normalization
Z-score normalization	Involved to calculate "standard deviation σ".
standard deviation σ	It is a measure of how dispersed the data is in relation to the mean. Low, or small, standard deviation indicates data are clustered tightly around the mean, and high, or large, standard deviation indicates data are more spread out. Or if you've heard of the normal distribution or the bell-shaped curve, sometimes also called the Gaussian distribution, this is what the standard deviation for the normal distribution looks like.
Learning curve	It is difficult to tell in advance how many iterations gradient descent needs to converge, which is why you can create a learning curve.
Automatic Convergence Test	Another way to decide when your model is done training is with an automatic convergence test. If the cost J decreases by less than this number epsilon on one iteration, then you're likely on this flattened part of the curve that you see on the left and you can declare convergence.

Master Learning Rate and Feature Engineering: Supervised Machine Learning – Day 8

GeekCoding101 — Sat, 27 Apr 2024 00:00:00 GMT

Today I started with Choosing the learning rate, reviewed the Jupyter lab and learnt what is feature engineering.

Choosing the learning rate

The graph taugh in Choosing the learning rate is helpful when develping models:

Feature Engineering

When I first started Andrew Ng’s Supervised Machine Learning course, I didn’t really realize how much of an impact feature engineering could have on a model’s performance. But boy, was I in for a surprise! As I worked through the course, I quickly realized that the raw data we start with is rarely good enough for building a great model. Instead, it needs to be transformed, scaled, and cleaned up — that’s where feature engineering comes into play.

Feature engineering is all about making your data more useful for a machine learning algorithm. Think of it like preparing ingredients for a recipe — the better the quality of your ingredients, the better the final dish will be. Similarly, in machine learning, the features (the input variables) need to be well-prepared to help the algorithm understand patterns more easily. Without this step, even the most powerful algorithms might not perform at their best.

In the course, Andrew Ng really breaks it down and explains how important feature scaling and transformation are. In one of the early lessons, he used the example of linear regression — a simple algorithm that relies on understanding the relationship between input features and the output. If the features are on vastly different scales, it can throw off the whole process and make training the model take much longer. This was something I had never considered before, but it made so much sense once he explained it.

I remember struggling a bit with the idea of scaling features at first. Some of the variables in the data might be on completely different scales, like the size of a house in square feet versus the number of bedrooms. Features like the size of a house might have values in the thousands, while the number of bedrooms might only range from 1 to 5. Without scaling, the larger feature would dominate the learning process, and the model would be biased toward it. Learning how to scale these features properly made a huge difference in getting the algorithm to work more efficiently.

One of the key takeaways from this part of the course was the importance of understanding your data. Andrew Ng emphasizes that feature engineering is not just about applying transformations, but also about using your knowledge of the problem to make the data more relevant. For example, I learned that I could create new features from existing ones. If I had data about the size of a house and the number of bedrooms, I could create a new feature like "bedrooms per square foot," which might give the model more useful information to work with.

As the course progressed, I saw how important feature engineering is not just for scaling but also for improving model performance in more complex algorithms like logistic regression and neural networks. It was all about creating features that made it easier for the algorithm to find patterns and make predictions.

To be honest, at first, I didn’t realize how much of a difference feature engineering could make. But after practicing it through the course, I saw firsthand how tweaking and transforming features could lead to much better results. It felt like I had unlocked a new level in the course, where I wasn’t just feeding data into the algorithm, but actually working with it to make the algorithm smarter. Feature engineering really turned out to be one of the most rewarding parts of learning machine learning, and it’s something I’ll definitely keep honing as I continue on this journey.

It's the end of this learning week?!

Didn't expect that I reached the end of this learning week!

I didn't spend much time on the labs. There are three labs in this learning week and I am afaid I might not be ready for the practice lab of this week.

Will come back to udpate more details after my exercises.

Ps. feel free to check out my other posts in Supervised Machine Learning Journey.

Master Gradient Descent and Binary Classification: Supervised Machine Learning – Day 9

GeekCoding101 — Thu, 09 May 2024 00:00:00 GMT

A break due to sick

Oh boy... I was sick for almost two weeks 🤒 After a brief break, I’m back to dive deep into machine learning, and today, we’ll revisit one of the core concepts in training models—gradient descent. This optimization technique is essential for minimizing the cost function and finding the optimal parameters for our machine learning models. Whether you're working with linear regression or more complex algorithms, understanding how gradient descent guides the learning process is key to achieving accurate predictions and efficient model training.

Let's dive back into the data-drenched depths where we left off, shall we? 🚀

The first coding assessment

I couldn't recall all of the stuff actually. It's for testing implementation of gradient dscent for one variable linear regression.

I did a walk through previous lessons and I found this summary is really helpful:

This exercise enhanced what I've learnt about "gradient descent" in this week.

Getting into Classification

I started the learning of the 3rd week. Looks like it will be more interesting.

I made a few notes:

binary classification
negative class not mean "bad", but absense.
positive class not mean "good", but presence.
New english words: benign, malignant
logistic regression - Even though, it has "regression" in the name, but it's for classification.
- threshold
- sigmoid function or logistic function
decision boundary

Probability that y is 1;
Given input arrow x, parameters arrow w, b.

I couldn't focus too long on this. Need to pause after watching a few videos.

Bye now.

Ps. feel free to check out my other posts in Supervised Machine Learning Journey.

Grinding Through Logistic regression: Exploring Supervised Machine Learning – Day 10

GeekCoding101 — Sat, 11 May 2024 00:00:00 GMT

Let's continue!

Today is mainly learning about "Decision boundary", "Cost function of logistic regresion", "Logistic loss" and "Gradient Descent Implementation for logistic regression".

We found out the "Decision boundary" is when z equals to 0 in the sigmod function.

Because at this moment, its value will be just at neutral position.

Andrew gave an example with two variables, x1 + x2 - 3 (w1 = w2 = 1) the decision bounday is the line of x1 + x2 = 3.

I want to say "Cost function for logistic regression" is the most hard in week 3 so far I've seen.

I haven't quite figured out why the square error cost function not applicable and where the loss function came from.

I have to re-watch the videos again.

The lab is also useful.

This particular cost function in above is derived from statistics using a statistical principle called maximum likelihood estimation (MLE).

Questions and Answers

Why do we need loss function?
Logistic regression requires a cost function more suitable to its non-linear nature. This starts with a Loss function.
Why is the square error cost function not applicable to logistic regression?
What is maximum likelihood?
In logistic regression, "cost" and "loss" have distinct meanings. Which one applies to a single training example?
A: The term "loss" typically refers to the measure applied to a single training example, while "cost" refers to the average of the loss across the entire dataset or a batch of training examples.

Some thoughts of today

Honestly, it feels like it's getting tougher and tougher.

I can still get through the equations and derivations alright, it’s just that as I age, I feel like my brain is just not keeping up.

At the end of each video, Andrew always congratulates me with a big smile, saying I’ve mastered the content of the session.

But deep down, I really think what he's actually thinking is, "Ha, got you stumped again!"

However, to be fair, Andrew really does explain things superbly well.

I hope someday I can truly master this knowledge and use it effortlessly.

Fighting!

Ps. Feel free to check out my other AI Machine Learning Journal blog posts at here.

Overfitting! Unlocking the Last Key Concept in Supervised Machine Learning – Day 11, 12

GeekCoding101 — Mon, 13 May 2024 00:00:00 GMT

I finished the course!

I really enjoyed the learning experiences in Andrew's course so far.

Let's see what I've learn for the two days!

Overfitting - The Last Topic of this Course!

Overfitting

It occurs when a machine learning model learns the details and noise in the training data to an extent that it negatively impacts the performance of the model on new data. This means the model is great at predicting or fitting the training data but performs poorly on unseen data, due to its inability to generalize from the training set to the broader population of data.

The course explains that overfitting can be addressed by:

Reducing model complexity: Simplifying the model by selecting one with fewer parameters.
Regularization: Adding a regularization term to the loss function, which penalizes large coefficients.
Using more training data: More data can help the model learn more generalizable patterns.

We can't bypass underfitting.

Overfitting and underfitting both are undesirable effects that suggest a model is not well-tuned to the task at hand, but they stem from opposite causes and have different solutions.

Below two screenshots captured from course for my notes:

Questions help me to master the content

What is overfitting (aka. high variance)? How to address it?
Solution to address:
a. getting more training data.
b. select features to exclude/include (because more features + insufficient data) will cause overfitting). However, useful features could be lost
c. Regularization - reduce size of parameters - Encourage the learning algorithm to shrink the values of the parameters without necessarily demanding that the parameter is set to exactly 0.
What is underfitting?
What is ƛ? How would it impact the learning algorithm if choose very large or small value?
In practice, dose regularizing b make much difference or not?
Give an example explaining what is preconception (aka. bias or underfit)
What is generalization?
You want your learning algorithm to generalize well, which means to make good predictions even on brand new examples that never seen before.
(The hardest one) Write down all the formulas taught in videos and explain how they could be implemented in python!

Words From Andrew At The End!

I want to say congratulations on how far you've come and I want to say great job for getting through all the way to the end of this video.

I hope you also work through the practice labs and quizzes.

Having said that, there are still many more exciting things to learn.

Awesome!

I am already ready for next machine learning journeys!

Install Azure-Cli on Mac

GeekCoding101 — Wed, 12 Jun 2024 00:00:00 GMT

Introduction

Are you ready to delve into the exciting realm of Azure AI?

Whether you're a seasoned developer or just starting your journey in the world of artificial intelligence, Microsoft Build offers a transformative opportunity to harness the power of AI.

Recently I came across several good tutorials on Microsoft website, e.g. "CLOUD SKILLS CHALLENGE: Microsoft Build: Build multimodal Generative AI experiences".

I enjoyed the learning on it. But I found out the very first step many people might seem as a challange: get az command work on Mac!

So I decided to write down all my fix.

Let's go!

Resolution

I am following up "Install Azure on Mac".

Run command:

brew update && brew install azure-cli

But it failed with permission issue on openssl package:

Figure 1: OpenSSL Permission Issue

</figcaption>

</figure>

I fixed it by changing the permission of /usr/local/lib to current user, but it's not enough.

I hit Python permission issue at a different location:

Figure 2: Python Permission Issue

</figcaption>

</figure>

So I had to apply the permission to /usr/local. So the command is:

brew unlink python@3.11
sudo chown -R "$USER":admin /usr/local/

The screenshot of brew unlink command:

Figure 3: Brew Unlink Existing Python

</figcaption>

</figure>

Finally it finished installation successfully!

Figure 4: Installation Success

</figcaption>

</figure>

Well done!

Ps. You're welcome to access my other AI Insights blog posts at here.

Honored to Pass AI-102!

GeekCoding101 — Thu, 27 Jun 2024 00:00:00 GMT

So, I did a thing. I earned my first Microsoft certificate: Azure AI Engineer Associate! 🎉

Here is my story from training to passing the AI-102 exam.

The Learning Journey of AI-102

The journey began with a four-and-a-half-day company-provided AI-102 training session.

It was a mix of online classes and labs. I was actually on vacation during this period, so I only managed to focus for about three days, probably.

The labs provided during the training were very useful.

There were about 10 labs, each lab could be done up to 10 times, with each session lasting 1 to 3 hours.

So I didn’t need to pay Microsoft to get familiar with the Azure AI environment.

Roughly calculation, the training provides 100 to 200 hours of lab time available, but I only used about 20 hours before taking the exam.

After the AI-102 training, I mainly sticked to Microsoft Learn: Designing and Implementing a Microsoft Azure AI Solution to fill gaps.

Trust me, that's really helpful! The MS Learn modules helped me understand the concepts better.

Cramming and Building Knowledge for AI-102

As the exam date got closer, I quickly skimmed John Savill’s Technical Training videos on YouTube for one time.

His videos helped me build a complete knowledge framework in my head. One time is enough for me.

Last but not least, please do read Areeb Pasha's AI-102 notes on Notion!

Thanks to Areeb Pasha! It's so useful.

These notes were like a concise version of MS Learn and made my studying very efficient.

I managed to cover all the important points quickly.

The Exam Day Experience

The exam day finally arrived: June 26, 2024!

My exam was scheduled at 8:15 AM, so I logged into OneVue 30 minutes early as advised.

It turned out to be a good idea because there were many things to do before starting the exam (in my case tho).

First, I had to tidy up my room...

Then, there was an ID check and a room scan. I had to use my webcam to show every corner of my desk, which was quite tricky.

Spoiler: all the photos were crooked.

Pro tip: No paper and pen are allowed. My proctor made me remove even the white sketch paper and pencil.

So, it was just my computer, and a bottle of water on the table.

Wait! Scheduling Challenges

Scheduling the AI-102 exam was a challenge itself. Booking an in-person exam in near by exam center was nearly impossible within three weeks.

I've checked all exam center online, no luck!

I just checked again, you can feel it:

Figure-1: No exam slot available

</figcaption>

</figure>

It has available slot until October!

Figure 2: Earliest available exam slot found in October 2024

</figcaption>

</figure>

So I switched to an online exam, but the earliest slots were at 5 AM, with only one slot per day.

After much searching, I found a slot 10 days later. I would be out of town for a couple of days, remember I was still on vacation?!

So the timeline is like, I joined 4 and half training, then schedule the exam, then went on vacation for several days, then spent three days of cramming after came back.

The Moment of Truth

When I started the exam, the first question was a massive use case. It was like getting hit with a wall of text.

I managed to get through it, and then there were some easy questions that helped me relax 😌.

Time management was a big issue. Because I used MS Learn too much to double check my answers and search answers...

Did I mention that you can use MS Learn during exam AI-102? Yes! But it's not that helpful, MS Learn is not ChatGPT, you will get overwhelmed.

So it's helpful to check facts if you know where to find the required information of the problem.

I was in a panic with 6 minutes left and 4 questions to go.

I quickly selected answers for those questinos.

With a mix of excitement and nerves, I hit that "submit" button like it was the launch of a rocket.

The Sweet Victory

Drumroll, please… 768! I passed!

It was a close call, but I did it. The journey was a mix of hard work, a lot of stress, and a bit of fun.

I’m glad I did it.

So there you have it—a glimpse into my adventure of becoming a Microsoft Certified: Azure AI Engineer Associate (AI-102).

Looking back, I started my AI learning journey since last year's hackathon event in my company. I built a RAG chatbot and integrated it into company's product portal and garnered huge attraction from leadership team. Then I deep dived into LLM/Agent/RAG technologies and keep advancing my hackathon project! Then I worked hard for around 30 hours and passed Andrew Ng's Supervised Machine Learning.

I am eager to apply my newly acquired skills and knowledge to further innovate and contribute to the field of AI and cloud services.

If you're on a similar path, keep learning, stay focused, and you will be there as well.

Good luck!

#MicrosoftCertified #AzureAI #ExamJourney #TechLife #NeverStopLearning

Instantly Remove Duplicate Photos With A Handy Script

GeekCoding101 — Mon, 02 Dec 2024 00:00:00 GMT

:::info

Thanksgiving usually brings memories of food, family, and laughter. For me, this year added an unexpected twist: cleaning up a massive library of duplicate photos stored on my WD NAS. What started as a manual chore turned into a tech-fueled triumph, thanks to the power of large language models (LLMs) like ChatGPT.

This is the story of how I turned a frustrating task (remove duplicate photos) into an automated solution—and how AI transformed me from a frustrated photo hoarder into a digital decluttering superhero.

:::

The Problem: Too Much Dust on Old Photos, I need "Remove Duplicate Photos" cleaner

Imagine sifting through tens of thousands of photos—manually. I mounted the NAS SMB partition on my MacBook, only to discover it was excruciatingly slow. After two days of copying files to my MacBook, my manual review session turned into a blur. My eyes hurt, my patience wore thin, and I knew there had to be a better way.

When I turned to existing tools for "remove duplicate photo" task, I hit a wall. Most were paid, overly complex, or simply didn’t fit my needs. Even the so-called free solutions required learning arcane commands like find. I needed something powerful, flexible, and fast. And when all else fails, what’s a tech enthusiast to do? Write their own solution—with a "little" help from ChatGPT.

The Power of ChatGPT

I’d dabbled with the same task scripting years ago but quickly gave up because of the time it required. Enter ChatGPT (no marketing here... I am a paid user though...), the real hero of this story. With its assistance, I wrote the majority of the script in less than a day before i gave up !

:::warning

I originally thought I could finish this coding script in just two hours with the help of ChatGPT, but... I ended up spending almost the entire day on it! So all those online posts claiming you can make an iOS app in two hours? They should be reported straight away!

:::

But anyway, of course, I still have to thank the emergence of Large Language Models! Based on the current code volume and quality, without 10 to 15 days, a single person would absolutely not be able to achieve the current results! So, I believe LLMs have helped me improve my efficiency by at least 10 times! And they've helped me avoid all sorts of unnecessary detours!

So now, I've create get_rid_of_dup.py(Clickme),

a Python-based command-line tool designed to find and remove duplicate files. The entire experience was a testament to how LLMs have redefined productivity for engineers and non-coders alike. Today, LLMs don't just help you write code; they make you feel like a superhero with a cape woven from AI-driven efficiency.

How the Script "Remove Duplicate Photos" Works

The remove duplicate photo script operates in two powerful modes:

Single Directory Duplicate Detection Quickly finds duplicates within the same folder, using a simple one-command setup. Example:
```
python get_rid_of_dup.py dedup --base-dir ./photos --max-width 50 --verbose
```
Cross-Directory Duplicate Detection Compare files across two directories, using one as a base directory while cleaning the duplicates in the other. This mode ensures that your originals remain untouched. Example:
```
python get_rid_of_dup.py search --base-dir ./test ./others --max-width 50 --verbose --exclude "*.DS_Store"
python get_rid_of_dup.py checksum --base-dir ./originals ./backup
python get_rid_of_dup.py delete --base-dir ./originals ./backup
```
Under the hood, the remove duplicate photo script uses checksum comparisons (via the xxhash library) to identify duplicates with lightning speed. It can also save checksum data for reuse, making subsequent runs exponentially faster.

A Few Things to Highlight about "Remove Duplicate Photos"

Performance: Scanning 30,000+ files (including large images) took under a minute. That’s faster than it takes me to make coffee.
Flexibility: Features like --skip-existing and --verbose make the tool adaptable to different workflows.
Practical Design Choices: For example, in single-directory mode, the script selects the file with the shortest name as the original, ensuring clean and logical results.

Reflecting

Reflecting on this experience, it’s clear that LLMs like ChatGPT are redefining productivity.

Empowering Coders and Non-Coders Alike ChatGPT doesn’t just write code—it teaches. For non-coders, it demystifies programming. For seasoned developers, it accelerates workflow and sparks new ideas.
Making the Impossible, Possible Tasks I once considered “too complex” to script suddenly became doable. With ChatGPT’s guidance, I tackled nuanced logic, performance tuning, and error handling in record time.
Turning Good Engineers Into Great Ones LLMs are like an extension of your brain. They handle repetitive tasks, suggest improvements, and help you focus on the creative aspects of problem-solving.

As I watched this project come together, I couldn’t help but feel a deep sense of gratitude—not just for solving my duplicate photo problem, but for living in an era where tools like ChatGPT exist. From now on, removing duplicate photo is just a piece of cake.

Ready to Declutter Your Files

The script is open-source and ready to use. Head over to my GitHub to get started: get_rid_of_dup.py. Here’s a quick summary of what it can do:

Search for duplicates:

python get_rid_of_dup.py search --base-dir ./photos ./comparison --max-width 100

Generate and save checksums:

python get_rid_of_dup.py checksum --base-dir ./photos ./backup

Delete duplicates safely:

python get_rid_of_dup.py delete --base-dir ./photos ./backup

Conclusion

This Thanksgiving, I walked away with more than just turkey leftovers. I gained a clean photo library with my remove duplicate photos script, a newfound appreciation for automation, and a deeper respect for what AI can achieve.

If you’re dealing with file clutter—or any repetitive task—let ChatGPT and Python be your allies. Trust me, they’ll turn a daunting chore into a satisfying win.

And who knows? Your next big idea might just be an LLM-powered breakthrough waiting to happen.

If you've missed the link of my github repositoy, here you go: https://github.com/geekcoding101/get_rid_of_dup

Why is the Transformer Model Called an "AI Revolution"?

GeekCoding101 — Tue, 03 Dec 2024 00:00:00 GMT

:::info

Hello, and welcome to the first edition of Daily AI Insight, a series dedicated to unraveling the fascinating world of artificial intelligence, one bite-sized topic at a time.

Daily AI Insight aims to bridge that gap. Every post will break down a key concept, a research trend, or a real-world application of AI into digestible, easy-to-understand insights. Whether you're an AI enthusiast, a professional looking to integrate AI into your work, or just curious about what all the fuss is about, this series is for you.

:::

1. What is the Transformer?

The Transformer is a deep learning architecture introduced by Google Research in 2017 through the seminal paper Attention is All You Need. Originally designed to tackle challenges in natural language processing (NLP), it has since transformed into the foundation for state-of-the-art AI models in multiple domains, such as computer vision, speech processing, and multimodal learning.

Traditional NLP models like RNNs (Recurrent Neural Networks) and LSTMs (Long Short-Term Memory networks) had two significant shortcomings:

Sequential Processing: These models processed text one token at a time, slowing down computations and making it hard to parallelize.
Difficulty Capturing Long-Range Dependencies: For long sentences or documents, these models often lost crucial contextual information from earlier parts of the input.

The Transformer introduced a novel Self-Attention Mechanism, enabling it to process entire input sequences simultaneously and focus dynamically on the most relevant parts of the sequence. Think of it like giving the model a panoramic lens, allowing it to view the entire context at once, rather than just focusing on one word at a time.

2. Why is the Transformer Important?

The Transformer brought a paradigm shift to AI, fundamentally altering how models process, understand, and generate information. Here's why it’s considered revolutionary:

(1) Parallel Processing

Unlike RNNs that process data step by step, Transformers can analyze all parts of the input sequence simultaneously. This parallelism significantly speeds up training and inference, making it feasible to train models on massive datasets.

(2) Better Understanding of Context

The Self-Attention Mechanism enables the Transformer to capture relationships between all tokens in a sequence. For example, in the sentence: “Although it was raining, she decided to go for a run,” the word "although" and "decided" are closely related, even though they’re separated by other words. Transformers excel at identifying and using such relationships.

(3) Scalability

The modular architecture of the Transformer makes it easy to scale. This is why the largest AI models today, like OpenAI's GPT series, Google’s BERT, and other LLMs (Large Language Models), all stem from this architecture.

3. How Does the Transformer Work?

The Transformer is built around two core components: Encoders and Decoders. Here’s how they function:

(1) The Encoder

The encoder processes the input sequence, such as a sentence, and transforms it into a series of rich, context-aware vector representations. For instance, in a translation task, the encoder might analyze the sentence “I love programming” and create numerical embeddings for each word, capturing their meaning and relationships.

(2) The Decoder

The decoder takes the encoder’s output and generates the target sequence. For translation, it might turn the encoded representations into a sentence like “J’aime programmer” in French.

Self-Attention Mechanism in Action

The heart of the Transformer lies in self-attention, which allows the model to compute the importance of each word relative to every other word in a sequence. For instance, when processing “I love programming,” the word “love” has strong ties to “programming,” which the attention mechanism identifies and weighs heavily during computations.

4. Key Applications and Models

The success of the Transformer architecture has led to the development of many groundbreaking AI models across different domains:

(1) Natural Language Processing

BERT (Bidirectional Encoder Representations from Transformers): A Google model designed for understanding the meaning of text in context, widely used for search engines, question answering, and sentiment analysis.
GPT Series (Generative Pre-trained Transformers): OpenAI’s series of models, from GPT-2 to GPT-4, excel in text generation, from creative writing to code completion.

(2) Computer Vision

Vision Transformer (ViT): Adapts the Transformer architecture for image recognition tasks, segmenting an image into patches and applying self-attention to understand relationships between different parts of the image.

(3) Multimodal AI

Models like CLIP and DALL-E use Transformers to handle text and image inputs, enabling AI to generate art from text descriptions or describe images in natural language.

5. Advantages of the Transformer

Efficiency: Parallel processing dramatically reduces training time.
Versatility: Adaptable to various tasks beyond NLP, such as computer vision and multimodal applications.
Scalability: Easy to scale up for training large models on massive datasets.

6. Challenges and Limitations

Despite its advantages, the Transformer is not without drawbacks:

High Computational Costs: Training Transformers, especially large-scale ones like GPT-4, requires enormous computational resources and specialized hardware like GPUs or TPUs.
Data-Hungry: Transformers need vast amounts of labeled data for training, making them inaccessible for smaller organizations or domains with limited data availability.
Lack of Interpretability: While the self-attention mechanism provides flexibility, the inner workings of Transformers remain a “black box,” posing challenges for applications like healthcare and legal systems where decisions need to be transparent.

7. Transformative Impact

The Transformer has reshaped AI research and applications, enabling breakthroughs in natural language understanding, image recognition, and generative AI. It’s the foundation for innovations like ChatGPT, automated translation, and content creation tools.

8. One-Line Summary

The Transformer revolutionized AI with its self-attention mechanism and scalability, making it the cornerstone of modern AI and driving advancements across multiple domains.

7 Key Insights on the Self-Attention Mechanism in AI Magic

GeekCoding101 — Wed, 04 Dec 2024 00:00:00 GMT

[ez-toc]

"Self Attention", a pivotal advancement in deep learning, is at the core of the Transformer architecture, revolutionizing how models process and understand sequences. Unlike traditional Attention, which focuses on mapping relationships between separate input and output sequences, Self-Attention enables each element within a sequence to interact dynamically with every other element. This mechanism allows AI models to capture long-range dependencies more effectively than previous architectures like RNNs and LSTMs. By computing relevance scores between words in a sentence, Self-Attention ensures that key relationships—such as pronoun references or contextual meanings—are accurately identified, leading to more sophisticated language understanding and generation.

1. The Origin of the Attention Mechanism

The Attention Mechanism is one of the most transformative innovations in deep learning. First introduced in the 2014 paper Neural Machine Translation by Jointly Learning to Align and Translate, it was designed to address a critical challenge: how can a model effectively focus on the most relevant parts of input data, especially in tasks involving long sequences?

Simply put, the Attention Mechanism allows models to “prioritize,” much like humans skip unimportant details when reading and focus on the key elements. This breakthrough marks a shift in AI from rote memorization to dynamic understanding.

2. The Core Idea Behind the Attention Mechanism

The Attention Mechanism’s main idea is simple yet powerful: it enables the model to assign different levels of importance to different parts of the input data. Each part of the sequence is assigned a weight, with higher weights indicating greater relevance to the task at hand.

For example, when translating the sentence “I love cats,” the model needs to recognize that the relationship between "love" and "cats" is more critical than that between "I" and "cats." The Attention Mechanism dynamically computes these relationships and helps the model focus accordingly.

How It Works (Simplified)

Here’s how the Attention Mechanism operates in three key steps:

Relevance Scoring Each element of the input sequence is compared against the rest to compute a “relevance score.”
Weight Normalization These scores are converted into probabilities using a Softmax function, ensuring all weights sum to 1.
Weighted Summation The weights are then used to compute a new “context vector” that emphasizes the most relevant parts of the input.

3. The Magic of Self-Attention

Self-Attention, a variant of the Attention Mechanism, lies at the heart of the Transformer architecture. Unlike traditional Attention that focuses on external target sequences (e.g., translating between languages), Self-Attention allows every element in a sequence to interact with every other element within the same sequence. This enhances the model's ability to understand global relationships.

Example: Sentence Understanding With Self Attention

Consider the sentence: “I bought a book yesterday. It is fascinating.”

A model with Self-Attention can identify that the word "it" refers to "book," not "yesterday" or "I."
It does this by calculating the relevance between "it" and all other words, assigning the highest weight to "book."

This ability to dynamically analyze relationships is a significant improvement over traditional RNNs and LSTMs, which struggle with such long-range dependencies.

4. Real-World Applications of the Attention Mechanism

The Attention Mechanism has proven invaluable across a wide range of AI tasks. Here are some of its most impactful applications:

(1) Machine Translation

In neural machine translation, the Attention Mechanism dynamically focuses on relevant parts of the source sentence, allowing for more accurate and fluent translations.

(2) Large Language Models

Transformer Architecture: Attention is the backbone of Transformer models, powering both the encoder and decoder components.
GPT and BERT: These models leverage multi-layer Self-Attention to significantly enhance natural language understanding and generation.

(3) Computer Vision

In computer vision, Attention is utilized in Vision Transformers (ViT). These models divide an image into patches and use Self-Attention to identify relationships between different parts of the image, achieving performance that often surpasses traditional convolutional neural networks (CNNs).

(4) Multimodal Models

Multimodal models like CLIP and DALL-E use Attention to process both text and image inputs simultaneously, enabling tasks such as generating artwork from text descriptions or captioning images.

5. Why Is the Attention Mechanism So Powerful?

The Attention Mechanism is often called “AI magic” because of its remarkable advantages:

Global Understanding: By analyzing relationships across the entire sequence, models can comprehend complex contexts.
Handling Long Sequences: Traditional models like RNNs struggle with long-distance dependencies, but Attention mechanisms treat all input elements equally, regardless of their position.
Broad Applicability: From text to images to multimodal tasks, the Attention Mechanism is versatile and widely adopted.

6. Challenges and Limitations

While the Attention Mechanism is transformative, it isn’t without its drawbacks:

Computational Cost: Calculating relationships between all elements in a sequence requires significant computation, particularly for long sequences.
Scalability: The quadratic complexity (O(n²)) of Self-Attention poses challenges for tasks involving very large inputs, though ongoing research is addressing this issue.

7. The Impact of Attention: From Focus to Revolution

The Attention Mechanism represents a paradigm shift in AI, enabling models to focus dynamically on the most relevant information. By solving key challenges in sequence modeling and understanding, it has paved the way for groundbreaking architectures like the Transformer and applications across diverse domains.

8. One-Line Summary

Thanks for being with me on the journey of "Self Attention Mechanism"!

The Attention Mechanism empowers AI with the ability to “prioritize” making it an indispensable tool for understanding, generating, and analyzing complex data.

You're welcome to access my other AI Insights blog posts at here.

What Are Parameters? Why Are “Bigger” Models Often “Smarter”?

GeekCoding101 — Thu, 05 Dec 2024 00:00:00 GMT

1. What Are Parameters?

In deep learning, parameters are the trainable components of a model, such as weights and biases, which determine how the model responds to input data. These parameters adjust during training to minimize errors and optimize the model's performance. Parameter count refers to the total number of such weights and biases in a model.

Think of parameters as the “brain capacity” of an AI model. The more parameters it has, the more information it can store and process.

For example:

A simple linear regression model might only have a few parameters, such as weights ( ww

w) and a bias ( bb

b).
GPT-3, a massive language model, boasts 175 billion parameters, requiring immense computational resources and data to train.

2. The Relationship Between Parameter Count and Model Performance

In deep learning, there is often a positive correlation between a model's parameter count and its performance. This phenomenon is summarized by Scaling Laws, which show that as parameters, data, and computational resources increase, so does the model's ability to perform complex tasks.

Why Are Bigger Models Often Smarter?

Higher Expressive Power Larger models can capture more complex patterns and features in data. For instance, they not only grasp basic grammar but also understand deep semantic and contextual nuances.
Stronger Generalization With sufficient training data, larger models generalize better to unseen scenarios, such as answering novel questions or reasoning about unfamiliar topics.
Versatility Bigger models can handle multiple tasks with minimal or no additional training. For example, OpenAI's GPT models excel in creative writing, code generation, translation, and logical reasoning.

However, bigger isn’t always better. If the parameter count exceeds the amount of data or the complexity of the task, the model may become overly complex and prone to overfitting.

3. The Practical Significance of Parameter Count

Language Models at Scale

Here’s a comparison of parameter counts for well-known models:

GPT-2: 1.5 billion parameters
GPT-3: 175 billion parameters
GPT-4: 1.7 trillion parameters.

As parameter counts grow, these models have demonstrated remarkable improvements in:

Text fluency: Generating coherent and contextually appropriate responses.
Reasoning: Solving logical puzzles or providing detailed explanations.
Creativity: Writing essays, poetry, and even code snippets.

In Computer Vision

Parameter count is equally significant in image recognition. For instance:

ResNet: Early versions had a few million parameters.
Vision Transformers (ViT): These modern architectures often have hundreds of millions of parameters, enabling them to outperform traditional convolutional networks on complex tasks.

4. Are Bigger Models Always Better?

Advantages of Large Models

Capture Complex Patterns: They can model intricate relationships in data that smaller models might miss.
Task Versatility: One large model can handle diverse tasks without needing significant fine-tuning.
Breakthroughs in Performance: Larger models often lead to state-of-the-art results across many benchmarks.

Drawbacks of Large Models

High Computational Cost: Bigger models require immense resources for both training and inference. For example, training GPT-3 reportedly cost millions of dollars in compute time.
Energy Consumption: Training large models has a significant environmental impact, as it demands enormous amounts of energy.
Efficiency Issues: For certain tasks, smaller, task-specific models may achieve similar results with far fewer resources.

As a result, choosing the right model size involves balancing performance gains against computational efficiency.

5. Trends in Parameter Optimization: Big Models vs. Small Models

Despite the success of large models, recent trends highlight the growing importance of efficient AI:

Parameter Compression: Techniques like knowledge distillation and model pruning extract the most valuable knowledge from large models and condense it into smaller, faster models.
Efficient Inference: Lightweight models, such as DistilBERT, are designed for mobile devices and embedded systems, making AI more accessible and sustainable.
Task-Specific Optimization: Instead of using a massive model for every problem, fine-tuning smaller models for specific tasks often yields better cost-effectiveness.

The likely future of AI involves large-scale pretraining paired with smaller, fine-tuned deployments, combining the strengths of both approaches.

6. One-Line Summary

Parameter count represents the "brain capacity" of an AI model. While larger models often excel at complex tasks, balancing size and efficiency is key to sustainable AI development.

Your Thoughts

Do you think the race for larger models is sustainable, or should the focus shift toward efficiency and accessibility? Share your perspective in the comments below!

What Is Prompt Engineering and How to "Train" AI with a Single Sentence?

GeekCoding101 — Fri, 06 Dec 2024 00:00:00 GMT

1. What is Prompt Engineering?

Prompt Engineering is a core technique in the field of generative AI. Simply put, it involves crafting effective input prompts to guide AI in producing the desired results.

Generative AI models (like GPT-3 and GPT-4) are essentially predictive tools that generate outputs based on input prompts. The goal of Prompt Engineering is to optimize these inputs to ensure that the AI performs tasks according to user expectations.

Here’s an example:

Input: “Explain quantum mechanics in one sentence.”
Output: “Quantum mechanics is a branch of physics that studies the behavior of microscopic particles.”

The quality of the prompt directly impacts AI performance. A clear and targeted prompt can significantly improve the results generated by the model.

2. Why is Prompt Engineering important?

The effectiveness of generative AI depends heavily on how users present their questions or tasks. The importance of Prompt Engineering can be seen in the following aspects:

(1) Improving output quality

A well-designed prompt reduces the risk of the AI generating incorrect or irrelevant responses. For example:

Ineffective Prompt: “Write an article about climate change.”
Optimized Prompt: “Write a brief 200-word report on the impact of climate change on the Arctic ecosystem.”

(2) Saving time and cost

A clear prompt minimizes trial and error, improving efficiency, especially in scenarios requiring large-scale outputs (e.g., generating code or marketing content).

(3) Expanding AI’s use cases

With clever prompt design, users can leverage AI for diverse and complex tasks, from answering questions to crafting poetry, generating code, or even performing data analysis.

3. Core techniques in Prompt Engineering

Designing an effective prompt involves several principles and strategies:

(1) Define clear goals

Prompts should directly target the task at hand. For example:

Vague Prompt: “Talk about animals.”
Clear Prompt: “Describe the behavior of lions in three sentences, including one interesting fact.”

(2) Provide context

Context helps AI better understand the task. For example:

Isolated Prompt: “Generate a paragraph about carbon dioxide.”
Contextualized Prompt: “Carbon dioxide is a major greenhouse gas contributing to global warming. Based on this, generate a 200-word article.”

(3) Control output style

By including descriptive language, users can adjust the AI's tone or style. For example:

General Prompt: “Write a paragraph about cats.”
Styled Prompt: “Write a humorous paragraph about why cats are smarter than dogs.”

(4) Iterative refinement

Prompts can be iteratively improved. Start with an initial output, then refine the prompt to address any shortcomings.

4. Limitations of Prompt Engineering

While Prompt Engineering is highly useful, it has its limitations:

Experience required: Designing effective prompts often requires users to understand how AI operates.
Model understanding constraints: Even with well-crafted prompts, the AI may still produce errors or misunderstand the task.
Dependence on model versions: Responses to prompts can vary significantly between models (e.g., GPT-3 vs. GPT-4).

5. In one sentence

Prompt Engineering is a critical skill in generative AI, allowing users to efficiently and accurately accomplish tasks by optimizing input prompts – truly the “art of communication” with AI.

Now, as promised, here are some highly recommended books on Prompt Engineering. They're packed with practical insights to take your skills to the next level:

Title	Author	Published	Summary
The Art of Prompt Engineering with ChatGPT	Nathan Hunter	2024	A hands-on guide exploring how to use ChatGPT effectively through prompt engineering, with practical techniques to master this art and science.
Prompt Engineering: Unlocking Generative AI	Navveen Balani	2024	Focuses on ethical and creative applications of prompt engineering, perfect for those looking to integrate this skill into AI development.
Prompt Engineering for Generative AI	James Phoenix, Mike Taylor	2024	Offers strategies and tips for designing reliable AI prompts, aimed at developers and engineers optimizing inputs for generative AI models.
Demystifying Prompt Engineering	Harish Bhat	2024	Simplifies the complexities of prompt engineering, with step-by-step guides for beginners and AI enthusiasts to create effective prompts.
Unlocking the Secrets of Prompt Engineering	Gilbert Mizrahi	2024	Delves into the art of prompt engineering with practical techniques, helping readers quickly advance from novice to expert in AI-driven language tasks.

Another good resource is Prompt Engineering Guide at https://www.promptingguide.ai/!

It introduced a lot of techniques:

Technique	Description	Reference
Zero-Shot Prompting	Instructing the model to perform a task without providing examples, relying on its pre-existing knowledge.	Zero-Shot Prompting
Few-Shot Prompting	Supplying a few examples within the prompt to guide the model's behavior and improve performance on specific tasks.	Few-Shot Prompting
Chain-of-Thought Prompting	Encouraging the model to articulate a step-by-step reasoning process, aiding in complex problem-solving.	Chain-of-Thought Prompting
Self-Consistency	Generating multiple reasoning paths and selecting the most consistent answer to enhance accuracy in complex reasoning tasks.	Self-Consistency
Generated Knowledge Prompting	Prompting the model to produce relevant facts before addressing the main task, leveraging its internal knowledge base.	Generated Knowledge Prompting
Prompt Chaining	Breaking down complex tasks into a series of simpler prompts, allowing the model to tackle each step sequentially.	Prompt Chaining
Tree of Thoughts (ToT)	Extending chain-of-thought by exploring multiple reasoning paths in a tree structure to improve problem-solving.	Tree of Thoughts
Retrieval-Augmented Generation (RAG)	Combining external knowledge retrieval with generation to provide up-to-date and accurate information.	Retrieval-Augmented Generation
Automatic Prompt Engineer	Utilizing models to automatically generate and optimize prompts, reducing manual effort.	Automatic Prompt Engineer
Active-Prompt	Engaging the model in an interactive manner to iteratively refine prompts and improve responses.	Active-Prompt
Directional Stimulus Prompting	Guiding the model's output by providing specific cues or directions within the prompt.	Directional Stimulus Prompting
Program-Aided Language Models (PAL)	Integrating programming logic with language models to handle tasks requiring precise computations.	Program-Aided Language Models
ReAct	Combining reasoning and acting by prompting the model to perform actions based on its reasoning process.	ReAct
Reflexion	Encouraging the model to reflect on its responses and iteratively improve them.	Reflexion
Multimodal Chain-of-Thought (CoT)	Applying chain-of-thought prompting across multiple modalities, such as text and images.	Multimodal CoT
Graph Prompting	Utilizing graph structures within prompts to represent relationships and enhance understanding.	Graph Prompting

Bonus: Is prompt engineering unnecessary with powerful AI models?

Even with advanced large language models (LLMs), Prompt Engineering remains crucial.

The quality of prompt design directly impacts the model's performance on specific tasks. Well-crafted prompts significantly improve output accuracy and relevance. Prompt Engineering can also help:

Guide complex reasoning: It enables the model to perform intricate tasks or solve layered problems.
Reduce hallucinations: Proper prompts minimize the chances of the model generating false or irrelevant information.
Improve domain-specific adaptability: Tailored prompts ensure better performance in specialized fields.

For further insights, check out the paper “Unleashing the Potential of Prompt Engineering in Large Language Models: A Comprehensive Review” on arXiv.

Alright, that’s all for today! If you enjoyed this or found it helpful, don’t forget to follow me. Let’s keep growing and learning together!

Goodnight! Dream big, folks…

Parameters vs. Inference Speed: Why Is Your Phone’s AI Model ‘Slimmer’ Than GPT-4?

GeekCoding101 — Sat, 07 Dec 2024 00:00:00 GMT

1. What Are Parameters?

This was covered in a previous issue: What Are Parameters? Why Are “Bigger” Models Often “Smarter”?

2. The Relationship Between Parameter Count and Inference Speed

As the number of parameters in a model increases, it requires more computational resources to perform inference (i.e., generate results). This directly impacts inference speed. However, the relationship between parameters and speed is not a straightforward inverse correlation.

Several factors influence inference speed:

(1) Computational Load (FLOPs)

The number of floating-point operations (FLOPs) required by a model directly impacts inference time. However, FLOPs are not the sole determinant since different types of operations may execute with varying efficiency on hardware.

(2) Memory Access Cost

During inference, the model frequently accesses memory. The volume of memory access (or memory bandwidth requirements) can affect speed. For instance, both the computational load and memory access demands of deep learning models significantly impact deployment and inference performance.

(3) Model Architecture

The design of the model, including its parallelism and branching structure, influences efficiency. For example, branched architectures may introduce synchronization overhead, causing some compute units to idle and slowing inference.

(4) Hardware Architecture

Different hardware setups handle models differently. A device’s computational power, memory bandwidth, and overall architecture all affect inference speed. Efficient neural network designs must balance computational load and memory demands for optimal performance across various hardware environments.

Thus, while parameter count is one factor affecting inference time, it’s not a simple inverse relationship. Optimizing inference speed requires consideration of computational load, memory access patterns, model architecture, and hardware capabilities.

3. Why Are AI Models on Phones ‘Slimmer’ Than GPT-4?

AI models running on phones are heavily compressed and optimized to operate within the resource constraints of mobile devices. Common optimization techniques include:

(1) Model Quantization

Quantization reduces the precision of model parameters from high precision (e.g., 32-bit floating-point) to lower precision (e.g., 8-bit integers), thereby reducing memory usage and computational requirements. For example:

A non-quantized model might require 100GB of memory.
A quantized version could reduce this to 10GB or less.

(2) Knowledge Distillation

In knowledge distillation, a "large model" teaches a "small model." The smaller model retains reasonable performance by learning from the large model’s outputs, despite having significantly fewer parameters.

(3) Model Pruning

Pruning removes redundant parameters in a model. For instance, neurons with minimal contribution to the output can be “pruned” to reduce the model size without significant performance loss.

(4) Optimized Inference Frameworks

Frameworks like TensorFlow Lite and ONNX are specifically designed for mobile and edge devices, offering performance optimizations to enhance inference efficiency.

4. Real-Life Examples: GPT-4 vs. Mobile AI

GPT-4

GPT-4 is a massive-scale model designed for cloud-based deployment. It relies on powerful GPU clusters and achieves exceptional performance on complex language tasks. However, this comes with high computational and infrastructure costs.

Mobile AI

Take, for instance, the quantized version of LLaMA 2, which has been optimized to run locally on high-end smartphones. While it doesn’t match the raw capabilities of cloud-based large models, it is efficient enough to handle common tasks effectively on-device.

5. Balancing Parameter Count and Inference Speed

The relationship between parameter count and inference speed exemplifies a trade-off:

Large models deliver superior performance but are slower and more expensive to run.
Smaller models are faster and more resource-efficient but lack the capabilities of larger counterparts.

This trade-off depends on the application context:

Cloud Services: Prioritize performance by using large-scale models.
Mobile Devices: Focus on speed and energy efficiency with lightweight models.
Edge Computing: Strive for a balance between performance and efficiency.

6. One-Line Summary

The parameter count of a model defines its potential capabilities, while inference speed is constrained by computational resources and optimization techniques. Mobile AI models achieve “small but mighty” performance through compression and optimization, but the raw power of GPT-4 and similar models still relies on cloud infrastructure.

Discovering the Joy of Tokens: AI’s Language Magic Unveiled

GeekCoding101 — Sun, 08 Dec 2024 00:00:00 GMT

Today’s topic might seem a bit technical, but don’t worry—we’re keeping it down-to-earth.

Let’s uncover the secrets of tokens, the building blocks of AI’s understanding of language.

If you’ve ever used ChatGPT or similar AI tools, you might have noticed something: when you ask a long question, it takes a bit longer to answer. But short questions? Boom, instant response. That’s all thanks to tokens.

1. What Are Tokens?

A token is the smallest unit of language that AI models “understand.” It could be a sentence, a word, a single character, or even part of a word. In short, AI doesn’t understand human language—but it understands tokens.

Take this sentence as an example:

“AI is incredibly smart.”

Depending on the tokenization method, this could be broken down into:

Word-level tokens: ["AI", "is", "incredibly", "smart"]
Character-level tokens: ["A", "I", " ", "i", "s", " ", "i", "n", "c", "r", "e", "d", "i", "b", "l", "y", " ", "s", "m", "a", "r", "t"]
Subword-level tokens (the most common method): ["AI", "is", "incred", "ibly", "smart"]

In a nutshell, AI breaks down sentences into manageable pieces to understand our language. Without tokens, AI is like a brain without neurons—completely clueless.

2. Why Are Tokens So Important?

AI models aren’t magical—they rely on a logic of “predicting the next step.” Here’s the simplified workflow: you feed in a token, and the model starts “guessing” what’s next. It’s like texting a friend, saying “I’m feeling,” and your friend immediately replies, “tired.” Is it empathy? Nope—it’s just a logical guess based on past interactions.

Why Does AI Need Tokens?

Language is complex, and tokens help AI translate it into something math can handle. For example:

Input: “AI is amazing!”
Tokenized version (just an illustrative example): [1234, 5678, 91011]
Prediction: Based on [1234, 5678], the model predicts the next token will be 91011.

3. How Does AI Tokenize? It’s Not Just Random Chopping

Tokenization isn’t just smashing sentences with a metaphorical hammer. There’s a method to the madness, and it’s pretty sophisticated:

(1) Word-based Tokenization

The simplest method: split the text by spaces. For example:

Input: “AI is awesome.”
Tokens: ["AI", "is", "awesome"]
Pros: Fast and straightforward.
Cons: Fails with punctuation ("awesome!") or morphologically complex languages like German.

(2) Subword-based Tokenization (Most Common Approach)

This is the go-to method for modern models like GPT or BERT. For example:

Input: “awesome.”
Tokens: ["awe", "some"] Why? It’s great for rare or unknown words. Even if the model hasn’t seen “awesomesauce,” it can still guess its meaning by breaking it into familiar parts like “awe” and “some.”

(3) Character-based Tokenization

Every single character is treated as a token:

Input: “GPT”
Tokens: ["G", "P", "T"]
Pros: Works for unknown words or typos.
Cons: Increases the number of tokens drastically, making it computationally expensive.

(4) Byte Pair Encoding (BPE)

Despite the fancy name, it’s just a frequency-based approach. The most common character pairs are merged into tokens. For example, the word “the” might appear so frequently that it gets treated as a single token.

In short: AI tokenization isn’t random; it’s a carefully designed process balancing precision and efficiency.

4. The Real Impact of Tokens on AI

Tokens aren’t just technical jargon—they directly affect how well an AI model performs. Here’s how:

(1) Context Range

A model’s token limit determines how much “context” it can remember in one go.

GPT-3 can handle 4096 tokens.
GPT-4 extends this to 32,000 tokens. What does this mean? With GPT-4, you can feed it a lengthy legal contract, and it can still keep the entire thing in memory while generating output. GPT-3? It’ll probably cut you off halfway, saying, “I forgot what you said earlier.”

(2) Generation Quality

Tokenization affects how smoothly AI generates text. For instance, subword tokenization helps AI recognize that “amazingly” and “amazing” are related, improving its ability to generate coherent content. A less sophisticated tokenizer might not make the connection.

(3) Computational Cost

Each token adds to the computational workload. This is why AI slows down with longer inputs—more tokens mean more processing, leading to what I like to call “computational fatigue.”

5. The Limitations of Tokenization

While tokenization is essential, it’s not without its quirks:

Semantic Splitting: Breaking “unbelievable” into ["un", "believ", "able"] might make sense mathematically but could dilute the semantic meaning.
Language Diversity: Tokenization rules vary widely across languages. What works for English may fail spectacularly for Chinese or Arabic.
Resource Consumption: Tokenizing long texts adds overhead, slowing down inference times and increasing computational demand.

6. One-Line Summary

Tokens are the building blocks of AI’s language understanding, and tokenization is the bridge that translates human language into math. Without tokens, AI is just a heap of clueless parameters.

More information: The Building Blocks of LLMs: Vectors, Tokens and Embeddings

Final Thoughts

AI may seem like “magic” but it’s really all about the details. Next time you’re using ChatGPT, try guessing: how many tokens did my question use? Did it exceed the context window? These “hidden mechanics” play a big role in determining how accurate and useful the AI’s response will be.

Alright, that’s it for today’s AI dissection! Follow me for more bite-sized insights, and let’s keep uncovering the nuts and bolts of AI together! See you tomorrow.

Ps. feel free to check out my other posts about Daily AI Insights.

Fine-Tuning Models: Unlocking the Extraordinary Potential of AI

GeekCoding101 — Mon, 09 Dec 2024 00:00:00 GMT

1. What Is Fine-Tuning?

Fine-tuning is a key process in AI training, where a pre-trained model is further trained on specific data to specialize in a particular task or domain.

Think of it this way: It is like giving a generalist expert additional training to become a specialist. For example:

Pre-trained model: Knows general knowledge (like basic reading comprehension or common language patterns).
Fine-tuned model: Gains expertise in a specific field, such as medical diagnostics, legal analysis, or poetry writing.

2. Why Is Fine-Tuning Necessary?

Pre-trained models like GPT-4 and BERT are powerful, but they’re built for general-purpose use. Fine-tuning tailors these models for specialized applications. Here’s why it’s important:

(1) Adapting to Specific Scenarios

General-purpose models are like encyclopedias—broad but not deep. Fine-tuning narrows their focus to master specific contexts:

Medical AI: Understands specialized terms like "coronary artery disease."
Legal AI: Deciphers complex legal jargon and formats.

(2) Saving Computational Resources

Training a model from scratch requires enormous resources. Fine-tuning leverages existing pre-trained knowledge, making the process faster and more cost-effective.

(3) Improving Performance

By focusing on domain-specific data, fine-tuned models outperform general models in specialized tasks. They can understand unique patterns and nuances within the target domain.

3. How Does It Work?

It typically involves the following steps:

(1) Selecting a Pre-trained Model

Choose a pre-trained model, such as GPT, BERT, or similar. These models have already been trained on massive datasets and understand the general structure of language.

(2) Preparing a Specialized Dataset

Gather a high-quality dataset relevant to your specific task. For example:

For legal document generation: A dataset of contracts and case law.
For medical diagnosis: A dataset of clinical notes and research papers.

(3) Training the Model

Train the pre-trained model on your domain-specific dataset, fine-tuning its parameters to optimize performance for your task. This process usually requires only a few training epochs.

(4) Validation and Adjustment

Test the fine-tuned model on unseen data to evaluate its performance. If necessary, refine the dataset or training process to achieve better results.

4. Real-Life Applications

Fine-tuning has revolutionized numerous fields. Here are some examples:

(1) Medicine

Goal: Develop AI models capable of interpreting medical images or summarizing clinical reports.
Dataset: Medical records, radiology images, and research articles.
Outcome: A model that understands medical terminology and improves diagnostic accuracy.

(2) Legal Industry

Goal: Automate the generation of legal documents or analyze case law.
Dataset: Legal texts, contracts, and court rulings.
Outcome: An AI that produces professional, compliant legal outputs.

(3) Financial Markets

Goal: Enable AI to analyze financial reports or make investment recommendations.
Dataset: Historical stock data and financial statements.
Outcome: A system that provides insights tailored to financial decision-making.

5. Challenges

While fine-tuning is a powerful technique, it’s not without limitations:

(1) Overfitting

If the dataset is too small or overly specific, the model may overfit, memorizing data patterns instead of generalizing knowledge.

(2) Cost Dependencies

Fine-tuning is more efficient than training from scratch but still requires computational resources and time—especially for large models.

(3) Data Bias

If the fine-tuning dataset contains biases, the model can inherit or amplify those biases.

6. One-Line Summary

Fine-tuning customizes pre-trained AI models for specific tasks, making them specialists in chosen domains, provided high-quality data and robust training are applied.

You can find some more details in the great paper "The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs".

Final Thoughts

Want an AI that writes professional contracts, generates medical reports, or offers personalized insights? Fine-tuning is how you teach a generalist AI to become an expert. But remember, it’s only as good as the data and training it receives.

That’s it for today’s AI deep dive! Follow for more, and let’s keep exploring the endless possibilities of AI together. See you tomorrow!

Ps. feel free to check out my other AI Daily Insights posts at here.

What Is an Embedding? The Bridge From Text to the World of Numbers

GeekCoding101 — Mon, 09 Dec 2024 00:00:00 GMT

1. What Is an Embedding?

An embedding is the “translator” that converts language into numbers, enabling AI models to understand and process human language. AI doesn’t comprehend words, sentences, or syntax—it only works with numbers. Embeddings assign a unique numerical representation (a vector) to words, phrases, or sentences.

Think of an embedding as a language map: each word is a point on the map, and its position reflects its relationship with other words. For example:

“cat” and “dog” might be close together on the map, while “cat” and “car” are far apart.

2. Why Do We Need Embeddings?

Human language is rich and abstract, but AI models need to translate it into something mathematical to work with. Embeddings solve several key challenges:

(1) Vectorizing Language

Words are converted into vectors (lists of numbers). For example:

“cat” → [0.1, 0.3, 0.5]
“dog” → [0.1, 0.32, 0.51]

These vectors make it possible for models to perform mathematical operations like comparing, clustering, or predicting relationships.

(2) Capturing Semantic Relationships

The true power of embeddings lies in capturing semantic relationships between words. For example:

“king - man + woman ≈ queen” This demonstrates how embeddings encode complex relationships in a numerical format.

(3) Addressing Data Sparsity

Instead of assigning a unique index to every word (which can lead to sparse data), embeddings compress language into a limited number of dimensions (e.g., 100 or 300), making computations much more efficient.

3. How Are Embeddings Created?

Embeddings are generated through machine learning models trained on large datasets. Here are some popular methods:

(1) Word2Vec

One of the earliest and most successful embedding methods, Word2Vec is based on the idea that similar words appear in similar contexts. For example:

Sentences: “Cats love milk” and “Dogs love bones”
Word2Vec places “cats” and “dogs” close together because they share similar linguistic surroundings.

(2) GloVe

GloVe (Global Vectors for Word Representation) focuses on capturing statistical co-occurrence. For instance:

Words like “apple” and “orange” often co-occur with “fruit,” and this relationship is encoded in their embeddings.

(3) Transformer Models (e.g., BERT, GPT)

Modern models dynamically create embeddings based on context. For example:

The word “bank” in “river bank” and “money bank” will have different embeddings, allowing the model to disambiguate meanings.

4. Applications of Embeddings

Embeddings are foundational to many AI applications, including:

(1) Search Engines

By converting queries and documents into embeddings, search engines calculate their similarity (e.g., using dot products) to deliver the most relevant results.

(2) Recommendation Systems

Platforms like YouTube and Netflix use embeddings to represent user preferences and content. For example:

Movies are embedded as vectors, and the system recommends content based on vector similarity.

(3) Generative AI

Embeddings enable models like ChatGPT or DALL-E to process and generate coherent text, images, and more.

5. How Dot Products Relate to Embeddings

Embeddings frequently involve dot product calculations, a crucial mathematical operation for comparing vectors. Here’s where dot products come into play:

(1) Similarity Measurement

In recommendation systems or search engines, the dot product measures the similarity between two vectors:

If the dot product is high, the items (e.g., a query and a document) are similar.

(2) Attention Mechanism

In Transformer models, dot products are used to compute attention scores, determining which parts of an input sequence are most relevant to the task.

6. Challenges of Embeddings

Despite their power, embeddings face some limitations:

Data Dependency: Embedding quality depends heavily on training data. Biased data can result in biased embeddings.
Dimensional Trade-Offs: High-dimensional embeddings are computationally expensive, while low-dimensional ones may lose critical information.
Semantic Ambiguity: Even advanced embeddings struggle with capturing nuanced or metaphorical meanings.

7. Visualization Resources

To better understand embeddings, here are some types of visualizations you can explore online:

Embedding Space

Search: embedding space visualization or word embedding map
These diagrams illustrate how words are distributed in a 2D or 3D space, showing semantic relationships.

Dot Product Similarity

Search: dot product similarity visualization or cosine similarity embedding
Demonstrates how embeddings are compared mathematically.

Attention Mechanisms

Search: transformer attention scores or attention mechanism visualization
Explains how embeddings and dot products work together in Transformers.

8. One-Line Summary

Embeddings are the bridge between human language and machine understanding, enabling AI models to map linguistic relationships into a mathematical space.

Final Thoughts

Actually, I feel like this time I didn’t delve deeply into embeddings. There’s just so much math involved, especially the dot product calculation of vectors. For those who want to learn more, I recommend checking out this article an-intuitive-101-guide-to-vector-embeddings.

Embeddings might seem like a dry technical concept, but they’re the unsung heroes behind AI’s ability to generate text, recommend content, and more. Next time you use ChatGPT, think about how every word you type has been transformed into a dense vector representation. Behind the magic is a lot of math!

Let’s keep breaking down AI one piece at a time—follow for more insights, and see you tomorrow!

:::info

Wow! Today marks the seventh issue of my "Daily AI Insight Series"—a full week of consistent daily posts! Through this journey, I've gained so much attention and grown a lot.

Thank you all for your encouragement!

Let’s keep it up! It’s Sunday today, and I originally thought about skipping it... but let’s push forward! Keep going, keep going!

:::

Pretraining vs. Fine-Tuning: What's the Difference?

GeekCoding101 — Wed, 11 Dec 2024 00:00:00 GMT

Let's deep dive into pretraining and fine-tuning today!

1. What Is Pretraining?

Pretraining is the first step in building AI models. Its goal is to equip the model with general language knowledge. Think of pretraining as “elementary school” for AI, where it learns how to read, understand, and process language using large-scale general datasets (like Wikipedia, books, and news articles). During this phase, the model learns sentence structure, grammar rules, common word relationships, and more.

For example, pretraining tasks might include:

Masked Language Modeling (MLM): Input: “John loves ___ and basketball.” The model predicts: “football.”
Causal Language Modeling (CLM): Input: “The weather is great, I want to go to” The model predicts: “the park.”

Through this process, the model develops a foundational understanding of language.

2. What Is Fine-Tuning?

Fine-tuning builds on top of a pretrained model by training it on task-specific data to specialize in a particular area. Think of it as “college” for AI—it narrows the focus and develops expertise in specific domains. It uses smaller, targeted datasets to optimize the model for specialized tasks (e.g., sentiment analysis, medical diagnosis, or legal document drafting).

For example:

To fine-tune a model for legal document generation, you would train it on a dataset of contracts and legal texts.
To fine-tune a model for customer service, you would use your company’s FAQ logs.

Fine-tuning enables AI to excel at specific tasks without needing to start from scratch.

3. Key Differences Between Pretraining and Fine-Tuning

While both processes aim to improve AI’s capabilities, they differ fundamentally in purpose and execution:

Aspect	Pretraining	Fine-Tuning
Purpose	To learn general language knowledge, including vocabulary, syntax, and semantic relationships.	To adapt the model for specific tasks or domains.
Data Source	Large-scale, general datasets (e.g., Wikipedia, books, news).	Domain-specific, smaller datasets (e.g., medical records, legal texts, customer FAQs).
Time and Cost	Time-consuming and computationally expensive, requiring extensive GPU/TPU resources.	Quicker and less resource-intensive.
Use Case	Provides foundational language capabilities for a wide range of applications.	Enables custom applications, like translation, sentiment analysis, or text summarization.
Parameter Update	Fully updates model parameters from scratch.	Makes targeted adjustments to the pretrained model’s parameters.

In short: Pretraining builds the “brain” of AI, while fine-tuning teaches it specific “skills.”

4. Why Separate Pretraining and Fine-Tuning?

(1) Efficiency

Pretraining requires vast amounts of data and computational power. For instance, GPT-3’s pretraining cost millions of dollars in GPU time. Fine-tuning, however, can achieve impressive results with a smaller dataset and less computational effort.

(2) General vs. Specific Knowledge

Pretrained models are designed for general-purpose tasks, while fine-tuning tailors these models for specific use cases, expanding their utility.

(3) Reusability

A single pretrained model can be fine-tuned for various domains, such as legal, medical, or educational applications. This modularity reduces redundancy and speeds up AI development.

5. Real-Life Applications

Case 1: Chatbots

Pretrained model: Understands general conversational language, like greetings or small talk.
Fine-tuned model: Learns how to answer domain-specific questions, such as those related to product returns.

Case 2: Legal Document Generation

Pretrained model: Recognizes general language patterns and logical structures.
Fine-tuned model: Can generate contracts and ensure legal compliance using domain-specific datasets.

Case 3: Medical Diagnosis

Pretrained model: Understands basic language and context relationships.
Fine-tuned model: Analyzes medical records and generates insights specific to healthcare.

6. The Future

As AI models grow in size and capability (e.g., GPT-4 and beyond), techniques like Few-Shot Learning and Zero-Shot Learning are reducing dependence on fine-tuning for some tasks. However, for highly specialized use cases, fine-tuning remains indispensable.

Trends to watch:

Stronger pretrained models: Increasingly capable of handling a broader range of tasks out of the box.
Simplified fine-tuning tools: Making it easier for businesses and individuals to customize AI models.

7. One-Line Summary

Pretraining is the “basic education” that equips AI with foundational knowledge, while fine-tuning is the “advanced training” that makes it an expert in specific domains. Together, they are the backbone of modern AI capabilities.

Final Thoughts

Next time you interact with AI, remember the two phases behind its smarts: Pretraining to make it a “generalist” and fine-tuning to transform it into a “specialist.” Stay tuned for more AI insights tomorrow—follow me to explore the magic of AI!

At the end, feel free to check out my other AI Insights blog posts at here.

Empower Your AI Journey: Foundation Models Explained

GeekCoding101 — Thu, 12 Dec 2024 00:00:00 GMT

Introduction: Why It Matters

In the rapidly evolving field of AI, the distinction between foundation models and task models is critical for understanding how modern AI systems work. Foundation models, like GPT-4 or BERT, provide the backbone of AI development, offering general-purpose capabilities. Task models, on the other hand, are fine-tuned or custom-built for specific applications. Understanding their differences helps businesses and developers leverage the right model for the right task, optimizing both performance and cost. Let’s dive into how these two types of models differ and why both are essential.

:::info

Today's topic is similar to Pretraining vs. Fine-Tuning. While "Foundation Models vs. Task Models" and "Pretraining vs. Fine-Tuning" are closely related, they’re not exactly the same. Foundation Models and Pretraining: Foundation models are products of pretraining. Task models are often derived from foundation models through fine-tuning.

I put them separetely because people sometimes confused and separate them we can have a clear focus.

:::

1. What Are Foundation Models?

Foundation models are general-purpose AI models trained on vast amounts of data to understand and generate language across a wide range of contexts. Their primary goal is to act as a universal knowledge base, capable of supporting a multitude of applications with minimal additional training.

Examples of foundation models include GPT-4, BERT, and PaLM. These models are not designed for any one task but are built to be flexible, with a deep understanding of grammar, structure, and semantics.

Key Features:

Massive Scale: Often involve billions or even trillions of parameters (What does parameters mean? You can refer to my previous blog What Are Parameters?).
Multi-Purpose: Can be adapted for numerous tasks through fine-tuning or prompt engineering (Please refer to my previous blog What Is Prompt Engineering and What Is Fine-Tuning).
Pretraining-Driven: Trained on vast datasets (e.g., Wikipedia, news, books) to understand general language structures (Please refer to ).

Think of a foundation model as a jack-of-all-trades—broadly knowledgeable but not specialized in any one field.

2. What Are Task Models?

Task models are specialized AI models designed or fine-tuned to excel at a specific task, such as sentiment analysis, machine translation, or medical diagnostics. Unlike foundation models, task models are focused and purpose-built to meet particular goals.

Key Features:

Task-Specific: Optimized for a narrow set of objectives.
Domain-Specific Data: Trained on datasets tailored to the task, such as legal contracts, medical records, or customer reviews.
Lightweight and Deployable: Typically smaller and easier to deploy in production settings.

For instance:

A sentiment analysis task model would determine whether a tweet is positive or negative.
A medical diagnosis task model would analyze patient data and suggest potential conditions.

Task models are like specialists in a particular domain—less versatile than foundation models but highly effective in their area of expertise.

3. Core Differences Between Foundation Models and Task Models

Aspect	Foundation Models	Task Models
Purpose	General-purpose, suitable for multiple applications.	Focused on a specific task or domain.
Data Source	Large-scale, general datasets (e.g., Wikipedia, news).	Domain-specific datasets (e.g., legal texts, reviews).
Training Process	Pretraining, requiring immense computational resources.	Fine-tuning or custom training, requiring less computation.
Scale	Billions or trillions of parameters.	Smaller, optimized for production environments.
Flexibility	Highly flexible, can adapt to various tasks.	Limited to specific tasks, but highly accurate.

In summary: Foundation models are the base layer of AI, while task models are tailored for specific applications.

4. Why Do We Need Both?

(1) Foundation Models: Broad Utility

Foundation models provide a starting point for diverse applications, saving time and resources. For example:

Use GPT-4 for general-purpose language understanding.
Use BERT for natural language processing tasks like question answering or summarization.

(2) Task Models: Precision and Efficiency

Task models optimize performance for specific objectives. They are essential when accuracy and domain knowledge are critical. For example:

Fine-tune a foundation model to generate legally compliant contracts.
Train a model specifically for medical imaging analysis.

By combining foundation models with task models, developers can achieve both adaptability and precision.

5. Real-Life Examples: Foundation Models + Task Models in Action

Example 1: Healthcare AI

Foundation Model: GPT-4 understands medical terminology.
Task Model: Fine-tuned on clinical records to generate accurate diagnostic reports.

Example 2: E-commerce Recommendations

Foundation Model: Analyzes general customer sentiment across reviews.
Task Model: Customized to recommend products based on specific purchase behaviors.

Example 3: Legal Document Automation

Foundation Model: Provides general language comprehension.
Task Model: Generates legally compliant contracts with domain-specific training.

6. The Future of Foundation and Task Models

As AI continues to evolve, the line between foundation models and task models may blur:

Foundation Models Will Become Stronger: With advancements in pretraining, these models might handle specific tasks with little or no fine-tuning (e.g., few-shot learning or zero-shot learning).
Task Models Will Remain Relevant: Despite stronger foundation models, specialized tasks requiring domain expertise and precision will still benefit from task-specific training.

The synergy between the two ensures that AI can adapt to both general and niche challenges.

7. One-Line Summary

Foundation models provide the broad, flexible foundation for AI, while task models deliver focused, specialized solutions tailored to specific needs.

Final Thoughts

Understanding the difference between foundation and task models is key to leveraging AI effectively. Whether building a general-purpose tool or solving a domain-specific problem, knowing when to rely on a foundation model and when to train a task model is critical. Stay tuned for more insights tomorrow—follow me for daily AI explorations!

At the end, feel free to check out my other AI Insights blog posts at here.

Discover the Power of Zero-Shot and Few-Shot Learning

GeekCoding101 — Fri, 13 Dec 2024 00:00:00 GMT

Transfer learning has revolutionized the way AI models adapt to new tasks, enabling them to generalize knowledge across domains. At its core, transfer learning allows models trained on vast datasets to tackle entirely new challenges with minimal additional data or effort. Two groundbreaking techniques within this framework are Zero-Shot Learning (ZSL) and Few-Shot Learning (FSL). ZSL empowers AI to perform tasks without ever seeing labeled examples, while FSL leverages just a handful of examples to quickly master new objectives. These approaches highlight the versatility and efficiency of transfer learning, making it a cornerstone of modern AI applications. Let’s dive deeper into how ZSL and FSL work and why they’re transforming the landscape of machine learning.

1. What Is Zero-Shot Learning (ZSL)?

[caption id="attachment_4838" align="alignnone" width="823"] Zero-Shot Learning refers to an AI model's ability to perform a specific task without having seen any labeled examples for that task during training. In other words, the model relies on its general knowledge and contextual understanding rather than on task-specific training data.[/caption]

Simple Example

Imagine a model trained to recognize “cats” and “dogs,” but it has never seen a “tiger.” When you show it a tiger and ask, “Is this a tiger?” it can infer that it’s likely a tiger by reasoning based on the similarities and differences between cats, dogs, and tigers.

How It Works

Semantic Embeddings ZSL maps both task descriptions and data samples into a shared semantic space. For instance, the word “tiger” is embedded as a vector, and the model compares it with the image’s vector to infer their relationship.
Pretrained Models ZSL relies heavily on large foundation models like GPT-4 or CLIP, which have learned extensive general knowledge during pretraining. These models can interpret natural language prompts and infer the answer.
Natural Language Descriptions Clear, descriptive prompts like “Is this a tiger?” help the model understand the task through language, allowing it to respond appropriately without requiring task-specific examples.

2. What Is Few-Shot Learning (FSL)?

Few-Shot Learning refers to an AI model’s ability to complete a task after being exposed to only a few labeled examples (typically 1 to 10). It is particularly useful in scenarios where data is scarce.

Simple Example

Suppose you need to teach a model to distinguish between “apples” and “oranges.” By providing just five labeled images of each, the model can quickly learn how to classify new images into these two categories.

How It Works

In-Context Learning Few-Shot Learning leverages examples provided within the task context to help the model infer rules. For example:

mathematica

Examples of apples:

Image 1: Red, round.

Image 2: Green, round.

Examples of oranges:

Image 1: Orange, round.

Image 2: Orange, slightly rough.

Task: What category does this new image belong to?

Image 3: Orange, round.

The model uses the context to deduce the classification.
Parameter Transfer FSL often relies on transferring knowledge from a pretrained model to a new task. The model applies its prior understanding of related tasks to the new one.
Gradient-Based Fine-Tuning A small amount of fine-tuning with limited labeled data allows the model to adjust its parameters for better task performance.

3. Key Differences Between ZSL and FSL

Aspect	Zero-Shot Learning (ZSL)	Few-Shot Learning (FSL)
Data Requirement	No task-specific examples required.	Requires a small number of labeled examples (1–10).
Approach	Relies on general knowledge and natural language prompts.	Combines task examples with prior model knowledge.
Use Case	Best for tasks with no available labeled data.	Suitable for scenarios with limited labeled data.
Model Dependency	Heavily depends on strong pretrained models.	Requires pretrained models and task-specific adaptation.

4. Real-World Applications

Zero-Shot Learning Applications

Text Classification Using GPT-4 to classify text as positive or negative sentiment without training on labeled data, relying solely on the prompt: “Is this a positive or negative review?”
Image Recognition CLIP can identify objects in images by answering natural language queries like “Is this a panda?” without having been trained on specific panda images.
New Task Inference Models like GPT-4 can handle tasks like translation between languages it hasn’t explicitly been trained on, leveraging its general language understanding.

Few-Shot Learning Applications

Medical Diagnosis Fine-tune a model with a few labeled medical records to diagnose rare diseases more accurately.
Niche Classification Train a model to classify reviews in a specific industry (e.g., luxury goods) using only a handful of labeled examples.
Custom AI for Businesses Fine-tune a model with a small dataset of customer support tickets to create a tailored AI assistant for answering specific queries.

5. Challenges of ZSL and FSL

Challenges of Zero-Shot Learning

Understanding Task Descriptions Models rely heavily on the clarity of natural language prompts, and vague instructions can lead to poor performance.
Domain Adaptation Pretrained models may lack domain-specific knowledge (e.g., medical or legal), limiting their effectiveness in specialized areas.

Challenges of Few-Shot Learning

Sample Bias A small dataset may not represent the full complexity of the task, leading to overfitting.
High Data Quality Requirement FSL demands clean, high-quality examples, as errors in the data can mislead the model.

6. One-Line Summary

Zero-shot learning enables models to infer tasks without any labeled data, while few-shot learning allows them to adapt quickly with just a few examples. Together, they make AI more flexible and efficient.

Final Thoughts

ZSL and FSL represent AI’s shift toward greater adaptability and efficiency, enabling it to perform tasks with minimal data. Whether you’re marveling at GPT-4’s zero-shot conversational skills or fine-tuning a few-shot model for a specific use case, these techniques are revolutionizing AI applications. Stay tuned for tomorrow’s topic, and follow for more AI insights!

At the end, feel free to check out my other AI Insights blog posts at here.

The Hallucination Problem in Generative AI: Why Do Models “Make Things Up”?

GeekCoding101 — Sun, 15 Dec 2024 00:00:00 GMT

[caption id="attachment_4837" align="alignnone" width="1440"] Generative AI has taken the tech world by storm, revolutionizing how we interact with information and automation. But one pesky issue has left users both puzzled and amused—the “hallucination” problem. These hallucinations occur when AI models confidently produce incorrect or entirely fabricated content. Why does this happen, and how can we address it? Let’s explore.[/caption]

What Is Hallucination in Generative AI?

In generative AI, hallucination refers to instances where the model outputs false or misleading information that may sound credible at first glance. These outputs often result from the limitations of the AI itself and the data it was trained on.

Common Examples of AI Hallucinations

Fabricating facts: AI models might confidently state that “Leonardo da Vinci invented the internet,” mixing plausible context with outright falsehoods.
Wrong Quote: "Can you provide me with a source for the quote: 'The universe is under no obligation to make sense to you'?" AI Output: "This quote is from Albert Einstein in his book The Theory of Relativity, published in 1921." This quote is actually from Neil deGrasse Tyson, not Einstein. The AI associates the quote with a famous physicist and makes up a book to sound convincing.
Incorrect technical explanations: AI might produce an elegant but fundamentally flawed description of blockchain technology, misleading both novices and experts alike.

Hallucination highlights the gap between how AI "understands" data and how humans process information.

Why Do AI Models Hallucinate?

The hallucination problem isn’t a mere bug—it stems from inherent technical limitations and design choices in generative AI systems.

Biased and Noisy Training Data

Generative AI relies on massive datasets to learn patterns and relationships. However, these datasets often contain:

Biased information: Common errors or misinterpretations in the data propagate through the model.
Incomplete data: Missing critical context or examples in the training corpus leads to incorrect generalizations.
Cultural idiosyncrasies: Rare idiomatic expressions or language-specific nuances, like Chinese 成语, may be underrepresented in training data.

Limitations of Model Architecture

Generative AI predicts outputs based on probability rather than factual accuracy. Its core mechanism aims to find the "most likely" next word or phrase rather than verify its correctness. This design inherently prioritizes fluency over precision.

Influence of Prompts

The way users frame questions or inputs significantly affects AI responses. Ambiguity in prompts—common in languages like Chinese with complex grammar—can further exacerbate errors. For example:

Asking “What are China’s five tallest mountains?” may prompt a mix of correct and fabricated peaks due to poorly structured data or vague phrasing.

How Does Hallucination Impact Users?

The hallucination problem isn’t just an academic curiosity—it has real-world consequences that impact trust, decision-making, and user experience.

Misleading Decisions

When users unknowingly rely on incorrect AI outputs, the results can be detrimental:

Academic Missteps: Students may reference false information in essays or research papers.
Business Risks: Companies using AI for market analysis might make poor strategic decisions based on fabricated trends.

Challenges in Chinese Language Contexts

Chinese presents unique difficulties for AI systems, including:

Idioms and cultural references: Misinterpreting or misusing idiomatic expressions can lead to miscommunication.
Ambiguity and polysemy: Words with multiple meanings in Chinese can confuse AI and cause inaccurate translations or explanations.

Eroding Trust in AI

Frequent hallucinations can erode user confidence in generative AI, especially in high-stakes domains like healthcare, finance, or law. Once trust diminishes, adoption rates decline, stalling technological progress.

How Can We Address the Hallucination Problem?

While hallucination cannot be entirely eliminated, there are practical steps to mitigate its effects.

Improve Training Data Quality

Data cleaning: Eliminate incorrect or low-quality information from training datasets.
Expand data diversity: Incorporate underrepresented linguistic and cultural examples, such as idioms and colloquialisms.
Update for relevance: Continuously supplement datasets with the latest verified information.

Implement Post-Processing Mechanisms

Human review: Deploy experts to validate AI-generated outputs in critical applications.
Algorithmic validation: Use secondary AI models or rule-based systems to cross-check outputs for logical consistency.

Educate Users on AI Limitations

Empowering users with knowledge about AI's strengths and weaknesses fosters better usage. Teach users how to frame precise prompts and critically evaluate outputs rather than taking them at face value.

Future Outlook: Balancing Challenges and Opportunities

The hallucination problem underscores the limitations of even the most advanced generative AI systems. However, it also highlights areas for growth and innovation.

Can Hallucination Be Fully Eliminated?

Complete elimination of hallucinations seems unlikely due to the probabilistic nature of AI. However, ongoing improvements in training, validation, and architecture can significantly reduce the frequency and impact of hallucinations.

Best Practices for Coexisting with AI

The future lies in human-AI collaboration rather than blind reliance. By leveraging AI for what it excels at—pattern recognition, rapid response, and creativity—while compensating for its weaknesses, we can achieve a balanced coexistence.

Conclusion and Discussion

The hallucination problem in generative AI is a reminder that even cutting-edge technology is not infallible. What steps do you think are most effective for addressing this issue? Have you encountered amusing or frustrating examples of AI hallucinations? Share your thoughts and stories in the comments below!

At the end, feel free to check out my other AI Insights blog posts at here.

Knowledge Distillation: How Big Models Train Smaller Ones

GeekCoding101 — Mon, 16 Dec 2024 00:00:00 GMT

Knowledge Distillation in AI is a powerful method where large models (teacher models) transfer their knowledge to smaller, efficient models (student models). This technique enables AI to retain high performance while reducing computational costs, speeding up inference, and facilitating deployment on resource-constrained devices like mobile phones and edge systems. By mimicking the outputs of teacher models, student models deliver lightweight, optimized solutions ideal for real-world applications. Let’s explore how knowledge distillation works and why it’s transforming modern AI.

1. What Is Knowledge Distillation?

Knowledge distillation is a technique where a large model (Teacher Model) transfers its knowledge to a smaller model (Student Model). The goal is to compress the large model’s capabilities into a lightweight version that is faster, more efficient, and easier to deploy, while retaining high performance.

Think of a teacher (large model) simplifying complex ideas for a student (small model). The teacher provides not just the answers but also insights into how the answers were derived, allowing the student to replicate the process efficiently.

The illustration from Knowledge Distillation: A Survey explained it:

Another figure is from A Survey on Knowledge Distillation of Large Language Models:

2. Why Is Knowledge Distillation Important?

Large models (e.g., GPT-4) are powerful but have significant limitations:

High Computational Costs: Require expensive hardware and energy to run.
Deployment Challenges: Difficult to use on mobile devices or edge systems.
Slow Inference: Unsuitable for real-time applications like voice assistants.

Knowledge distillation helps address these issues by:

Reducing Model Size: Smaller models require fewer resources.
Improving Speed: Faster inference makes them ideal for resource-constrained environments.
Maintaining Accuracy: By learning from large models, smaller models can achieve comparable performance.

3. How Does Knowledge Distillation Work?

The process involves several key steps:

(1) Train the Teacher Model

A large model is trained on a comprehensive dataset to achieve high accuracy and generalization.

(2) Generate Soft Targets

The teacher model produces outputs with detailed probability distributions.
For example, when classifying an image, instead of just saying “cat,” the teacher might output:
- Cat: 80%
- Dog: 15%
- Fox: 5%.
These soft targets provide rich information about how the teacher distinguishes between categories.

(3) Train the Student Model

The smaller model learns from both the teacher’s soft targets and the original data.
By mimicking the teacher’s outputs, the student absorbs the distilled knowledge without requiring as much capacity.

(4) Evaluate and Optimize

The student model’s performance is validated and fine-tuned to ensure it meets the desired accuracy and efficiency.

4. A Simple Example: The Classroom Analogy

Without Distillation: A small model learns directly from raw data, like a student relying solely on a textbook without guidance.
With Distillation: The teacher (large model) explains not only the answers but also why certain conclusions are drawn. The student absorbs these nuanced insights, leading to better understanding.

Probabaly below figure from Knowledge Distillation : Simplified can help:

5. Real-World Applications of Knowledge Distillation

(1) Lightweight AI on Edge Devices

Small, distilled models are deployed on smartphones, IoT devices, and embedded systems.
Example: A distilled CLIP model for image classification on mobile.

(2) Real-Time Applications

Faster inference is crucial for speech recognition or recommendation systems in real-time scenarios.
Example: Voice assistants using distilled models for quick responses.

(3) Multitask Learning

Combine multiple teacher models into one small model capable of handling various tasks.
Example: A single model for both translation and sentiment analysis.

6. Challenges in Knowledge Distillation

(1) Knowledge Loss

Small models may fail to replicate the full depth of understanding from large models, especially for complex tasks.

(2) Computational Overhead

Generating soft targets from a teacher model can be resource-intensive when working with large datasets.

(3) Task-Specific Needs

Different tasks require different knowledge. Adapting distilled models to specific tasks remains a research challenge.

7. One-Line Summary

Knowledge distillation compresses the “wisdom” of large models into smaller, efficient ones, enabling faster, cost-effective AI without sacrificing accuracy.

Final Thoughts

Knowledge distillation bridges the gap between large, powerful models and real-world deployment. By making AI both smarter and leaner, this technique is transforming applications from edge devices to real-time systems. Next time you use a quick AI assistant on your phone, think about the distilled knowledge powering it. Stay tuned for more insights tomorrow!

Weight Initialization: Unleashing AI Performance Excellence

GeekCoding101 — Mon, 16 Dec 2024 00:00:00 GMT

Weight Initialization in AI plays a crucial role in ensuring effective neural network training. It determines the starting values for connections (weights) in a model, significantly influencing training speed, stability, and overall performance. Proper weight initialization prevents issues like vanishing or exploding gradients, accelerates convergence, and helps models achieve better results. Whether you’re working with Xavier, He, or orthogonal initialization, understanding these methods is essential for building high-performance AI systems.

:::info

Ugh, such a headache… sorry. Honestly, today’s chapter involves some formulas, and I feel like it’s tough to explain them clearly in such a limited space. But hey, it’s just a casual explainer piece, right? Hopefully, I can follow up with a deeper dive into the principles later on…

:::

1. What Is Weight Initialization?

Weight initialization is the process of assigning initial values to the weights of a neural network before training begins. These weights determine how neurons are connected and how much influence each connection has. While the values will be adjusted during training, their starting points can significantly impact the network’s ability to learn effectively.

Think of weight initialization as choosing your starting point for a journey.

A good starting point (proper initialization) puts you on the right path for a smooth trip.
A bad starting point (poor initialization) may lead to delays, detours, or even getting lost altogether.

2. Why Is Weight Initialization Important?

The quality of weight initialization directly affects several key aspects of model training:

(1) Training Speed

Poor initialization can slow down the model’s ability to learn by causing redundant or inefficient updates.
Good initialization accelerates convergence, meaning the model learns faster.

(2) Gradient Behavior

Vanishing Gradients: If weights are initialized too small, gradients shrink as they propagate backward, making it difficult for deeper layers to update.
Exploding Gradients: If weights are initialized too large, gradients grow exponentially, leading to instability during training.

(3) Final Model Performance

A well-initialized network is more likely to reach a better final solution, while a poorly initialized one may get stuck in a suboptimal solution or fail to train altogether.

3. Everyday Examples of Weight Initialization

Example 1: The Zero Trap

Imagine you’re training a neural network to distinguish between "cats" and "dogs." If all weights are initialized to zero, every neuron in the network will compute the same value. The network will be incapable of learning diverse features like "whiskers" for cats or "tail shapes" for dogs. It’s like asking a group of people to vote, but everyone always gives the same answer—no progress can be made.

Example 2: Random Chaos

Suppose weights are initialized randomly but with values that are too large. The network becomes chaotic, like a classroom where everyone is shouting different answers at once. The gradients become uncontrollable, and learning collapses.

Example 3: The Sweet Spot

With proper initialization (e.g., scaled random values), the network starts off on a stable footing. It’s like giving each voter clear instructions—everyone brings unique but manageable inputs to the table, allowing the group to reach a consensus effectively.

4. Common Weight Initialization Methods

Here are the most widely used approaches, explained without diving into technical formulas:

(1) Random Initialization

Assign random values to the weights.
Pro: Breaks symmetry and ensures neurons don’t learn identical features.
Con: If the range of randomness is too wide or narrow, training becomes unstable or slow.

(2) Xavier Initialization

Designed to maintain balance in gradient flow across layers.
I found this article explained Xavier initialization very well, feel free to check out.
Best For: Networks using smooth activation functions like Sigmoid or tanh.
Benefit: Helps gradients propagate effectively without vanishing or exploding.

(3) He Initialization

Specifically tailored for ReLU activation functions.
Why It Works: ReLU only activates positive inputs, so it needs a larger initial range to ensure more neurons are active during training.
Best For: Deep networks with ReLU or its variants.

(4) Orthogonal Initialization

Starts with weights that form an orthogonal matrix.
Pro: Ensures independence between different directions in the weight space.
Best For: Complex or very deep networks.

5. Practical Challenges and Optimizations

Challenges

Dynamic Needs: Different network architectures and activation functions require tailored initialization methods. A one-size-fits-all approach rarely works.
Deep Networks: In extremely deep networks, even good initialization methods may struggle to maintain stable gradients.

Optimizations

Activation Function Pairing: Match initialization methods with the activation function. For example, He initialization works well with ReLU.
Normalization Layers: Techniques like Batch Normalization or Layer Normalization can mitigate the effects of poor initialization.
Manual Fine-Tuning: In some cases, experimenting with the initialization range for specific layers can yield better results.

6. One-Line Summary

Weight initialization is the starting point for a neural network’s training journey, and proper initialization ensures the model learns efficiently, avoids gradient issues, and achieves better performance.

Final Thoughts

Weight initialization might seem like a small step in the deep learning pipeline, but it’s a critical factor for training success. The next time you train a neural network, pay close attention to your initialization strategy—it could make or break your model’s performance. Stay tuned for more AI insights, and let’s continue exploring together!

At the end, please feel free to check out my other AI Insights blog posts at here.

Quantization: How to Unlock Incredible Efficiency on AI Models

GeekCoding101 — Wed, 18 Dec 2024 00:00:00 GMT

Quantization is a transformative AI optimization technique that compresses models by reducing precision from high-bit floating-point numbers (e.g., FP32) to low-bit integers (e.g., INT8). This process significantly decreases storage requirements, speeds up inference, and enables deployment on resource-constrained devices like mobile phones or IoT systems—all while retaining close-to-original performance. Let’s explore why it is essential, how it works, and its real-world applications.

Why Do AI Models Need to Be Slimmed Down?

AI models are growing exponentially in size, with models like GPT-4 containing hundreds of billions of parameters. While their performance is impressive, this scale brings challenges:

High Computational Costs: Large models require expensive hardware like GPUs or TPUs, with significant power consumption.
Slow Inference Speed: Real-time applications, such as voice assistants or autonomous driving, demand fast responses that large models struggle to provide.
Deployment Constraints: Limited memory and compute power on mobile or IoT devices make running large models impractical.

The Problem

How can we preserve the capabilities of large models while making them lightweight and efficient?

The Solution

Quantization. This optimization method compresses models to improve efficiency without sacrificing much performance.

What Is It?

It reduces the precision of AI model parameters (weights) and intermediate results (activations) from high-precision formats like FP32 to lower-precision formats like FP16 or INT8.

Simplified Analogy

It is like compressing an image:

Original Image (High Precision): High resolution, large file size, slow to load.
Compressed Image (Low Precision): Smaller file size with slightly lower quality but faster and more efficient.

How Does It Work?

The key is representing parameters and activations using fewer bits while minimizing performance loss. This involves two main steps:

1. Numerical Range Mapping

High-precision floating-point numbers are mapped to a smaller integer range.

For example, a floating-point parameter ranging from [-2.0, 2.0] is mapped to integers in [0, 255].

2. Float-to-Integer Conversion

Using a scale factor, floating-point values are converted to integers:

-1.0 becomes 0.
2.0 becomes 255.

Result

The model operates at a lower precision but retains the key information needed for accurate predictions.

Core Processes and Methods

1. Weight Quantization

What It Does: Converts model parameters from FP32 to INT8.
Effect: Reduces storage requirements significantly but may introduce minor errors.

2. Activation Quantization

What It Does: Quantizes intermediate computation results during inference.
Effect: Further reduces compute demands but requires hardware support.

3. Quantization-Aware Training (QAT)

What It Does: Simulates quantization during training so the model can adapt to low-precision calculations.
Effect: Retains higher accuracy compared to post-training quantization.

4. Dynamic Quantization

What It Does: Dynamically quantizes activations during inference while keeping weights in high precision.
Effect: Suitable for real-time applications, offering flexibility in deployment.

Learn more from this article:

Static vs Dynamic Quantization in Machine Learning

Real-World Applications

1. Voice Assistants on Mobile Devices

Voice assistants require fast responses, but large models consume too much power. By quantizing a speech recognition model, it can run locally on phones, doubling response speed and reducing power consumption by 40%.

2. Image Classification on Edge Devices

Edge devices like security cameras need to process large volumes of real-time video data. Quantizing a ResNet model from FP32 to INT8 increases inference speed by 3x while reducing memory usage by 70%.

3. Real-Time Object Detection in Autonomous Vehicles

Autonomous vehicles require high-accuracy, low-latency object detection. Using quantization-aware training, models maintain precision while accelerating processing speeds, enabling faster responses to sudden obstacles.

Limitations

Despite its benefits, it has some limitations:

Accuracy Loss: Low precision can introduce quantization errors, which affect performance in high-accuracy tasks like medical diagnostics.
Hardware Dependency: Efficient quantized operations require hardware that supports low-precision calculations, such as INT8-compatible devices.
Limited Scope: Adapting quantized models to complex or multimodal tasks remains a challenge.

The Future

Mixed Precision Computing: Combining low-precision (e.g., INT8) and high-precision (e.g., FP16/FP32) operations to balance performance and accuracy.
Improved Quantization-Aware Training: Enhancing training methods to automatically optimize weight distributions during quantization.
Specialized Hardware Support: Designing chips optimized for ultra-low precision calculations (e.g., INT4, INT2) to further reduce energy consumption.

One-Line Summary

Quantization enables AI models to transition from “high precision” to “high efficiency,” making them lightweight yet powerful—an essential tool for modern AI.

At the end, you're welcome to access my other AI Insights blog posts at here.

Ray Serve: The Versatile Assistant for Model Serving

GeekCoding101 — Fri, 20 Dec 2024 00:00:00 GMT

Ray Serve is a cutting-edge model serving library built on the Ray framework, designed to simplify and scale AI model deployment. Whether you’re chaining models in sequence, running them in parallel, or dynamically routing requests, Ray Serve excels at handling complex, distributed inference pipelines. Unlike Ollama or FastAPI, it combines ease of use with powerful scaling, multi-model management, and Pythonic APIs. In this post, we’ll explore how Ray Serve compares to other solutions and why it stands out for large-scale, multi-node AI serving.

Before Introducing Ray Serve, We Need to Understand Ray

What is Ray?

Ray is an open-source distributed computing framework that provides the core tools and components for building and running distributed applications. Its goal is to enable developers to easily scale single-machine programs to distributed environments, supporting high-performance tasks such as distributed model training, large-scale data processing, and distributed inference.

Core Modules of Ray

Ray Core
- The foundation of Ray, providing distributed scheduling, task execution, and resource management.
- Allows Python functions to be seamlessly transformed into distributed tasks using the @ray.remote decorator.
- Ideal for distributed data processing and computation-intensive workloads.
Ray Libraries
- Built on top of Ray Core, these are specialized tools designed for specific tasks. Examples include:
  - Ray Tune: For hyperparameter search and experiment optimization.
  - Ray Train: For distributed model training.
  - Ray Serve: For distributed model serving.
  - Ray Data: For large-scale data and stream processing.

In simpler terms, Ray Core is the underlying engine, while the various tools (like Ray Serve) are specific modules built on top of it to handle specific functionalities.

Now Let’s Talk About Ray Serve...

Many people ask: “Is Ray Serve just a backend service that routes user requests to an LLM (Large Language Model) and returns the results?”

You’re half right! Ray Serve does exactly that, but it’s much more than just a “delivery boy.” Compared to a basic FastAPI backend or a dedicated tool like Ollama, Ray Serve is a flexible, capable, and self-scaling assistant that handles much more than just routing.

Let’s dive in and break down what Ray Serve does, and how it compares to Ollama or a custom-built FastAPI solution.

Ray Serve: The Versatile Multi-Tasker of Model Serving

In short, Ray Serve’s mission is to: “Handle user requests, route them to the right model for processing, optimize resources, and dynamically scale as needed.”

It’s like a supercharged scheduler that performs the following key tasks:

Setting up model services: You tell it where your model is (e.g., a GPT-4 instance), and it will automatically handle receiving requests, sending inference tasks, and even batching requests.
Managing traffic spikes: When user requests flood in like a tidal wave, it dynamically scales instances to handle the pressure.
Supporting multiple models: With Ray Serve, you can host multiple models in a single service (e.g., one for text generation and another for spam classification) without any issues.

So no, Ray Serve isn’t just “doing the grunt work”—it also adjusts the architecture, adds new resources, and patches itself when needed.

Application Patterns of Ray Serve: Adapting to Multi-Model and Multi-Step Inference

:::info

The diagrams in this sections are from https://www.anyscale.com/glossary/what-is-ray-serve.

:::

In modern AI systems, multi-model and multi-step inference has become a common requirement. Whether you’re processing images, text, or multi-modal inputs, model services need to support flexible inference patterns. Ray Serve excels here by seamlessly adapting to the following three classic patterns, offering simple and efficient Pythonic APIs to minimize configuration complexity.

Pattern 1: Sequential Model Inference (Chaining Models in Sequence)

In this pattern, user input passes through multiple models sequentially, with each model’s output serving as the input for the next. This chained structure is common in tasks like image processing or data transformations.

Example: For an image enhancement task, the input might go through a denoising model (Model_1), followed by a feature extraction model (Model_2), and finally a classification model (Model_3).

Advantages of Ray Serve:

Efficient Communication: Data is passed between models using shared memory, reducing overhead.
Flexible Scheduling: Resources are dynamically allocated to ensure stable and efficient inference pipelines.

Pattern 2: Parallel Model Inference (Ensembling Models)

Explanation: In this pattern, user input is sent to multiple models simultaneously, with each model processing the request independently. The results are then aggregated by an ensemble step to produce the final output. This pattern is often used in recommendation systems or ensemble learning, where outputs from multiple models are combined for decision-making.

Example: A recommendation system might use collaborative filtering (Model_1), a deep learning model (Model_2), and a rule-based model (Model_3) to make predictions, then select the best recommendation based on business logic.

Advantages of Ray Serve:

Flexible Routing Mechanism: Easily configure multiple model endpoints for parallel processing.
High-Concurrency Handling: Ray’s distributed architecture efficiently manages high-load scenarios with multiple models.

Pattern 3: Dynamic Model Dispatching (Dynamic Dispatching to Models)

Explanation: Here, models are dynamically selected based on the input’s characteristics, ensuring that only the necessary models are triggered for inference. This is ideal for scenarios with complex classification tasks or diverse model types.

Example: In an image classification system, depending on the input image (e.g., a fruit, car, or plant), a specialized model is dynamically chosen for inference instead of invoking every model in the pipeline.

Advantages of Ray Serve:

Resource Efficiency: Only the required models are triggered, avoiding unnecessary computation.
Flexible Business Logic: Dynamic routing rules can be easily defined with simple Python code, eliminating the need for complex YAML configurations.

Unique Advantages of Ray Serve

Compared to other model-serving frameworks like TensorFlow Serving or NVIDIA Triton, Ray Serve offers unique advantages for multi-step and multi-model inference scenarios:

Dynamic Scheduling: Adjust resources and routing strategies based on workload requirements.
Efficient Communication: Optimize data transfer between models using shared memory to reduce overhead.
Granular Resource Allocation: Assign fractional CPU or GPU resources to model instances, improving utilization.
Pythonic API: Simplify implementation with intuitive Python interfaces, avoiding complex YAML setups.

Ollama vs. Ray Serve vs. Custom FastAPI: A Comparison

1. Ollama: The Lightweight Assistant for LLMs

Ollama is designed to quickly set up local LLM services like LLaMA or other open-source models.

Strengths:

Plug-and-Play Simplicity: Minimal configuration required.
LLM-Focused: Optimized for large language models with offline deployment support.

Weaknesses:

Limited Flexibility: Restricted to LLMs and lacks support for multi-model management.
Scalability Concerns: Not ideal for high-concurrency or distributed deployments.

2. Custom FastAPI: The DIY Player for Enthusiasts

FastAPI is a flexible web framework for building lightweight APIs, including ones that interface with backend models.

Strengths:

Full Customization: You have complete control over the logic and routing.
Lightweight: Ideal for small-scale projects.

Weaknesses:

Manual Scaling: Requires hand-crafted solutions for scaling and multi-model management.
Complex Distributed Deployments: Needs additional tools like Kubernetes for distributed setups.

3. Ray Serve: The Smart Manager

Ray Serve combines Ollama’s simplicity with FastAPI’s flexibility, adding powerful distributed capabilities.

Strengths:

Multi-Model Support: Host multiple models simultaneously.
Dynamic Scaling: Automatically adjust resources based on traffic.
Distributed Deployment: Handles multi-node clusters effortlessly.
Batching Optimization: Combines multiple requests for efficient processing.

Weaknesses:

Learning Curve: Configuration is more complex than FastAPI.
Ray Dependency: May feel like overkill for single-node setups.

Choosing the Right Tool

Feature/Framework	Ray Serve	Ollama	FastAPI
Multi-Model Support	Strong	Moderate	Weak
Distributed Deployment	Yes	No	No
Dynamic Scaling	Yes	No	No
Learning Curve	Moderate	Low	Low
Best For	Large-scale distributed projects, complex model serving	Quick local LLM deployment	Small projects, API customization

Conclusion

If you just want a quick, local LLM deployment, go with Ollama. For flexible API development, FastAPI is your best choice. If you need multi-model management, dynamic scaling, or distributed deployment, Ray Serve is the ultimate solution.

Ray Serve acts as the "smart manager" of backend services, effortlessly handling both single-node and multi-node deployments. Stay tuned for a deeper dive into how Ray Serve dynamically adjusts resources based on traffic!

Groundbreaking News: OpenAI Unveils o3 and o3 Mini with Stunning ARC-AGI Performance

GeekCoding101 — Sat, 21 Dec 2024 00:00:00 GMT

On December 20, 2024, OpenAI concluded its 12-day "OpenAI Christmas Gifts" campaign by revealing two groundbreaking models: o3 and o3 mini. At the same time, the ARC Prize organization announced OpenAI's remarkable performance on the ARC-AGI benchmark. The o3 system scored a breakthrough 75.7% on the Semi-Private Evaluation Set, with a staggering 87.5% in high-compute mode (using 172x compute resources). This achievement marks an unprecedented leap in AI's ability to adapt to novel tasks, setting a new milestone in generative AI development.

The o3 Series: From Innovation to Breakthrough

OpenAI CEO Sam Altman had hinted that this release would feature “big updates” and some “stocking stuffers.” The o3 series clearly falls into the former category. Both o3 and o3 mini represent a pioneering step towards 2025, showcasing exceptional reasoning capabilities and redefining the possibilities of AI systems.

ARC-AGI Performance: A Milestone Achievement for o3

The o3 system demonstrated its capabilities on the ARC-AGI benchmark, achieving 75.7% in efficient mode and 87.5% in high-compute mode. These scores represent a major leap in AI's ability to generalize and adapt to novel tasks, far surpassing previous generative AI models.

[caption id="attachment_4806" align="alignnone" width="1750"] From https://arcprize.org/blog/oai-o3-pub-breakthrough[/caption]

What is ARC-AGI?

ARC-AGI (AI Readiness Challenge for Artificial General Intelligence) is a benchmark specifically designed to test AI's adaptability and generalization. Its tasks are uniquely crafted:

Simple for humans: Tasks like logical reasoning and problem-solving.
Challenging for AI: Especially when models haven’t been explicitly trained on similar data.

o3’s performance highlights a significant improvement in tackling new tasks, with its high-compute configuration setting a new standard at 87.5%.

How o3 Outshines Traditional LLMs: From Memory to Program Synthesis

Traditional GPT models rely on "memorization": learning and executing predefined programs based on massive training data. However, this approach struggles with novel tasks due to its inability to dynamically recombine knowledge or generate new "programs."

o3's Core Innovation: Dynamic Knowledge Recombination

Program Search and Execution o3 generates natural language "programs" (such as Chains of Thought, CoT) to solve tasks and executes them internally.
Evaluation and Refinement Using techniques similar to Monte-Carlo Tree Search (MCTS), o3 dynamically evaluates program paths and selects optimal solutions.

While this process is compute-intensive (requiring millions of tokens and significant costs per task), it dramatically enhances AI’s adaptability to new challenges.

Efficiency vs. Cost: Balancing o3’s Performance

Despite its remarkable performance, o3’s high-compute mode comes with significant costs. According to ARC Prize data:

Efficient mode:
- Cost per task: ~$20
- Semi-Private Eval score: 75.7%
High-compute mode:
- Uses 172x resources of efficient mode.
- Achieves 87.5%, but with a much higher cost.

While the current cost-performance ratio remains a challenge, advancements in optimization and hardware are expected to reduce costs in the coming months.

What Makes o3 a Groundbreaking Leap?

1. Task Adaptability

o3 dynamically generates and executes task-specific natural language programs, moving beyond the static “memorization” paradigm of previous generative AI models.

2. Generalization

Compared to the GPT series, o3 demonstrates near-human generalization capabilities, especially on benchmarks like ARC-AGI.

3. Architectural Innovation

o3’s success underscores the critical role of architecture in advancing AI capabilities. Simply scaling GPT-4 or similar models would not achieve comparable results.

Is o3 AGI?

While o3’s performance is extraordinary, it has not yet reached the level of Artificial General Intelligence (AGI). Key limitations include:

Failures on Simple Tasks Even in high-compute mode, o3 struggles with some straightforward tasks, revealing gaps in fundamental reasoning.
Challenges with ARC-AGI-2 Preliminary tests suggest that o3 might score below 30% on the upcoming ARC-AGI-2 benchmark, while average humans score over 95%.

These challenges highlight that while o3 is a significant milestone, it remains a step on the path to true AGI.

Looking Ahead: The Future of o3 and AGI

1. Open-Source Collaboration

The ARC Prize initiative plans to launch the more challenging ARC-AGI-2 benchmark in 2025, encouraging researchers to build on o3’s success through open-source analysis and optimization.

2. Expanding Capabilities

Further analysis of o3 will help identify its mechanisms, performance bottlenecks, and potential for future advancements.

3. Advancing Benchmarks

The ARC Prize Foundation is developing third-generation benchmarks to push the boundaries of AI systems’ adaptability and generalization.

Conclusion: The Significance of o3

OpenAI’s o3 model represents a groundbreaking leap in generative AI, pushing the boundaries of task adaptability and dynamic knowledge recombination. By overcoming the limitations of traditional LLMs, o3 opens new avenues for addressing novel challenges.

This is only the beginning. With new benchmarks and collaborative research on the horizon, o3 sets the stage for further progress towards AGI. As we look ahead to 2025, the future of AI promises even greater possibilities.

:::info

Disclaimer: The content above includes contributions generated with the assistance of AI tools.

:::

Diving into "Attention is All You Need": My Transformer Journey Begins!

GeekCoding101 — Sat, 28 Dec 2024 00:00:00 GMT

Today marks the beginning of my adventure into one of the most groundbreaking papers in AI for transformer: "Attention is All You Need" by Vaswani et al. If you’ve ever been curious about how modern language models like GPT or BERT work, this is where it all started. It’s like diving into the DNA of transformers — the core architecture behind many AI marvels today.

What I’ve learned so far has completely blown my mind, so let’s break it down step by step. I’ll keep it fun, insightful, and bite-sized so you can learn alongside me! From today, I plan to study one or two pages of this paper daily and share my learning highlights right here.

Day 1: The Abstract

The abstract of "Attention is All You Need" sets the stage for the paper’s groundbreaking contributions. Here’s what I’ve uncovered today about the Transformer architecture:

The Problem with Traditional Models:
- Most traditional sequence models rely on Recurrent Neural Networks (RNNs) or Convolutional Neural Networks (CNNs).
- These models have limitations:
  - RNNs are slow due to sequential processing and lack parallelization.
  - CNNs struggle to capture long-range dependencies effectively.
Transformer’s Proposal:
- The paper introduces the Transformer, a new architecture that uses only Attention Mechanisms while completely removing recurrence and convolution. This approach makes transformers faster and more efficient.
Experimental Results:
- On WMT 2014 English-German translation, the Transformer achieves a BLEU score of 28.4, surpassing previous models by over 2 BLEU points. WMT (Workshop on Machine Translation) is a benchmark competition for translation models, and this task involves translating English text into German.
- On WMT 2014 English-French translation, it achieves a state-of-the-art BLEU score of 41.8 with significantly lower training costs. This task involves translating English text into French.
What is BLEU?
- BLEU (Bilingual Evaluation Understudy) is a metric used to evaluate the quality of machine translations. It measures how closely the machine-generated translation matches human reference translations. Scores range from 0 to 100, with higher scores indicating better performance.
Generalization to Other Tasks:
- The Transformer model is not just limited to translation. The paper demonstrates its effectiveness in English constituency parsing, even with limited training data.

Why Transformers Matter

Transformers are everywhere now. From powering tools like Google Translate to enabling cutting-edge models like GPT, the ideas in this paper are the foundation of modern AI. Learning about transformers feels like discovering the blueprint of an advanced technology that’s reshaping the world.

What’s next for me? Tomorrow, I’ll dive into the introduction and explore why attention mechanisms are such a powerful concept within the Transformer architecture.

Your Takeaway

If you’ve been putting off reading this paper, join me! It’s surprisingly approachable once you break it down into smaller concepts. Stay tuned for more updates on my journey, and let’s explore the world of transformers together. Spoiler: it’s insanely cool!

:::info

I was struggling when to use "Transformers" or "Transformer", here explanation came from ChatGPT:

Singular Transformer is used correctly when talking about the architecture itself.
Plural Transformers is used correctly when discussing broader applications.

:::

Stay curious, stay excited. Let the learning adventure begin! 🚀

:::info

Disclaimer: The content above includes contributions generated with the assistance of AI tools.

:::

Terms Used in "Attention is All You Need"

GeekCoding101 — Sat, 28 Dec 2024 00:00:00 GMT

Below is a comprehensive table of key terms used in the paper "Attention is All You Need," along with their English and Chinese translations. Where applicable, links to external resources are provided for further reading.

English Term	Chinese Translation	Explanation	Link
Encoder	编码器	The component that processes input sequences.
Decoder	解码器	The component that generates output sequences.
Attention Mechanism	注意力机制	Measures relationships between sequence elements.	Attention Mechanism Explained
Self-Attention	自注意力	Focuses on dependencies within a single sequence.
Masked Self-Attention	掩码自注意力	Prevents the decoder from seeing future tokens.
Multi-Head Attention	多头注意力	Combines multiple attention layers for better modeling.
Positional Encoding	位置编码	Adds positional information to embeddings.
Residual Connection	残差连接	Shortcut connections to improve gradient flow.
Layer Normalization	层归一化	Stabilizes training by normalizing inputs.	Layer Normalization Details
Feed-Forward Neural Network (FFNN)	前馈神经网络	Processes data independently of sequence order.	Feed-Forward Networks in NLP
Recurrent Neural Network (RNN)	循环神经网络	Processes sequences step-by-step, maintaining state.	RNN Basics
Convolutional Neural Network (CNN)	卷积神经网络	Uses convolutions to extract features from input data.	CNN Overview
Parallelization	并行化	Performing multiple computations simultaneously.
BLEU (Bilingual Evaluation Understudy)	双语评估替代	A metric for evaluating the accuracy of translations.	Understanding BLEU

This table provides a solid foundation for understanding the technical terms used in the "Attention is All You Need" paper. If you have questions or want to dive deeper into any term, the linked resources are a great place to start!

Transformers Demystified - Day 2 - Unlocking the Genius of Self-Attention and AI's Greatest Breakthrough

GeekCoding101 — Mon, 30 Dec 2024 00:00:00 GMT

Transformers are changing the AI landscape, and it all began with the groundbreaking paper "Attention is All You Need." Today, I explore the Introduction and Background sections of the paper, uncovering the limitations of traditional RNNs, the power of self-attention, and the importance of parallelization in modern AI models. Dive in to learn how Transformers revolutionized sequence modeling and transduction tasks!

:::info

I’ve embarked on an exciting journey to thoroughly understand the groundbreaking paper “Attention is All You Need.” My approach is simple but thorough: each day, I focus on a specific section of the paper, breaking it down line by line to grasp every concept, idea, and nuance. Along the way, I simplify technical terms, explore references, and explain math concepts in an accessible manner. I also supplement my learning with further readings and analogies to make even the most complex topics easy to understand. This step-by-step method ensures that I not only learn but truly internalize the foundations of Transformers, setting the stage for more advanced explorations. If you’re curious about Transformers or modern AI, join me as I unravel this revolutionary model one day at a time!

:::

1. Introduction

Sentence 1:

Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks in particular, have been firmly established as state-of-the-art approaches in sequence modeling and transduction problems such as language modeling and machine translation [35, 2, 5].

Explanation (like for an elementary school student): There are special types of AI models called Recurrent Neural Networks (RNNs) that are like people who can remember things from the past while working on something new.

Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) are improved versions of RNNs.
These models are the best performers (state-of-the-art) for tasks where you need to process sequences, like predicting the next word in a sentence (language modeling) or translating text from one language to another (machine translation).

Key terms explained:

Recurrent Neural Networks (RNNs): Models designed to handle sequential data (like sentences, time series).
- Analogy: Imagine reading a book where each sentence depends on the one before it. An RNN processes the book one sentence at a time, remembering earlier ones.
- Further Reading: RNNs on Wikipedia
Long Short-Term Memory (LSTM): A type of RNN that solves the problem of forgetting important past information.
- Analogy: LSTMs are like a memory-keeper that knows what’s important to remember and what to forget.
- Further Reading: LSTM on Wikipedia
Gated Recurrent Units (GRUs): A simpler version of LSTM, with fewer memory-related functions.
- Further Reading: GRU Details
Sequence Modeling and Transduction:
- Sequence Modeling: Tasks like predicting the next word in a sentence.
- Sequence Transduction: Tasks like translating sentences into another language or converting text to speech.
- Further Reading: Sequence Transduction Paper

References explained:

[13] Hochreiter & Schmidhuber (1997): Introduced LSTMs.
- Link: LSTM Original Paper
[7] Chung et al. (2014): Evaluated GRUs.
- Link: GRU Evaluation Paper
[35, 2, 5]: Machine translation and language modeling using RNNs.

Sentence 2:

Numerous efforts have since continued to push the boundaries of recurrent language models and encoder-decoder architectures [38, 24, 15].

Explanation: Over time, researchers have been working hard to make RNNs even better. They focused on:

Recurrent language models: Making RNNs predict words more accurately.
Encoder-Decoder architectures: A setup where one model (encoder) processes the input, and another model (decoder) generates the output (like translation).

Key terms explained:

Encoder-Decoder Architecture:
- The encoder compresses the input into a smaller representation (like summarizing).
- The decoder uses this compressed information to generate the output.
- Analogy: Like translating English to French — first understanding the English text, then generating the French version.
- Further Reading: Encoder-Decoder Overview

References explained:

[38] Wu et al. (2016): Explored Google’s Neural Machine Translation (GNMT) using encoder-decoder architectures.
- Link: GNMT Paper
[24] Luong et al. (2015): Studied effective approaches to attention in neural machine translation.
- Link: Luong Attention Paper
[15] Jozefowicz et al. (2016): Studied language modeling limits.
- Link: Language Model Study

Sentence 3:

Recurrent models typically factor computation along the symbol positions of the input and output sequences.

Explanation: RNNs handle input/output one step at a time:

Input symbols: Letters, words, or parts of words in a sentence.
Factor computation: RNNs calculate each part of the sequence (e.g., one word) in a fixed order.

Sentence 4:

Aligning the positions to steps in computation time, they generate a sequence of hidden states hth_t, as a function of the previous hidden state ht−1h_{t-1} and the input for position tt.

Explanation: RNNs have a hidden memory state (hth_t) that stores what it has learned so far:

For each position (tt):
- Use the previous memory (ht−1h_{t-1}).
- Add new input information for position tt.

Math Representation:

$$h_t = f(h_{t-1}, x_t)$$

$h_t$: Hidden state at time $t$.
$h_{t-1}$: Previous hidden state.
$x_t$: Input at time $t$.
$f$: Function combining these.

Analogy: Think of $h_t$ as a diary where you write today’s experiences based on yesterday’s memories.

Sentence 5:

This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples.

Explanation:

Since RNNs process sequences step-by-step (sequentially), they can’t do multiple steps at the same time (no parallelization).
This is a problem for long sequences because:
1. Memory limits: You can’t train many sequences together (batching is limited).
2. Time cost: Processing each step one at a time is slow.

Analogy: Imagine reading a book one sentence at a time vs. scanning multiple pages in parallel. RNNs are like the first method \u2014 slow and memory-hungry for large books.

Why this is a problem: In real-world tasks like translation, sentences can be very long, making RNNs less efficient.

Sentence 6:

Recent work has achieved significant improvements in computational efficiency through factorization tricks [21] and conditional computation [32], while also improving model performance in case of the latter.

Explanation: Some researchers found clever ways to make RNNs faster and better:

Factorization tricks: These simplify calculations to save time.
Conditional computation: This focuses on only the important parts of the sequence, skipping unnecessary work.

References explained:

[21] Factorization Tricks: Simplifies computations in LSTMs for faster training.
- Link: Factorization Tricks Paper
[32] Conditional Computation: Introduced sparsely gated mixture-of-experts layers, improving efficiency.
- Link: Conditional Computation Paper

Sentence 7:

The fundamental constraint of sequential computation, however, remains.

Explanation: Even with improvements, RNNs still can’t avoid processing sequences step-by-step. This sequential nature is their biggest limitation.

Sentence 8:

Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [2, 19].

Explanation:

Attention mechanisms are like a smart highlight tool that helps models focus on the most important parts of the input.
The big advantage? Attention doesn’t care how far apart the related elements are in a sequence (e.g., the first and last words of a long sentence).

References explained:

[2] Bahdanau et al. (2014): Introduced attention in neural machine translation.
- Link: Bahdanau Attention Paper
[19] Kim et al. (2017): Explored structured attention networks.
- Link: Structured Attention Networks Paper

Sentence 9:

In all but a few cases [27], however, such attention mechanisms are used in conjunction with a recurrent network.

Explanation: Most models use attention with RNNs (as an extra feature) instead of replacing the RNN completely.

Reference explained:

[27] Parikh et al. (2016): Proposed a decomposable attention model without recurrence.
- Link: Decomposable Attention Model Paper

Sentence 10:

In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output.

Explanation: The Transformer is a new model that:

Removes recurrence: No RNNs are used at all.
Uses only attention: Attention mechanisms handle all the work of relating input and output sequences.

Why it’s exciting: This design solves the problems of RNNs (sequential processing and memory issues) while keeping the ability to model relationships in long sequences.

Sentence 11:

The Transformer allows for significantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs.

Explanation:

The Transformer is fast because it processes sequences in parallel.
In experiments, it achieved top performance in translation tasks with just 12 hours of training on 8 GPUs (powerful processors).

Key takeaway: The Transformer is faster, more efficient, and achieves better results than traditional models.

2. Background

Sentence 1:

The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions.

Explanation: Some models before the Transformer also tried to solve the problem of sequential processing:

Extended Neural GPU: Uses neural networks for faster calculations.
ByteNet: Uses convolutions to process sequences in parallel.
ConvS2S: Combines convolutions with sequence modeling.

Why they matter: These models inspired the Transformer by showing that parallelization could work.

References explained:

[16] Extended Neural GPU: Explored memory-efficient computations.
- Link: Extended Neural GPU Paper
[18] ByteNet: Introduced logarithmic efficiency for sequence processing.
- Link: ByteNet Paper
[9] ConvS2S: Used convolutions for sequence-to-sequence learning.
- Link: ConvS2S Paper

Sentence 2:

In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet.

Explanation:

For models like ByteNet and ConvS2S, the farther apart two elements in a sequence are, the more operations are needed to relate them.
ConvS2S: Operations increase linearly (slow for long sequences).
ByteNet: Operations increase logarithmically (faster but still depends on distance).

Sentence 3:

This makes it more difficult to learn dependencies between distant positions [12].

Explanation:

In models like ConvS2S and ByteNet, the more operations needed to relate distant parts of a sequence, the harder it is for the model to learn meaningful relationships between those parts.
Why it matters: For tasks like translation, where the first and last words of a sentence may be closely connected, this limitation is a big problem.

Reference explained:

[12] Hochreiter et al. (2001): This paper explains the challenges of learning long-term dependencies in sequences due to gradient-related issues in recurrent models.
- Link: Gradient Flow Paper

Sentence 4:

In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2.

Explanation:

The Transformer solves the dependency problem by requiring only a constant number of operations to relate any two positions in a sequence.
- Analogy: Think of a Transformer as a direct highway between every pair of cities, instead of needing to stop at every town along the way like in RNNs or ConvS2S.
Averaging Attention-Weighted Positions:
- Attention assigns a “weight” to each position in the sequence to decide how important it is.
- Averaging these weights reduces the ability to capture fine-grained details, like losing sharpness in a photo.
Multi-Head Attention: The Transformer fixes this by using multiple attention mechanisms (heads), which we’ll cover in section 3.2.

Math Explanation for Operations

Let’s break down the constant vs. linear vs. logarithmic growth using simple terms and math.

ConvS2S (Linear Growth):
- To relate two distant elements, ConvS2S needs $O(d)$ operations, where $d$ is the distance between them.
- Example: If $d=10$, ConvS2S needs 10 operations. If $d=100$, it needs 100 operations.
- Linear growth means: The cost increases directly with the distance.
ByteNet (Logarithmic Growth):
- ByteNet improves this with $O(\log(d))$ operations.
- Example: If $d=10$, it might need about 3 operations (since $\log_2(10) \approx 3$).
- Logarithmic growth means: The cost increases slowly as the distance grows.
Transformer (Constant Growth):
- The Transformer needs only $O(1)$ operations, regardless of distance $d$.
- Example: Whether $d=10$ or $d=1000$, the cost stays the same.
- This is because attention mechanisms compare all positions simultaneously.

Why it matters: Constant-time operations make the Transformer much faster and scalable for long sequences.

Sentence 5:

Self-attention, sometimes called intra-attention, is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence.

Explanation:

Self-Attention:
- A mechanism where a model focuses on relationships within the same sequence (e.g., relating the subject of a sentence to its verb).
- It’s like looking at a single document and marking connections between sentences to summarize its meaning.
Representation of the Sequence: The output of self-attention is a compact representation that captures all the important information about the sequence.

Further Reading:

Understanding Self-Attention

Sentence 6:

Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 27, 28, 22].

Explanation: Self-attention is powerful and versatile. It has been used in:

Reading comprehension: Understanding and answering questions about a passage.
Abstractive summarization: Summarizing content by rewriting it in new words.
Textual entailment: Determining if one sentence logically follows another.
Task-independent sentence representations: Creating general-purpose sentence embeddings for use in different tasks.

References explained:

[4] Cheng et al. (2016): Used LSTMs for machine reading.
- Link: Machine Reading Paper
[27] Parikh et al. (2016): Proposed attention-based models without recurrence.
- Link: Decomposable Attention Paper
[28] Paulus et al. (2017): Applied reinforcement learning for summarization.
- Link: Summarization Paper
[22] Lin et al. (2017): Explored structured self-attentive embeddings.
- Link: Self-Attentive Embeddings Paper

Sentence 7:

End-to-end memory networks are based on a recurrent attention mechanism instead of sequence-aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks [34].

Explanation:

End-to-End Memory Networks: A model that combines attention and memory for tasks like answering questions.
Instead of processing sequences step-by-step like RNNs, these models use attention mechanisms to focus on relevant information in memory.
Use Cases: Simple question answering and language modeling (predicting sentences).

Reference explained:

[34] Sukhbaatar et al. (2015): Proposed memory networks for reasoning tasks.
- Link: Memory Networks Paper

Sentence 8:

To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.

Explanation: The Transformer is unique because:

It’s the first model to rely completely on self-attention.
It doesn’t use RNNs or convolution at all, unlike earlier models.

Key takeaway: This makes the Transformer faster, simpler, and more scalable than its predecessors.

Sentence 9:

In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as [17, 18] and [9].

Explanation: The next parts of the paper will cover:

How the Transformer works (architecture).
Why self-attention is important (motivation).
Comparison with older models (e.g., Neural GPU, ByteNet, ConvS2S).

Today, we explored the Introduction and Background sections of the revolutionary paper “Attention is All You Need.” From understanding the limitations of RNNs to discovering the power of self-attention and parallelization, it’s clear why Transformers are a game-changer in the world of AI. These foundational insights set the stage for the next step in our journey: diving into the Transformer Architecture itself. Tomorrow, I’ll delve into the mechanics of self-attention, multi-head attention, and positional encoding.

Stay tuned as we continue to uncover the brilliance behind this landmark model!

Ultimate Kubernetes Tutorial Part 1: Setting Up a Thriving Multi-Node Cluster on Mac

GeekCoding101 — Sat, 01 Mar 2025 00:00:00 GMT

Introduction

Hey there! Welcome to this Kubernetes tutorial! Ever dreamed of running a real multi-node Kubernetes (K8s) cluster on your laptop instead of settling for Minikube’s diet version? A proper real multi-node Kubernetes environment requires virtual machines, and until last year, VMware Fusion was a paid product—an obstacle for many. I know there are alternatives, like KVM, Oracle VirtualBox, and even Minikube’s so-called multi-node mode ----but let’s be real: I’ve got a beast of a MacBook Pro, so why not flex its muscles and spin up a legit multi-node cluster? 🚀

But great news! On November 11, 2024, VMware announced that Fusion and Workstation are now free for all users! The moment I stumbled upon this announcement, I was thrilled. Time to roll up my sleeves, fire up some VMs, and make this cluster a reality. Kick off my Kubernetes tutorial! Let’s dive in! 🚀

Project Overview

My Goal

In this series of Kubernetes tutorial, I want to set up a full Kubernetes cluster on my MacBook Pro using VMware Fusion, creating multiple VMs to simulate real-world deployment and practice my DevOps and IaC (Infrastructure as Code) skills.

Planned Setup

Create a VM as Base VM (Rocky Linux 9)
- Configure networking
- Update system packages
- Disable firewalld
- Enable SSH passwordless login from local Mac to the base VM
- Set up zsh, tmux, vim and common aliases
- Install Miniforge for Python environment management
- Install and configure Ansible
Set up a Local Server Node (localserver)
- Clone from the above base VM image
- Create an Ansible script to customize the base VM image withe new hostname, SSH keys, and networking
- Set up DNS and NTP servers as our internal hostname resolution and local time sync up
Create Kubernetes Nodes (k8s-1 to k8s-4)
- Clone from the base image
- Using the same Ansible script to customize new VMs' hostname, SSH keys, and networking
- Install core Kubernetes packages (containerd, kubelet, kubeadm, kubectl)
- Enable firewalld and open necessary ports (Yes! Many online articles disable firewalld setup in their tutorils, but I want lift the bar! Get it work with iptables like a production environment!)
Cluster Formation
- Setup k8s-1 as Master Node with Flannel as CNI plugin
- Setup k8s-2, k8s-3, k8s-4 as Worker Nodes and join the cluster
Test/Deploy Nginx Service into Cluster via NodePort
Setup Kubernetes Cluster Dashboard
More is coming!

Networking

In the envrionment used in this Kubernetes tutorial, each VM will have two network interfaces:

ens160 → Connected to vmnet2 (private network created on VMFusion for Kubernetes I will talk later: 172.16.211.0/24)
ens224 → Shared with Mac for Internet access.

Hostname	Role	IP Address (ens160)
localserver	DNS Server, NTPServer	`172.16.211.100/24`
k8s-1	Master	`172.16.211.11/24`
k8s-2	Worker	`172.16.211.12/24`
k8s-3	Worker	`172.16.211.13/24`
k8s-4	Worker	`172.16.211.14/24`

Creating the Rocky 9 Base VM

Configure a Custom Network in VMFusion

I hope you’ve already installed VMware Fusion—that part is straightforward.

To create an isolated network among VMs for Kubernetes:

Open VMware Fusion → Preferences → Network
Add a new network (vmnet2)
Uncheck "Provide addresses on this network via DHCP" (as we’ll use static IPs)

Configure the Internet Network in VMFusion

This is straighgtforward, add it for each node as below:

Create the Base VM and Install Rocky Linux 9

I used to work with CentOS and love it, since CentOS 9 was discontinued at the end of 2021, Rocky Linux was announced as a replacement for it. So I will setup Kubernetes on Rocky Linux 9.

You can download the ISO from here.

:::note

Please bear with me. It's a long article, but it's fun! Hope you will like my Kubernetes tutorial soon!

:::

For me, in this kubernetes tutorial, my macboork is Intel, so I used Intel arch ISO and downloaded the DVD ISO, NOT the minimal ISO or boot ISO:

Create a new VM from VMFusion and select the ISO to start installation.

During the Rocky 9 installation, manually set:

hostname: baseimage
password of root
Create a user account admin and make it as the user administrator
IP Address: 172.16.211.3/24
DNS Server:172.16.211.100 , 8.8.8.8
Search Domain: dev.geekcoding101local.com (We will configure this domain later in localserver VM)
Pointing to NTP server running on localserver(172.16.211.100) which we will setup later.

A few screenshots:

(I added the ens224 network adapter post the ISO installation, that's why it's not shown in below)

If you forgot to configure DNS during installation, update it via command line post installation:

nmcli con mod ens160 ipv4.dns "172.16.211.100 8.8.8.8 8.8.8.4"
nmcli con mod ens160 ipv4.dns-search "dev.geekcoding101local.com"
nmcli con mod ens160 ipv4.ignore-auto-dns yes
nmcli con up ens160
nmcli dev show ens160

Once the DNS server (172.16.211.100) on localserver is up, you should be able to resolve hostnames:

nslookup baseimage
nslookup baseimage.dev.geekcoding101local.com
hostname -f
hostname -s

[infobox title="Tips"]

Tips: Network Interface Names

The network adapter name ens160 in Rocky 9 is assigned based on Predictable Network Interface Names (PNIN), a naming convention introduced in systemd v197 to ensure stable and predictable interface names across reboots and hardware changes. The name ens160 specifically follows the "Firmware/BIOS Index-based Naming" scheme, where:

e stands for Ethernet.
n indicates it's a network device.
s160 refers to the firmware (BIOS/UEFI) assigned index, which is based on how the hypervisor or hardware presents the device.

Why is it `ens160`?

On VMware, the ens160 interface name is commonly assigned because VMware presents the first virtual NIC with firmware index 160. This is specific to VMware’s implementation.

Is it Consistent Across All Rocky Linux 9 Installs?

Not necessarily. The naming depends on the hardware and hypervisor:

VMware: The first NIC is typically named ens160 because of VMware’s firmware enumeration.
Physical Machines: The first NIC may be named ens3, ens5f0, enp1s0, eno1, etc., depending on:
- PCI bus topology (enpXYSZ for PCI enumeration).
- Onboard NICs (enoX for motherboard NICs).
- BIOS/firmware-assigned index (ensX for BIOS indexing).
Other Hypervisors:
- KVM/QEMU: Uses ens3 or enp1s0 (based on PCI bus mapping).
- Hyper-V: Uses eth0 or ensX.

Can You Change It?

Yes, if you want to ensure consistent naming across environments, you can override it using:

udev rules (/etc/udev/rules.d/70-persistent-net.rules)
GRUB kernel parameters (disable PNIN):
```
grubby --update-kernel=ALL --args="net.ifnames=0 biosdevname=0"
```
This will revert to eth0, eth1, etc.

This is out of the scope of our current blog post, feel free to add a comment if you want to see a post about how to override it.

[/infobox]

:::info

Do you like above style of Tips? Hope so! I will test out this format in this Kubernetes tutorial, let me know!

:::

Once system is up, let's disable firewalld, obvisouly I don't stuck due to any firewall issue as a base VM image (We will turn it on when setting up Kubernetes cluster):

systemctl stop firewalld
systemctl disable firewalld
systemctl mask firewalld

disable: Disables the service from starting automatically at boot but doesn't prevent manual starts.
mask: Prevents the service from being started manually or automatically by creating a link to /dev/null.

Thanks for your reading! I hope you enjoy my kubernetes tutorial so far!

Update Packages

Here I installed my favorite packages/tools, including vim, tmux, zsh and etc.

You can add your own essentials tools in below list so that you can get it on every new VM cloned from this base VM image:

dnf update -y
dnf install vim wget git tmux perl-Time-HiRes bind-utils util-linux-user zsh -y

perl-Time-HiRes: required by tmux to show time.
bind-utils: provides nslookup and other DNS related tools
util-linux-user: provides chsh

Setup password-less SSH authentication from local to VM

I love this! It's a must for any development environment settings!

It's so annoying if you need to type password at every login!

[infobox title="Tips"]

Rocky Linux 9 DVD has installed SSHD server by default.

[/infobox]

Typically, we should use ssh-agent for better key management and security, but since this is a base image and we just want password-less access from our local Mac to the new VMs, it's simpler to prepare the authorized_keys file. This way, we can quickly enable password-less authentication without dealing with additional setup or dependencies! That's what I will use in this kubernetes tutorial!

Perform the steps on local machine (mine is the macbook pro) :
```
ssh-keygen -t rsa
```
Just follow the steps, using default settings, you will get your key pairs at ~/.ssh/id_rsa.pub and~/.ssh/id_rsa. Save the output of content of ~/.ssh/id_rsa.pub by cating it:
```
cat ~/.ssh/id_rsa.pub
```
Log into baseimage as root to perform:
```
mkdir -p /root/.ssh/
touch /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
vi /root/.ssh/authorized_keys
```
In above vi editor, paste your content of~/.ssh/id_rsa.pub and save it. Repeat above steps for creating .ssh folder and populate the /home/admin/.ssh/authorized_keys for admin account. Then restart sshd on base VM:
```
systemctl restart sshd
```
Test login to the base VM from your local machine, you will not need to type password:
```
ssh -vv root@172.16.211.3
ssh -vv admin@172.16.211.3
```
Using -vv at this moment is useful, because most likely your first time setup ssh passwordless authentication would fail due to this or that misconfiguration. With -vv you can spot the error message. Good luck!

Set Up Essential Tools

Create a shared tools directory:

mkdir -p /opt/share_tools/bin/
chmod 755 -R /opt/share_tools/

Verify directory permissions:

ls -l /opt | grep share
ls -l /opt/share_tools

Setup Zsh as the Shared Default Shell

Zsh is the first basic thing I want to cover in this Kubernetes tutorial!

Zsh is the default shell on Mac. I want to have it on the Rocky Linux VM as well.

Install Zsh and Packages (Ensure Zsh is Installed)
Zsh should already be installed in the "Update OS and Install Packages" section. This guide is best viewed on GeekCoding101—where it was originally published
Install Oh-My-Zsh
Run the following command to install Oh-My-Zsh:
```
wget https://github.com/robbyrussell/oh-my-zsh/raw/master/tools/install.sh -O - | zsh
```
By default, Oh-My-Zsh is installed in your user’s home directory (~/.oh-my-zsh).
Copy the Checked Out Folder to a Shared Path
Copy the Oh-My-Zsh directory to a shared path:
```
cp -r ~/.oh-my-zsh /usr/share/oh-my-zsh
```
Install Powerlevel10k to a Shared Path
Clone the Powerlevel10k theme into a shared location:
```
git clone --depth=1 https://github.com/romkatv/powerlevel10k.git /usr/share/powerlevel10k
```
You needs to have Patched font to display the icons on shell console. Recommended font can be downloaded at here: Meslo Nerd Font patched for Powerlevel10k.

I am using iTerm2 , so configure the font for the profile as below:
Update ~/.zshrc
Modify your .zshrc to use the shared paths (Here I used ~/.zshrc, but we don't need this for every user, because later we will dump the content of ~/.zshrc to /etc/zshrc for all users):
```
export ZSH="/usr/share/oh-my-zsh" 
ZSH_THEME="powerlevel10k/powerlevel10k"
```
Copy Configured Files to /etc/skel for New Users
Once the configuration is complete, copy the necessary files to the /etc/skel directory for new users:
```
cp ~/.zshrc /etc/skel/ 
cp ~/.p10k.zsh /etc/skel/ 
chmod 644 /etc/skel/.zshrc /etc/skel/.p10k.zsh
```
Set Default Shell for New Users
Update /etc/default/useradd to set Zsh as the default shell for new users:
```
SHELL=/bin/zsh
```

Modify `/etc/zshrc` for Shared Configuration

At the end of /etc/zshrc, add the following to handle SSH sessions:

# Check if this is an SSH session 
# If not, launch bash because console fonts couldn't support oh-my-zsh 
if [[ ! -n "$SSH_CONNECTION" ]]; then 
  exec /bin/bash
fi

You might notice above screenshot has a very nice status bar in Vim, let me know in comments if you want to know how I customized my Vim ^^ (This guide is best viewed on GeekCoding101—where it was originally published. 🚀) Append the .zshrc configuration into /etc/zshrc:

cat .zshrc >> /etc/zshrc

Set Up Global Aliases in `/etc/zshenv`

Populate /etc/zshenv with the my favorite aliases:

alias ls='ls -G'
alias ll='ls -G -l'
alias la='ls -G -la'
# Git
alias gs='git status '
alias ga='git add '
alias gb='git branch '
alias gba='git branch -a'
alias gbd='git branch -d'
alias gbr='git branch -r'
alias gc='git commit '
alias gd='git diff '
alias gdh='git diff HEAD '
alias gco='git checkout '
alias glg='git log  --graph   --name-only '
# Get pods
alias k=kubectl
alias kg='kubectl get'
alias kga='kubectl get all --all-namespaces'
alias kgns="kubectl get ns --show-labels"
alias kgp="kubectl get pods -o wide"
alias kgpn="kubectl get pods -o wide -n "
alias kgpa="kubectl get pods -A -o wide"
alias kgpjson='kubectl get pods -o=json'                 # options: -n <ns> <pn>
alias kgpsys='kubectl --namespace=kube-system get pods'
alias kgs="kubectl get service -o wide"
alias kgsn="kubectl get service -o wide -n"
alias kgn="kubectl get nodes -o wide"

# Describe
alias k=kubectl
alias kdns='kubectl describe namespace'
alias kdn='kubectl describe node'
alias kdpn="kubectl describe pod -n"            # options: -n <ns> <pn>

# Delete
alias krm='kubectl delete'
alias krmf='kubectl delete -f'
alias krming='kubectl delete ingress'
alias krmingl='kubectl delete ingress -l'
alias krmingall='kubectl delete ingress --all-namespaces'
# Misc
alias ka='kubectl apply -f'
alias klo='kubectl logs -f'
alias kex='kubectl exec -i -t'

export GPG_TTY=$(tty)
export SHARE_TOOLS="/opt/share_tools/bin/"
export PATH=${SHARE_TOOLS}:$PATH

Update `zsh` For Existing Users

I know this is our first VM, but just in case you want to configure on your existing VM, for users created before setting up Oh-My-Zsh and Powerlevel10k, update their shell to Zsh (replace the ${targetuser} with your real username):

chsh -s /bin/zsh ${targetuser}

Maintenance

For future maintenance purpose, we only need to update the following files:

/etc/zshenv
/etc/zshrc

Oh-My-Zsh will check update and prompt when everytime you login. So no need worry there!

:::info

Thanks for your reading! So far so good? I hope you enjoy my kubernetes tutorial! If any feedback, feel free to leave your comments!

:::

Configure Tmux for Multi-Session Management

Ever had SSH sessions drop in the middle of a deployment? Or needed to juggle multiple terminals like a hacker in a sci-fi movie? tmux solves it all. With persistent sessions, split panes, and the ability to detach and reattach at will, I can effortlessly manage multiple Kubernetes nodes, tail logs, and run long processes without worrying about losing my progress. It’s basically my command-line command center, a friend of Kubernetes cluster administrator, and once you get hooked, there’s no going back. 🚀 This is another must for any development environment! Let me show you the tricks in this kubernetes tutorial!

Tmux Launcher Script

I want to luanch Tmux automatically when SSH into the VM, it needs a script to launch it and hook it into zsh launch.

Create a script for launching tmux:

vim /opt/share_tools/bin/launch_tmux.sh
chmod +x /opt/share_tools/bin/launch_tmux.sh

Script contents:

#!/bin/zsh

SESSION_NAME="k8s"

if [ -z "$TMUX" ]; then
  tmux has-session -t ${SESSION_NAME} 2>/dev/null

  if [[ $? != 0 ]]; then
    tmux new-session -s ${SESSION_NAME}
  else
    tmux ls | grep -q "${SESSION_NAME}:.*(attached)"
    if [[ $? == 0 ]]; then
      tmux new-session
    else
      tmux attach -t ${SESSION_NAME}
    fi
  fi
else
  echo "Tmux session $SESSION_NAME already exists."
fi

Append below into /etc/zshrc:

# Launch tmux
# Check if the user is connected via SSH
if [[ -n "$SSH_CONNECTION" ]]; then
  # Launch the tmux script
  /opt/share_tools/bin/launch_tmux.sh
fi

Now open new ssh will see (default tmux + oh-my-zsh + powerlevel10k):

Install and Configure gpakosz/.tmux.git for Tmux

The UI of above TMUX was too plain?! Not cool !

Okay, let's customize it a little bit with gpakosz/.tmux.git!

Log into the base VM as root to perform (This guide is best viewed on GeekCoding101—where it was originally published):

git clone https://github.com/gpakosz/.tmux.git /opt/gpakosz.tmux/
ln -s /opt/gpakosz.tmux/.tmux.conf /etc/tmux.conf

Set below in /etc/zshenv:

# For make tmux for all users on the VM, must define TMUX_CONF to /etc/tmux.conf.
# It won't work if set TMUX_CONF to other values, like "/opt/gpakosz.tmux/.tmux.conf"
export TMUX_CONF="/etc/tmux.conf"
export TMUX_CONF_LOCAL="/opt/gpakosz.tmux/.tmux.conf.local"

Create the file link:

ln -s /opt/gpakosz.tmux/.tmux.conf /etc/tmux.conf

Ensure perl-Time-HiRes is installed at the Update_Packages step.

Customize the TMUX theme

Append below into /opt/gpakosz.tmux/.tmux.conf.local before the line # -- custom variables:

# increase history size
set -g history-limit 9999999
# start with mouse mode enabled
set -g mouse on

bind-key -n C-S-Left swap-window -t -1\; select-window -t -1
bind-key -n C-S-Right swap-window -t +1\; select-window -t +1

# -- custom variables ----------------------------------------------------------

Still in /opt/gpakosz.tmux/.tmux.conf.local, find mode-keys vi and uncomment it:

Continue customization in/opt/gpakosz.tmux/.tmux.conf.local.

I'd like to change some color, just follow me find below settings and update as below:

tmux_conf_theme_left_separator_main='\uE0B0'  # /!\ you don't need to install Powerline
tmux_conf_theme_left_separator_sub='\uE0B1'   # you only need fonts patched with
tmux_conf_theme_right_separator_main='\uE0B2' # Powerline symbols or the standalone
tmux_conf_theme_right_separator_sub='\uE0B3'  # PowerlineSymbols.otf font, see README.md

tmux_conf_theme_status_left=" ☮️ #S | "

# status right style
tmux_conf_theme_status_right_fg="$tmux_conf_theme_colour_12,$tmux_conf_theme_colour_14,$tmux_conf_theme_colour_6"
tmux_conf_theme_status_right_bg="$tmux_conf_theme_colour_15,$tmux_conf_theme_colour_17,$tmux_conf_theme_colour_9"

tmux_conf_theme_left_separator_main='\uE0B0'
tmux_conf_theme_left_separator_sub='\uE0B1'
tmux_conf_theme_right_separator_main='\uE0B2'
tmux_conf_theme_right_separator_sub='\uE0B3'

Now take a look (I just used the localserver VM we will create later to take a screenshot, the top right "localserver" is set by iTerm2) !

:::info

Do you feel boring so far? I hope not! If any feedback about this kubernetes tutorial, looking forward to seeing your comments!

:::

Install Miniforge

I haven't thought about what exact use case I need Python in this Kubernetes environment, but I want to have Python management toolkit ready on the base image so that it can become handy in future. Let's cover this in thisKubernetes tutorial as well!

In my development environment, e.g. this Kubernetes cluster environment, I prefer Miniforge over Conda to manage Python, because -- why deal with the bloated, corporate-flavored Anaconda distribution when you can have a lightweight, community-driven alternative that just works? 🚀 Miniforge gives you the same Conda package management power, but without the unnecessary packages, keeping it fast and minimal.

The installation is simple.

Run curl command to download from here.

Then install it to /opt/miniforge3 so that every user can use it:

curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
chmod +x Miniforge3-Linux-x86_64.sh

❯ ./Miniforge3-Linux-x86_64.sh -h

usage: ./Miniforge3-Linux-x86_64.sh [options]

Installs Miniforge3 24.11.3-0
-b           run install in batch mode (without manual intervention),
             it is expected the license terms (if any) are agreed upon
-f           no error if install prefix already exists
-h           print this help message and exit
-p PREFIX    install prefix, defaults to /root/miniforge3, must not contain spaces.
-s           skip running pre/post-link/install scripts
-u           update an existing installation
-t           run package tests after installation (may install conda-build)

❯ ./Miniforge3-Linux-x86_64.sh -p /opt/miniforge3

During the installation, it also asked me to update Shell, I answered yes:

After installation, I noticed that /etc/zshrc got updated as below:

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/miniforge3/bin/conda' 'shell.zsh' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/miniforge3/etc/profile.d/conda.sh" ]; then
        . "/opt/miniforge3/etc/profile.d/conda.sh"
    else
        export PATH="/opt/miniforge3/bin:$PATH"
    fi
fi
unset __conda_setup

if [ -f "/opt/miniforge3/etc/profile.d/mamba.sh" ]; then
    . "/opt/miniforge3/etc/profile.d/mamba.sh"
fi
# <<< conda initialize <<<

You can see it set PATH in above, but just to be safe, in order to find programs under /opt/miniforge3/bin, I also manually updated my /etc/zshenv as below:

export MINIFORGE="/opt/miniforge3/bin"
export PATH=${MINIFORGE}:${SHARE_TOOLS}:$PATH

Let's run a test:

❯ conda env list

# conda environments:
#
base                 * /opt/miniforge3

:::info

Do you like my kubernetes tutorial so far? Rate it a 5 star!

:::

Install and Configure Ansible

Okay, now it's time to install Ansible in this Kubernetes tutorial. Let's use Ansible to manage the operations in Kubernetes nodes.

dnf install epel-release -y
dnf install ansible -y

Generate a default configuration file:

ansible-config init --disabled > /etc/ansible/ansible.cfg

Update /etc/zshenv to append below line:

export ANSIBLE_CONFIG=/etc/ansible/ansible.cfg

Source /etc/zshenv:

source /etc/zshenv

Update /etc/ansible/ansible.cfg:

[defaults]
inventory = /etc/ansible/hosts
log_path = /var/log/ansible.log
host_key_checking = False
retry_files_enabled = False
timeout = 10
display_skipped_hosts = False

Verify installation:

[warningbox title="Tips:"]

If not add export line in /etc/zshenv and source it, then ansible --version will use /root/ansible.cfg, like this:

[/warningbox]

Configure Ansible Hosts

Edit /etc/ansible/hosts:

[base]
baseimage ansible_host=172.16.211.3 ansible_user=root ansible_ssh_private_key_file=~/.ssh/ansible_ed25519

[application_servers]
localserver ansible_host=172.16.211.100 ansible_user=root ansible_ssh_private_key_file=~/.ssh/ansible_ed25519

[k8s_cluster]
k8s-1 ansible_host=172.16.211.11 ansible_user=root ansible_ssh_private_key_file=~/.ssh/ansible_ed25519
k8s-2 ansible_host=172.16.211.12 ansible_user=root ansible_ssh_private_key_file=~/.ssh/ansible_ed25519
k8s-3 ansible_host=172.16.211.13 ansible_user=root ansible_ssh_private_key_file=~/.ssh/ansible_ed25519
k8s-4 ansible_host=172.16.211.14 ansible_user=root ansible_ssh_private_key_file=~/.ssh/ansible_ed25519

SSH Key Setup for Ansible

Let's generate a new key pair for Ansible purpose, also for easy maintenance and isolation:

ssh-keygen -t ed25519 -C "ansible-key" -f ~/.ssh/ansible_ed25519
ssh-copy-id -i ~/.ssh/ansible_ed25519.pub root@172.16.211.3

Test SSH access

ssh -i ~/.ssh/ansible_ed25519 root@172.16.211.3

Run a quick Ansible test:

ansible baseimage -m ping

Example output:

[warningbox title="Tips"]

Tips: Why do we need ansible_ssh_private_key_file in`/etc/ansible/hosts` ?

If without it, you might see below output in ping test:

[/warningbox]

[infobox title="Preview of next post"]

I have written another Ansible script to sync specific account's ssh key to target machine!

You will see it soon in coming kubernetes tutorial!

[/infobox]

Create configure_vm.yml Script

Think about it—cloning the base image is easy, but manually setting the hostname, network, and other configs for every VM? No thanks! That’s way too much repetitive work. 😵‍💫 I can't tolerate such cumbersome in my kubernetes tutorial!

So, being the efficiency-loving geek that I am, I wrote a script at:
📌 /opt/share_tools/bin/configure_vm.yml

With this, after cloning this base image for our Kubertenets cluster setup, I can just feed in an input file, run the script, and boom—it automatically configures each VM with the right settings. Less typing, fewer mistakes, and more time for the fun stuff. Let’s put this script to work! 🚀

---
- hosts: localhost
  gather_facts: no
  vars:
    input_file: "{{ input_file_path | default('input.json') }}"
    config: "{{ lookup('file', input_file) | from_json }}"
    ansible_key_path: "{{ config.ansible_key_path | default('~/.ssh/ansible_ed25519') }}"
    ssh_key_path: "{{ config.ssh_key_path | default('~/.ssh/ssh_ed25519') }}"

  tasks:
    # Handle Ansible SSH Key
    - name: Check if Ansible SSH private key exists
      stat:
        path: "{{ ansible_key_path }}"
      register: ansible_key_exists

    - name: Remove existing Ansible SSH private key if present
      file:
        path: "{{ ansible_key_path }}"
        state: absent
      when: ansible_key_exists.stat.exists

    - name: Remove existing Ansible SSH public key if present
      file:
        path: "{{ ansible_key_path }}.pub"
        state: absent
      when: ansible_key_exists.stat.exists

    - name: Generate Ansible SSH key pair
      ansible.builtin.openssh_keypair:
        path: "{{ ansible_key_path }}"
        type: ed25519
        state: present
        comment: "ansible@{{ config.hostname }}"

    # Handle SSH Connection Key
    - name: Check if SSH private key exists
      stat:
        path: "{{ ssh_key_path }}"
      register: ssh_key_exists

    - name: Remove existing SSH private key if present
      file:
        path: "{{ ssh_key_path }}"
        state: absent
      when: ssh_key_exists.stat.exists

    - name: Remove existing SSH public key if present
      file:
        path: "{{ ssh_key_path }}.pub"
        state: absent
      when: ssh_key_exists.stat.exists

    - name: Generate SSH key pair for SSH connection
      ansible.builtin.openssh_keypair:
        path: "{{ ssh_key_path }}"
        type: ed25519
        state: present
        comment: "ssh@{{ config.hostname }}"

    - name: Debug the resolved SSH key paths for verification
      debug:
        msg: |
          The Ansible SSH key path is {{ ansible_key_path }}
          The SSH connection key path is {{ ssh_key_path }}

    # Network and Hostname Configuration
    - name: Set IP address and gateway using nmcli
      command: "nmcli con mod ens160 ipv4.addresses {{ config.ip }}/{{ config.subnet }} ipv4.gateway {{ config.gateway }} ipv4.dns '{{ config.dns1 }} {{ config.dns2 }}' ipv4.method manual"
      ignore_errors: yes

    - name: Bring up the connection
      command: nmcli con up ens160
      ignore_errors: yes

    - name: Set the hostname
      command: hostnamectl set-hostname "{{ config.hostname }}"

    - name: Update /etc/hosts - remove baseimage
      lineinfile:
        path: /etc/hosts
        regexp: 'baseimage'
        state: absent

    - name: Update /etc/hosts - add new hostname
      lineinfile:
        path: /etc/hosts
        line: "{{ config.ip }} {{ config.hostname }}.{{ config.domain }} {{ config.hostname }}"
        state: present

    - name: Update /etc/zshenv to set ANSIBLE_CONFIG environment variable
      lineinfile:
        path: /etc/zshenv
        line: "export ANSIBLE_CONFIG=/etc/ansible/ansible.cfg"
        create: yes

This one must be the longest script in current Kubernetes tutorial post!

It's actually simple. By the way, I will always show the complete code in my kubernetes tutorial, no worry missing any code. If you spot any, comment it immediately to let me know!

It configures the VM by setting up SSH keys, network settings, hostname, and environment variables. It reads configuration details from a JSON input file (input.json by default) and applies the following steps:

1. SSH Key Management

Ensures that both Ansible SSH keys (Remember I generated a different key pair for ansible purpose) and regular SSH keys are properly configured:
- Removes existing keys if they are present, that's the ones came from base image.
- Generates new Ed25519 SSH key pairs for Ansible automation and regular SSH access.

2. Network & Hostname Configuration

Configures the machine's IP address, gateway, and DNS using nmcli.
Brings up the modified network connection.
Sets the machine's hostname using hostnamectl.
Updates /etc/hosts:
- Removes any references to baseimage.
- Adds a new entry for the machine's IP and domain.

3. Environment Variable Setup

Ensures that the Ansible configuration path is set in /etc/zshenv.

[warningbox title="Warning"]

Remember the Newwork Interface Name? You need to update ens160 in above script to your network interface name!

My bad! I should have parameterize it for the script!

[/warningbox]

This script is designed for our initial VM provisioning, ensuring SSH access, correct network configuration, and proper hostname resolution. It really makes our Kubernetes cluster setup easier!

It's fantastic!

Base VM/Image of Kubernetes is done! Clean Up!

So now our base VM for Kubernetes is ready. You think that's the end of this Kubernetes tutorial?! No way! It's just a start!

Since we will clone it to new VMs, let's clean up the logs and stale configuration.

I created below script /opt/share_tools/bin/clean_up.sh to do the clean up job!

#!/bin/bash

echo "Starting system cleanup..."

# Remove all non-builtin users except 'admin', 'nobody' and reserved users
USERS=$(awk -F: '($3 >= 1000 && $1 != "admin" && $1 != "nobody") {print $1}' /etc/passwd)
for USER in $USERS; do
    echo "Deleting user: $USER"
    userdel -r $USER
done

# Clean up system logs and temporary files
log_dirs=(
    "/var/log"
    "/var/tmp"
    "/tmp"
)

# Find and delete log and temp files, and print deleted files
for dir in "${log_dirs[@]}"; do
    echo "Cleaning directory: $dir"
    find "$dir" -type f -name "*.log" -print -exec rm -f {} \;
    find "$dir" -type f -name "*.tmp" -print -exec rm -f {} \;
done

echo "Cleaning up package manager cache..."
dnf clean all

echo "Rotating and cleaning journal logs..."
journalctl --rotate
journalctl --vacuum-time=1s

# Remove all non-hidden files under /root except anaconda-ks.cfg
echo "Keeping anaconda-ks.cfg and removing other non-hidden files under /root..."
find /root/ -maxdepth 1 -type f ! -name "anaconda-ks.cfg" -not -name ".*" -print -exec rm -f {} \;
rm -frv /root/.cache 
echo "" > /root/.zsh_history

# Remove all non-hidden files under /home/admin/
echo "Removing all non-hidden files under /home/admin/..."
find /home/admin/ -maxdepth 1 -type f -not -path '*/\.*' -print -exec rm -f {} \;

# Clean up command history
> /home/admin/.bash_history
> /home/admin/.zsh_history
> /root/.bash_history
> /root/.zsh_history
echo "System cleanup complete."

Just run it once before we shutdown this base VM:

/opt/share_tools/bin/clean_up.sh

Hooray!

Spent several days crafting this Part 1 post for my kubernetes tutorial — because if I’m doing this, I’m doing it right. My mission? To deliver the best damn Kubernetes cluster setup tutorial on the internet! 🚀

Up next, in mykubernetes tutorial Part 2, I’ll walk you through setting up a localserver to handle DNS and NTP services within our Kubernetes cluster environment, laying the foundation for a fully functional Kubernetes cluster. With some luck (and zero typos in config files), we’ll have our nodes talking to each other in no time. Stay tuned! 😎

:::info Love my kubernetes tutorial? Rate it a 5 start ! :::

:::info You're on a roll! Don't stop now—check out the full series and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, current one.

🚀 In Part 2, I walked through configuring a local DNS server and NTP server, essential for stable name resolution and time synchronization across nodes locally. These foundational steps will make our Kubernetes setup smoother

🚀 In Part 3, I finished the Kubernetes cluster setup with Flannel, got one Kubernetes master and 4 worker nodes that’s ready for real workloads.

🚀 In Part 4, I explored NodePort and ClusterIP,understood the key differences, use cases, and when to choose each for internal and external service access!🔥

🚀 In Part 5, explored how to use externalName and LoadBalancer and how to run load testing with tool hey. :::

Ultimate Kubernetes Tutorial Part 2: DNS server and NTP server Configuration

GeekCoding101 — Mon, 03 Mar 2025 00:00:00 GMT

Introduction

Hey there! Ready to take this Kubernetes setup to the next level? 🚀 In Part 1, we got our base VM image up and running—nice work! Now, in Part 2, I am going to clone that image to set up a local server as a DNS server and NTP server. I was considering to incorporate the steps to setup Kubernetes master and worker nodes, but seems too much. Anyway, a real cluster is coming soon! 😎

Excited? Let’s dive in and make some magic happen. 🔥

Create `localserver` VM

:::info

This DNS server isn’t to replace CoreDNS in Kubernetes, which is used inside Kubernetes for service discovery. Instead, it’s a local DNS server for VMs to resolve hostnames within the private network. This ensures that all nodes (master and workers) can communicate using hostnames instead of IP addresses, making cluster management smoother. 🚀

:::

Clone from Base Image Rocky 9

vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/baseimage-rocky9.vmwarevm/baseimage-rocky9.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/localserver.vmwarevm/localserver.vmx full
sed -i '' 's/displayName = "Clone of baseimage-rocky9"/displayName = "localserver"/' "/Users/geekcoding101.com/Virtual Machines.localized/localserver.vmwarevm/localserver.vmx"
cat "/Users/geekcoding101.com/Virtual Machines.localized/localserver.vmwarevm/localserver.vmx" | grep disp

Above commands is to clone the base VM image (display name in VMFusion is Clone of baseimage-rocky9) as a new one, then update the display name of the new VM to localserver instead of Clone of baseimage-rocky9.

Now, you probably need to run a scan in VMware Fusion to see the newly added VM:

Customize the Local Server VM

First, stop the baseimage VM and start the localserver VM to avoid network conflict.

Now we can SSH as root into the localserver VM by using the IP172.16.211.3 of the base VM.

Remember the script /opt/share_tools/bin/configure_vm.yml we created in Ultimate Kubernetes Tutorial - Setting Up a Thriving Multi-Node Cluster on Mac: Part 1.

Let's preapre the input file /opt/share_tools/init_data/localserver_vm_input.json:

{
  "hostname": "localserver",
  "ip": "172.16.211.100",
  "subnet": "24",
  "gateway": "172.16.211.2",
  "dns1": "8.8.8.8",
  "dns2": "8.8.4.4",
  "domain": "dev.geekcoding101local.com",
  "ansible_key_path": "~/.ssh/ansible_ed25519",
  "ssh_key_path": "~/.ssh/ssh_ed25519"
}

I would suggest now you use the VMFusion console to run the following command instead of in the SSH terminal, because it will change the IP and might interrupt the SSH connection results script failure:

ansible-playbook /opt/share_tools/bin/configure_vm.yml -e "input_file_path=/opt/share_tools/init_data/localserver_vm_input.json"

As it suggested in the input json file, the script will:

Update hostname to localserver
Configure the network interface defined in script to 172.16.211.100 (my case is ens160)
Generate SSH keys for both Ansible and normal SSH
Apply other necessary settings

Once done, you should be able to connect via SSH with the new IP address.

Setting Up DNS Server

Now that the localserver is up, let's install and configure the DNS server.

dnf update -y
dnf install bind -y

Now let's start update the BIND configuration.

First one is /etc/named.conf:

❯ cat /etc/named.conf
//
// named.conf
//
// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS
// server as a caching only nameserver (as a localhost DNS resolver only).
//
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//

options {
        listen-on port 53 { 127.0.0.1; 172.16.211.100; };
        listen-on-v6 port 53 { ::1; };
        directory     "/var/named";
        dump-file     "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named_stats.txt";
        memstatistics-file "/var/named/data/named_mem_stats.txt";
        secroots-file    "/var/named/data/named.secroots";
        recursing-file    "/var/named/data/named.recursing";
        allow-query     { any; };
        forwarders {
          8.8.8.8;  # Google's DNS as a fallback
        };

        /*
         - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
         - If you are building a RECURSIVE (caching) DNS server, you need to enable
           recursion.
         - If your recursive DNS server has a public IP address, you MUST enable access
           control to limit queries to your legitimate users. Failing to do so will
           cause your server to become part of large scale DNS amplification
           attacks. Implementing BCP38 within your network would greatly
           reduce such attack surface
        */
        recursion yes;

        dnssec-validation yes;

        managed-keys-directory "/var/named/dynamic";
        geoip-directory "/usr/share/GeoIP";

        pid-file "/run/named/named.pid";
        session-keyfile "/run/named/session.key";

        /* https://fedoraproject.org/wiki/Changes/CryptoPolicy */
        include "/etc/crypto-policies/back-ends/bind.config";
};

logging {
        channel default_debug {
                file "data/named.run";
                severity dynamic;
        };
};

zone "." IN {
    type hint;
    file "named.ca";
};
zone "dev.geekcoding101local.com" IN {
    type master;
    file "/var/named/dev.geekcoding101local.com.zone";
    allow-update { none; };
};
zone "211.16.172.in-addr.arpa" IN {
    type master;
    file "/var/named/211.16.172.in-addr.arpa.zone";
    allow-update { none; };
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";

Second file is DNS zone file /var/named/dev.geekcoding101local.com.zone:

❯ cat /var/named/dev.geekcoding101local.com.zone
$TTL 86400
@   IN  SOA ns1.dev.geekcoding.com. root.dev.geekcoding101local.com. (
            2024010103  ; Serial
            3600        ; Refresh
            1800        ; Retry
            1209600     ; Expire
            86400 )     ; Minimum TTL

@   IN  NS  localserver.dev.geekcoding101local.com.
localserver IN A 172.16.211.100  ; IP of localserver DNS

k8s-1 IN A 172.16.211.11
k8s-2 IN A 172.16.211.12
k8s-3 IN A 172.16.211.13
k8s-4 IN A 172.16.211.14
k8s-5 IN A 172.16.211.15

devbox IN A 172.16.211.99
; Local server entry
localserver IN A 172.16.211.100

Third file is /var/named/211.16.172.in-addr.arpa.zone:

❯ cat /var/named/211.16.172.in-addr.arpa.zone
$TTL 86400
@   IN  SOA ns1.dev.geekcoding.com. root.dev.geekcoding101local.com. (
            2024010103  ; Serial
            3600        ; Refresh
            1800        ; Retry
            1209600     ; Expire
            86400 )     ; Minimum TTL

@   IN  NS  localserver.dev.geekcoding101local.com.

; PTR Records
11  IN  PTR k8s-1.dev.geekcoding101local.com.
12  IN  PTR k8s-2.dev.geekcoding101local.com.
13  IN  PTR k8s-3.dev.geekcoding101local.com.
14  IN  PTR k8s-4.dev.geekcoding101local.com.
15  IN  PTR k8s-5.dev.geekcoding101local.com.
99  IN  PTR devbox.dev.geekcoding101local.com.
100 IN  PTR localserver.dev.geekcoding101local.com.

:::warning

If no reverse zone, you will hit:

❯ nslookup 172.16.211.100
** server can't find 100.211.16.172.in-addr.arpa: NXDOMAIN

:::

Let's run a test on the files for syntax check:

named-checkzone dev.geekcoding101local.com /var/named/dev.geekcoding101local.com.zone
named-checkzone 211.16.172.in-addr.arpa /var/named/211.16.172.in-addr.arpa.zone

Restart DNS Service:

systemctl restart named
systemctl status named

:::info

You might see Unable to fetch DNSKEY error in above, we can ignore, as we don't need DNSKEY.

:::

Let's run a nslookup test:

:::warning

Every time after modifying a DNS zone file, we need to increment the serial number.

The serial number is in the format YYYYMMDD##.

For example, if the current serial number is 2024010101, and you're making a second change on the same day, update it to 2024010102.

:::

Setting Up NTP Server

To ensure time synchronization across all nodes just in case internet issue as we're running all nodes on my laptop, let's setup NTP server Chrony.

:::info

Chrony is an implementation of the Network Time Protocol (NTP). It is an alternative to ntpd, a reference implementation of NTP.

:::

dnf install chrony -y

Modify the configuration, actually just one line change:

[root@localserver ~]# cat /etc/chrony.conf
...
allow 172.16.211.0/24
...

Start the NTP Service:

systemctl restart chronyd
systemctl status chronyd

Verify with:

❯ chronyc sources -v

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current best, '+' = combined, '-' = not combined,
| /             'x' = may be in error, '~' = too variable, '?' = unusable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? 65-100-46-164.dia.static>     1   6   377    16    -33ms[  -33ms] +/- 47ms
^? ntp3.radio-sunshine.org       2   6   377    15    -81ms[  -81ms] +/- 120ms
^? server.slakjd.com             3   6   377    16    -71ms[  -71ms] +/- 44ms
^? kjsl-fmt2-net.fmt2.kjsl.>     2   6   377    16    -65ms[  -65ms] +/- 8139us
^? localserver.dev.geekcodi>     0   6     0     - +0ns[   +0ns] +/- 0ns
[root@localserver ~]#

Wrapping Up

At this point, our localserver is now running DNS and NTP services 🚀

In Part 3, I will:

Configure a Kubernetes base image
Spin up the master node and 4 worker nodes with the Kubernetes base image
Setup the K8s Master Node
Join the worker nodes to the cluster

Stay tuned, and let’s keep this cluster rolling! 🚀🔥

:::info You're on a roll! Don't stop now—check out the full series and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, I laid out the networking plan, my goals for setting up Kubernetes, and how to prepare a base VM image for the cluster.

🚀 In Part 2, current post.

🚀 In Part 3, I finished the Kubernetes cluster setup with Flannel, got one Kubernetes master and 4 worker nodes that’s ready for real workloads.

🚀 In Part 4, I explored NodePort and ClusterIP,understood the key differences, use cases, and when to choose each for internal and external service access!🔥

🚀 In Part 5, explored how to use externalName and LoadBalancer and how to run load testing with tool hey. :::

Ultimate Kubernetes Tutorial Part 3: A Streamlined Kubernetes cluster setup

GeekCoding101 — Sun, 09 Mar 2025 00:00:00 GMT

Introduction

Welcome back to the Kubernetes tutorial series! Now that our base image and local server are ready, it’s time for the real action—Kubernetes cluster setup with Flannel. I'll spin up one Kubernetes master and 4 worker nodes, forming a local Kubernetes cluster that’s ready for real workloads. No more theory—let’s build something real! 🚀

Clone baseimage to k8s-1 as The Kubernetes VM Base Image

Before jump on our Kubernetes cluster setup, let's start from my Mac's terminal, clone from Base Image - Rocky 9 as k8s-base:

❯ vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/baseimage-rocky9.vmwarevm/baseimage-rocky9.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-1.vmwarevm/k8s-1.vmx full
❯ sed -i '' 's/displayName = "Clone of baseimage-rocky9"/displayName = "k8s-1"/' "/Users/geekcoding101.com/Virtual Machines.localized/k8s-1.vmwarevm/k8s-1.vmx"

Make sure you've stopped the baseimage VM, start the k8s-base VM.

The steps here I've mentioned details in Part 2, in short, after above command, we need to rescan in VMFusion and SSH as root into the k8s-base using the IP172.16.211.3 of the base VM, preapre the input file /opt/share_tools/init_data/k8s-1_vm_input.json:

{
  "hostname": "k8s-1",
  "ip": "172.16.8.11",
  "subnet": "24",
  "gateway": "172.16.211.2",
  "dns1": "172.16.211.100",
  "dns2": "8.8.8.8",
  "domain": "dev.geekcoding101local.com",
  "ansible_key_path": "~/.ssh/ansible_ed25519",
  "ssh_key_path": "~/.ssh/ssh_ed25519"
}

Then using VMFusion console to login into the VM, perform below command to generate SSH keys and setup networking:

ansible-playbook /opt/share_tools/bin/configure_vm.yml -e "input_file_path=/opt/share_tools/init_data/k8s-1_vm_input.json"

Now I can connect from SSH passwordlessly via the new IP 172.16.8.11.

Test DNS

Please note here is testing our local DNS server to ensure it's working in our Kubernetes cluster setup. But it's not going to replace CoreDNS...

Anyway, ensure the DNS server localserver(172.16.211.100) we setup in Part 2 is running.

Ensure the 172.16.211.100 is on top of /etc/resolv.conf , should be same as below:

❯ cat /etc/resolv.conf
# Generated by NetworkManager
search localdomain dev.geekcoding101local.com
nameserver 172.16.211.100
nameserver 8.8.8.8
nameserver 172.16.68.2

The 172.16.68.2 is assigned by ens224, the network adapter we added into VM for internet access. BecauseVMware Fusion typically assigns 172.16.68.1 and 172.16.68.2 as DNS servers for virtual machines when using NAT (Network Address Translation) networking.

Test DNS and hostname as below as root:

nslookup k8s-1
nslookup k8s-1.dev.geekcoding101local.com
hostname -f
hostname -s

:::warning However, if you have 172.16.68.2 on top of /etc/resolv.conf, you would hit:

I don't recommend to manually update /etc/resolv.conf to fix it as you've seen "Generated by NetworkManager".

The reason why 172.16.68.2 is ahead of 172.16.211.100 is the order of nmcli command on the network adatper ens160 and ens224.

Now if we shut down ens224 and bring up it again,172.16.68.2 will be shown at the bottom of /etc/resolv.conf:

nmcli dev down ens224
nmcli dev up ens224

:::

Setup Docker

In k8s-1 SSH session, now we can install common packages required by Kubernetes cluster setup.

In this Kubernetes cluster setup, we will use k8s-1 as a base image, so we can easily clone it as k8s-2, k8s-3, k8s-4 and k8s-5 without repeat the common packages installation!

dnf update -y

dnf config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
dnf install docker-ce docker-ce-cli containerd.io socat -y
systemctl enable --now docker

systemctl start docker
systemctl status docker

:::info socat is required by kubeadmin init which we will run later, without it, you will hit a warning as below: :::

:::warning Docker-ce and docker-ce-cli are not necessary and deprecated since Kubernetes 1.20+, as Kubernetes no longer relies on Docker as its container runtime. However, Docker can still be useful in a development environment, offering a full containerization platform that includes tools for building images, managing networks, volumes, and simple orchestration.

But it will introduce a problem in /etc/containerd/config.toml that it will disable cri like this:

disabled_plugins = ["cri"]

This has an important impact on how containerd integrates with Kubernetes.

The CRI plugin is specifically needed when we're using containerd as our container runtime for Kubernetes. The CRI plugin enables containerd to communicate with Kubernetes by implementing the Container Runtime Interface (CRI), which Kubernetes uses to manage containers. When using Docker, it will not need cri, that's why above docker installation disabled it. But I've mentioned previously why I want to have docker and I want to usecontainerd as our container runtime, so let's just fix the configuration issue.

Just regenerate the config.toml which will enable cri plugin by default, but needs to manually update SystemdCgroup to true via below commands:

containerd config default | sudo tee /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

Kubernetes uses cgroups for managing resources like CPU, memory, and I/O for containers. If SystemdCgroup = true, containerd integrates with systemd to manage cgroups, which is the preferred method on most modern Linux distributions using systemd as their init system (e.g., Rocky Linux, etc.).

:::

Execute `Docker` Command Without sudo As Non-root Accounts

If running as a non-root account, you will encounter this permission error:

[admin@k8s-01 ~]$ docker ps
permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.47/containers/json": dial unix /var/run/docker.sock: connect: permission denied
[admin@k8s-01 ~]$

So let's add the use into docker group:

[admin@k8s-1 ~]$ sudo usermod -aG docker $(whoami)

Log out of the current session and log in again as admin and you should see docker ps started working now.

Let's test docker (how about switch back to root account to perform the testing...):

docker run hello-world

Configure OS To Support Kubernetes

Disable swap for Kubernetes

swapoff -a
sed -i '/swap/d' /etc/fstab

Configure Linux kernel's networking parameters

br_netfilter: Kubernetes uses network bridges to connect Pods, and the br_netfilter module ensures that iptables can see and manipulate bridged traffic. This is essential for Kubernetes' internal networking (such as inter-Pod communication and service routing).

Overlay in /etc/modules-load.d/k8s.conf: is not needed on rocky 9. Because I found it's already loaded:

❯ modinfo overlay
filename:       /lib/modules/5.14.0-427.37.1.el9_4.x86_64/kernel/fs/overlayfs/overlay.ko.xz
alias:          fs-overlay
license:        GPL
description:    Overlay filesystem
author:         Miklos Szeredi <miklos@szeredi.hu>
rhelversion:    9.4
srcversion:     6DB4565DD58AB453DBFAD2A
depends:
retpoline:      Y
intree:         Y
name:           overlay
vermagic:       5.14.0-427.37.1.el9_4.x86_64 SMP preempt mod_unload modversions
sig_id:         PKCS#7
signer:         Rocky kernel signing key
sig_key:        52:A7:4C:F4:7A:B4:B1:12:D3:1E:72:33:0A:0D:49:8B:C3:34:88:DC
sig_hashalgo:   sha256
signature:      18:AF:F5:F2:12:80:5A:92:B3:5E:29:B2:A5:10:E8:27:90:73:B4:B2:
                25:B0:04:42:2B:28:FF:86:50:0D:82:CA:12:68:93:70:9F:04:C5:3C:
                19:B2:29:47:41:DD:7F:1D:33:18:33:B7:50:2C:30:A4:0D:CB:1E:53:
                4A:66:B8:BF:CB:41:F8:89:3E:5E:CA:63:8B:0C:2F:CD:42:AD:63:9D:
                C4:6A:31:FD:4B:46:0C:33:38:5A:BA:11:B0:66:76:BF:54:7B:B7:63:
                35:1B:76:52:D2:04:BF:83:65:A7:C6:0D:D1:CB:96:BF:60:37:54:37:
                3E:1B:76:69:9C:2F:8F:8D:81:21:88:33:96:EA:E6:C3:97:D1:1E:8F:
                BC:BD:70:82:27:2A:F3:8C:11:1D:AC:AC:13:00:F6:CD:00:BD:6C:3E:
                40:6F:F2:54:9C:E3:62:A7:17:78:4C:3C:43:A0:49:4D:61:FE:FD:A6:
                CD:51:5F:E6:F3:47:B7:70:D4:5E:55:3C:B8:8C:D5:45:81:6F:47:E4:
                80:39:E1:BA:0D:79:21:64:A6:7E:4D:ED:59:09:F1:26:D2:06:98:E5:
                EB:E5:B1:58:F5:AF:89:0B:0E:8B:65:EB:2A:83:30:48:FD:AC:48:AB:
                12:39:EF:3C:BB:DA:CC:26:F8:38:7F:C8:2D:15:7D:4D:3A:E6:8F:AA:
                AB:16:79:39:2D:2E:9D:5B:76:29:6F:BE:74:4E:65:F5:1F:01:43:58:
                DE:12:54:B5:C7:9E:A5:4C:B0:1D:5E:9B:05:AF:CF:B8:33:28:B4:8E:
                6E:A1:E1:58:7D:CC:F2:61:51:EA:B1:C0:BD:BE:02:56:43:6D:5A:67:
                D7:F0:25:02:91:70:74:AE:F4:6F:D3:E9:9A:1E:D0:DD:BA:C2:3C:B3:
                07:C4:F3:AD:37:63:6B:2B:B9:1D:FB:0B:CC:0B:B7:E3:14:EA:2E:28:
                D7:56:97:88:91:A5:3F:59:5D:21:7E:88:EA:AB:49:E3:3B:77:5B:F3:
                9F:56:EE:46
parm:           check_copy_up:Obsolete; does nothing
parm:           redirect_max:Maximum length of absolute redirect xattr value (ushort)
parm:           redirect_dir:Default to on or off for the redirect_dir feature (bool)
parm:           redirect_always_follow:Follow redirects even if redirect_dir feature is turned off (bool)
parm:           index:Default to on or off for the inodes index feature (bool)
parm:           nfs_export:Default to on or off for the NFS export feature (bool)
parm:           xino_auto:Auto enable xino feature (bool)
parm:           metacopy:Default to on or off for the metadata only copy up feature (bool)
❯ cd /lib/modules/$(uname -r)/kernel/fs/overlayfs
❯ ls
overlay.ko.xz

bridge-nf-call-iptables: Without this, if iptables is not configured to handle bridged traffic, the network policies and traffic filtering between pods and services may not work correctly.

So now perform the configuration for above:

❯ modprobe br_netfilter
❯ echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
❯ tee /etc/modules-load.d/k8s.conf <<EOF
br_netfilter
EOF
br_netfilter
❯ tee /etc/sysctl.d/k8s.conf <<EOF
net.ipv4.ip_forward = 1 
net.bridge.bridge-nf-call-ip6tables = 1 
net.bridge.bridge-nf-call-iptables = 1
EOF
❯ sysctl --system

You might see some articles configured "ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh" in modules-load.d/k8s.conf, however, they're required by ipvs but we're suing iptables. Using iptables is easier comparing with ipvs in development environment. In a scale environment, iptables struggles to scale to tens of thousands of Services because it is designed purely for firewalling purposes and is based on in-kernel rule lists.

I've seen several artiles talking Kubernetes cluster setup disabled firewalld, just want to remind that, disabling firewalld does not affect the need for proper kernel network module configuration as shown in above and network filtering for bridged traffic. The above commands are critical for Kubernetes networking.

Install Kubernetes Packages

cat <<EOF | tee /etc/yum.repos.d/k8s.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.29/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.29/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

dnf makecache
# disableexcludes ensures that packages from the Kubernetes repository are not excluded during installation.
dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes    

systemctl enable kubelet && systemctl start kubelet && systemctl status kubelet

:::warning Don’t worry about any kubelet errors at this point (you might see in the output from systemctl status kubelet, I should have captured the full screenshot in above). Once the worker nodes successfully join the Kubernetes cluster, the kubelet service will automatically activate and start communicating with the control plane. :::

Enable Firewalld

You remember, we've disabled firewalld in Part 1: Setting Up a Thriving Multi-Node Cluster on Mac ? We need to enable it now in our Kuberentes cluster setup as I want my environment running in a production-similar environment.

:::info Even if firewalld is disabled, Kubernetes still needs proper network configurations for bridged traffic. :::

Open Required Ports

Port(s)	Description
6443	Kubernetes API server
2379-2380	etcd server client API
10250	Kubelet API
10251	kube-scheduler
10252	kube-controller-manager
10255	Read-only Kubelet API
5473	Cluster Control Plane Config API

systemctl unmask firewalld
systemctl start firewalld

firewall-cmd --zone=public --permanent --add-port=6443/tcp
firewall-cmd --zone=public --permanent --add-port=2379-2380/tcp
firewall-cmd --zone=public --permanent --add-port=10250/tcp
firewall-cmd --zone=public --permanent --add-port=10251/tcp
firewall-cmd --zone=public --permanent --add-port=10252/tcp
firewall-cmd --zone=public --permanent --add-port=10255/tcp
firewall-cmd --zone=public --permanent --add-port=5473/tcp

firewall-cmd --zone=public --permanent --list-ports

:::info Docker manages its own iptables rules independently, even if firewalld is disabled. However, you still need to open specific Kubernetes-related ports to allow communication between control-plane and worker nodes. :::

:::warning If you see firewall warnings after reboot, like this:

It’s likely because Docker service was not yet up during startup. Let's ignore it. :::

Pull Images with crictl

As admin account, perform:

sudo kubeadm config images pull

:::warning You might hit:

❯ sudo kubeadm config images pull
W0308 22:04:13.872981  109867 version.go:104] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://cdn.dl.k8s.io/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W0308 22:04:13.873156  109867 version.go:105] falling back to the local client version: v1.29.14
failed to pull image "registry.k8s.io/kube-apiserver:v1.29.14": output: time="2025-03-08T22:04:14-08:00" level=fatal msg="validate service connection: validate CRI v1 image API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService"
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher

Solution is to create the file sudo vim /etc/crictl.yaml manually with below content:

runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false

Then restart containerd service:

sudo systemctl restart containerd

:::

So now we have Kubernetes images for our Kubernetes cluster setup.

Hey, just want to remind here, we can only use crictl to manage images for Kubernetes, because our Kubernetes using containerd instead of Docker as the container runtime. And crictl needs to access /run/containerd/containerd.sock owned by root:root, so please remember to use sudo if you logged in as non-root account:

[caption id="attachment_4915" align="aligncenter" width="1468"] Checking Docker images (it only has the hello-world image which pulled before when we're testing docker):[/caption]

❯ sudo docker images
[sudo] password for admin:
REPOSITORY    TAG       IMAGE ID       CREATED         SIZE
hello-world   latest    d2c94e258dcb   17 months ago   13.3kB

Create Kubernetes Worker Nodes

We have get our Kubernetes base image k8s-1 ready!

Before we further configure k8s-1 as our master node, now it's time shutdown it and clone it as k8s-base-image:

❯ vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-1.vmwarevm/k8s-1.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx full

❯ sed -i '' 's/displayName = "Clone of k8s-1"/displayName = "k8s-base-image"/' "/Users/geekcoding101.com/Virtual Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx"

Then repeat the steps mentioned in Clone baseimage to k8s-1 as The Kubernetes VM Base Image, clone k8s-base-image to k8s-2, k8s-3, k8s-4 and k8s-5 in our Kubernetes cluster setup.

vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-1.vmwarevm/k8s-1.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx full
sed -i '' 's/displayName = "Clone of k8s-1"/displayName = "k8s-base-image"/' "/Users/geekcoding101.com/Virtual Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx"

vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-2.vmwarevm/k8s-2.vmx full
sed -i '' 's/displayName = "Clone of k8s-base-image"/displayName = "k8s-2"/' "/Users/geekcoding101.com/Virtual Machines.localized/k8s-2.vmwarevm/k8s-2.vmx"

vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-3.vmwarevm/k8s-3.vmx full
sed -i '' 's/displayName = "Clone of k8s-base-image"/displayName = "k8s-3"/' "/Users/geekcoding101.com/Virtual Machines.localized/k8s-3.vmwarevm/k8s-3.vmx"

vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-4.vmwarevm/k8s-4.vmx full
sed -i '' 's/displayName = "Clone of k8s-base-image"/displayName = "k8s-4"/' "/Users/geekcoding101.com/Virtual Machines.localized/k8s-4.vmwarevm/k8s-4.vmx"

vmrun clone /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-base-image.vmwarevm/k8s-base-image.vmx  /Users/geekcoding101.com/Virtual\ Machines.localized/k8s-5.vmwarevm/k8s-5.vmx full
sed -i '' 's/displayName = "Clone of k8s-base-image"/displayName = "k8s-5"/' "/Users/geekcoding101.com/Virtual Machines.localized/k8s-5.vmwarevm/k8s-5.vmx"

Recan in VMFusion, you will see:

Remember to customize each node one by one with different input.json (A kind reminder, before configuring network, k8s-2 to k8s-5 will use the same IP 172.16.211.11 of k8s-1, so please shutdown k8s-1 before finish configuration on k8s-2 to k8s-5, and localserver should be running as well). I put the all input.json here so you can copy it (My bad! I should have prepared those files in baseimage!):

❯ ls /opt/share_tools/init_data
devbox_vm_input.json  k8s-1_vm_input.json  k8s-2_vm_input.json  k8s-3_vm_input.json  k8s-4_vm_input.json  k8s-5_vm_input.json  localserver_vm_input.json
❯ cat /opt/share_tools/init_data/k8s-2_vm_input.json
{
  "hostname": "k8s-2",
  "ip": "172.16.211.12",
  "subnet": "24",
  "gateway": "172.16.211.2",
  "dns1": "172.168.211.100",
  "dns2": "8.8.8.8",
  "domain": "dev.geekcoding101local.com",
  "ansible_key_path": "~/.ssh/ansible_ed25519",
  "ssh_key_path": "~/.ssh/ssh_ed25519"
}

❯ cat /opt/share_tools/init_data/k8s-3_vm_input.json
{
  "hostname": "k8s-3",
  "ip": "172.16.211.13",
  "subnet": "24",
  "gateway": "172.16.211.2",
  "dns1": "172.168.211.100",
  "dns2": "8.8.8.8",
  "domain": "dev.geekcoding101local.com",
  "ansible_key_path": "~/.ssh/ansible_ed25519",
  "ssh_key_path": "~/.ssh/ssh_ed25519"
}

❯ cat /opt/share_tools/init_data/k8s-4_vm_input.json
{
  "hostname": "k8s-4",
  "ip": "172.16.211.14",
  "subnet": "24",
  "gateway": "172.16.211.2",
  "dns1": "172.168.211.100",
  "dns2": "8.8.8.8",
  "domain": "dev.geekcoding101local.com",
  "ansible_key_path": "~/.ssh/ansible_ed25519",
  "ssh_key_path": "~/.ssh/ssh_ed25519"
}

❯ cat /opt/share_tools/init_data/k8s-5_vm_input.json
{
  "hostname": "k8s-5",
  "ip": "172.16.211.15",
  "subnet": "24",
  "gateway": "172.16.211.2",
  "dns1": "172.168.211.100",
  "dns2": "8.8.8.8",
  "domain": "dev.geekcoding101local.com",
  "ansible_key_path": "~/.ssh/ansible_ed25519",
  "ssh_key_path": "~/.ssh/ssh_ed25519"
}

Then run nslookup and ping command to make sure network has no problem in this Kubernetes cluster setup.

For example, when first time start k8s-2, you will see it's using k8s-1 as hostname and its IP:

After perform the ansible script, logout and log in again as root:

Setup Kubernetes Master Node k8s-1

I know it's kind of unbelievable that we have prepared so much for this Kubernetes cluster setup but the actual steps to form the master and join worker nodes are just two or three commands...

Run sudo kubeadm init on Master Node

sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12

:::info 10.244.0.0/16 is required by Flannel which is the CNI plugin I am going to use in this Kubernetes cluster setup.

The --service-cidr flag in kubeadm init defines the virtual IP range for Kubernetes services (ClusterIP services). This CIDR block is used by kube-proxy and the cluster DNS for internal service discovery. Typically, you can specify any private IP range that does not overlap with:

The --pod-network-cidr (e.g., 10.244.0.0/16 for Flannel)
Any physical or existing network in your infrastructure. :::

:::warning You might hit the following errors in Kubernetes cluster setup:

validate CRI v1 image API for endpoint "unix:///var/run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.ImageService

Or:

[ERROR CRI]: container runtime is not running: output: time="2025-03-08T11:11:00-08:00" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1

The solution:

Source: Linux Foundation Forum

cd ~
mkdir bak
sudo cp /etc/containerd/config.toml ./bak
sudo rm -fr /etc/containerd/config.toml
sudo systemctl restart containerd

After this, it should start working! :::

❯ sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12
[sudo] password for admin:
W0308 22:24:40.483800  123896 version.go:104] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://cdn.dl.k8s.io/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W0308 22:24:40.483994  123896 version.go:105] falling back to the local client version: v1.29.14
[init] Using Kubernetes version: v1.29.14
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0308 22:24:40.834473  123896 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 172.16.211.11]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-1 localhost] and IPs [172.16.211.11 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-1 localhost] and IPs [172.16.211.11 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "super-admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 34.003550 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-1 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: yjfem7.na3i596dag4eogh9
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.16.211.11:6443 --token yjfem7.na3i596dag4eogh9 \
        --discovery-token-ca-cert-hash sha256:23622f60b6274309294e1693439cd9a5e897c4037baaa62d5980a64745445cac

Since we're not root, perform the steps mentioned in above in your Kubernetes cluster setup:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Verify Kubernetes Cluster Status

Check the cluster nodes:

kubectl get nodes -o wide
kubectl get pods -n kube-system

You might need to wait some minutes to see all services running, here you go:

As you see, we have pods:

Kube-apiserver
Kube-controller-manager
Kube-scheduler
Etcd
Kube-proxy
CoreDNS

Test basic Kubernetes commands:

kubectl cluster-info
kubectl get namespaces

Deploy Flannel for Pod Networking

Are you exicited? We're almost there to get ourKubernetes cluster setup ready!

Okay, Flannel must be installed for pods communication:

kubectl --insecure-skip-tls-verify apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

There are alternatives CNI we can choose for Kubernetes cluster setup, here I chose Flannel because it is simple and best for lightweight networking in small to medium Kubernetes cluster setup. It supports VXLAN, host-gw, or other simple encapsulation methods.

Calico has better performance and security policies support in cloud-native environments, but it's more complex to set up than Flannel.

Weave Net is actually at a similar position as Flannel. It also supports Built-in network encryption which Flannel doesn't offer.

But anyway, let's focus on Flannel for now, we can explore other options in this Kubernetes cluster setup blog series later.

Set Up Worker Nodes

On each worker node k8s-2, k8s-3, k8s-4 and k8s-5 run as non-root account:

sudo kubeadm join 172.16.211.11:6443 --token yjfem7.na3i596dag4eogh9 \ --discovery-token-ca-cert-hash sha256:23622f60b6274309294e1693439cd9a5e897c4037baaa62d5980a64745445cac

If you need to regenerate the above join command in our Kubernetes cluster setup, go to master node as admin account:

sudo kubeadm token create --print-join-command

For example, the screenshot from k8s-2 in my Kubernetes cluster setup:

Hold tight! This is our last step to finish theKubernetes cluster setup!

Final Steps

Now, let's verify our Kubernetes cluster setup:

kubectl get nodes -o wide

If everything is set up correctly, all nodes should be in a Ready state. 🎉

We did!!! That’s it for this post!

Remember at the beginning of this post that we've observed erros in systemctl status kubelet, check it again:

In the next section, I will explore NodePort and ClusterIP Kubernetes services with a Nginx pod into this Kubernetes cluster setup and deep dive into Flannel troubleshooting!

Stay tuned! 🚀

:::info You're on a roll! Don't stop now—check out the full series and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, I laid out the networking plan, my goals for setting up Kubernetes, and how to prepare a base VM image for the cluster.

🚀 In Part 3, current post.

🚀 In Part 4, I explored NodePort and ClusterIP,understood the key differences, use cases, and when to choose each for internal and external service access!🔥

🚀 In Part 5, explored how to use externalName and LoadBalancer and how to run load testing with tool hey. :::

NodePort vs ClusterIP - Ultimate Kubernetes Tutorial Part 4

GeekCoding101 — Sat, 15 Mar 2025 00:00:00 GMT

Introduction

Hey, welcome back to my ultimate Kubernetes tutorials! Now that our 1 master + 4 worker node cluster is up and running, it’s time to dive into NodePort vs. ClusterIP—two key service types in Kubernetes. Services act as the traffic controllers of your cluster, making sure pods can communicate reliably. Without them, your pods would be like isolated islands, unable to connect in a structured way. Pods are ephemeral, constantly changing IPs. That’s where Kubernetes services step in—ensuring stable access, whether for internal pod-to-pod networking or external exposure. Let’s break down how they work and when to use each! 🚀

Before we start, here comes a quick summary for common Four Kubernetes services:

Service Type	Description	Use Case
`ClusterIP`	Exposes the service internally within the cluster. No external access.	Internal microservices that only communicate within Kubernetes.
`NodePort`	Exposes the service on a static port on each node's IP, making it accessible externally.	Basic external access without a LoadBalancer.
`LoadBalancer`	Creates an external load balancer that directs traffic to the service.	Production environments requiring automated load balancing.
`ExternalName`	Maps a Kubernetes Service to an external DNS name instead of forwarding traffic.	Redirecting traffic to external services outside the cluster.

Ps. Headless Service is also a Kubernetes Service type, but it behaves differently from the usual four.

In this post, I will guide you to:

✅ Create an Nginx deployment running on a single node
✅ Expose it using a NodePort Service
✅ Verify accessibility inside and outside the cluster ✅ Expose it using a ClusterIP Service ✅ Verify accessibility inside and outside the cluster ✅ Run a comparison between ClusterIP Service and NodePort Service

Let’s get started! 🚀

Deploying Nginx on a Single Node

Let's create a simple Kubernetes deployment with Nginx running on a single node.

:::info If not specifically saying, all commands will be performed on k8s-1 as admin account. :::

Create a Testing Namespace

Namespaces in Kubernetes are like virtual clusters within your cluster, helping you organize and isolate resources. By creating a testing namespace, we keep our deployment separate from the default namespace, preventing conflicts with existing workloads and making cleanup easier. This way, when we're done experimenting, we can simply delete the namespace, wiping out everything inside it—no need to manually remove individual resources.

:::info If not creating one, the new deployment will use the default namespace. :::

List our existing namespaces:

kubectl get namespaces -o wide

Check the current default namespace:

kubectl config get-contexts

If the NAMESPACE column is empty, it means the namespace is set to default.

Let's create our namespace service-type-test:

kubectl create namespace service-type-test

Set our new namespace service-type-test as the default:

kubectl config set-context --current --namespace=service-type-test

Now, all kubectl commands (under current account session) will default to this namespace unless another is explicitly specified. If you login as root, then you need to perform the same again to get the convinience.

You can also verify the change by below command:

kubectl config view --minify --output 'jsonpath={..namespace}'

Create a Deployment YAML

Now let's create a deployment file:

cd
mkdir nginx-deployment
vim nginx-deployment/nginx-single-node.yaml

Paste the following YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-single
  namespace: service-type-test
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        kubernetes.io/hostname: k8s-2
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

Apply the Deployment

kubectl apply -f nginx-deployment/nginx-single-node.yaml

Check if the pod is running:

kubectl get pods -o wide

Example Output:

❯ kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE    IP           NODE    NOMINATED NODE   READINESS GATES
nginx-single-7dfff5577-2v25s   1/1     Running   0          117s   10.244.1.2   k8s-2   <none>           <none>

Testing Nginx from Inside the Pod

At this moment, there is no external access. You must log into the pod or create a temporary test pod.

kubectl exec -it nginx-single-7dfff5577-2v25s -- sh

Inside the pod:

# curl http://10.244.1.2

Example Output:

So this test method has obvisou cons, it is only valid inside the Nginx pod (not cluster-wide). Doesn't verify network policies, DNS, or service discovery for external access.

Testing Nginx By Creating a Temporary Pod

With this method, we can test networking from a different pod (simulating real application behavior). It ensures DNS resolution and Service discovery work correctly and it's stateless and temporary (deleted after exit).

kubectl run testpod --rm -it --image=rockylinux:9 -- bash

Once it's running, we need to install several commands, e.g. ping, nslookup:

dnf install -y iputils net-tools nc traceroute bind-utils iproute

Troubleshooting Pod's Internet Access Issue

You might hit internet access issue in above command in the testpod:

dnf install -y iputils net-tools nc traceroute bind-utils iproute

The problem is in Kubernetes' DNS settings.

kubectl edit configmap -n kube-system coredns

Let's add 8.8.8.8 and 1.1.1.1.

:::info 1.1.1.1 is Cloudflare’s public DNS resolver. 8.8.8.8 is public IP addresses for Google's primary DNS servers. :::

Then delete existing CoreDNS pods and then they will be re-created with latest settings:

kubectl delete pod -n kube-system -l k8s-app=kube-dns
kubectl get pods -n kube-system -l k8s-app=kube-dns

Then we can immediately check the dnf command in testpod:

Troubleshooting Pod's Communication Issue

Check the nodes running pods:

❯ kubectl get pods -n service-type-test -o wide

NAME                           READY   STATUS    RESTARTS      AGE     IP           NODE    NOMINATED NODE   READINESS GATES
nginx-single-7dfff5577-2v25s   1/1     Running   2 (46h ago)   3d23h   10.244.1.4   k8s-2   <none>           <none>
testpod                        1/1     Running   0             46h     10.244.2.5   k8s-3   <none>           <none>

Now we have ping command in testpod, test ping from testpod to the nginx-single-7dfff5577-2v25s pod, you might see:

[root@testpod /]# ping 10.244.1.4
PING 10.244.1.4 (10.244.1.4) 56(84) bytes of data.
From 10.244.1.0 icmp_seq=1 Packet filtered
From 10.244.1.0 icmp_seq=2 Packet filtered

Typically, seeing Packet filtered is caused by firewalld rules.

Let's first understand the network on k8s-2 worker node which is running nginx-single-7dfff5577-2v25s pod. Because when troubleshooting Kubernetes networking, one of the first things I always check is the network interfaces on my worker node. Why? Because understanding the network layout is crucial—it tells me how traffic flows within the node, and between nodes.

Running ip a gives a snapshot of all active network interfaces, and here’s what I see on my worker node:

❯ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:56:d2:2d brd ff:ff:ff:ff:ff:ff
    altname enp3s0
    inet 172.16.211.12/24 brd 172.16.211.255 scope global noprefixroute ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe56:d22d/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
3: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:0c:29:56:d2:37 brd ff:ff:ff:ff:ff:ff
    altname enp19s0
    inet 172.16.68.135/24 brd 172.16.68.255 scope global dynamic noprefixroute ens224
       valid_lft 1419sec preferred_lft 1419sec
    inet6 fe80::2a79:5bce:ed76:fa4c/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 7e:e6:b8:4c:23:64 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 06:8c:32:9e:aa:fc brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::48c:32ff:fe9e:aafc/64 scope link
       valid_lft forever preferred_lft forever
6: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether 2a:f2:2f:f3:e1:ac brd ff:ff:ff:ff:ff:ff
    inet 10.244.1.1/24 brd 10.244.1.255 scope global cni0
       valid_lft forever preferred_lft forever
    inet6 fe80::28f2:2fff:fef3:e1ac/64 scope link
       valid_lft forever preferred_lft forever
7: vethf4df6ba9@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
    link/ether 36:57:42:b1:e7:6f brd ff:ff:ff:ff:ff:ff link-netns cni-224ac32a-95b2-1ef2-b716-e1230f1e1296
    inet6 fe80::3457:42ff:feb1:e76f/64 scope link
       valid_lft forever preferred_lft forever

To save your time, let's create a table:

Interface	Purpose	Notes
`lo`	Loopback	Used for internal communication within the node. Always present.
`ens160`	Primary network interface	Connected to `vmnet2`, providing a private network for our VMs (`172.16.211.12/24`).
`ens224`	Secondary network interface	Connected to another network (`172.16.68.135/24`) managed by VMFusion, providing external internet access.
`docker0`	Docker bridge (not used for Kubernetes)	Created by Docker but not part of Kubernetes networking. Leftover from local development.
`flannel.1`	Flannel overlay network	Handles pod-to-pod communication across nodes (`10.244.1.0/32`).
`cni0`	Main CNI bridge	Connects pods on this node to the Flannel overlay network (`10.244.1.1/24`).
`veth*`	Virtual Ethernet interfaces	Bridges individual pods to the `cni0` bridge. Created dynamically as pods start.

Then checking the firewall-cmd --list-all on k8s-2:

❯ sudo firewall-cmd --list-all
[sudo] password for admin:
public (active)
  target: default
  icmp-block-inversion: no
  interfaces: ens160 ens224 flannel.1
  sources:
  services: cockpit dhcpv6-client ssh
  ports: 6443/tcp 2379-2380/tcp 10250/tcp 10251/tcp 10252/tcp 10255/tcp 5473/tcp 8472/udp 30000-32767/tcp
  protocols:
  forward: yes
  masquerade: yes
  forward-ports:
  source-ports:
  icmp-blocks:
  rich rules:

Did you spot the problem?

The cni0 is missing from the active firewalld zone! This is a problem because cni0 is the main bridge interface for pod networking—it connects all pods on this node to the Flannel overlay network. Without it being part of the public zone, firewalld might be blocking traffic between pods on this worker node.

We can verify it by enabling firewalld logs:

sudo firewall-cmd --set-log-denied=all
sudo firewall-cmd --reload

Then tail -f /var/log/messages on k8s-2, you should see:

Mar 14 21:28:44 k8s-2 kernel: filter_FWD_public_REJECT: IN=flannel.1 OUT=cni0 MAC=06:8c:32:9e:aa:fc:aa:fc:a1:4d:b8:af:08:00 SRC=10.244.2.0 DST=10.244.1.4 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=21550 DF PROTO=ICMP TYPE=8 CODE=0 ID=44547 SEQ=1
Mar 14 21:28:45 k8s-2 kernel: filter_FWD_public_REJECT: IN=flannel.1 OUT=cni0 MAC=06:8c:32:9e:aa:fc:aa:fc:a1:4d:b8:af:08:00 SRC=10.244.2.0 DST=10.244.1.4 LEN=84 TOS=0x00 PREC=0x00 TTL=62 ID=22104 DF PROTO=ICMP TYPE=8 CODE=0 ID=44547 SEQ=2

Good! Now we can fix the problem, just add cni0 into the public zone!

Perform this on on every node as admin:

sudo firewall-cmd --permanent --zone=public --add-interface=cni0
sudo firewall-cmd --reload

Then ping should start working!

:::warning Worst scenario is, you might need to enale masquerade manually in firewall if it's a no value, and you might even don't have flannel.1 in the public zone! MASQUERADE is essential for proper packet routing when using an overlay network like Flannel:

Pods are assigned virtual IPs (e.g., 10.244.x.x) that exist only inside the cluster.
These IPs are not directly routable on the physical network.
MASQUERADE ensures that packets from one node’s pod network (10.244.x.x) get translated correctly when sent to another node.

Now, let's check firewalld configuration:

firewall-cmd --query-masquerade
firewall-cmd --get-active-zones

Fix above Firewalld issues on every node as admin:

sudo firewall-cmd --permanent --add-masquerade
sudo firewall-cmd --permanent --zone=public --add-interface=flannel.1 
sudo systemctl reload firewalld

Now, everything in network should be good! :::

As a summary for this netowrking troubleshooting, I prepared a diagram for you:

Exploring NodePort Service

You might ask:Can testpod ping nginx-single-7dfff5577-2v25s directly?

No,it won't work by default.

Kubernetes does NOT create DNS records for individual pods.

When you run ping nginx-single-7dfff5577-2v25s, your shell tries to resolve the pod name to an IP.
But there’s no built-in DNS entry for an individual pod unless a Service is created.

So, let's create a NodePort service for our Nginx service!

On k8s-2 master node, create a service YAML file:

vim nginx-deployment/nginx-nodeport-service.yaml

Paste the following:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: service-type-test
spec:
  selector:
    app: nginx
  type: NodePort
  ports:
  - port: 80
    targetPort: 80
    nodePort: 30080

Apply the service:

kubectl apply -f nginx-service.yaml

We should see:

NodePort: Nginx Access Methods

Source	Can Access `nginx-service`?	Method to Use	Why?
testpod (inside cluster, same namespace)	✅ Yes	`curl http://nginx-service.service-type-test:80`	Kubernetes DNS resolves it to the service.
testpod (inside cluster, using Pod IP directly)	✅ Yes	`curl http://10.244.1.4:80`	Works, but not recommended (Pod IPs change).
testpod (inside cluster, different namespace)	❌ No (default) ✅ Yes (if explicitly specified)	`curl http://nginx-service.service-type-test.svc.cluster.local:80`	Cross-namespace access needs full DNS.
Worker nodes (`k8s-2`, `k8s-3`, etc.)	✅ Yes	`curl http://localhost:30080`	NodePort is open on all nodes.
Worker nodes (`k8s-2`, `k8s-3`, etc.) using Pod IP directly	✅ Yes	`curl http://10.244.1.4:80`	Because Flannel automatically configures routing between nodes. If you check route on any node, you should see something like this: `❯ ip route \| grep 10.244 10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink 10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink`
Master node (`k8s-1`)	✅ Yes	`curl http://localhost:30080`	NodePort is open on all nodes.
Master node using Pod IP directly	✅ Yes	`curl http://10.244.1.4:80`	Same as worker nodes.
Laptop (VMFusion, same `vmnet2` network as worker nodes)	✅ Yes	`curl http://<any-worker-node-ip>:30080`	NodePort is accessible externally.
Laptop using Pod IP directly (`curl http://10.244.1.4:80`)	❌ No	N/A	Flannel can't manage my laptop route ^^

:::info The format nginx-service.service-type-test is a Kubernetes internal DNS name that follows this structure:

<SERVICE_NAME>.<NAMESPACE>.svc.cluster.local

For example, when you run:

curl http://nginx-service.service-type-test:80

It is equivalent to:

curl http://nginx-service.service-type-test.svc.cluster.local:80

Kubernetes automatically creates a DNS entry for every service, so any pod inside the cluster can resolve nginx-service.service-type-test to its NodePort service, which forwards traffic to the appropriate pod. :::

Exploring ClusterIP Service

We can test ClusterIP as well, create /home/admin/nginx-deployment/nginx-clusterip-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: service-type-test
spec:
  selector:
    app: nginx
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 80

Delete NodePort and apply ClusterIP service:

kubectl delete -f /home/admin/nginx-deployment/nginx-nodeport-service.yaml
kubectl apply -f /home/admin/nginx-deployment/nginx-clusterip-service.yaml
kubectl get service -n service-type-test -o wide

The output would looks like this:

❯ kubectl get service -n service-type-test -o wide

NAME            TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE     SELECTOR
nginx-service   NodePort   10.101.98.109   <none>        80:30080/TCP   3h19m   app=nginx
❯ kubectl delete -f /home/admin/nginx-deployment/nginx-nodeport-service.yaml

service "nginx-service" deleted
❯ kubectl get service -n service-type-test -o wide

No resources found in service-type-test namespace.
❯ kubectl apply -f /home/admin/nginx-deployment/nginx-clusterip-service.yaml

service/nginx-service created
❯ kubectl get service -n service-type-test -o wide

NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE   SELECTOR
nginx-service   ClusterIP   10.101.195.219   <none>        80/TCP    3s    app=nginx

The access method table:

Source	Can Access `nginx-service`?	Method to Use	Why?
testpod (inside cluster, same namespace)	✅ Yes	`curl http://nginx-service.service-type-test:80`	Resolves to ClusterIP `10.101.195.219`, accessible within the cluster.
testpod (inside cluster, using Pod IP directly)	✅ Yes	`curl http://10.244.1.4:80`	Works as Pod IP is routable within the cluster.
testpod (inside cluster, different namespace)	✅ Yes	`curl http://nginx-service.service-type-test.svc.cluster.local:80`	DNS resolves to ClusterIP, accessible within the cluster.
Worker nodes (`k8s-2`, `k8s-3`, etc.)	✅ Yes	`curl http://10.101.195.219:80`	Because Flannel automatically configures routing.
Master node (`k8s-1`)	✅ Yes	`curl http://10.101.195.219:80`	ClusterIP is accessible from within the cluster.
Laptop (VMFusion, same `vmnet2` network as worker nodes)	❌ No	N/A	ClusterIP is internal and not exposed externally.
Laptop using Pod IP directly (`curl http://10.244.1.4:80`)	❌ No	N/A	Pod IPs (`10.244.x.x`) are not reachable from outside the cluster.

What is a ClusterIP? Why Can a Worker Node Access the ClusterIP?

I know you might ask this. Here comes the breakdown:

1. ClusterIP is a Virtual IP Managed by kube-proxy.

10.101.195.219 is not tied to any single pod or node—it’s a virtual IP managed by kube-proxy. When a request is made to 10.101.195.219:80, kube-proxy redirects it to one of the matching pods (10.244.1.4:80).

2. Flannel Provides Pod-to-Pod Connectivity Across Nodes

I've mentioned previously, Flannel creates an overlay network so all Pods (10.244.x.x) can communicate, even across different nodes. If the Nginx pod (10.244.1.4) is on a different node (k8s-2) (testpod is on k8s-3), Flannel encapsulates the traffic and routes it through the worker nodes' ens160 (172.16.211.x) interfaces.

3. Iptables Rules Handle Traffic Routing

kube-proxy sets up iptables rules on each node to redirect ClusterIP traffic to the actual pod. Let's run iptables-save | grep 10.101.195.219 on any node, you would see below rules forwarding traffic to 10.244.1.4:

NodePort vs ClusterIP

Before we wrap up, let's take a step back and compare ClusterIP and NodePort, two essential service types in Kubernetes. While both enable communication within a cluster, their accessibility and use cases differ significantly. Whether you're building internal microservices or exposing an application externally, choosing the right service type is crucial.

It's alwasy easy to compare two similar technologies with a comparison table. The table below summarizes their key differences to help you decide which one fits your needs best.

Feature	ClusterIP	NodePort
Accessibility	Only accessible within the cluster	Accessible from outside the cluster via `NodeIP:NodePort`
Default Behavior	Assigned a private IP within the cluster	Exposes a service on a high-numbered port (30000-32767) on each node
Use Case	Internal communication between microservices	External access without a LoadBalancer, typically for development/testing
How to Access	Via `ClusterIP` or service name inside the cluster	Via `http://:` from clients
Example Service YAML	`apiVersion: v1 kind: Service metadata: name: my-clusterip-service spec: type: ClusterIP selector: app: my-app ports: - port: 80 targetPort: 80`	`apiVersion: v1 kind: Service metadata: name: my-nodeport-service spec: type: NodePort selector: app: my-app ports: - port: 80 targetPort: 80 nodePort: 30080`
Requires External Networking?	No, works entirely within the cluster	Yes, needs the node's IP to be reachable
Security Considerations	More secure since it's only accessible inside the cluster	Less secure as it exposes a port on all nodes

🎉 Congratulations!

If you’ve made it this far, congrats! Our Nginx instance is now fully accessible and we've learned NodePort and ClusterIP services, and honestly, this feels like a huge win!

:::info I originally thought about wrapping up the series here—it’s been an intense ride. I spent a full week, squeezing every spare moment and working around the clock to tackle one of the most crucial (and driest) parts of Kubernetes. I was in that state of learning excitement, pushing through, and now that it’s finally done, I feel a huge sense of accomplishment… and total exhaustion. But seeing everything come together is just too satisfying to stop now. So… I’m keeping this journey going! 🚀 :::

Next up, I’ll be diving into other Kubernetes services—ExternalName and LoadBalancer—because why stop when there’s so much more to explore?

Stay tuned for the next post! 😎🔥

:::info You're on a roll! Don't stop now—check out the full series and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, I laid out the networking plan, my goals for setting up Kubernetes, and how to prepare a base VM image for the cluster.

🚀 In Part 3, I finished the Kubernetes cluster setup with Flannel, got one Kubernetes master and 4 worker nodes that’s ready for real workloads 🔥

🚀 In Part 4, current one!

🚀 In Part 5, explored how to use externalName and LoadBalancer and how to run load testing with tool hey.

:::

ExternalName and LoadBalancer - Ultimate Kubernetes Tutorial Part 5

GeekCoding101 — Tue, 18 Mar 2025 00:00:00 GMT

Introduction

Hey, welcome back to my ultimate Kubernetes tutorials! So far, we've explored ClusterIP and NodePort, but what if you need to route traffic outside your cluster or expose your app with a real external IP? That’s where ExternalName and LoadBalancer services come in. ExternalName lets your pods seamlessly connect to external services using DNS, while LoadBalancer provides a publicly accessible endpoint for your app. In this post, we’ll break down how they work, when to use them, and how to configure them in your Kubernetes cluster. Let’s dive in! 🚀

Exploring ExternalName Service

Okay, we're still in my nginx/testpod environment in namespace service-type-test

In our last post, we have ClusterIP running, let's delete it to get a clean environment to start:

kubectl apply -f /home/admin/nginx-deployment/nginx-clusterip-service.yaml
kubectl get service -n service-type-test -o wide

You should not see any service is running in above output.

Now, let's work on ExternalName!

Creating an ExternalName service is simpler than creating NodePort or ClusterIP , a little bit... create a file /home/admin/nginx-deployment/nginx-externalname-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: service-type-test
spec:
  type: ExternalName
  externalName: my-nginx.external.local

Unlike ClusterIP, NodePort, LoadBalancer, or Headless services, this service does not select backend pods. Instead, it just creates a DNS alias that redirects traffic to an external hostname. So:

No selector needed → It does not route traffic to Kubernetes pods.
No labels needed → There’s no pod matching required since it’s just a DNS pointer.
It simply returns the CNAME record when queried inside the cluster.

Simpler on Kubernetes side, but more manual steps on your own side ^^

I must manually configure DNS resolution for my-nginx.external.local so that Kubernetes can resolve it to the correct external IP or hostname.

Then, How?

kubectl edit configmap -n kube-system coredns

Then update it as below:

hosts {
    172.16.211.12 my-nginx.external.local
}

Restart CoreDNS pods and apply ExternalName service:

kubectl rollout restart deployment coredns -n kube-system 
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl apply -f ./nginx-deployment/nginx-externalname-service.yaml
kubectl get services -o wide

You should see this:

Tricks on Name Resolution

Now trying to resolve the name in testpod:

[root@testpod /]# nslookup 172.16.211.12
12.211.16.172.in-addr.arpa      name = my-nginx.external.local.

[root@testpod /]# nslookup my-nginx.external.local.
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   my-nginx.external.local
Address: 172.16.211.12

Did you notice that I put a trailing dot . when running nslookup on my-nginx.external.local ?

It's a must for current configuration. Otherwise, you will hit this:

[root@testpod /]# nslookup my-nginx.external.local
Server:         10.96.0.10
Address:        10.96.0.10#53

** server can't find my-nginx.external.local.service-type-test.svc.cluster.local: SERVFAIL

The reason is that the DNS query is being appended with the Kubernetes default search domain, so above command is equal to below:

nslookup my-nginx.external.local.service-type-test.svc.cluster.local

This happens because:

Inside a Kubernetes pod, DNS queries automatically append the namespace and cluster domain (e.g., .svc.cluster.local).
Since my-nginx.external.local is an absolute FQDN, CoreDNS shouldn't apply the .svc.cluster.local suffix.

One option to force a Fully Qualified Domain Name (FQDN) Query is to put a trailing dot . to the DNS name ^^

Then why bother to put the trailing . ? Can we make lifer easier?

Sure. Then just update the configmap of CoreDNS to this:

hosts {
    172.16.211.12 my-nginx.external.local my-nginx.external.local.service-type-test.svc.cluster.local
}

Now both my-nginx.external.local and my-nginx.external.local. work!

Integrate ExternalName And ClusterIP

And then you might ask, why I give 172.16.211.12? Do I have to use the work node IP where is running the pod to resolve the external name?

Not necessarily! You don’t have to use the exact worker node IP where the pod is running. Instead, you should configure my-nginx.external.local to resolve to an IP that can correctly route traffic to the Nginx pod.

One solution I used here which is also recommended is to use ClusterIP!

:::tip Yes, we can have both services for our nginx service! :::

Before that, we need to delete the exisitng service to get a clean start and ensure no service is running.

kubectl delete -f ./nginx-deployment/nginx-externalname-service.yaml
kubectl get services -o wide

Then, we need to update our yaml files, because so far both nginx-deployment/nginx-externalname-service.yaml and nginx-deployment/nginx-clusterip-service.yaml are using same name nginx-service! In Kubernetes, a Service is uniquely identified by its name and namespace. Let's update the name a bit.

I know, it's just a one line change. But let's make sure you have it correctly!

nginx-externalname-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-external-service
  namespace: service-type-test
spec:
  type: ExternalName
  externalName: my-nginx.external.local

nginx-clusterip-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-clusterip-service
  namespace: service-type-test
spec:
  selector:
    app: nginx
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 80

Then perform the commands:

kubectl apply -f nginx-deployment/nginx-clusterip-service.yaml
kubectl apply -f nginx-deployment/nginx-externalname-service.yaml

Then update CoreDNS via kubectl edit configmap -n kube-system coredns:

hosts {
    10.98.205.55 my-nginx.external.local my-nginx.external.local.service-type-test.svc.cluster.local
    fallthrough
}

Please note, I added fallthrough this time. Because ClusterIP will be resolved by Kubernetes plugin in CoreDNS instead of hosts plugin. The fallthrough directive allows other DNS plugins (like kubernetes) to continue processing if the entry isn't found in hosts firstly.

Then run:

kubectl rollout restart deployment coredns -n kube-system

Finally, let's testing name resolution in testpod:

[root@testpod /]# nslookup 10.98.205.55
55.205.98.10.in-addr.arpa       name = my-nginx.external.local.
55.205.98.10.in-addr.arpa       name = my-nginx.external.local.service-type-test.svc.cluster.local.

[root@testpod /]# nslookup my-nginx.external.local
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   my-nginx.external.local.service-type-test.svc.cluster.local
Address: 10.98.205.55

[root@testpod /]# nslookup my-nginx.external.local.
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   my-nginx.external.local
Address: 10.98.205.55

[root@testpod /]# nslookup nginx-clusterip-service.service-type-test
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   nginx-clusterip-service.service-type-test.svc.cluster.local
Address: 10.98.205.55

[root@testpod /]#

And then test the access to our Nginx service!

The Comparison of Access Methods

Source	Can Access `nginx-service`?	Method to Use	Why?
testpod (inside cluster, same namespace)	✅ Yes	`curl http://nginx-clusterip-service.service-type-test:80`	Resolves to ClusterIP `10.98.205.55`, accessible within the cluster.
testpod (inside cluster, using ExternalName service)	✅ Yes	`curl http://nginx-external-service.service-type-test:80`	DNS resolves `my-nginx.external.local` to `10.98.205.55`.
testpod (inside cluster, using Pod IP directly)	✅ Yes	`curl http://10.244.1.4:80`	Works as Pod IP is routable within the cluster.
testpod (inside cluster, different namespace)	✅ Yes	`curl http://nginx-clusterip-service.service-type-test.svc.cluster.local:80`	DNS resolves to ClusterIP, accessible within the cluster.
Worker nodes (`k8s-2`, `k8s-3`, etc.)	✅ Yes	`curl http://10.98.205.55:80`	ClusterIP is accessible from within the cluster.
Master node (`k8s-1`)	✅ Yes	`curl http://10.98.205.55:80`	ClusterIP is accessible from within the cluster.
Laptop (VMFusion, same `vmnet2` network as worker nodes)	❌ No	N/A	Resolve internally within Kubernetes only.
Laptop using Pod IP directly (`curl http://10.244.1.4:80`)	❌ No	N/A	Pod IPs (`10.244.x.x`) are not reachable from outside the cluster.
Laptop (VMFusion) using LoadBalancer (if configured)	❌ No	N/A	Not provide external access for the current service.

Amazing!

Exploring LoadBalancer Service

Thanks for being with you so far! I hope you enjoy my step-by-step explaination!

Now let's clean up the service and start learning LoadBalancer service!

Since we're running Kubernetes inside VMFusion, there's no cloud provider to automatically assign a LoadBalancer IP. We'll need to use MetalLB as a software-based LoadBalancer for our cluster.

Install MetaLB

Go to tags page of MetaLB, get the latest version, currently is 0.14.9. Then we can apply on our master node as admin:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.9/config/manifests/metallb-native.yaml
kubectl get pods -n metalb-system
kubectl get crds | grep metalb
kubectl get svc -n metalb-system

Define IPAddressPool

Create file /home/admin/nginx-deployment/metalb-ipaddresspool.yaml:

apiVersion: metallb.io/v1beta1  # Use v1beta1 for latest MetalLB versions
kind: IPAddressPool
metadata:
  name: default-pool
  namespace: metallb-system
spec:
  addresses:
  - 172.16.211.200-172.16.211.210  # Define an IP range
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: l2-adv
  namespace: metallb-system

kubectl apply -f nginx-deployment/metalb-ipaddresspool.yaml
kubectl get ipaddresspools -n metallb-system

Create LoadBalancer Service

Create file /home/admin/nginx-deployment/nginx-loadbalancer-service.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-loadbalancer
  namespace: service-type-test
spec:
  selector:
    app: nginx
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80

Apply and check:

kubectl apply -f nginx-deployment/nginx-loadbalancer-service.yaml
kubectl get svc -n service-type-test
curl http://172.16.211.200:80

Above is testing on master node k8s-1.

Let's also test from testpod and from my laptop.

All work!

Wait! How to Know LoadBalancer Is Balancing ?!

That's a good question!

Since MetalLB LoadBalancer operates at Layer 2 (default) or BGP, traffic is distributed across multiple pods behind the service. Let’s simulate and test whether MetalLB is balancing traffic.

Check how many pods your LoadBalancer service is distributing traffic to:

❯ kubectl get endpoints -n service-type-test nginx-loadbalancer

NAME                 ENDPOINTS       AGE
nginx-loadbalancer   10.244.1.4:80   38m

We can see it has a IP range for balance... What?! It only has one IP! Thinking...

Oh! Remember now! Our Nginx pod was set to run on just one node!

To refresh you memory here is the file we depolied the nginx pod:

❯ cat nginx-deployment/nginx-single-node.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-single
  namespace: service-type-test
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        kubernetes.io/hostname: k8s-2
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

To truly test if MetalLB's LoadBalancer is distributing traffic, we need multiple pods running behind the service. If only one pod is available, all incoming requests will always hit that single pod, making it impossible to observe any load balancing in action. Kubernetes distributes traffic only among pods that match the service selector, so if there’s just one, there’s nothing to balance! To fix this, we should scale the deployment to at least two or three replicas and then send multiple requests to see how they get distributed. Let’s scale it up and test again! 🚀

Update Nginx Deployment Yaml

Typically we can use kubectl command to update replicas , like this:

kubectl scale deployment nginx-single -n service-type-test --replicas=3

However, as you see, our previous Nginx yaml has name nginx-single, it would lead misunderstanding. Let's just delete it and recreate one with name nginx-multiple-nodes:

❯ kubectl delete -f nginx-deployment/nginx-single-node.yaml
deployment.apps "nginx-single" deleted
❯ kubectl get pods

NAME      READY   STATUS    RESTARTS   AGE
testpod   1/1     Running   0          2d20h

The new nginx yaml file /home/admin/nginx-deployment/nginx-multiple-nodes.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-multiple-nodes
  namespace: service-type-test
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

kubectl apply -f nginx-deployment/nginx-multiple-nodes.yaml
kubectl get pods -o wide

Check kubectl get endpoints again:

Now we can see three IPs assigned!

A Quick Try for Testing?

I can send multiple requests from my laptop to the LoadBalancer IP:

for i in {1..10}; do curl -s http://172.16.211.200 | grep "Welcome"; done

If load balancing is working correctly, some responses should come from different pods.

Are you kidding? How do I know? This is what I got!

❯ for i in {1..10}; do curl -s http://172.16.211.200 | grep "Welcome"; done

<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>
<title>Welcome to nginx!</title>
<h1>Welcome to nginx!</h1>

I see.

Customize Nginx default.conf via ConfigMap

We can update Nginx default.conf to make it add a header holding the server's hostname in its response for every request, so we should see different values in response' header that telling MetaLB is working well.

We don't need to rebuild Nginx image to have the custom default.conf, we can just use configmap and mount it into our Nginx pod, then we can see response!

You might ask, but what is ConfigMap and why is that? If we want to update Nginx config file, shoudn't we log into the pod and update the file manually?

Ah ha! Gotcha!

You're right—on a traditional Linux server, you'd SSH in, modify /etc/nginx/nginx.conf, and restart Nginx. But in Kubernetes, there's a more scalable and automated way to manage configurations - That is ConfigMap.

What Exactly Is a ConfigMap in Kubernetes?

Alright, think of a ConfigMap as a central place to store our application's configuration—instead of hardcoding settings inside our container image, we define them separately and let Kubernetes inject them when needed.

Why Does This Matter?

Imagine running the same application in different environments (development, testing, production). We wouldn’t want to rebuild container image every time just to change a database URL, an API key, or a logging level. Instead, we store these settings in a ConfigMap, and our pods pull the configuration dynamically at runtime.

How Does a ConfigMap Work?

A ConfigMap in Kubernetes can store:
✅ Key-value pairs (like environment variables)
✅ Entire configuration files
✅ Command-line arguments

Once created, we can store ConfigMaps into Kubernetes, and it can insert ConfigMap into pods as:
🔹 Environment variables
🔹 Mounted files (as volumes)
🔹 Command-line arguments

Our case: Storing an Nginx Config in ConfigMap

Instead of modifying the Nginx image's default.conf manually inside a pod (which would get lost after a restart), we create a ConfigMap at /home/admin/nginx-deployment/nginx-config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
  namespace: service-type-test
data:
  default.conf: |
    server {
      listen 80;
      location / {
        add_header X-Served-By $hostname;
        root /usr/share/nginx/html;
        index index.html;
      }
    }

Here, default.conf is a key, and its value is the actual Nginx configuration file.

Let's apply it into Kubernetes:

kubectl apply -f nginx-deployment/nginx-config.yaml
kubectl get configmaps

You should see:

❯ kubectl get configmaps

NAME               DATA   AGE
kube-root-ca.crt   1      5d13h
nginx-config       1      101s

How Do We Use This ConfigMap in a Pod?

We need to mount the above ConfigMap inside our nginx-multiple-nodes deployment so that every pod automatically loads the config on startup. To do this, let's just create a new nginx deployment yaml at/home/admin/nginx-deployment/nginx-multiple-nodes-with-custom.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-multiple-nodes
  namespace: service-type-test
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-config-volume
          mountPath: /etc/nginx/conf.d/default.conf  # This is where we inject the file
          subPath: default.conf                      # subPath to tell K8S to only use the value of key 'default.conf' from the volume which is a configMap
      volumes:
      - name: nginx-config-volume
        configMap:
          name: nginx-config

What’s Happening Here?

Created a volume for using the configMap whose name is nginx-config which has been applied in previous step.
Then mount the nginx-config ConfigMap as a file in /etc/nginx/conf.d/default.conf.
The original default.conf comes with the Nginx image will be overwritten thus Nginx will start use this file instead of the original default one.
If we update the ConfigMap in Kubernetes in future, we can just restart the Nginx pod—no need to rebuild the container at all!

So now you see, because Kubernetes treats containers as immutable—any manual changes inside a running pod are lost when it restarts. ConfigMaps solve this by separating configuration from the application, making it:
✅ Easier to update (without modifying the container image)
✅ More flexible (different configs for different environments)
✅ More scalable (all pods pull the latest config automatically)

Let's delete the existing Nginx deployment and apply the new one:

kubectl delete -f nginx-deployment/nginx-multiple-nodes.yaml
kubectl apply -f nginx-deployment/nginx-multiple-nodes-with-custom.yaml
kubectl get pods -o wide

Test Again!

Go back to my laptop:

for i in {1..10}; do curl -i -s http://172.16.211.200 | grep "X-Served-By"; done

Please note, I added -i to include response headers in the output.

Any Specialized Tools for Load Testing?

Okay, since you asked, let's use hey (link here)!

Install it on my mac:

brew install hey

In order to get a clean start on logs, let's restart our deployment:

kubectl rollout restart deployment nginx-multiple-nodes -n service-type-test

This is our new pods status:

❯ kubectl get pods -o wide

NAME                                    READY   STATUS    RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
nginx-multiple-nodes-668cdc96dd-4v8db   1/1     Running   0          2m46s   10.244.4.15   k8s-5   <none>           <none>
nginx-multiple-nodes-668cdc96dd-cjz2r   1/1     Running   0          2m59s   10.244.1.11   k8s-2   <none>           <none>
nginx-multiple-nodes-668cdc96dd-zlwqv   1/1     Running   0          2m34s   10.244.2.11   k8s-3   <none>           <none>
testpod                                 1/1     Running   0          3d13h   10.244.2.5    k8s-3   <none>           <none>

Run:

hey -n 1000 -c 50 http://172.16.211.200

It means:

Send 1000 HTTP requests to http://172.16.211.200.
Send 50 requests concurrently.

This is our output:

❯ hey -n 1000 -c 50 http://172.16.211.200

Summary:
  Total:        0.4999 secs
  Slowest:      0.0759 secs
  Fastest:      0.0135 secs
  Average:      0.0242 secs
  Requests/sec: 2000.3376

  Total data:   599010 bytes
  Size/request: 615 bytes

Response time histogram:
  0.013 [1]     |
  0.020 [114]   |■■■■■■
  0.026 [749]   |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.032 [36]    |■■
  0.038 [14]    |■
  0.045 [10]    |■
  0.051 [0]     |
  0.057 [4]     |
  0.063 [1]     |
  0.070 [27]    |■
  0.076 [18]    |■

Latency distribution:
  10% in 0.0193 secs
  25% in 0.0203 secs
  50% in 0.0211 secs
  75% in 0.0232 secs
  90% in 0.0272 secs
  95% in 0.0549 secs
  99% in 0.0716 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0004 secs, 0.0135 secs, 0.0759 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0000 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0004 secs
  resp wait:    0.0222 secs, 0.0133 secs, 0.0553 secs
  resp read:    0.0003 secs, 0.0000 secs, 0.0047 secs

Status code distribution:
  [200] 974 responses

Error distribution:
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61257->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61258->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61260->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61263->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61264->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61265->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61266->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61267->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61268->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61269->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61270->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61271->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61272->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61273->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61274->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61275->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61276->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61277->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61278->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61279->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61280->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61281->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61282->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61283->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61284->172.16.211.200:80: read: connection reset by peer
  [1]   Get "http://172.16.211.200": read tcp 172.16.211.1:61286->172.16.211.200:80: read: connection reset by peer

Do we really trust the data from hey?

Now worries, since we have a clean start, we can check Kubernetes Nginx pods logs telling request came from what client!

❯ kubectl logs -l app=nginx -n service-type-test --tail=1000 | awk '{print $1}' | grep -E '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$' | sort | uniq -c

    334 10.244.1.1
    235 10.244.2.1
    405 10.244.4.1

Add up 334 + 235 + 405 , it's 974 match to hey output [200] 974 response !!!

I feel so satisifying!

Wait! You might have noticed, the IP in kubectl logs are not the IPs in kubectl get pods, that's not our maplap IP... how can we use the data to say "it match" ??

Good observation!

The .1 address in each subnet (e.g., 10.244.1.1 and 10.244.2.1) is assigned to cni0, the bridge interface created by CNI (Flannel in our case). When traffic arrives at a pod, if it comes from another node, it first passes through Flannel's virtual network (cni0). By default, Nginx logs the IP of the last network hop—which in this case is the Flannel bridge (cni0) instead of the original client.

Comparison between ClusterIP, NodePort, ExternalName and LoadBalancer

I know, a comparison table would be alwasy helpful at the end of post!

Here you go:

Service Type	Use Case	How It Works	When to Use
ClusterIP (Default)	Internal communication within the cluster	Creates a stable internal IP that other pods can use	Use when exposing a service only to other pods (e.g., backend services, databases)
NodePort	Expose services externally via a node's IP and a high-numbered port	Maps a fixed port (30000-32767) on each node to the service	Use when external access is needed without a LoadBalancer, mainly for development & testing
LoadBalancer	Expose services externally with a dedicated external IP	Allocates an external IP via cloud provider or MetalLB	Use when running on a cloud provider or using MetalLB in bare-metal environments
ExternalName	Map a service name to an external DNS name	DNS lookup redirects traffic to an external domain	Use when integrating Kubernetes services with external systems (e.g., external databases or APIs)

🎉 Congratulations!

And that’s a wrap for our Part 5: ExternalName and LoadBalancer! 🎉 This one was a deep dive, but seeing everything come together feels amazing. We've tackled how Kubernetes handles external services and dynamic traffic distribution—powerful stuff! But we’re not stopping here.

Stay tuned for the next post! 😎🔥

:::info You're on a roll! Don't stop now—check out the full series and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, I laid out the networking plan, my goals for setting up Kubernetes, and how to prepare a base VM image for the cluster.

🚀 In Part 3, I finished the Kubernetes cluster setup with Flannel, got one Kubernetes master and 4 worker nodes that’s ready for real workloads.

🚀 In Part 4, I explored NodePort and ClusterIP,understood the key differences, use cases, and when to choose each for internal and external service access!🔥

🚀 In Part 5, Current one. I dived into ExternalName and LoadBalancer services, uncovering how they handle external access, DNS resolution, and dynamic traffic distribution! :::

Mastering Terraform with AWS Guide Part 1: Launch Real AWS Infrastructure with VPC, IAM and EC2

GeekCoding101 — Tue, 15 Apr 2025 00:00:00 GMT

So… you’ve heard about Terraform. Maybe your team is using it, maybe your cloud dreams demand it — or maybe, like me, you’ve been deep in the Kubernetes jungle (My blog posts about K8S) and now want a declarative friend for AWS too! Either way, welcome aboard. In this post, I’ll walk you through setting up your Terraform with AWS environment from scratch, on a Mac.

We’ll start simple and go all the way to managing VPC, Security Groups, IAM users and EC2 infrastructure using best practices -all built using Terraform with AWS. By the end, you’ll not only run Terraform with AWS — you’ll understand all below questions, just to name a few, such as how to run terraform with aws, how to create aws ec2 instance using terraform, how to create security group in aws using terraform...fantastic!

What is Terraform?

Terraform is an open-source infrastructure as code (IaC) tool created by HashiCorp. It lets you define, provision, and manage cloud infrastructure using human-readable config files written in HCL (HashiCorp Configuration Language).

Think of it as Git for your cloud — but with superpowers.

I know it's kind of short introduction, let's look at a real life scenario to understand what it is and why we need it!

Why Do We Actually Need Terraform? A Real-Life Scenario

Let’s say you’re an ambitious DevOps engineer named Alice. One day your boss comes in hot:

“Hey Alice! We need 3 EC2 instances on AWS, 2 on Azure, and an S3 bucket for backups. Oh — and don’t forget a VPC, IAM roles, a database, some tags, and make it all repeatable. By lunch.”

No pressure, right?

Without Terraform with AWS, you'd be:

Clicking through three different cloud consoles 🖱️
Copy-pasting IPs into random docs 📋
Forgetting what you named stuff by the third resource 😵‍💫
Swearing at yourself during the teardown: “Wait, which region was that bucket in?”

Now imagine doing this again next week — for dev, staging, and prod. Nightmare fuel.

Enter Terraform: Your Cloud Wizard

Now, we have Terraform with AWS:

You write the infrastructure once in .tf files
Want 10 EC2s instead of 3? Change count = 10, re-run
Need to deploy the same setup on Azure? Change the provider
Broke something? terraform destroy to the rescue

It's like having a universal remote control for cloud resources.

You got this! Terraform with AWS makes managing AWS cloud infrastructure not only repeatable, but also version-controlled — just like your code.

In Short

Terraform keeps your cloud clean, consistent, and version-controlled — no more “what did I click last Tuesday?” mysteries. It helps you:

Reuse configs like code
Version control your infrastructure
Avoid human errors from clicking the wrong dropdown
Automate across multiple environments (dev, staging, prod)
Sleep better knowing you can recreate your stack in seconds

So next time someone says “spin up a new environment”, you won’t sweat it — you’ll terraform apply and sip your coffee like a boss. ☕ And of course, not just Terraform with AWS, you can work with different providers and maintain consistentce between them easily!

Step-by-Step Guide to Learn/Practice Terraform

Install Terraform on macOS

Let’s install Terraform using Homebrew:

brew tap hashicorp/tap 
brew install hashicorp/tap/terraform

Confirm installation:

terraform version

Setup Terraform Aliases (Optional but Awesome)

If you're lazy (like all great engineers), add these aliases to your ~/.zshenv:

# Terraform
alias f='terraform'
alias finit='terraform init'
alias fv='terraform validate'
alias fp='terraform plan'
alias fpo='terraform plan -output '
alias fa='terraform apply'
alias faa='terraform apply --auto-approve'
alias fcon='terraform console'
alias fgra='terraform graph'
alias fo='terraform output '
alias fs='terraform show '
alias fsj='terraform show -json '

Then run:

source ~/.zshenv

Boom. Productivity unlocked. 🚀

:::info You might ask... hey, why choose f as the alias of terraform? Insted of something tf?Good question! Because why type two characters? If just one, f is just under your fingers more convenient than reaching to t! :::

Beginner Script to Explore Terraform Language

Before we jump into Terraform with AWS, we need to make sure we understand how Terraform works without involving AWS.

Let’s create some .tf files to practice variables, data sources, and conditionals.

mkdir -p tutorial/basic
cd tutorial/basic

Save this as test-vars.tf.

variable "my-test" {
  type    = number
  default = 123
}

variable "my-map" {
  type = map(any)
  default = {
    "key1" = "value1"
    "key2" = "value2"
  }
}

variable "my-list" {
  type = list(any)
  default = [
    "value1",
    "value2"
  ]
}

output "my-test" {
  value = {
    value1 = var.my-map["key1"]
    value2 = var.my-list[0]
  }
}

variable "environment" {
  type    = string
  default = "dev"
}

output "conditional-test-output" {
  value = var.environment == "dev" ? "Development Environment" : "Production Environment"
}

data "local_file" "local_file_example" {
  filename = "${path.module}/test-vars.tf"
}

output "file-content" {
  value = data.local_file.local_file_example.content
}

Above Terraform script gives us a nice little playground to understand several core features: variables with different data types, outputs, conditional expressions, refer to an element in the array or map, and a data source to read from a local file.

It starts by defining three variables:

my-test is a number type with a default of 123,
my-map is a map with arbitrary values (using any),
my-list is a list also holding values of any type.

Data Types

There is alwasy a question in every programming language, same in Terraform HCL:

How to declare variables in terraform?

I pulled a table to illustrate data types as below:

Data Type	Description	Example
`string`	A sequence of Unicode characters (text)	`variable "env" { type = string default = "dev" }`
`number`	A numeric value (int or float)	`variable "count" { type = number default = 3 }`
`bool`	Boolean (true or false)	`variable "enabled" { type = bool default = true }`
`list(type)`	Ordered sequence of values of same type	`variable "names" { type = list(string) default = ["a", "b"] }`
`map(type)`	Key-value pair object with same type values	`variable "tags" { type = map(string) default = { "env" = "prod" } }`
`set(type)`	Like a list, but unordered and unique	`variable "unique_ids" { type = set(string) default = ["a", "b", "a"] }`
`tuple([types])`	Ordered collection of mixed types	`variable "example" { type = tuple([string, number]) default = ["x", 10] }`
`object({ ... })`	Structured object with named attributes	`variable "config" { type = object({ name = string, count = number }) default = { name = "x", count = 1 } }`
`any`	Wildcard for any type (use sparingly)	`variable "dynamic_input" { type = any default = "maybe anything" }`

Output Block

Then we have an output "my-test" block that shows how to extract values from these structures: it pulls key1 from the map and the first element of the list. This block showcases interpolation and indexing. With output, after running terraform apply, this output will display value1 and the first item from my-list.

:::info Output is like Terraform’s way of giving you results or data to feed into other modules, in other words, whatever data we're interested in. :::

Conditional Expression

We also introduce a variable "environment" set to "dev", and use a conditional expression in output "conditional-test-output" to return a string based on its value—mimicking basic logic without needing an if block.

:::warning In Terraform, there’s no traditional if-else block like in many programming languages, but conditional expressions serve a similar purpose. :::

Data Resrouce Block

Finally, there's a data resource: data "local_file", which loads the content of a file named test-vars.tf located in the same module directory, and outputs its content. This is a powerful feature when your Terraform config needs to reference external data—like existing files, templates, or config artifacts.

Terraform Commands

To manage infrastructure effectively with Terraform, there’s a standard lifecycle of commands that help you maintain control and visibility over changes.

terraform init initializes the working directory containing the .tf files. It downloads the necessary provider plugins (like AWS) and prepares the backend if we're using one. This step is required before any other command.
terraform validate performs a syntax check on your configuration files to ensure everything is well-formed. It catches structural issues early but doesn't check the actual resource existence or cloud-level constraints.
terraform plan creates an execution plan showing what actions Terraform will take. You might ask, "How to read terraform plan output", it's simle! It compares desired state (as defined in the code) with the current state and highlights what will be created, changed, or destroyed—without actually applying any changes.
terraform apply executes the actions proposed by the plan, provisioning or modifying infrastructure to match your configuration. This is when Terraform interacts with AWS (or other providers) to make things real.
terraform output displays the values defined in your output blocks after a successful apply. It’s commonly used to retrieve resource attributes (like instance IPs or ARNs) needed for further automation or verification.

Run `test-vars.tf`

Let's use above test-vars.tf file to practice above commands:

[caption id="attachment_4994" align="aligncenter" width="1366"] perform terraform init on test-var.tf[/caption]

[caption id="attachment_4998" align="aligncenter" width="1288"] perform terraform validate on test-var.tf[/caption]

[caption id="attachment_4997" align="aligncenter" width="1682"] perform terraform plan on test-var.tf[/caption]

[caption id="attachment_4995" align="aligncenter" width="1118"] perform terraform output on test-var.tf[/caption]

Add AWS Capability into Terraform

One of the best parts about using Terraform with AWS is how easily you can spin up and tear down entire environments with a single command. Let’s get our local environment ready for cloud magic.

Setup AWS Root Account and Create IAM User

First, head over to aws.amazon.com and create a root account if you haven’t already.

Then, inside the AWS Console:

Navigate to IAM > Users
Create a user named terraform-admin
Unchecked "Users must create a new password at next sign-in", no need for testing purpose
Grant it the AdministratorAccess (AWS Managed Policy)
Enable programmatic access (you'll need the access key + secret)
Console access is optional, I granted it anyway

This IAM user will act as our Terraform operator.

A few screenshots:

After user creation, then we need to create access id and secret key for Terraform as it needs authentication.

Click the user in IAM, go to "Security Credentials" tab, scroll down to find "Access Key" section, create it as below:

We will use the IAM user and the credentials later when terraform .tf files interacting with AWS.

Create IAM Users in Terraform With AWS

Finally we're here to explain "how to create iam user in aws using terraform"... No worries.

Honestly, Terraform doesn’t have much cryptic or hard-to-read syntax—it’s pretty clean. But there're two features I want to highlight: the count and for_each.

You might not need either of them on day one, but once you start managing resources with repeatable nested blocks—like multiple tags, multiple ingress rules, or custom configurations per item—it quickly becomes a favorite.

Here I want to demo the usage of each for our first Terraform with AWS blog series by provisioning multiple AWS IAM users using Terraform with AWS, it's a practical AWS-focused script to create multiple IAM users actually.

Let's first focus on Count first.

AWS Authentication Setup in Terraform

Before .tf files to work in Terrafrom with AWS, Terraform needs to get credentials of AWS IAM user so that it can be authenticated successfully. There are different methods to authenticate Terraform. Here I am using environment variables.

:::warning Please replace your_access_key_id and your_secret_access_key with the credentials of IAM user created earlier in below. :::

export AWS_ACCESS_KEY_ID='your_access_key_id'
export AWS_SECRET_ACCESS_KEY='your_secret_access_key'
export AWS_DEFAULT_REGION='us-west-1'
export AWS_PROFILE="default"
export AWS_CONFIG_FILE="$HOME/.aws/config"
export AWS_SHARED_CREDENTIALS_FILE="$HOME/.aws/credentials"

Let's create folder:

mkdir -p tutorial/aws-iam
cd tutorial/aws-iam

Count Usage

This script aws_iam.tf looks like below initially without using count:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

resource "aws_iam_user" "terraform_user_0" {
  name = "terraform-user-0"
}

resource "aws_iam_user" "terraform_user_1" {
  name = "terraform-user-1"
}

resource "aws_iam_user" "terraform_user_2" {
  name = "terraform-user-2"
}

:::warning IAM users are global, so no need to specify region for them. :::

Emm, how to use count in terraform to simplify above code?

You got this! Here we improve scalability with count:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

variable "user_prefix" {
  type    = string
  default = "terraform-user"
}

variable "user_count" {
  type    = number
  default = 3
}

resource "aws_iam_user" "terraform_user" {
  count = var.user_count

  name = "${var.user_prefix}-${count.index}"
}

So now, the benefits is obvious - No hard-coded user name and it's easier to scale!

for_each Usage

Now, how to use for_each in terraform? Let's focus on for_each.

While both approaches (for_each and count) are valid in this Terraform with AWS code, they serve different purposes depending on the use case.

Here’s how to create multiple IAM users using for_each:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

variable "user_prefix" {
  type    = string
  default = "terraform-user"
}

variable "user_count" {
  type    = number
  default = 3
}

locals {
  user_names = [for i in range(var.user_count) : "${var.user_prefix}-${i}"]
}

resource "aws_iam_user" "terraform_user" {
  for_each = toset(local.user_names)

  name = each.value
}

This approach uses a local variable to generate a set of user names and then iterates over each unique name using for_each. Each item in the set becomes a resource instance with its own lifecycle, based on the value of each.value.

Count vs. for_each: When to Use Which

Aspect	`count`	`for_each`
Input type	Integer	Set, map, or other collection
Index reference	`count.index`	`each.key` / `each.value`
Resource tracking	Index-based	Value/key-based
Reordering impact	Can recreate resources on list changes	More stable; avoids recreation if values remain
Best suited for	Identical resources with predictable count	Resources that need to be uniquely identified

Use count when you need a fixed number of uniform resources and the specific identity of each resource doesn’t matter. Use for_each when you're dealing with uniquely named resources or working with sets/maps — especially in scenarios where identity and lifecycle tracking are important.

Both approaches are fully supported, and the choice should be guided by the structure of your data and the operational needs of your infrastructure.

Advanced: Create AWS VPC/Network/EC2 With Security Groups

Whether you're building a simple EC2 instance or managing complex networking, Terraform with AWS keeps everything declarative and under control.

Now, let’s build a practical example that:

Create a VPC and subnet in AWS
Set up internet access
Add a security group that allows SSH and HTTP inbound, and all traffic outbound
Create an EC2 instance and attach security group

Directory Structure

tutorial/aws-vpc-ec2-demo/
├── main.tf
├── network.tf
├── internet_gateway.tf
├── route_table.tf
├── security_group.tf
├── ec2.tf

Get into the folder:

mkdir -p tutorial/aws-vpc-ec2-demo
cd tutorial/aws-vpc-ec2-demo

main.tf

This section is always required when working on Terraform with AWS.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-west-1"
}

The block of code in above is the essential handshake between Terraform and AWS. The terraform block specifies that your configuration requires the AWS provider, sourced from HashiCorp’s registry, and locked to version ~> 5.0, which means any non-breaking updates in the 5.x series are acceptable. This ensures compatibility and stability across Terraform runs.

About the version match, I pulled this table for your quick references:

Version Constraint	Meaning	Example Allowed Versions
`~> 3.5`	Allow patch-level updates within 3.x (>=3.5.0, <4.0.0)	3.5.0, 3.5.1, 3.6.0
`~> 3`	Allow any version within major version 3 (>=3.0.0, <4.0.0)	3.0.0, 3.5.2, 3.99.99
`>= 3.5, < 3.8`	Allow only versions in a specific minor range	3.5.0, 3.6.1, 3.7.9
`= 3.5.2`	Pin to a specific version only	Only 3.5.2
`> 3.5`	Allow any version greater than 3.5 (but not 3.5)	3.6.0, 4.0.0
`<= 3.5`	Allow versions less than or equal to 3.5	3.0.0, 3.4.9, 3.5.0

The provider "aws" block then sets the context for how Terraform interacts with your AWS environment — in this case, targeting the us-west-1 region. This tells Terraform, “Hey, deploy all the resources in the California region.” By declaring the provider and version this way, we're building a reproducible, consistent infrastructure-as-code setup that won’t break unexpectedly if a newer major version of the provider is released.

:::info Since we've export credentials in Environment Variables, so no need to specify credentials in `provider aws` section. :::

network.tf

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}

resource "aws_subnet" "main" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.1.0/24"
  map_public_ip_on_launch = true
}

In AWS, every resource lives inside a Virtual Private Cloud (VPC). This Terraform with AWS block creates a custom VPC with a /16 CIDR block, which allows for 65,536 private IP addresses — a large range that gives you plenty of room to grow. By the way, AWS automatically creates a default VPC in every region.

The aws_subnet slices a /24 block from the VPC — allowing 256 IPs (minus AWS reservations). The critical flag here is map_public_ip_on_launch = true. Without this, your EC2 instances won’t get a public IP, and you'll be stuck trying to SSH into a black hole. With this setting enabled, instances launched into this subnet will be publicly addressable.

internet_gateway.tf

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.main.id
}

route_table.tf

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
  }

  tags = {
    Name = "PublicRouteTable"
  }
}

resource "aws_route_table_association" "public_subnet" {
  subnet_id      = aws_subnet.main.id
  route_table_id = aws_route_table.public.id
}

security_group.tf

Security groups in AWS are like bouncers for your EC2 instances — they control what traffic is allowed in or out of your virtual machines. Whether you're allowing SSH for remote access or HTTP for your website, security groups are your first line of defense.

When working on Terraform with AWS, you can define security rules inline within the aws_security_group resource. HashiCorp recommends using dedicated resources for ingress/egress rules now:

aws_vpc_security_group_ingress_rule
aws_vpc_security_group_egress_rule

Previsouly, Hashicorp provides ingress and egress arguments of the aws_security_group resource for configuring in-line rules. But they struggle with managing multiple CIDR blocks, and tags and descriptions due to the historical lack of unique IDs. So now using aws_vpc_security_group_egress_rule and aws_vpc_security_group_ingress_rule resources is the current best practice.

resource "aws_security_group" "web_sg" {
  name        = "web_sg"
  description = "Allow HTTP and SSH"
  vpc_id      = aws_vpc.main.id
}

resource "aws_vpc_security_group_ingress_rule" "http_in" {
  security_group_id = aws_security_group.web_sg.id
  cidr_ipv4         = "0.0.0.0/0"
  from_port         = 80
  to_port           = 80
  ip_protocol       = "tcp"
}

resource "aws_vpc_security_group_ingress_rule" "ssh_in" {
  security_group_id = aws_security_group.web_sg.id
  cidr_ipv4         = "0.0.0.0/0"
  from_port         = 22
  to_port           = 22
  ip_protocol       = "tcp"
}

resource "aws_vpc_security_group_egress_rule" "all_out" {
  security_group_id = aws_security_group.web_sg.id
  cidr_ipv4         = "0.0.0.0/0"
  from_port         = 0
  to_port           = 0
  ip_protocol       = "-1"
}

ec2.tf

data "aws_ami" "amazon_linux_2" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

resource "aws_instance" "web" {
  ami                         = data.aws_ami.amazon_linux_2.id
  instance_type               = "t2.micro"
  subnet_id                   = aws_subnet.main.id
  vpc_security_group_ids      = [aws_security_group.web_sg.id]
  associate_public_ip_address = true

  tags = {
    Name = "terraform"
  }

  user_data = <<-EOF
              #!/bin/bash
              sudo amazon-linux-extras enable nginx1
              sudo yum clean metadata
              sudo yum install -y nginx
              sudo systemctl start nginx
              EOF
}

Here we're launching a micro-sized instance using the latest AMI using Terrafrom with AWS, placing it in our subnet, and attaching a security group that allows HTTP traffic.

It's using the latest Amazon Linux 2 AMI — all without hardcoding image IDs as we have pulled the aws_ami resource and filtered our required AMI images. There might be multiple AMI available, most_recent will ensure it pick up the latest one. Specifying t2.micro is important, because it's available for free tier of AWS account, I don't want AWS bill surprise me...

The user_data block runs a bash script to install and start NGINX right after the instance boots — voilà, instant web server! 🎉

Terraform Graph

When working with even a moderately sized Terraform with AWS project—like our aws-vpc-ec2-demo that stitches together VPCs, subnets, security groups, EC2 instances, internet gateways, and more—keeping track of how all the resources relate to each other can get a bit overwhelming. That’s where the magic of terraform graph comes in.

Terraform automatically analyzes all your .tf files and maps out the dependencies between resources, so it knows exactly what needs to be created first, what depends on what, and how to destroy them safely in reverse order. It builds a dependency graph internally—and you can view this visually by piping the output of terraform graph into a tool like Graphviz. It's an eye-opener for understanding Terraform’s internal logic and a fantastic way to document and debug your setup.

Just perform:

terraform graph

You will get:

❯ terraform graph
digraph G {
  rankdir = "RL";
  node [shape = rect, fontname = "sans-serif"];
  "data.aws_ami.amazon_linux_2" [label="data.aws_ami.amazon_linux_2"];
  "aws_instance.web" [label="aws_instance.web"];
  "aws_internet_gateway.igw" [label="aws_internet_gateway.igw"];
  "aws_route_table.public" [label="aws_route_table.public"];
  "aws_route_table_association.public_subnet" [label="aws_route_table_association.public_subnet"];
  "aws_security_group.web_sg" [label="aws_security_group.web_sg"];
  "aws_subnet.main" [label="aws_subnet.main"];
  "aws_vpc.main" [label="aws_vpc.main"];
  "aws_vpc_security_group_egress_rule.all_out" [label="aws_vpc_security_group_egress_rule.all_out"];
  "aws_vpc_security_group_ingress_rule.http_in" [label="aws_vpc_security_group_ingress_rule.http_in"];
  "aws_vpc_security_group_ingress_rule.ssh_in" [label="aws_vpc_security_group_ingress_rule.ssh_in"];
  "aws_instance.web" -> "data.aws_ami.amazon_linux_2";
  "aws_instance.web" -> "aws_security_group.web_sg";
  "aws_instance.web" -> "aws_subnet.main";
  "aws_internet_gateway.igw" -> "aws_vpc.main";
  "aws_route_table.public" -> "aws_internet_gateway.igw";
  "aws_route_table_association.public_subnet" -> "aws_route_table.public";
  "aws_route_table_association.public_subnet" -> "aws_subnet.main";
  "aws_security_group.web_sg" -> "aws_vpc.main";
  "aws_subnet.main" -> "aws_vpc.main";
  "aws_vpc_security_group_egress_rule.all_out" -> "aws_security_group.web_sg";
  "aws_vpc_security_group_ingress_rule.http_in" -> "aws_security_group.web_sg";
  "aws_vpc_security_group_ingress_rule.ssh_in" -> "aws_security_group.web_sg";
}

Let's paste above into https://dreampuf.github.io/GraphvizOnline/, we can get graph as below:

Helpful, right?Once we understand how Terraform with AWS handles dependencies and state, our infrastructure starts to feel like elegant code — not chaos.

Let's give it a try!

![terraform with aws vpc and ec2 demo's output of "terraform apply"](./aws-vpc-ec2-demo-terraform-apply-1.jpg "terraform with aws vpc and ec2 demo's output of "terraform apply"")

Let's get the public IP and test Nginx:

Hooray!

Spent nearly a week putting together this first post on "Terraform with AWS" guide — it's not that complicated, because I wanted every command, every config, and every explanation to click for anyone following along. From VPC, IAM users, Security Groups to EC2 and best practices, I’ve covered the real stuff you'd face when building infrastructure from scratch using Terraform with AWS. 💻☁️

This blog series is all about mastering Terraform with AWS from the ground up — no shortcuts, just clean, scalable infrastructure-as-code.

Up next? We're going beyond the basics — deploying a fully working EKS (Elastic Kubernetes Service) cluster with Terraform. If you thought this post was useful, wait until you see what’s coming. Buckle up, cloud wranglers. 🚀

Stay tuned for the next post! 😎🔥

:::info I’ve pushed everything to GitHub for you! You can find all the Terraform scripts from this blog post right here: 👉 https://github.com/geekcoding101/iac/tree/main/terraform/tutorial 🚀 :::

:::info You're on a roll! Don't stop now—check out the full series of Kubernetes and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, I laid out the networking plan, my goals for setting up Kubernetes, and how to prepare a base VM image for the cluster.

🚀 In Part 3, I finished the Kubernetes cluster setup with Flannel, got one Kubernetes master and 4 worker nodes that’s ready for real workloads.

🚀 In Part 4, I explored NodePort and ClusterIP,understood the key differences, use cases, and when to choose each for internal and external service access!🔥

🚀 In Part 5, I dived into ExternalName and LoadBalancer services, uncovering how they handle external access, DNS resolution, and dynamic traffic distribution! :::

Terraform Meta Arguments Unlocked: Practical Patterns for Clean Infrastructure Code

GeekCoding101 — Mon, 21 Apr 2025 00:00:00 GMT

I’ve always found Terraform meta arguments a bit confusing at first glance—not count, for_each, but things like connection, provisioner, depends_on, source and lifecycle often seem straightforward but can behave unexpectedly in different contexts. That’s why I decided to write this blog post: to break them down clearly, explain what each one does, and show practical examples of how and when to use them effectively.

Terraform Meta Arguments Table

Meta-Argument	Applicable To	Description
`count`	`resource`, `module`, `data`	Create multiple instances of a resource or module using a number.
`for_each`	`resource`, `module`, `data`	Create multiple instances using a map or set of strings. More flexible than count.
`provider`	`resource`	Specify which provider configuration to use if multiple are defined.
`depends_on`	`resource`, `module`, `data`	Explicitly define dependencies between resources or modules.
`lifecycle`	`resource`	Control resource creation and destruction behavior (e.g., prevent_destroy, ignore_changes).
`provisioner`	`resource`	Run scripts or commands after a resource is created or destroyed.
`connection`	`resource`	Define how to connect to a remote resource (used with provisioners).
`source`	`module`	Specify the location of a module (registry, Git, local path, etc.).

Usage

🧮 Terraform meta arguments: count

resource "aws_instance" "web" { 
  count = 3 
  ami = "ami-0c55b159cbfafe1f0" 
  instance_type = "t2.micro" 
  tags = { 
    Name = "Web-${count.index}" 
  }
}

✅ Creates multiple instances using a simple integer.

⚠️ Attention Notes:

count.index starts from 0.
Not ideal for working with named collections (use for_each instead).
Not supported in provider blocks.

🔁 Terraform meta arguments: for_each

resource "aws_s3_bucket" "example" { 
  for_each = toset(["logs", "media", "backups"]) 
  bucket = "my-bucket-${each.key}" 
  acl = "private" 
}

✅ More flexible than count, supports map and set types.

⚠️ Attention Notes:

each.key and each.value used depending on collection type.
Keys must be unique.
Best for managing multiple named resources (e.g., per environment).

🧩 Terraform meta arguments: provider

provider "aws" { 
  region = "us-east-1" 
  alias = "east" 
} 

provider "aws" { 
  region = "us-west-2" 
  alias = "west" 
} 

resource "aws_instance" "example" { 
  provider = aws.west 
  ami = "ami-0c55b159cbfafe1f0" 
  instance_type = "t2.micro" 
}

✅ Specifies a particular provider config when multiple are defined.

⚠️ Attention Notes:

Only works in resource, not module blocks.
Must use alias to differentiate provider instances if have the same names.

🔗 Terraform meta arguments: depends_on

resource "aws_iam_role" "role" { 
  name = "example-role" 
  assume_role_policy = jsonencode({ 
    Version = "2012-10-17", 
    Statement = [{ 
      Effect = "Allow", 
      Principal = { Service = "ec2.amazonaws.com" }, 
      Action = "sts:AssumeRole" 
    }] 
  }) 
} 

resource "aws_iam_role_policy_attachment" "attachment" { 
  role = aws_iam_role.role.name 
  policy_arn = "arn:aws:iam::aws:policy/ReadOnlyAccess" 
  depends_on = [aws_iam_role.role] 
}

✅ Ensures ordering when Terraform can't automatically infer it.

⚠️ Attention Notes:

Use when a dependency is implicit (e.g., local-exec, provisioners).
Can be used in resource, module, and data blocks.

♻️ Terraform meta arguments: lifecycle

resource "aws_instance" "db" { 
  ami = "ami-0c55b159cbfafe1f0" 
  instance_type = "t3.micro" 

  lifecycle { 
    prevent_destroy = true 
    create_before_destroy = true 
    ignore_changes = [tags["Owner"]] 
  } 
}

Lifecycle Argument	Default	Description
`create_before_destroy`	`false`	Ensures a new resource is created before the old one is destroyed to avoid downtime. Common in zero-downtime deployments.
`prevent_destroy`	`false`	Prevents a resource from being destroyed. Terraform will produce an error if a destroy is attempted on this resource.
`ignore_changes`	`[]`	Ignores changes to specific attributes in future plans. Useful for fields updated externally or during auto-scaling.
`replace_triggered_by`	`[]`	Forces resource replacement when another referenced resource or attribute changes. Introduced in Terraform 0.15+

✅ Fine-tunes how Terraform handles changes and destruction.

⚠️ Attention Notes:

prevent_destroy helps protect critical infra.
ignore_changes avoids re-creating resources when certain fields change.
Only supported in resource blocks.

💻 Terraform meta arguments: provisioner

resource "null_resource" "example" { 
  provisioner "local-exec" { 
    command = "echo Hello, Terraform!" 
  } 
}

✅ Executes a script or command after resource creation.

⚠️ Attention Notes:

Best for ad-hoc automation or external configuration steps.
Two types: local-exec and remote-exec.
Not idempotent—Terraform can't track what was done.

🔐 Terraform meta arguments: connection

resource "null_resource" "remote" { 
  provisioner "remote-exec" { 
    inline = ["echo Connected!"] 
  } 

  connection { 
    type = "ssh" 
    user = "ubuntu" 
    host = "1.2.3.4" 
    private_key = file("~/.ssh/id_rsa") 
  } 
}

✅ Used with remote-exec provisioners to connect to VMs or servers.

⚠️ Attention Notes:

Only works inside a resource block.
Requires credentials and reachable IP.
Supported ssh and WinRM
Mostly used in VM provisioning, not cloud-native workflows.

📦 Terraform meta arguments: source (for modules)

module "vpc" { 
  source = "terraform-aws-modules/vpc/aws" 
  version = "~> 4.0" 
}

✅ Tells Terraform where to find the module (registry, Git, local, etc.).

⚠️ Attention Notes:

Only valid in module blocks.
the source argument in a Terraform module block does not support dynamic expressions like variables. It must be a static, known-at-plan-time string.
When using the registry, you can also set a version.

Advanced Usage

This example is to help me to understand what is a module, how the variables passing between root module and child modules, advanced usage of count, how to bypass source limit that it cannot use variables, usage of null_resource and local-exec usage.

Directory Structure

❯ tree
.
├── main.tf
├── modules
│   ├── aws_module
│   │   └── main.tf
│   ├── azure_module
│   │   └── main.tf
│   └── wrapper
│       ├── main.tf
│       └── variables.tf
└── variables.tf

Code Walkthrough

❯ cat main.tf
module "cloud_infra" {
  source        = "./modules/wrapper"
  provider_type = var.provider_type
}

❯ cat variables.tf
variable "provider_type" {
  description = "Which cloud provider to use: 'aws' or 'azure'"
  type        = string
  default     = "aws"
}
❯ cat modules/wrapper/main.tf
module "aws" {
  source = "../aws_module"
  count  = var.provider_type == "aws" ? 1 : 0
}

module "azure" {
  source = "../azure_module"
  count  = var.provider_type == "azure" ? 1 : 0
}
❯ cat modules/wrapper/variables.tf
variable "provider_type" {
  description = "Cloud provider type: aws or azure"
  type        = string
}
❯ cat modules/aws_module/main.tf
resource "null_resource" "aws_example" {
  provisioner "local-exec" {
    command = "echo Deploying AWS Infrastructure"
  }
}
❯ cat modules/azure_module/main.tf
resource "null_resource" "azure_example" {
  provisioner "local-exec" {
    command = "echo Deploying Azure Infrastructure"
  }
}

Takeaways

This is just an example to illustrate the usage of a wrapper module in Terraform, but it’s also grounded in practical value you’d encounter in real-world scenarios.

Modular Abstraction
- The root module only needs to know what to deploy, not how it's deployed for each provider.
- You use a variable (e.g. provider_type) to switch between providers.
Wrapper Logic in Its Own Module
- The wrapper module decides which provider-specific module to load based on the input value.
- This avoids conditional logic spread across the root module.
Directory Structure Reflects Cloud Providers
- Each cloud (AWS, Azure, etc.) has its own submodule with isolated logic.
- This keeps code clean and avoids mixing different cloud resources in the same files.
Dynamic Module Source Selection
- By using a variable in the module source path and combine with count, you can dynamically load the desired submodule (aws, azure, etc.).
- This is static at plan/apply time, but flexible from a design perspective.
Encapsulation of Variables
- Variables like provider_type are declared at every module level that needs them.
- This ensures smooth flow of configuration down from root module → wrapper → cloud module.
Easy to Extend
- Adding support for a new cloud provider (e.g. GCP) is as simple as adding a new folder and extending the wrapper logic — no changes needed in the root module.

Summary

Terraform meta arguments—like lifecycle, and provisioner—can seem straightforward until you hit real-world use cases. In this post, I broke down each of these Terraform meta arguments with clear explanations, practical examples, and some gotchas to watch out for. I also shared a working demo using a wrapper module pattern to dynamically deploy AWS or Azure modules based on input variables. Whether you're new to Terraform modules or just want to sharpen your understanding of Terraform meta arguments behaviors, this guide aims to bring clarity to the chaos.

:::tip Feel free to check out my other Terraform blog posts:Mastering Terraform with AWS Guide Part 1: Launch Real AWS Infrastructure with VPC, IAM and EC2 :::

:::info You're on a roll! Don't stop now—check out the full series of Kubernetes and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, I laid out the networking plan, my goals for setting up Kubernetes, and how to prepare a base VM image for the cluster.

🚀 In Part 3, I finished the Kubernetes cluster setup with Flannel, got one Kubernetes master and 4 worker nodes that’s ready for real workloads.

🚀 In Part 4, I explored NodePort and ClusterIP,understood the key differences, use cases, and when to choose each for internal and external service access!🔥

🚀 In Part 5, I dived into ExternalName and LoadBalancer services, uncovering how they handle external access, DNS resolution, and dynamic traffic distribution! :::

Terraform Associate Exam: A Powerful Guide about How to Prepare It

GeekCoding101 — Sun, 27 Apr 2025 00:00:00 GMT

Today, I officially passed the HashiCorp Certified: Terraform Associate (003) exam! 🚀

It wasn’t hard — It's one hour exam and I finished in about 40 minutes, reviewed a few flagged questions, and then confidently submitted.

Now, while I'm parking the more advanced HashiCorp Certified: Terraform Authoring & Operations Professional with AWS (HCTOP-002-AWS) for the moment, my next mission is to tackle Certified Kubernetes Administrator (CKA).
After that, we’ll see whether I circle back to the Terraform Professional Level exam.

🔁 A Quick Rewind: The Journey

If you’ve seen my last two blog posts about Terraform, you might have guessed it —
👉 I actually booked the Terraform associate exam upfront, way before I even started the real preparation.

I booked it on purpose to push myself — to create that real, no-turning-back deadline pressure.
Since then, I've been squeezing in study time almost every day, balancing learning from Udemy courses, doing hands-on practice, and posting my Terraform learning journey right here on this blog.

Lesson?
Setting a real goal date works. It forces you to move!

Terraform associate certification cost?

It's $70.5 as of today.

How long does terraform associate exam take?

One hour.

How many questions in terraform associate exam?

Total 57 questions.

What type of questions in Terraform associate exam?

Good question! The Terraform associate exam includes:

Multiple Choice (Single Answer)
Multiple Choice (Multiple Answer)
True/False
Fill the blank

You can find the sample questions at HashiCorp official site here.

Is the Terraform Associate Certification worth it?

In short: absolutely, yes — if you work with cloud infrastructure.

Terraform has become the industry standard for Infrastructure as Code (IaC), and getting certified shows you understand not just the basic commands, but the right way to manage infrastructure at scale — including modules, state management, workspaces, and advanced features like remote backends. The certification isn’t just a checkbox; it validates that you can design clean, reusable, and reliable Terraform configurations — a huge plus for DevOps, cloud engineers, and even platform architects.

Especially the Professional Level exam are all hands-on format and take 3 hours to finish and it's quite challenging!

If you're serious about leveling up in the cloud/DevOps world, I would recommend you to take it.

📚 Resources I Used to Prepare

Big shoutout to the two courses that shaped my Terraform associate exam preparation journey:

🎯 Learn from official website of this exam at here.
🎯 All in One course for learning Terraform and gaining the official Terraform Associate Certification (003) by Zeal Vora - 👉 Check it out here
🎯 The Original Terraform Associate 003 Prep: Pass your Terraform cert with 300+ Questions with Explanations and Resources by Bryan Krausen - 👉 Check it out here

I also created my own error notebook and learning notes during my preparation as you see in next paragraphs.

✍️ My Terraform Associate Notes and Takeaways

Here's a brain-dump of everything I noted down during my preparation — real, practical, exam-focused.

Even you feel the Terraform Associate is an beginner level exam, I still highly recommend you to read through every point I noted here! You will find out that you will not just need them in Terraform Associate exam!

Key Terraform Concepts and Nuggets

Main Notes

Terraform Community CLI does not natively support VCS (Version Control System) connections. You have to manually pull/push. But Terraform Cloud / HCP Terraform support it.
There is a special block: moved Tells Terraform a resource has changed address, without touching the actual infrastructure. Used for Renaming, restructuring resources or modules safely. Example:
```
moved {
  from = aws_instance.old_name
  to   = aws_instance.new_name
}
```
It tells Terraform:

:::info This old resource is now considered to be at this new address — DON'T destroy and recreate it. :::
Since Terraform 1.5, there is an import block! Old way: Run terraform import manually for each resource
```
terraform import <resource_type>.<resource_name> <real-world-ID>
```
New way: Declare imports alongside code:
```
import {
  id = "i-1234567890abcdef0"
  to = aws_instance.example
}
```
It tells Terraform:
- Terraform will import the AWS EC2 instance with ID i-1234567890abcdef0
- And map it to the resource aws_instance.example declared in your .tf file.
terraform console also will lock state file!
True or false: In Terraform Community, workspaces generally use the same code repository while workspaces in Terraform Enterprise/Cloud are often mapped to different code repositories. Answer: True. In Terraform Community, workspaces typically share the same code repository, allowing multiple environments or configurations to be managed within the same repository. On the other hand, in Terraform Enterprise/Cloud, workspaces are often mapped to different code repositories to provide better isolation and organization for different projects or teams.
Object type can specify data type for each field, but all values map type must be same type.
```
variable "example_map" {
  type = map(string)
}
```
True or false: Infrastructure as code (IaC) tools allow you to manage infrastructure with configuration files rather than through a graphical user interface. Answer: True
provider block is not a must! Terraform can automatically detect and use providers based on the resource configurations defined in the code.
terraform plan is not a must before terraform apply!
Run terraform init successfully, then directly run terraform apply, what would happen? It will scan target infrastructure, create new state file, then deploy.
Run terraform init, then if removing the version line from module block, run terraform init -upgrade, what would happen? Terraform WILL NOT download latest version of modules! Terraform will use already downloaded version as Terraform cache it locally!
Run terraform init -upgrade, what would happen? It will check and download latest version of plugins/modules complies with the configuration’s version constraints
If the backend hosting state does not supports state blocking, then two terraform apply at the same might cause corruption in state file!
State file has one purpose to improve performance!
What does Terraform agents do? Execute plan and apply changes in infrastructure!
True or False: Any sensitive values referenced in the Terraform code, even as variables, will end up in plain text in the state file. That’s TRUE!!!!
Same resources needs to have provider alias to have different configuration! Like one AWS resource needs to be in east region, another AWS with alias can be in west region!
The primary use of Infrastructure as Code (IaC)? The ability to programmatically deploy and configure resources
True or False: In both Terraform Community and HCP Terraform, workspaces provide similar functionality of using a separate state file for each workspace. Answer: True

For_each Value Referencing Table

Exhibited Code:

variable "env" {
  type = map(any)
  default = {
    prod = {
      ip = "10.0.150.0/24"
      az = "us-east-1a"
    }
    dev = {
      ip = "10.0.250.0/24"
      az = "us-east-1e"
    }
  }
}
resource "aws_subnet" "example" {
  for_each          = var.env
  cidr_block        = each.value.ip
  availability_zone = each.value.az
  tags = {
    Name = "subnet-${each.key}"
  }
}

Keyword	Meaning
`each.key`	The map key (prod or dev)
`each.value`	The full object for that key
`each.value.ip`	The IP address for that environment
`each.value.az`	The availability zone for that environment

📚 Terraform Golden Rule Mismatch Table

Well, I gave the name "Golden Rule" ^^

Mismatch	Terraform Reaction
Missing in state but exists in config?	Terraform plans to create it.
Exists in state but missing in config?	Terraform plans to destroy it.

More Terraform Pro Tips

The credentials to aws are defined in Provider block!

provider "aws" {
  region     = "us-east-1"
  access_key = "YOUR_ACCESS_KEY"
  secret_key = "YOUR_SECRET_KEY"
}

No name in provider block, it’s using alias. How to use the alias in resource?

provider "aws" {
  region = "us-east-1"
}
provider "aws" {
  alias  = "west"
  region = "us-west-2"
}
resource "aws_instance" "example" {
  provider      = aws.west  # use the aliased provider
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t2.micro"
}

Inside terraform block: required_version - to constrain Terraform CLI version.

Backend configuration must be defined inside the terraform block. You cannot define a backend inside a provider block or outside the terraform block:

terraform {
  backend "remote" {
    hostname = "app.terraform.io"
    organization = "btk"

    workspaces {
      name = "bryan-prod"
    }
  }
}

Constrain single or multiple provider version in terraform block, None of this can be in a provider block!!!:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    azurerm = {
      source = "hashicorp/azurerm"
      version = "2.90.0"
    }
  }
}

The pattern of source path if using private registry:

module "vpc" {
  source  = "registry.example.com/devops-team/vpc-module/aws"
  version = "1.0.3"
}

Below is on Terraform public registry!

module "consul" {
  source = "hashicorp/consul/aws"
  version = "0.1.0"
}

References Value

Source	Reference Format	Example
Variable	`var.<variable_name>`	`var.instance_type`
Local	`local.<local_name>`	`local.default_tags`
Data Source	`data.<provider>_<data_type>.<name>.<attribute>`	`data.aws_ami.ubuntu.id`
Module Output	`module.<module_name>.<output_name>`	`module.network.vpc_id`
Resource Attribute	`<provider>_<resource_type>.<resource_name>.<attribute>`	`aws_instance.web.public_ip`
Terraform Built-in Functions	`<function>(<arguments>)`	`cidrsubnet(var.vpc_cidr, 8, 1)`
Terraform Meta-Arguments (special cases)	`self.<attribute>` (within resource)	`self.public_ip`

:::info Define local is using locals! :::

✨ Closing Thoughts

Passing the Terraform Associate (003) cert wasn't brutal — it just needed focused practice, real deadlines, and hands-on experience.

Next stop: CKA (Certified Kubernetes Administrator)!
Maybe afterward, I'll resume the Terraform Pro-level certs — but for now, it’s Kubernetes grind time. 🚀

:::info Feel free to check out my previous posts about Terraform:

Part 1: Mastering Terraform with AWS Guide Part 1: Launch Real AWS Infrastructure with VPC, IAM and EC2

Part 2: Terraform Meta Arguments Unlocked: Practical Patterns for Clean Infrastructure Code

:::

Golang Range Loop Reference - Why Your Loop Keeps Giving You the Same Pointer (and How to Fix It)

GeekCoding101 — Mon, 05 May 2025 00:00:00 GMT

When I first started learning Go, I thought I was doing everything right—until I ran into a weird bug about golang range loop reference. I was iterating over a list of Book structs (of course, I can't share the real structs and code used here... all here are for turorial purpose), taking the pointer of each one, and storing them into a slice. But at the end of the loop, all the pointers pointed to... the same book?! 🤯

Let’s walk through this classic Go beginner mistake together — and fix it the right way.

📚 The Use Case: A Slice of Books in a Library

Suppose we have a list of books, and we want to collect pointers to each one so we can modify them later.

Here’s the code I thought would work:

for _, book := range books {
    bookPointers = append(bookPointers, &book) // Oops...
}

But when I printed out the pointers, they all pointed to the last book in the list. This bug stumped me for a while until I understood one critical Go behavior.

The File Structure To Run The Code

learning-golang/
├── 01-loop-reference-pitfall/
│   ├── main.go
│   └── README.md
├── Makefile
├── bin/
└── go.mod

This is the complete buggy code:

package main

import (
    "fmt"
)

type Book struct {
    Title  string
    Author string
}

func main() {
    originalBooks := []Book{
        {"Go in Action", "William Kennedy"},
        {"The Go Programming Language", "Alan Donovan"},
        {"Introducing Go", "Caleb Doxsey"},
    }

    fmt.Println("❌ Buggy Version:")
    var buggyPointers []*Book
    for _, book := range originalBooks {
        buggyPointers = append(buggyPointers, &book)
    }
    for _, bp := range buggyPointers {
        fmt.Printf("Title: %-30s | Address: %p\n", bp.Title, bp)
    }
}

The Makefile:

# Usage:
#   make run DIR=01-loop-reference-pitfall
#   make build DIR=01-loop-reference-pitfall
#   make clean

GO=go

run:
    @echo "👉 Running $(DIR)/main.go..."
    cd $(DIR) && $(GO) run main.go

build:
    @echo "🔧 Building binary from $(DIR)/main.go..."
    cd $(DIR) && $(GO) build -o ../bin/$(notdir $(DIR))

clean:
    @echo "🧹 Cleaning up built binaries..."
    rm -rf bin/

Build and Run The Code

❯ go mod init github.com/geekcoding101/learning-golang
❯ make run DIR=01-loop-reference-pitfall

As you see, the Address didn't change at all!

🐛 The Problem: Reuses the Loop Variable in Golang Range Loop Reference

In Go, when you do:

for _, book := range books

The book variable is reused in every iteration. It's not a new instance each time. So taking &book actually gives you the same memory address over and over.

This means every pointer in the slice is just pointing to the same memory location, which at the end holds the value of the last book.

✅ The Fix: Indexing Directly

The correct way is to use an index:

fmt.Println("\n✅ Fixed Version:")
var fixedPointers []*Book
for i := range originalBooks {
    fixedPointers = append(fixedPointers, &originalBooks[i])
}
for _, bp := range fixedPointers {
    fmt.Printf("Title: %-30s | Address: %p\n", bp.Title, bp)
}

Now, each pointer actually refers to the corresponding element in the original slice. Problem solved!

💻 Real Code Example on GitHub

I've documented this bug and fix in my GitHub repository:

👉 github.com/geekcoding101/learning-golang

Here's what you’ll find:

The buggy version (with all pointers pointing to the same book)
The fixed version (each pointer is correct)
A Makefile to help you run and build each learning topic

✍️ Follow My Golang Tutorials

I’ll continue sharing these hands-on lessons as I deepen my understanding of Go.

Check out my blog 👉 www.geekcoding101.com — where I share practical posts, breakdowns, and real-world insights from my coding journey.

📌 Quick Hashtag

#Golang Range Loop Reference, #for loop golang range, #for loop range golang, #golang for range loop, #for loop in golang with range, #go slice reference, #go for loop pointer trap, #why does my go loop store the same pointer, #golang how to correctly get pointer from loop, #go for loop pointer always same

A 12 Factor Crash Course in Python: Build Clean, Scalable FastAPI Apps the Right Way

GeekCoding101 — Mon, 12 May 2025 00:00:00 GMT

Intro: Building Apps That Don’t Suck in Production

Let’s be honest—plenty of apps “work on my machine” but self-destruct the moment they meet the real world. Configs hardcoded, logs missing, environments confused, and deployments that feel like an escape room puzzle.

If you want your service to thrive in production (and not become an ops horror story), you need a design philosophy that enforces clean separation, modularity, and resilience. That's where the 12 Factor App methodology comes in.

In this post, we’re going to break down each of the 12 Factor using a Python/FastAPI related stack—and walk through how to get them right.

🧱 The Twelve Factor — Python Style

Let’s take each principle, one by one. Think of it as a devops dojo, with Python as your katana.

Codebase: One codebase tracked in revision control, many deploys

12 Factor App: Single source of truth, version-controlled, no Franken-repos.

📌 In Python:

One Git repo per service.
Don't share code across projects via copy-paste. Use internal packages or shared libraries (published to private PyPI or via Git submodules).

✅ Best Practice:

/fastapi-12factor-app
├── app/
│   ├── api/
│   ├── core/
│   ├── models/
│   └── main.py
├── tests/
├── Dockerfile
├── pyproject.toml
├── README.md
└── .env

Dependencies: Explicitly declare and isolate dependencies

12 Factor App: No implicit magic. Use virtualenvs and lock your deps.

📌 In Python: Use pyproject.toml and a tool like Poetry or pip-tools.

✅ Example pyproject.toml:

[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.110.0"
uvicorn = "^0.29.0"
sqlalchemy = "^2.0"
pydantic = "^2.6"
python-dotenv = "^1.0"

🔒 Lock it down:

poetry lock

And run your app in a containerized environment, so your coworker’s Python 3.6 setup doesn’t eat your soul.

Config: Store config in the environment

Configs aren’t code. Environment variables FTW.

📌 In Python with Pydantic v2:

from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    database_url: str
    debug: bool = False

    class Config:
        env_file = ".env"

settings = Settings()

✅ .env for local:

DATABASE_URL=postgresql+asyncpg://user:pass@db:5432/app
DEBUG=true

🚀 Let Kubernetes inject real env vars in prod. No secrets in code, please.

Backing Services: Treat backing services as attached resources

12 Factor App: Databases, queues, and blobs should be replaceable.

📌 In FastAPI:

Define your database URL in settings.database_url, not hardcoded. SQLAlchemy supports this beautifully.

from sqlalchemy.ext.asyncio import create_async_engine

engine = create_async_engine(settings.database_url, echo=settings.debug)

🧪 In test, you can override DATABASE_URL with a SQLite memory DB. That’s the power of this separation.

Build, Release, Run: Strictly separate build and run stages

12 Factor App: Immutable images. Don’t change code/configs post-build.

📦 Dockerfile example:

FROM python:3.12-slim

WORKDIR /app
COPY pyproject.toml .
RUN pip install poetry && poetry install --no-dev

COPY . .

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

👊 Don’t inject secrets during build—use env at runtime.

Processes: Execute the app as one or more stateless processes

12 Factor App: Stateless, share-nothing services.

📌 In FastAPI:

Keep state (like DB sessions) outside the app object.
Use dependency injection for scoped connections.

from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import sessionmaker

async_session = sessionmaker(engine, expire_on_commit=False, class_=AsyncSession)

async def get_session() -> AsyncSession:
    async with async_session() as session:
        yield session

This plays nice with Kubernetes autoscaling and kills zombie state.

Port Binding: Export services via port binding

12 Factor App: Your app should be self-contained and listen on a port.

✅ FastAPI does this naturally:

uvicorn app.main:app --host 0.0.0.0 --port 8000

K8s service can bind this to external ports as needed. No Apache/Nginx glue required.

Concurrency: Scale out via the process model

12 Factor App: Scale horizontally, not by making megathreads.

📌 Use Uvicorn workers via gunicorn if needed, or just scale pods in K8s:

gunicorn -k uvicorn.workers.UvicornWorker app.main:app -w 4

Or define a HorizontalPodAutoscaler in K8s—clean separation.

Disposability: Fast startup and graceful shutdown

12 Factor App: Apps should start/stop fast and cleanly.

✅ In FastAPI, use startup/shutdown events:

from fastapi import FastAPI

app = FastAPI()

@app.on_event("startup")
async def on_startup():
    print("Ready to go!")

@app.on_event("shutdown")
async def on_shutdown():
    print("Shutting down gracefully...")

Kubernetes will send SIGTERM—be ready for it.

Dev/Prod Parity: Keep development, staging, and production as similar as possible

📌 Use .env for local, ConfigMaps/Secrets for prod, but same app code.

Also—use Docker for dev, same as prod. Don’t “just run it on the host.”

✅ Use docker-compose in dev (or Tilt/Skaffold) to mirror the prod infra.

Logs: Treat logs as event streams

12 Factor App: Don’t write to files. Stream to stdout/stderr.

✅ FastAPI + logging setup:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

@app.get("/health")
async def health():
    logger.info("Health check called")
    return {"status": "ok"}

🎯 Let Kubernetes + Fluentd/ELK/Grafana Loki deal with aggregation.

Admin Processes: Run admin/one-off tasks as one-off processes

12 Factor App: ✅ Create a separate scripts/ dir with admin tasks (DB migrations, data cleaning, etc.)

/scripts/
  └── migrate.py

Run it as:

python scripts/migrate.py

Or use K8s Jobs for one-offs in production.

Cheatsheet

Factor	Applies To	Description
`Codebase`	All apps	One codebase per app, tracked in version control, with many deploys.
`Dependencies`	Language/runtime	Explicitly declare and isolate dependencies via a manifest (e.g., `pyproject.toml`).
`Config`	Environment management	Store config in environment variables; never in code.
`Backing Services`	Databases, queues, caches	Treat services like resources; attach/detach them via config, not code changes.
`Build, Release, Run`	CI/CD pipelines	Separate build, release, and run stages. Never change code/config after release.
`Processes`	Application execution	Execute apps as stateless processes; share nothing, scale horizontally.
`Port Binding`	Web services	Export services via port binding; don’t depend on external web servers.
`Concurrency`	Scalability	Scale out via process model; use multiple instances or pods, not threads.
`Disposability`	Lifecycle management	Fast startup and graceful shutdown improve robustness and scalability.
`Dev/Prod Parity`	Dev environments	Keep development, staging, and production as similar as possible.
`Logs`	Observability	Treat logs as event streams; write to stdout/stderr and let the platform handle aggregation.
`Admin Processes`	One-off tasks	Run one-off admin tasks (e.g., migrations) as isolated processes, not part of the main app.

🔚 Wrapping It All Up

The 12 Factor App methodology isn’t just a checklist—it’s a survivability manual for cloud-native apps. And FastAPI, paired with Pydantic v2 and SQLAlchemy, makes following these principles refreshingly clean.

A few takeaways:

Treat config like royalty—never hardcode it.
Keep your app stateless and dumb—let Kubernetes do the smart scaling.
Stream your logs, don't hoard them.
Build once, deploy often, break never (hopefully).

:::info If you want to check more about engineering notes and system design, feel free to visit the tags at here :::

Kubernetes Control Plane Components Explained

GeekCoding101 — Sun, 18 May 2025 00:00:00 GMT

Intro: So… What Powers the Kubernetes Brain?

Everyone loves to show off their YAML-fu and talk about Pods and Deployments, but what’s actually running behind the scenes? What's keeping track of all your services, secrets, and scheduled chaos? Today I'd like to bring your a quick introduction about the Kubernetes control plane components—the brains of the operation. It’s made of several server-side components that work together like an orchestra of background daemons with trust issues and strict roles.

In this post, we’ll demystify each core server running in a Kubernetes control plane:
✅ etcd
✅ kube-apiserver
✅ kube-scheduler
✅ kube-controller-manager
✅ cloud-controller-manager
✅ kubelet
✅ kube-proxy
✅ coredns
✅ Optional players (metrics-server, CRI, CSI, etc.)

Let’s start with the foundation: how these pieces talk.

🧩 Kubernetes Architecture in a Nutshell

Kubernetes has two main types of nodes:

Control Plane Nodes (aka “masters”): run the brain (scheduler, API, etc.)
Worker Nodes: run your actual workloads (pods)

Think of the control plane as mission control and the worker nodes as spacecraft. One issues orders, the other executes.

They talk over HTTP/gRPC and communicate securely via TLS.

Now let’s break down the core components—what they are, what they do, and how they fit together.

🔐 1. Kubernetes Control Plane Components - etcd

What it is:
A distributed key-value store used to store all cluster data: objects, state, configs.

Think of it as:
The brain’s hard disk.

What lives inside etcd?

Pod definitions
ConfigMaps
Secrets
Cluster state
Node info
RoleBindings, CRDs, everything

Backed by:
etcd (from CoreOS), written in Go, uses the Raft consensus algorithm for HA and consistency.

Why it matters:
You never talk to etcd directly. The API server does only.

🧭 2. Kubernetes Control Plane Components - kube-apiserver

What it is:
The main RESTful API that all clients (kubectl, controllers, kubelet) talk to.

Think of it as:
The gatekeeper and translator.

Responsibilities:

Validates incoming requests
Authenticates + authorizes them
Talks to etcd to read/write state
Notifies other components via the Watch API

It’s stateless, and often run behind a load balancer for HA.

⏱ 3. Kubernetes Control Plane Components - kube-scheduler

What it is:
Assigns unscheduled pods to nodes, based on constraints.

Think of it as:
The Tinder for workloads. It matches pods to nodes based on:

CPU/memory availability
Node taints/tolerations
Affinity rules
Pod priority
Custom scoring plugins

The flow:

A pod is created with no nodeName.
API server stores it in etcd.
Scheduler sees it, scores possible nodes.
It picks a winner and updates the pod with nodeName.

🧙 4. Kubernetes Control Plane Components - kube-controller-manager

What it is:
A daemon that runs many controllers in one binary.

Think of it as:
The cluster babysitter—constantly checking for drift and correcting it.

Runs controllers for:

Deployments
Replicasets
Nodes (watching for crashes)
Endpoints
Persistent Volumes
Certificates

How it works:
Each controller watches a specific object type via the API server and ensures the desired state matches reality. If not, it acts.

☁️ 5. Kubernetes Control Plane Components - cloud-controller-manager

What it is:
An optional component for clusters running on public clouds (AWS, GCP, Azure).

Think of it as:
The bridge between Kubernetes and your cloud infrastructure.

Responsibilities:

Creating LoadBalancers
Managing cloud-based node info
Attaching volumes (via CSI)
Updating routes

You won’t see this in bare-metal clusters unless you set up external integrations manually.

🧍 6. Kubernetes Control Plane Components - kubelet

What it is:
A daemon that runs on every worker node.

Think of it as:
The node’s supervisor.

What it does:

Watches for pods assigned to its node
Pulls container images via the CRI
Starts/stops containers using container runtimes (containerd, CRI-O)
Sends status updates back to API server

Important:
kubelet does not manage containers you started outside of Kubernetes.

🌐 7. Kubernetes Control Plane Components - kube-proxy

What it is:
Runs on each node to implement service networking.

Think of it as:
A port-forwarding ninja that handles cluster IP routing rules.

Modes:

iptables (legacy, still common)
ipvs (newer, faster, uses Linux’s IPVS)
eBPF (if you’re fancy and using Cilium)

What it does:

Maintains network rules to forward traffic to the correct pod behind a Service
Enables internal DNS (via CoreDNS)
Helps implement ClusterIP, NodePort, LoadBalancer behavior

🧠 8. Kubernetes Control Plane Components - CoreDNS

What it is:
Default DNS service in Kubernetes.

Think of it as:
Your cluster’s internal phone book.

What it does:

Resolves pod/service names to cluster IPs
Works with kube-dns-compatible tools
Responds to all *.svc.cluster.local domain lookups

Runs as a Deployment + Service, just like any other app—because Kubernetes eats its own dog food.

🛠 Supporting Components

Component	Role
metrics-server	Collects CPU/mem stats for autoscaling (HPA)
CRI runtimes (containerd, CRI-O)	Interface between kubelet and container engines
CSI drivers	Handle volume mounting for storage backends
CNI plugins	Provide networking (Calico, Flannel, Cilium, etc.)
Admission controllers	API gatekeepers that enforce rules (e.g., resource limits, policies)
API Aggregation layer	Supports extending the Kubernetes API (like metrics.k8s.io)

📦 Kubernetes Control Plan Components Diagram

+--------------------------------------------------+
|                  Control Plane                   |
| +-----------------+      +--------------------+  |
| | kube-apiserver  | <--> |  etcd              |  |
| +-----------------+      +--------------------+  |
|        ^                                         |
|        |                                         |
| +----------------------------+                   |
| |  kube-controller-manager   |                   |
| +----------------------------+                   |
| +----------------------------+                   |
| |       kube-scheduler       |                   |
| +----------------------------+                   |
| +----------------------------+                   |
| | cloud-controller-manager   |                   |
| |       (optional)           |                   |
| +----------------------------+                   |
+--------------------------------------------------+

+-------------------------------------------------+
|                Worker Node                      |
| +---------+  +-----------+  +-----------------+ |
| | kubelet |  | kube-proxy|  | containerd/CRI  | |
| +---------+  +-----------+  +-----------------+ |
| +---------------------------------------------+ |
| |                Pods (your app)              | |
| +---------------------------------------------+ |
+-------------------------------------------------+

I really like this simplifed diagram!

🧠 A Quick Reference

Component	Purpose
`etcd`	Key-value store (source of truth)
`kube-apiserver`	Cluster API gateway
`kube-scheduler`	Assigns pods to nodes
`kube-controller-manager`	Maintains cluster state
`cloud-controller-manager`	Connects to cloud provider
`kubelet`	Manages node and runs pods
`kube-proxy`	Handles networking and routing
`CoreDNS`	Resolves internal service names

:::info You're on a roll! Don't stop now—check out the full series and level up your Kubernetes skills. Each post builds on the last, so make sure you haven’t missed anything! 👇

🚀 In Part 1, I laid out the networking plan, my goals for setting up Kubernetes, and how to prepare a base VM image for the cluster.

🚀 In Part 3, I finished the Kubernetes cluster setup with Flannel, got one Kubernetes master and 4 worker nodes that’s ready for real workloads.

🚀 In Part 4, I explored NodePort and ClusterIP,understood the key differences, use cases, and when to choose each for internal and external service access!🔥

🚀 In Part 5, Current one. I dived into ExternalName and LoadBalancer services, uncovering how they handle external access, DNS resolution, and dynamic traffic distribution! :::

Secure by Design Part 1: STRIDE Threat Modeling Explained

GeekCoding101 — Mon, 02 Jun 2025 00:00:00 GMT

Intro: Why Every App Needs Threat Modeling And Why STRIDE

I’ve been meaning to write this post for a long time. Not because STRIDE Threat Modeling are the hottest buzzwords in cybersecurity—they aren’t. And not because threat modeling is some shiny new technique—it’s not. But because if you’re building or defending any system—especially something as deceptively simple as a chat app—threat modeling is non-negotiable.

:::info Check out this cool one page Threat modeling manifesto :::

Whether you're knee-deep in SecOps, defining IAM policies, tuning your SIEM, or crafting detection logic, you’ve got one mission: protect the stuff that matters. That means user data, privacy, service uptime, and reputation and so on. And if we don't design with threats in mind, we're just building breach bait with good intentions.

So why STRIDE?

Because STRIDE gives us a practical lens to view risk before the attacker does. Instead of reacting to CVEs or chasing zero-days, STRIDE helps you think like a malicious actor while you’re still sketching your architecture in a whiteboard session or writing that controller code.

In this post, I am going to use STRIDE threat modeling to walk through a seemingly simple application—a chat app—and uncover the kinds of security holes that quietly turn into breach reports. You’ll see just how quickly things go sideways when we forget to ask, “What could go wrong here?”

But first, let's talk about the app we're modeling.

Our Target: A Chat App

Let’s keep it humble. No machine learning, no blockchain, no AI buzzwords glued onto CRUD. Just a straightforward web-based chat application.

Here’s what it does:

User Registration: Email + password
Login System: Username/password auth, session cookies
User Directory: Displays online users
1:1 Messaging: Users can send and receive messages
Message History: Stored and retrievable
Admin Panel: Hidden route, unknown to regular users

Now, this setup probably feels familiar. It’s the backbone of a thousand hackathons and product MVPs. But here’s the truth: simple apps are hacker candy.

Why? Because developers often make the same assumptions:

“It’s just a prototype.”
“Who would even try to attack this?”
“We’ll add security later.”

Later never comes. And these "low-risk" features? They can become pivot points for privilege escalation, data leaks, or full compromise. One misconfigured route or weak endpoint can become your next Incident Report ticket.

So before we start breaking things (in Part 2), let’s apply STRIDE threat modeling—a time-tested threat modeling framework from Microsoft—to map out what could go wrong across this app’s lifecycle.

Next stop: breaking down each of the six STRIDE categories and how they apply to this seemingly innocent app.

STRIDE: A Bit of History, Tools, and Fun Facts

Before we tear our chat app apart threat by threat, it’s worth pausing to talk about where STRIDE came from—and why it’s still standing strong in today’s security architecture playbook.

Where Did STRIDE Threat Modeling Come From?

STRIDE was developed by Microsoft in the early 2000s, as part of their Trustworthy Computing Initiative—yes, that era when Windows XP was everyone’s favorite backdoor. 😅

The goal? Give developers and architects a lightweight, repeatable way to ask:
“What could go wrong here?”

Rather than just tossing in a firewall and calling it a day, STRIDE forced teams to think in terms of threat categories—not just patches and alerts. It came bundled into the Microsoft SDL (Security Development Lifecycle) and has been a part of secure-by-design processes ever since.

And you know what? It still holds up, especially in a world dominated by microservices, APIs, cloud, and third-party integrations.

Tools That Support STRIDE Threat Modeling

You don’t have to scribble on whiteboards or use napkins (though, respect if you do). Here are a few tools to actually implement STRIDE modeling in your workflows:

Tool	Description	Good For
OWASP Threat Dragon	Open-source threat modeling tool with STRIDE templates	Visual modeling, diagrams
Microsoft Threat Modeling Tool	Free tool from Microsoft for STRIDE-based modeling	Deep STRIDE threat modeling templates, flow diagrams
IriusRisk	Paid tool for automated threat modeling and compliance mapping	Enterprise threat modeling at scale
Draw.io + STRIDE Cards	DIY visual modeling using STRIDE cards	Lightweight teams, whiteboard replacements

Pro Tip: If you already use architecture diagrams in tools like Lucidchart or Miro, just layer STRIDE annotations on top. It’s easier than reinventing the wheel with a new platform.

STRIDE Breakdown – Mapping Threats to Chat App Feature

STRIDE stands for Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege.

It's the OG threat modeling framework for secure-by-design thinking.

For each category, we’ll look at:

What it means
Real-world relevance
How it applies to our chat app
How to detect it
How to prevent or mitigate

Let’s dive in.

S – Spoofing Identity

What It Means

Spoofing is about pretending to be someone you’re not. Usually, that’s about faking identity—think unauthorized login attempts, session impersonation, or token theft.

It doesn’t have to be high-tech. A weak password policy or default admin credentials are all it takes.

Real-World Relevance

Credential stuffing from leaked password dumps
Session hijacking via stolen cookies
Social engineering leading to unauthorized access

Chat App Example

Attacker tries logging in as admin with common passwords like admin123 or password.
Registration page reveals whether a username/email is already taken (“User already exists”) → helps confirm valid accounts.
Session cookie doesn’t use HttpOnly or Secure flags → attacker injects JS and steals session.

How to Detect

Unusual login attempts across many usernames
Brute-force behavior from single IPs
Session reuse across different IPs/devices

How to Prevent

Enforce strong passwords and rate-limiting
Use MFA (seriously, just do it)
Harden sessions: HttpOnly, Secure, SameSite=Strict
Generic error messages (“Login failed”) to prevent enumeration
Alert on login anomalies (e.g., geolocation or timing mismatches)

T – Tampering with Data

What It Means

Tampering is about unauthorized modification of data—altering messages, modifying user roles, or injecting parameters to mess with system behavior.

This could be at-rest (modifying DB records), in-transit (MITM), or even through insecure APIs.

Real-World Relevance

Changing prices on e-commerce sites
Modifying permissions via API injection
Overwriting user data via insecure endpoints

Chat App Example

User crafts a PUT /api/messages/1234 call to edit someone else’s message
Sends chat message with embedded <script> tag to execute JS on recipient’s browser
Manually edits session data in localStorage to escalate role from user to admin

How to Detect

Unexpected mutations in data logs
Sudden role changes or message edits by unauthorized users
Parameter tampering attempts in logs (via WAF or API gateway)

How to Prevent

Use digital signatures or hash checks for message integrity (e.g., HMAC)
Implement strict authorization checks at the server, not just UI
Sanitize inputs (yes, again—this never gets old)
Disable client-side trust for roles or permissions

R – Repudiation

What It Means

Repudiation is when an attacker performs actions without accountability—then denies them. If the system doesn’t log properly, they get away clean.

It’s like someone deleting all your Slack messages and saying, “Wasn’t me.”

Real-World Relevance

Lack of logs in cloud misconfigurations
Insider threats covering their tracks
Attackers disabling or deleting logs post-compromise

Chat App Example

User deletes messages with no audit trail—no record of who said what
Admin bans a user but there’s no timestamp or log of that action
A rogue employee reads DMs and no one knows because access wasn't logged

How to Detect

You can’t… unless you already had good logging in place
Look for missing data in activity logs
Use external systems (like SIEM) to detect deletions or gaps

How to Prevent

Immutable logging (e.g., append-only logs with checksum verification)
Log all sensitive actions: logins, deletions, permission changes
Store logs off-host (e.g., centralized logging with ELK or Loki)
Use timestamping + user context in every log event

I – Information Disclosure

What It Means

This is about leaking data to unauthorized users. Could be PII, secrets, internal APIs, or even error messages that give away the goods.

It doesn’t need to be a SQL injection. Sometimes it’s just poorly scoped permissions.

Real-World Relevance

Exposed S3 buckets
Leaky APIs showing internal user info
Stack traces returned in production

Chat App Example

User accesses /api/messages?id=4001 and gets another user’s message because there's no ownership check
API returns full user records including email and IPs
Error page reveals server paths or tech stack via verbose stack trace

How to Detect

Data leak detection in outbound logs
DLP (Data Loss Prevention) tools for sensitive data patterns
Review of access control on all endpoints and APIs

How to Prevent

Apply object-level access controls (don’t trust “just the route”)
Strip metadata from responses
Mask sensitive data (e.g., show part of an email, not all)
Disable detailed errors in production

D – Denial of Service (DoS)

What It Means

DoS means making a system unavailable or unusable—intentionally or accidentally—usually by overwhelming it.

This isn’t just about traffic floods. It includes logic bombs, resource exhaustion, and malformed input that crashes the app.

Real-World Relevance

Spamming forms or chat endpoints
Flooding chat with emojis or large payloads
Abuse of nested JSON to crash parsers

Chat App Example

Bot sends thousands of messages per second → server CPU maxes out
Large message payloads (10MB+ text blobs) crash DB or front-end
Abuse of emoji reactions to spam notifications

How to Detect

Rate spikes on endpoints (monitor RPS/latency)
Alerts for memory, CPU, or queue overflows
App crashes tied to malformed input

How to Prevent

Rate limiting per IP/token/user (e.g., using Redis buckets)
Set max body size on requests
Queue-based processing (isolate spikes from core logic)
CAPTCHAs on forms and registration

E – Elevation of Privilege

What It Means

This one’s the crown jewel of attacks. EoP is when a normal user gains higher privileges—like becoming an admin, impersonating other users, or accessing restricted areas.

This often comes from missing authorization checks or client-side trust.

Real-World Relevance

IDOR (Insecure Direct Object Reference)
Hidden admin features discovered by poking URLs
JWT manipulation (changing role: user → role: admin)

Chat App Example

Regular user discovers /admin/users route and sees admin dashboard
API lets any authenticated user call DELETE /users/{id} without role check
JWT token is unsigned or uses symmetric secret → attacker creates valid “admin” token

How to Detect

Auth bypass attempts in logs
Use of admin-only routes by regular users
Role mismatches between session vs behavior

How to Prevent

Enforce role-based access on the backend (never rely on frontend auth)
Use signed JWTs with asymmetric encryption (RS256 > HS256)
Scope tokens tightly (expiration, audience, permissions)
Always fail securely—deny access by default

Interesting STRIDE Facts (Because Nerding Out Is Fun)

Mnemonic Origins: STRIDE threat modeling is a backronym—it was created to map common threat types to the core properties of secure systems (authentication, integrity, non-repudiation, confidentiality, availability, and authorization).
"D" Is Sneaky: Denial of Service in STRIDE isn’t always massive traffic floods. It includes logic bombs and resource starvation too. Your app doesn’t have to go down in flames to be considered under DoS threat.
STRIDE Is Not a Checklist: It’s a thinking framework, not a compliance sheet. The real power is in using it to uncover flaws in the design before they hit production.
STRIDE + DFD = 🔥: It’s most effective when paired with data flow diagrams (DFDs). You model how data flows through your app, then apply STRIDE to each element (data store, process, external entity, etc.).

STRIDE’s Secret Superpower

Most threat modeling frameworks require heavy lifting or lots of training. STRIDE hits that sweet spot: easy enough for a dev to use, powerful enough for a security architect to trust.

The beauty? You can use it on anything—from a serverless app to a Kubernetes cluster to, yep, our friendly chat app.

If you’re building features faster than you’re threat modeling, you’re building features that might become attack surfaces. STRIDE slows you down just enough to build wisely.

Final Thoughts on STRIDE Threat Modeling

STRIDE threat modeling isn’t just academic. It’s a conversation starter. A design reviewer. A build-time bodyguard.

Every time you launch a new feature or review a pull request, ask:

S – Could someone fake an identity here?
T – Can they change something they shouldn’t?
R – Will we know who did what?
I – Are we leaking anything useful?
D – Can someone take this down with a hammer?
E – What happens if a normal user pushes the limits?

Apply this mindset to each part of your system—auth, storage, API, UI, admin tools, and even logs.

Your future self (and your customers) will thank you.

More references:

OWASP Threat Modeling Process

:::info Like the post? You're welcome to check out my other posts:

:::

Agentic Frameworks: A Quick Guide to the 2025 Agent War

GeekCoding101 — Fri, 14 Nov 2025 00:00:00 GMT

Introduction to the Agentic Framework Series

Lately it feels like the world of AI moves so fast that if you blink, you miss an entire generation of breakthroughs. As someone who loves digging into emerging technologies, I finally gathered the courage to steal a little time from a busy schedule and kick off a series I’ve been wanting to write for ages. And what better place to start than the booming world of agentic frameworks? With LangGraph, LlamaIndex Agents, OpenAI’s Agents SDK, Google’s ADK, and Microsoft’s Agent Framework all evolving at lightning speed, the agentic AI ecosystem is turning into a full-on 2025 “Agent War.” Since I’m constantly tracking these updates anyway, I figured—why not share the journey and explore this rapidly shifting landscape together?

What Is an Agentic Framework?

An agentic framework is a software toolkit designed to help developers build AI agents—systems that can reason, take actions, use tools, and complete multi-step tasks with a level of autonomy. Instead of treating an LLM as a “single prompt in, single answer out” model, an agentic framework gives it structure: memory, tools, workflows, decision loops, and the ability to interact with data or external systems. Frameworks like LangGraph, LlamaIndex Agents, OpenAI’s Agent SDK, and Google’s ADK make it possible to create agents that can research, retrieve information, write code, operate APIs, and even coordinate with other agents. In short, an agentic framework transforms a passive model into an active problem-solver.

BTW, agentic frameworks — often called AI agent frameworks as well.

Enough talk — let’s get to it!

Let's Summon Popular Agentic Frameworks

LangChain + LangGraph

Positioning:
Probably still the most widely recognized OSS agent stack; LangChain for the agent loop and tools, LangGraph for durable, stateful workflows. Link: LangChain Blog

Latest status (late 2025):

LangChain and LangGraph both hit v1.0 with a tightened “core agent loop” and a middleware system for flexible control.
LangGraph adds graph-structured workflows, persistence, debugging, and visual tools for complex agent interactions and long-running processes.

Best for:
General-purpose agent apps: RAG copilots, workflow agents, multi-step tool use, where you want a large ecosystem and lots of examples.

LlamaIndex (LlamaAgents & Workflows)

Positioning:
Started as a data/RAG library, now a full “data-centric agent framework” with strong connectors, parsing (LlamaParse), and high-quality RAG pipelines.Links: LlamaIndex+2

Latest status:

LlamaAgents early-access: full-stack templates for building agents, including TypeScript workflows and CopilotKit integrations.
2025 comparisons show improved retrieval quality and strong performance for document-heavy workloads.

Best for:
Knowledge-intensive agents: enterprise search copilots, contract analysis, tech docs assistants, any system where the “data plane” is the hard part.

OpenAI Agents SDK (successor to Swarm)

Positioning:
OpenAI’s first-party agentic framework for building agents over their Responses API: a minimal set of primitives (agents, handoffs, guardrails, sessions) with tight GPT integration. Link: OpenAI Agents SDK

Latest status:

Designed as a production-ready agentic framework of the earlier “Swarm” multi-agent experiment.
Integrated with OpenAI’s new Responses API (web search, computer use, document search), replacing the older Assistants API over 2025-26.

Best for:
Teams already standardized on OpenAI: quick path to agents with web search, tools, and multi-agent delegation without heavy orchestration code.

Microsoft Agent Framework (AutoGen + Semantic Kernel)

Positioning:
Microsoft is merging AutoGen’s multi-agent orchestration with Semantic Kernel’s enterprise integration into a single “Microsoft Agent Framework” for Python and .NET. Link: Microsoft Agent Framework

Latest status:

AutoGen is now in maintenance mode; new feature development is happening in Microsoft Agent Framework.
Provides agents, planners, and orchestration with hooks into Azure/OpenAI, Office 365, and other Microsoft services.

Best for:
Enterprise .NET/Python shops on Azure that want multi-agent workflows tied into existing Microsoft infra, identity, and DevOps.

Google Agent Development Kit (ADK) / Vertex AI Agent Builder

Positioning:
Google’s open-source Agent Development Kit plus Vertex AI Agent Builder: ADK for local/dev usage, Agent Builder for managed, scalable deployment. Link: Google Agent Build

Latest status:

ADK announced in 2025 as a modular, model-agnostic agentic framework (though optimized for Gemini and Google Cloud).
Recent updates add prebuilt plugins (including “self-heal”), more language support (Go, Python, Java), and richer observability & security for production agents.

Best for:
GCP-centric teams: data & MLOps agents (BigQuery, Dataflow, etc.), multi-system enterprise agents, and workloads that need Vertex AI governance features.

CrewAI

Positioning:
A popular OSS “multi-agent crew” agentic framework: define specialized agents (roles), share context, and let them collaborate on tasks. Link: CrewAI

:::info Sneak peek — the next episode will likely be a hands-on dive into CrewAI! :::

Latest status:

Active development and strong marketing as a “multi-agent platform” with business-oriented workflows.
Frequently cited (alongside LangChain & AutoGen) as a top choice in 2025 industry roundups because of its clean Python API and real-world focus.

Best for:
Multi-agent experiments and startup-style stacks: e.g., “researcher + planner + executor” teams for content, lead-gen, or coding tasks.

Haystack

Positioning:
RAG + agentic framework aimed squarely at production: modular pipelines and “agents” that can call tools, retrieve data, and generate answers.Link: Heystack

Latest status:

Marketed specifically for “agentic, compound AI systems” with end-to-end observability and debugging.
Supports agents that choose between web search, vector stores, and other tools to resolve complex queries.

Best for:
Teams that want a transparent, production-grade RAG/agent stack with strong search roots (Elastic/OpenSearch, vector DBs) and clear pipelines.

OpenHands Software Agent SDK (software-dev agents)

Positioning:
A toolkit spun out of the popular OpenHands code-assistant framework, specifically for reliable software development agents (coding, debugging, PRs). Link: OpenHands Software Agent SDK

Latest status:

2025 paper describes a redesigned SDK for flexible, secure software agents: sandboxed execution, multi-LLM routing, and integration with editors (VS Code, browser, CLI).

Best for:
Engineering teams wanting production code agents (e.g., SWE-Bench style tasks) with strong execution sandboxes and lifecycle controls.

Research / Training-oriented frameworks (Agent Lightning, etc.)

Positioning:
These are less “app frameworks” and more training stacks, but relevant for anyone planning RL-fine-tuned agents. The paper link is here.

Agent Lightning: RL training framework that decouples training from agent execution, plugging into existing agent stacks like LangChain, AutoGen, OpenAI Agents SDK with minimal changes.

Best for:
R&D teams working on agent RL, evaluation, and fine-tuning rather than pure orchestration.

Dominant Use Cases

Across vendors and OSS, you see a few consistent patterns:

RAG copilots / knowledge assistants
- Internal “Chat with docs” for policies, support docs, codebases (LangChain, LlamaIndex, Haystack).
- Industry: customer support, legal analysis, technical documentation.
Agentic Process Automation (APA) / workflow agents
- Multi-step agents that call APIs, write files, trigger workflows, etc. (LangGraph, CrewAI, OpenAI Agents SDK, MS Agent Framework, Google ADK). Link: Top 7 AI Agentic Frameworks in 2025: The Ultimate Guide.
- Example: end-to-end lead processing, back-office ops, report generation.
Software engineering agents
- Coding, refactoring, test-generation, PR review (OpenHands SDK; OpenAI computer-use agents; some AutoGen/CrewAI patterns).
Data & analytics agents
- Data engineering/data science agents in the cloud (Google’s Data Engineering Agent, Data Science Agent; ADK/Vertex AI).
- SQL-query agents, interactive dashboards, ETL automation.
Ops / infra & enterprise workflows
- Cloud management, monitoring, and remediation (GCP’s new ops agents, Azure/AKS workflows, security & governance hooks).

What is the best AI agent framework ?!

Haha! I know you will ask this question!

Choosing the best AI agent framework isn’t as simple as crowning a single winner—because the “best” depends entirely on what you’re building. LangGraph excels at complex, stateful workflows with fine-grained control. LlamaIndex dominates when your agent needs powerful data retrieval and document intelligence. OpenAI’s Agents SDK is unbeatable for rapid development with built-in web search, tool use, and multi-agent orchestration. Google’s ADK shines in enterprise environments that lean heavily on GCP data pipelines. And Microsoft’s Agent Framework integrates seamlessly with Azure and the broader Microsoft ecosystem. Instead of looking for a universal champion, the smarter question is: Which agentic framework aligns with your stack, your data, and your use case? Because in 2025’s Agent War, "context"—not hype—decides the winner.

Where we can go deeper next ?

We’ve only scratched the surface of what the agentic ecosystem is becoming. From production-ready orchestration with LangGraph, to data-centric workflows in LlamaIndex, to OpenAI’s emerging Agent SDK and Google’s ADK reshaping enterprise automation—the real excitement starts when we dig into how these agentic frameworks actually behave in the wild.

In the next episodes, I'd like to break down architectures, explore real-world use cases, and walk through hands-on builds you can follow step by step. If the 2025 “Agent War” or "Agentic Framework War" or whatever has your curiosity sparked, stick around—this is just the beginning, and the most interesting battles are still ahead.

:::info You're on a roll! Don't stop now—check out the other series and level up your AI skills. Make sure you haven't missed anything! 👇

🚀 Daily AI Insights

🚀 Machine Learning :::

Understanding OpenDAL Storage in Dify: A New Year's Journey

GeekCoding101 — Thu, 01 Jan 2026 00:00:00 GMT

Understanding OpenDAL Storage in Dify: A New Year's Journey

My Story

Happy New Year! 🎉

As 2026 kicks off, I countinue to dive deeper into the AI ecosystem and experimenting with various tools. My focus? Building practical AI workflows with platforms like Dify, n8n, OpenRouter and so on. This post documents one of my first with Dify - figuring out how its file storage actually works.

If you're like me and got confused about where Dify stores files, why containers keep restarting with cryptic permission errors, or what the heck OPENDAL_FS_ROOT actually does, this guide is for you.

What is This About?

I use self-hosted Dify and run it with docker. I found that the file storage configuration is not well documented, so I decided to write this post to help others understand how it works.

When you're running Dify with Docker, understanding how file storage works is crucial. Files uploaded to Dify (documents, images, etc.) need to be stored somewhere, and that "somewhere" involves a dance between environment variables, Docker volume mounts, and a library called OpenDAL.

Let me break down what I learned the hard way.

The Key Players

1. OpenDAL - The Unsung Hero

OpenDAL (Apache Open Data Access Layer) is basically a Swiss Army knife for storage. It gives you one consistent API to talk to different storage backends - local filesystem, AWS S3, Azure Blob, you name it. Think of it as a translator that speaks "storage" in many dialects.

2. Environment Variables

Your .env file is where the magic configuration happens. This is where you tell Dify how and where to store files.

3. Docker Volume Mounts

This is the bridge between your Mac (or whatever host you're on) and the Docker container's internal filesystem. Get this wrong, and you'll be scratching your head for hours. Trust me, I know.

How The Pieces Fit Together

Step 1: The Environment Variable Magic

When you set OPENDAL_SCHEME=fs, the system starts looking for variables that match this pattern:

OPENDAL_<SCHEME_NAME>_<CONFIG_NAME>

So for filesystem storage:

OPENDAL_SCHEME=fs → tells it to use local filesystem
OPENDAL_FS_ROOT=<path> → tells it where to put files

Simple enough, right? Well, here's where it gets interesting...

Step 2: What Happens Inside the Container

I dove into the source code (api/extensions/storage/opendal_storage.py) and found this gem:

def _get_opendal_kwargs(*, scheme: str, env_file_path: str = ".env", prefix: str = "OPENDAL_"):
    kwargs = {}
    config_prefix = prefix + scheme.upper() + "_"  # Creates "OPENDAL_FS_"
    
    # Scans environment variables
    for key, value in os.environ.items():
        if key.startswith(config_prefix):
            kwargs[key[len(config_prefix):].lower()] = value
    # OPENDAL_FS_ROOT becomes kwargs['root']

Then in the OpenDALStorage constructor:

def __init__(self, scheme: str, **kwargs):
    kwargs = kwargs or _get_opendal_kwargs(scheme=scheme)
    
    if scheme == "fs":
        root = kwargs.get("root", "storage")  # Gets OPENDAL_FS_ROOT value
        Path(root).mkdir(parents=True, exist_ok=True)  # Creates directory inside container

Step 3: The Critical Connection - Volume Mounts

Here's what tripped me up initially. In docker-compose.yaml, you'll see:

volumes:
  - ./volumes/app/storage:/app/api/storage
    # ^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^
    # Your Mac              Inside container

The "Aha!" Moment:

Left side (./volumes/app/storage): This is a folder on your actual Mac (relative to where docker-compose.yaml lives)
Right side (/app/api/storage): This is a folder inside the Docker container (a completely separate filesystem)
Docker magically keeps these two folders in sync

So when Dify writes a file to /app/api/storage inside the container, it appears in ./volumes/app/storage on your Mac. Mind = blown. 🤯

Real-World Examples

The Default Setup (What Works Out of the Box)

This is what Dify gives you by default, and honestly, it works great:

# .env
STORAGE_TYPE=opendal
OPENDAL_SCHEME=fs
OPENDAL_FS_ROOT=/app/api/storage  # Container path - must match volume mount

⚠️ Deprecated Configuration (Do Not Use)

The following configuration is deprecated and should be migrated to OpenDAL:

# DEPRECATED - Old approach
STORAGE_TYPE=local
STORAGE_LOCAL_PATH=storage

Why it's deprecated:

STORAGE_TYPE=local is marked as deprecated in the codebase
STORAGE_LOCAL_PATH is deprecated in favor of OpenDAL's configuration
OpenDAL provides a unified interface that supports multiple storage backends

Migration path:

# Before (Deprecated)
STORAGE_TYPE=local
STORAGE_LOCAL_PATH=storage

# After (Current)
STORAGE_TYPE=opendal
OPENDAL_SCHEME=fs
OPENDAL_FS_ROOT=/app/api/storage

The OpenDAL approach offers:

Unified configuration pattern across all storage types
Better extensibility (easy to switch to S3, Azure Blob, etc.)
Improved error handling and retry mechanisms
Active maintenance and support

# docker-compose.yaml
volumes:
  - ./volumes/app/storage:/app/api/storage

Result: Files stored in ./volumes/app/storage/ on your Mac

What If I Want Files Somewhere Else?

Maybe you're like me and want all your AI project files in a specific folder. Here's how:

# .env (NO CHANGE NEEDED)
STORAGE_TYPE=opendal
OPENDAL_SCHEME=fs
OPENDAL_FS_ROOT=/app/api/storage  # Keep this as container path

# docker-compose.yaml (ONLY change the left side)
volumes:
  - ~/Documents/models/dify_data/files:/app/api/storage
    # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^
    # Custom host path                    Same container path

Result: All your Dify files now live in ~/Documents/models/dify_data/files/ on your Mac. Perfect for keeping your AI experiments organized!

Mistakes I Made (So You Don't Have To)

🤦 Mistake #1: The $HOME Trap

I thought "Hey, I'll just use $HOME to point to my Documents folder!"

# DON'T DO THIS - I learned the hard way
OPENDAL_FS_ROOT=$HOME/Documents/models/dify_data/files

What happened:

My containers started crash-looping
Error logs screamed: PermissionError: [Errno 13] Permission denied: '/Users'
Spent 2 hours debugging 😅

Why it failed: Inside the Docker container, $HOME is /root, not /Users/yourusername. The container tried to create /Users/yourusername/Documents/... and failed spectacularly.

The fix:

# Always use the container path
OPENDAL_FS_ROOT=/app/api/storage

🤦 Mistake #2: Trying to Be Too Clever

When I wanted a custom storage location, my first instinct was to change OPENDAL_FS_ROOT:

# NOPE - This breaks everything
OPENDAL_FS_ROOT=/my/custom/path

The problem: This path must match the right side of your volume mount. If they don't match, chaos ensues.

The right way:

Keep OPENDAL_FS_ROOT=/app/api/storage (container path)
Only change the left side of the volume mount (your Mac path)

🤦 Mistake #3: Relative Path Confusion

Using relative paths seemed harmless:

# Avoid this
OPENDAL_FS_ROOT=storage  # Where even is this?

The issue: Inside a container, "relative to what?" becomes a real question. Is it relative to /app? /app/api? Who knows!

Better approach:

# Crystal clear - no ambiguity
OPENDAL_FS_ROOT=/app/api/storage

When Things Go Wrong (Debugging Tips)

Symptom: Containers Keep Restarting

This was my first encounter with Dify. Everything would start, then crash, start again, crash again. Fun times.

First, check the logs:

docker-compose logs --tail=50 api

If you see this:

PermissionError: [Errno 13] Permission denied: '/Users'

You might have got the $HOME trap or a path mismatch. Fix OPENDAL_FS_ROOT to use the container path.

My Debugging Checklist

When something's off, I run through these commands:

Check environment variable inside container:

docker exec docker-api-1 env | grep OPENDAL

Check mounted directory:

docker exec docker-api-1 ls -la /app/api/storage

Verify sync with host:
```
ls -la ./volumes/app/storage/
```

Quick Reference (The TL;DR)

Here's everything in one place:

What	Why It Matters	Example
`STORAGE_TYPE`	Which storage system to use	`opendal` ✅ (~~`local` is old news~~)
`OPENDAL_SCHEME`	What kind of storage	`fs` for local files
`OPENDAL_FS_ROOT`	Where files go in the container	`/app/api/storage`
~~`STORAGE_LOCAL_PATH`~~	~~Old way of doing things~~	~~Use OpenDAL instead~~
Volume mount (left)	Where files appear on your Mac	`./volumes/app/storage` or custom path
Volume mount (right)	Must match `OPENDAL_FS_ROOT`	`/app/api/storage`

The Golden Rules:

OPENDAL_FS_ROOT always points to a container path (right side of volume mount)
Want files elsewhere on your Mac? Change only the left side of the volume mount
Never use $HOME or Mac paths in OPENDAL_FS_ROOT
When in doubt, use absolute paths

Wrapping Up

This was just one piece of my AI experimentation journey. As I continue exploring Dify, n8n, and the broader AI ecosystem in 2026, I'm sure I'll encounter more quirks and learning moments. That's the fun part, right?

If you found this helpful or have your own Dify war stories, I'd love to hear them! This AI revolution is moving fast, and we're all learning together.

Happy building! 🚀

Useful Links

Apache OpenDAL Documentation
OpenDAL Service Configurations
Dify code: api/extensions/storage/opendal_storage.py
Agentic Frameworks: A Quick Guide to the 2025 Agent War

GeekCoding101 - Make your way to geek

Git Notes

General Settings

Branches

Fast-forward merge

Git Clone

Clone into current directory

Git Checkout

Check out files deleted locally

Clone a subdirectory only of a Git repository

Clone specific branch of a Git repository

Git Remote

Git Log

Git Diff

Diff that changed between two commits

Git Stash

Git Squash

Why do we need git squash and how it helps?

Revert-Reset

Undo a commit and Redo

Revert a commit already pushed to a remote repository

Revert with log history for tracing rollback opration

Revert even without any log trace for rollback operation

Revise commit log

Ignore local files instead of updating .gitignore

Pull latest changes from another branch to current branch

My Git Commands Alias

Q/A

Why git keep asking user credentials

Seeing HEAD detached at xxxxxx

Docker Notes

Build Docker Image

Method 1: Docker build

Method 2: Docker commit

Export/Import

Docker Registry

Setup Docker repository

Install and enable docker-registry

Verify docker-registry service

Configure storage_path

Setup client to use the registry

Push to the registry

Docker Storage

Where does docker store images?

Cheat Sheet

Tmux Notes

Introduction

Pane

Windows

Sessions

Configuration

Session Handling

Search

Pluggins

Use conf from Github

Integrate with iTerm2

My customization

Misc

Resolution problem in multiple monitors

Build and Sign RPM package and repo

Create unsigned rpm

Create Folder Structure

Create SPEC file for unsigned rpm

Create a dummy source file for unsigned rpm

Build rpm-no-sig.rpm

Create signed rpm

Create SPEC file for signed rpm

Create a dummy source file for signed rpm

Generate GPG key and Build rpm-with-sig.rpm

Creating repo database/conf for unsigned rpm

Create repo database for the rpm-no-sig rpm

Create repo conf file for unsigned repo

Creating signed repo database/conf/gpg for signed rpm

Generate GPG key and Create repo for the rpm-with-sig rpm

Create repo conf file for signed repo

RPM/YUM relevant GPG knowlege

YUM clean up

Yum commands references

Q/A

Can't remove keys from RPM due to duplicate entries

Seeing `HEAD detached at xxxxxx`

Update `app.module.ts`

`basicAuthMiddleware.ts`

`app.ts`