Tag: self-hosted

  • Forget ChatGPT, I Built My Own Local AI with AMD RX 7900 XTX, Ollama & DeepSeek-R1

    Forget ChatGPT, I Built My Own Local AI with AMD RX 7900 XTX, Ollama & DeepSeek-R1

    The journey to bringing you this guide was paved with rage and hardship. Before we go any further, let me be clear: local AI is nowhere near as good as ChatGPT or similar online tools. Without solid prompt engineering, you’ll mostly get weird, useless responses.

    That said, DeepSeek-R1 (32B) is hands down the best local model I’ve ever used—but even then, it’s nowhere near the level of ChatGPT-4o in the cloud. To match that, you’d need the DeepSeek-R1 671B model, which is a mind-blowing 404GB. Running that locally? Yeah, that would be absolute madness.

    Disclaimer: This post has some strong opinions about Linux distributions and hardware that some people may find disturbing or hurtful. Please don’t take it too serious.

    Rant about AMD

    Skip it, or read my raw unfiltered anger.

    The image of this post perfectly reflects my mood.

    A while ago, I decided to build an AI server at home to run models locally. My plan was to get an NVIDIA 4090, which at the time cost around 2000€. But then, my friend—who runs Arch as his daily driver (I should’ve seen the red flag)—was using an AMD RX 7900 XTX, which was only 900€ at the time. He hyped it up, saying, “Oh yeah, get this one! Same VRAM, super easy to set up, everything works flawlessly!”

    I was intrigued.

    As fate would have it, another friend echoed the same thing, insisting that for 24GB of VRAM, I wouldn’t find anything cheaper. And, well, that was actually true.

    However, everything I read online told me that AMD GPUs lag far behind NVIDIA in every way, and worst of all, you’d always have to hack things together just to make them work. Still, on Black Friday, I caved and bought the AMD GPU.

    I regret it every single day since putting it in. I hate it. It absolutely sucks.

    So far, it has worked on Windows 11—but even there, it was a pain. And seriously, how do you even mess up Windows 11 support??

    Then I switched to Ubuntu as my main OS (☹️). After two days of struggle (and reinstalling the entire OS three times), I somehow got it to work. I still don’t know what I did. Every guide on the internet gives different commands, different settings, and different advice. Most are for older AMD GPUs, almost none work for the newer models, and—just for fun—most of the essential tools don’t support the “new” AMD cards either.

    I hate it. I hate it so much.

    My mood

    I will never buy an AMD GPU ever again. Even if they came with 100GB of VRAM and cost just 5€, I do not care.

    Looking back, I would rather pay 2000€ for a GPU that just works than spend endless hours hacking together the most basic functionality. The sheer frustration of dealing with this mess infuriates me beyond words.

    This post serves as both a rant and a personal reminder: Never. Ever. Ever. Buy. AMD. Hardware. Again.

    To be honest, I’m just as disappointed in AMD CPUs. Their hardware transcoding is absolute trash.

    From now on, it’s Intel and NVIDIA, forever and always.

    Prerequisite

    • 32GB RAM (with ComfyUI, bump that up to 40GB)
    • 250GB SSD Storage
    • Debian 12 LXC

    If you are more curious about my exact setup you’ll find a detailed list where you can check if yours is similar here: My Home Server: “PrettyLittleKitten” – A Personal Tech Haven. At the very least, your GPU should match (AMD RX 7900 XTX) to follow the tutorial step by step. If it doesn’t, chances are it’ll fail.

    You need to install the kernel drivers on the host for passthrough to an LXC:

    apt update
    apt install pve-headers dkms
    apt install amdgpu-dkms

    You need to reboot Proxmox after that.

    reboot

    Setting up LXC

    It’s important to note that the LXC must be privileged. I know there are guides for setting up an unprivileged one, but literally none of them worked—which only fueled my rage to unbearable levels.

    So yeah… I just went with privileged.

    Easy mode Proxmox VE Helper-Scripts:

    bash -c "$(wget -qLO - https://github.com/community-scripts/ProxmoxVE/raw/main/ct/debian.sh)"

    Or, just manually add a Debian 12 LXC in the Proxmox GUI. Make sure to enable nesting=1, as this is required for Docker to work later.

    I did try to get this working on Ubuntu 24 for hours… and failed miserably. Don’t ask me why—I have no idea.

    Now, let’s install some dependencies that we’ll need later—or at the very least, ones that will make our lives easier:

    apt update && apt upgrade -y
    apt install sudo curl jq -y
    cd /tmp
    curl -fsSL https://get.docker.com -o get-docker.sh
    sudo sh ./get-docker.sh 

    You are free to install docker without using the convenience script.

    GPU Passthrough

    This heavily depends on your hardware and software.

    If, like me, you have an AMD RX 7900 XTX and Proxmox 8.3.3, then you can just follow along. Otherwise—based on my own painful experience—you’ll likely need to find another guide.

    Inside the LXC, run:

    cat /etc/group | grep -w 'render\|\video'

    This will display the GIDs you need for passthrough in a second—so make sure to note them down:

    video:x:44:root # <- /dev/kfd
    render:x:993:root # <- /dev/dri/render*

    The 44 will be used for “video” and the 993 for “render”. Yours will, of course, be different.

    On the Proxmox host run the following command:

    lspci | grep VGA
    ls -l /sys/class/drm/renderD*/device

    The output should be something like this:

    lspci | grep VGA
    03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX] (rev c8)
    11:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raphael (rev cb)
    
    ls -l /sys/class/drm/renderD*/device
    lrwxrwxrwx 1 root root 0 Feb  2 13:33 /sys/class/drm/renderD128/device -> ../../../0000:03:00.0
    lrwxrwxrwx 1 root root 0 Feb  2 13:33 /sys/class/drm/renderD129/device -> ../../../0000:11:00.0

    In my case, “renderD128” is the part I need. To find yours, match the ID from the first command (e.g., 03:00.0) with the ID from the second command (e.g., 0000:03:00.0). Once they match, you’ll know which renderD* device corresponds to your GPU (the other one is the iGPU of the CPU, don’t use that).

     In the Proxmox GUI, go to your LXC container’s “Resources” tab and click “Add” → “Device Passthrough“. Now, add the “video” and “render” devices using the GIDs you noted earlier:

    • Render device: Use the path for your graphics card and the GID from the LXC output.
    • Video device: Use /dev/kfd and the GID for “video” from the LXC output.

    This is what your settings should look like (you may need to restart the LXC first).

    After a restart of the container check to see if permission are correct:

    ls -lah /dev/kfd /dev/dri/renderD128 # <- edit the ID

    The output should look like this:

    crw-rw---- 1 root render 226, 128 Feb  2 18:01 /dev/dri/renderD128
    crw-rw---- 1 root video  236,   0 Feb  2 18:01 /dev/kfd

    Make sure that “root render” is the GPU and “root video” the Kernel Fusion Driver (kfd).

    Kernel Fusion Driver

    If you want to run ROCm-based GPU compute workloads, such as machine learning, OpenCL, or scientific computing, on your AMD GPU within Proxmox. It acts as the interface between the AMD GPU driver and user-space applications, enabling GPU acceleration for parallel computing tasks.

    – ChatGPT-4o

    Install AMD Software

    We need to install some tools inside of our Debian LXC:

    apt update && apt upgrade -y
    cd /tmp
    wget https://repo.radeon.com/amdgpu-install/6.2.4/ubuntu/noble/amdgpu-install_6.2.60204-1_all.deb
    sudo apt install ./amdgpu-install_6.2.60204-1_all.deb
    amdgpu-install --usecase=rocm --no-dkms

    You can also refer to the official guide: Quick Start Installation Guide – ROCm (at the very least, check if the links are still valid by the time you’re reading this).

    The download might take a while. Since I have a new AMD RX 7900 XTX, I need to use:

    export HSA_OVERRIDE_GFX_VERSION=11.0.0
    echo 'export HSA_OVERRIDE_GFX_VERSION=11.0.0' >> ~/.bashrc
    source ~/.bashrc

    The next step is to test if everything is working. We’ll use amdgpu_top for this (you can skip this, but I wouldn’t):

    sudo apt install libdrm-dev
    cd /tmp
    wget https://github.com/Umio-Yasuno/amdgpu_top/releases/download/v0.10.1/amdgpu-top_without_gui_0.10.1-1_amd64.deb
    dpkg -i amdgpu-top_without_gui_0.10.1-1_amd64.deb 
    amdgpu_top

    You should see the name of your GPU and see some values below.

    Run this command inside of the LXC to see if everything worked correctly:

    /opt/rocm/bin/rocminfo

    Somewhere in that “rocminfo” output you should see your GPU (and a bunch of other nerd stuff):

      Marketing Name:          Radeon RX 7900 XTX
      Vendor Name:             AMD

    🤩 Sweet!

    Installing Ollama

    This will also take a while. Just follow the guide on the Ollama website —here’s all you need to do:

    apt install curl 
    cd /tmp
    curl -fsSL https://ollama.com/install.sh | sh

    Since curl is not installed by default in the Debian LXC we’re using, we’ll need to install it first (if you filled this guide, you have it already). Then, we’ll run the install script from the Ollama website. Be patient—the download takes a while since it pulls about 30GB of data.

    By the way, I love the Ollama website. The simple black-and-white design with rounded borders? 🤌 I’m a sucker for minimalistic aesthetics. (I hope you like my blog’s design too! 🕺)

    Next Step: Testing with a Smaller Model

    Before downloading the DeepSeek 32B model, we’ll first test with a smaller DeepSeek version. If your GPU matches mine, the larger model should work fine.

    You can check the available models on the Ollama Models website.

    Now, run this command—it’s going to download a 1.5GB file, so the wait time depends on your internet speed:

    ollama pull deepseek-r1:1.5b

    You can then test:

    curl -X POST http://localhost:11434/api/generate -d '{
      "model": "deepseek-r1:1.5b",
      "prompt": "Tell me a funny story about my best friend Karl. 300 characters maximum.",
      "stream": false
    }' | jq .

    Once upon a time, in the quiet town of Pawsley, there was Karl, a beloved kind cat who adored his three feline friends: Sam, Max, and Lily. Karl always had a warm smile and a habit of aiding others, often finding humor in unexpected places.

    One crisp autumn afternoon, Karl spotted a bag of marbles on the park’s bench. Curious, he decided to play with them. To everyone’s surprise, a man walked by, holding a jar full of coins instead of marbles. “Hey, it’s you!” Karl exclaimed. The man, initially unimpressed, asked for his change. Karl suggested taking one marble in exchange and gave him the coins.

    “Thank you,” the man thanked. Karl offered him a glass of water, knowing the jar was empty. “That’ll keep me happy,” he said. But as the water filled the glass, a strange ripple appeared on the floor. The man ran over to check but found nothing—no coins, just a tiny, pink spot.

    “Wait, is it my pet?!” Karl pointed. The man laughed, looking suspiciously at Karl. “You saw that marble in there.” He handed Karl a piece of paper and said,”I need your help.”

    Karl rolled on his back and rolled into the street. From behind him, he saw Sam, Max, and Lily with their usual spotlights on. Karl placed a paw on Sam’s paw, suggesting they might be related.

    “Hey!” Max excitedly exclaimed. “My little sister is getting sick! Can you help me find her?”

    Karl hesitated but followed Max to the neighbor’s yard. His friend found Lily, who was hiding somewhere. Karl helped Lily bring her back home, while Sam and Max watched in laughter. Karl’s kind words were met with cheers.

    In the end, Karl became a beloved companion among Pawsley’s residents, not just for his love of marbles but also for his warmth and kindness.

    Never interrupt an AI artist in the middle of their masterpiece. 🎨🤖 But yeah, the 300-Character-Limit was ignored.

    Cleanup

    Let’s get rid of the old files. But first, double-check before running these commands—they’ll wipe out all .sh and .deb files in /tmp.

    ls -lh /tmp/*.sh /tmp/*.deb

    The command above shows you what the command below will delete. Proceed if the displayed files are trash. The deleted files are irreversibly gone after this command:

    rm -f /tmp/*.sh /tmp/*.deb
    apt autoremove

    Testing OpenWebUI with DeepSeek-R1 32B

    The prompt I gave DeepSeek was “Give me a 10 word super short story about a cat named Paul.” – which is the name of my actual cat.

    Well, not quite 10 words, but close enough. This was a cold start, meaning the model wasn’t loaded into VRAM yet—so yeah, it took a bit longer.

    DeepSeek really put in some serious effort here, like it was deep in thought, sweating over every word. 😂 Considering all that, the result is absolutely hilarious:

    Okay, so I need to come up with a 10-word super short story about a cat named Paul. Hmm, let’s see… The user gave an example: “Paul the cat climbed the fence, chasing a bird, then napped in the sun.” That’s pretty straightforward but conveys a little action and a bit of character.

    I should think about what elements make a good 10-word story. It needs to have a clear subject (Paul), some action or activity, maybe an obstacle or goal, and perhaps a resolution or consequence. Since it’s so short, each word counts, so I need to be concise but still vivid.

    Let me brainstorm some ideas about Paul the cat. Maybe he’s curious, playful, adventurous, or lazy. The example has him chasing a bird and then napping. Perhaps another activity? Maybe exploring somewhere, finding something, or having an interaction with another animal or person.

    I should think of a simple structure: subject (Paul), action verb, some context, maybe an obstacle or additional detail, and a resolution. For example, “Paul pounced on the mouse but it escaped.” That’s 8 words. Maybe add more details to make it 10.

    Or perhaps something like “Paul sat by the window watching birds chirp.” That’s also 8 words. Maybe include movement: “Paul raced through the garden chasing a butterfly.”

    Wait, the user provided an example that starts with Paul doing an action (climbed), then another action (chasing), and ends with resting (napped). So perhaps following a similar structure would work well.

    Let me try: “Paul explored the attic, found a toy, and took a nap.” That’s 9 words. Maybe add more detail or action. Or “Paul climbed the tree, met another cat, then curled up to sleep.”

    Wait, that’s 10 words. Hmm, but it feels a bit flat. Maybe make the interaction more engaging: “Paul chased a squirrel around the yard and finally rested.” That’s 9 words. Could add something like “in the shade” at the end for 10.

    Alternatively, maybe include some emotion or consequence: “Paul pounced on the ball, rolled over, and fell asleep.”

    Wait, let me check word count. Each idea needs to be exactly 10 words. Let’s see:

    • Paul the cat climbed the fence, chasing a bird, then napped in the sun. (10 words)

    So that’s the example given. I can try something similar but different.

    Maybe focus on a different activity: “Paul discovered a sunny spot, curled up, and purred softly.” That’s 9 words. Maybe add more action or detail to make it 10.

    Alternatively: “Paul leaped onto the counter, knocked over a bowl, then ate the leftovers.” That’s 10 words but maybe a bit long in description.

    Another angle: “Paul wandered through the park, met a dog, and they played together.” That’s 9 words. Add something like “all day” at the end for 10.

    Wait, perhaps include some obstacle or challenge: “Paul tried to catch a fish in the pond but got wet.” That’s 10 words. It includes an attempt, action, consequence.

    Or maybe something more peaceful: “Paul stretched lazily on the rug, yawned, and drifted off.” 9 words. Add “to sleep” for 10.

    I think I need to ensure each word adds value without being redundant. Let me try another approach:

    • Paul chased a butterfly in the garden until it flew away. (That’s 8 words. Maybe add “then rested” at the end: 10 words.)

    Wait, but that might be too similar to the example.

    Alternatively, focus on something unique about Paul: “Paul, the curious cat, explored every nook and cranny.” That’s 9 words. Maybe add “all day long” for 12; that’s too many.

    Perhaps include an interaction with another animal or a person: “Paul meowed at the door, demanding attention, then curled up.” 9 words. Add “by the fire” for 10.

    Wait, perhaps combine actions and consequences more effectively. Let me think of a scenario where Paul does something that leads to a natural conclusion in just 10 words.

    How about: “Paul sneaked into the kitchen, stole a treat, then hid under the bed.” That’s 9 words. Maybe add “to digest” for 10.

    Or maybe something more adventurous: “Paul climbed a tall tree, got stuck, and miaowed for help.” That’s 9 words. Add “from his owner” for 11; that’s too many.

    Alternatively: “Paul caught a moth in the dark, played with it, then slept soundly.” 9 words. Maybe add “through the night” for 10.

    I think I’m overcomplicating it. Let me go back to the example and see if I can structure something similar but different. The example has two actions (climbed, chasing) followed by a rest. So maybe another set of actions leading to a consequence or resolution.

    Perhaps: “Paul chased his tail in circles until he panted.” That’s 8 words. Maybe add “then stopped to rest” for 10.

    Alternatively: “Paul jumped on the sofa, knocked over a lamp, then slept.” But that’s 9 words; maybe add “peacefully” at the end for 10.

    Wait, perhaps focus on Paul’s interaction with an object or another animal. Let me think of something unique but still within the 10-word limit.

    Another idea: “Paul found a cozy spot under the stairs and napped.” That’s 8 words. Maybe add “after exploring all day” for 10.

    Wait, that would be “Paul found a cozy spot under the stairs after exploring all day.” That’s 9 words. Hmm.

    Alternatively: “Paul purred contentedly while basking in sunlight.” 7 words. Add some activity before that to make it 10.

    I think I need to stick to a structure where Paul does something, then another action or consequence, and maybe ends with a state or rest.

    Let me try again: “Paul climbed the fence, chased a butterfly, then napped in shade.” That’s 9 words. Maybe change “then” to “and” for flow: “climbed, chased, and napped” but that might not add up correctly.

    Wait, let’s count the example given: “Paul the cat climbed the fence, chasing a bird, then napped in the sun.” That’s 10 words. So it includes his name, his action (climbed), context (the fence), another action (chasing), and the consequence (napped in the sun). I need to mirror that structure but with different actions.

    So maybe: “Paul the cat explored the garden, found a mouse, then relaxed by the pond.” That’s 10 words. It follows a similar flow—action, context, additional action, consequence.

    Alternatively: “Paul the cat pounced on a toy, rolled around, and fell asleep.” 9 words. Add “on the rug” for 10.

    Wait, maybe make it more dynamic: “Paul the cat raced through the park, met a friend, then rested under a tree.” That’s 10 words. It shows activity, interaction, and rest.

    I think that’s a good structure—action, context, another action or interaction, consequence. So putting it all together, I can create a concise story within the word limit.

    – DeepSeek-R1:32B thoughts

    Here you can see the electricity usage—a solid 450W for two minutes until things calmed down. Usually the server “idles” at around 50-53W, since I have a local SIEM and this website (along with a lot of other stuff) runs on it, it is never really really idle.

    Both hardware and software went all in, pushing their limits… all for my short, 10-word cat story:

    This does not prove that DeepSeek-R1 is a waste of electricity and ends up with a subpar result, rather that you probably do not need an advanced thinking model for such a super simple task.

    Check out dolphin-mistral, also on a cold start only taking 4 seconds. The second message did not even take a second:

    Bonus: ComfyUI

    The Official Installation Guide: How to Install ComfyUI on Linux Alright, OpenWebUI now supports ComfyUI for image generation (still in beta at the time of writing). Here is the super quick start for ComfyUI, run these commands in your LXC:

    apt update && apt upgrade -y && apt install git apt install python3.11-venv -y
    mkdir ~/comfy && cd ~/comfy
    git clone https://github.com/comfyanonymous/ComfyUI.git
    cd ~/comfy/ComfyUI
    python3 -m venv venv
    source venv/bin/activate
    pip install --upgrade pip
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
    pip install -r requirements.txt

    Make sure to double-check the versions and links against your system and the official guide I linked at the top.

    Once everything is set up, give it a test run to confirm it’s working as expected:

    python main.py

    If your output confirms that everything is running smoothly, go ahead and turn it into a system service:

    nano /etc/systemd/system/comfyui.service

    Paste the following into the service file:

    [Unit]
    Description=ComfyUI Service
    After=network.target
    
    [Service]
    User=root
    Group=root
    WorkingDirectory=/root/comfy/ComfyUI
    ExecStart=/root/comfy/ComfyUI/venv/bin/python /root/comfy/ComfyUI/main.py --listen 0.0.0.0
    Restart=always
    
    [Install]
    WantedBy=multi-user.target

    Now reload and start the comfyui.service:

    sudo systemctl daemon-reload
    sudo systemctl enable comfyui.service
    sudo systemctl start comfyui.service
    sudo systemctl status comfyui.service

    ComfyUI Manager

    ComfyUI Manager is the ultimate quality-of-life add-on for ComfyUI, making model and plugin installation a breeze. Just get it—you’ll thank me later:

    cd ~/comfy/ComfyUI/custom_nodes
    git clone https://github.com/ltdrdata/ComfyUI-Manager comfyui-manager
    sudo systemctl restart comfyui.service

    Restart takes a little while since it downloads some dependencies, but no worries—give it a minute or two, and everything should run smoothly.

    Conclusion

    You should now have a working Debian LXC for local AI tasks, equipped with:

    Setting this up, especially with newer AMD GPUs, used to be a bit of a headache. I’m honestly glad I waited a bit—getting this to work wasn’t exactly painless, and I had to do a lot of digging through forums. But hey, it’s running now!

    Next up, get some models from the Ollama page. If you have an AMD RX 7900 XTX, you should be able to run the 32B version of DeepSeek-R1 effortlessly. Technically, you can load models larger than your VRAM, but be warned—it’ll slow things down.

    Also, don’t forget to secure your AI server and add valid SSL certificates, check out my post about it:

    I set up firewall rules centrally on my Dream Machine, so my AI server can only communicate with the reverse proxy.

    If your setup is different, you might want to handle this locally using UFW.

    These Goodbye Message are Brought to you by AI

    Aww, yaaaay! I totally loooove all your amazing readers <3 Wishing you guys the biiiggest luck with everything you dooove, okay? Dayyyyyybeeee~ 💕✨

    – qwen2.5:32b

    Love y’all, keep slaying in everything you do <3 Can’t wait to see what awesome things you have coming up. Bye for now! 👋🏼😉

    – dolphin-mistral
    Goodbye image I made with Flux Schnell and ComfyUI

    EDIT 1 (04.02.2024)

    I have downloaded and tested almost all popular models now and the only actually usable one for daily business like rewriting German emails or asking for expertise in German is qwen2.5 so far.

    The uncensored Dolphin models are a lot of fun, but also kind of stink with German, which is of course because their underlying models aren’t good at German either.

  • Squidward:Continuous Observation and Monitoring

    Squidward:Continuous Observation and Monitoring

    The name Squidward comes from TAD → Threat Modelling, Attack Surface and Data. “Tadl” is the German nickname for Squidward from SpongeBob, so I figured—since it’s kind of a data kraken—why not use that name?

    It’s a continuous observation and monitoring script that notifies you about changes in your internet-facing infrastructure. Think Shodan Monitor, but self-hosted.

    Technology Stack

    • certspotter: Keeps an eye on targets for new certificates and sneaky subdomains.
    • Discord: The command center—control the bot, add targets, and get real-time alerts.
    • dnsx: Grabs DNS records.
    • subfinder: The initial scout, hunting down subdomains.
    • rustscan: Blazing-fast port scanner for newly found endpoints.
    • httpx: Checks ports for web UI and detects underlying technologies.
    • nuclei: Runs a quick vulnerability scan to spot weak spots.
    • anew: Really handy deduplication tool.

    At this point, I gotta give a massive shoutout to ProjectDiscovery for open-sourcing some of the best recon tools out there—completely free! Seriously, a huge chunk of my projects rely on these tools. Go check them out, contribute, and support them. They deserve it!

    (Not getting paid to say this—just genuinely impressed.)

    How it works

    I had to rewrite certspotter a little bit in order to accomodate a different input and output scheme, the rest is fairly simple.

    Setting Up Directories

    The script ensures required directories exist before running:

    • $HOME/squidward/data for storing results.
    • Subdirectories for logs: onlynew, allfound, alldedupe, backlog.

    Running Subdomain Enumeration

    • squidward (certspotter) fetches SSL certificates to discover new subdomains.
    • subfinder further identifies subdomains from multiple sources.
    • Results are stored in logs and sent as notifications (to a Discord webhook).

    DNS Resolution

    dnsx takes the discovered subdomains and resolves:

    • A/AAAA (IPv4/IPv6 records)
    • CNAME (Canonical names)
    • NS (Name servers)
    • TXT, PTR, MX, SOA records

    HTTP Probing

    httpx analyzes the discovered subdomains by sending HTTP requests, extracting:

    • Status codes, content lengths, content types.
    • Hash values (SHA256).
    • Headers like server, title, location, etc.
    • Probing for WebSocket, CDN, and methods.

    Vulnerability Scanning

    • nuclei scans for known vulnerabilities on discovered targets.
    • The scan focuses on high, critical, and unknown severity issues.

    Port Scanning

    • rustscan finds open ports for each discovered subdomain.
    • If open ports exist, additional HTTP probing and vulnerability scanning are performed.

    Automation and Notifications

    • Discord notifications are sent after each stage.
    • The script prevents multiple simultaneous runs by checking if another instance is active (ps -ef | grep “squiddy.sh”).
    • Randomization (shuf) is used to shuffle the scan order.

    Main Execution

    If another squiddy.sh instance is running, the script waits instead of starting.

    • If no duplicate instance exists:
    • Squidward (certspotter) runs first.
    • The main scanning pipeline (what_i_want_what_i_really_really_want()) executes in a structured sequence:

    The Code

    I wrote this about six years ago and just laid eyes on it again for the first time. I have absolutely no clue what past me was thinking 😂, but hey—here you go:

    #!/bin/bash
    
    #############################################
    #
    # Single script usage:
    # echo "test.karl.fail" | ./httpx -sc -cl -ct -location -hash sha256 -rt -lc -wc -title -server -td -method -websocket -ip -cname -cdn -probe -x GET -silent
    # echo "test.karl.fail" | ./dnsx -a -aaaa -cname -ns -txt -ptr -mx -soa -resp -silent
    # echo "test.karl.fail" | ./subfinder -silent
    # echo "test.karl.fail" | ./nuclei -ni
    #
    #
    #
    #
    #############################################
    
    # -----> globals <-----
    workdir="squidward"
    script_path=$HOME/$workdir
    data_path=$HOME/$workdir/data
    
    only_new=$data_path/onlynew
    all_found=$data_path/allfound
    all_dedupe=$data_path/alldedupe
    backlog=$data_path/backlog
    # -----------------------
    
    # -----> dir-setup <-----
    setup() {
        if [ ! -d $backlog ]; then
            mkdir $backlog
        fi
        if [ ! -d $only_new ]; then
            mkdir $only_new
        fi
        if [ ! -d $all_found ]; then
            mkdir $all_found
        fi
        if [ ! -d $all_dedupe ]; then
            mkdir $all_dedupe
        fi
        if [ ! -d $script_path ]; then
            mkdir $script_path
        fi
        if [ ! -d $data_path ]; then
            mkdir $data_path
        fi
    }
    # -----------------------
    
    # -----> subfinder <-----
    write_subfinder_log() {
        tee -a $all_found/subfinder.txt | $script_path/anew $all_dedupe/subfinder.txt | tee $only_new/subfinder.txt
    }
    run_subfinder() {
        $script_path/subfinder -dL $only_new/certspotter.txt -silent | write_subfinder_log;
        $script_path/notify -data $only_new/subfinder.txt -bulk -provider discord -id crawl -silent
        sleep 5
    }
    # -----------------------
    
    # -----> dnsx <-----
    write_dnsx_log() {
        tee -a $all_found/dnsx.txt | $script_path/anew $all_dedupe/dnsx.txt | tee $only_new/dnsx.txt
    }
    run_dnsx() {
        $script_path/dnsx -l $only_new/subfinder.txt -a -aaaa -cname -ns -txt -ptr -mx -soa -resp -silent | write_dnsx_log;
        $script_path/notify -data $only_new/dnsx.txt -bulk -provider discord -id crawl -silent
        sleep 5
    }
    # -----------------------
    
    # -----> httpx <-----
    write_httpx_log() {
        tee -a $all_found/httpx.txt | $script_path/anew $all_dedupe/httpx.txt | tee $only_new/httpx.txt
    }
    run_httpx() {
        $script_path/httpx -l $only_new/subfinder.txt -sc -cl -ct -location -hash sha256 -rt -lc -wc -title \ 
        -server -td -method -websocket -ip -cname -cdn -probe -x GET -silent | write_httpx_log;
        $script_path/notify -data $only_new/httpx.txt -bulk -provider discord -id crawl -silent
        sleep 5
    }
    # -----------------------
    
    # -----> nuclei <-----
    write_nuclei_log() {
        tee -a $all_found/nuclei.txt | $script_path/anew $all_dedupe/nuclei.txt | tee $only_new/nuclei.txt
    }
    run_nuclei() {
        $script_path/nuclei -ni -l $only_new/httpx.txt -s high, critical, unknown -rl 5 -silent \
        | write_nuclei_log | $script_path/notify -provider discord -id vuln -silent
    }
    # -----------------------
    
    # -----> squidward <-----
    write_squidward_log() {
        tee -a $all_found/certspotter.txt | $script_path/anew $all_dedupe/certspotter.txt | tee -a $only_new/forscans.txt
    }
    run_squidward() {
        rm $script_path/config/certspotter/lock
        $script_path/squidward | write_squidward_log | $script_path/notify -provider discord -id cert -silent
        sleep 3
    }
    # -----------------------
    
    send_certspotted() {
        $script_path/notify -data $only_new/certspotter.txt -bulk -provider discord -id crawl -silent
        sleep 5
    }
    
    send_starting() {
        echo "Hi! I am Squiddy!" | $script_path/notify  -provider discord -id crawl -silent
        echo "I am gonna start searching for new targets now :)" | $script_path/notify  -provider discord -id crawl -silent
    }
    
    dns_to_ip() {
        # TODO: give txt file of subdomains to get IPs from file 
        $script_path/dnsx -a -l $1 -resp -silent \
        | grep -oE "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" \
        | sort --unique 
    }
    
    run_rustcan() {
        local input=""
    
        if [[ -p /dev/stdin ]]; then
            input="$(cat -)"
        else
            input="${@}"
        fi
    
        if [[ -z "${input}" ]]; then
            return 1
        fi
    
        # ${input/ /,} -> join space to comma
        # -> loop because otherwise rustscan will take forever to scan all IPs and only save results at the end
        # we could do this to scan all at once instead: $script_path/rustscan -b 100 -g --scan-order random -a ${input/ /,}
        for ip in ${input}
        do
            $script_path/rustscan -b 500 -g --scan-order random -a $ip
        done
    
    }
    
    write_rustscan_log() {
        tee -a $all_found/rustscan.txt | $script_path/anew $all_dedupe/rustscan.txt | tee $only_new/rustscan.txt
    }
    what_i_want_what_i_really_really_want() {
        # shuffle certspotter file cause why not
        cat $only_new/forscans.txt | shuf -o $only_new/forscans.txt 
    
        $script_path/subfinder -silent -dL $only_new/forscans.txt | write_subfinder_log
        $script_path/notify -silent -data $only_new/subfinder.txt -bulk -provider discord -id subfinder
    
        # -> empty forscans.txt
        > $only_new/forscans.txt
    
        # shuffle subfinder file cause why not
        cat $only_new/subfinder.txt | shuf -o $only_new/subfinder.txt
    
        $script_path/dnsx -l $only_new/subfinder.txt -silent -a -aaaa -cname -ns -txt -ptr -mx -soa -resp | write_dnsx_log
        $script_path/notify -data $only_new/dnsx.txt -bulk -provider discord -id dnsx -silent
        
        # shuffle dns file before iter to randomize scans a little bit
        cat $only_new/dnsx.txt | shuf -o $only_new/dnsx.txt
        sleep 1
        cat $only_new/dnsx.txt | shuf -o $only_new/dnsx.txt
    
        while IFS= read -r line
        do
            dns_name=$(echo $line | cut -d ' ' -f1)
            ip=$(echo ${line} \
            | grep -E "\[(\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)\]" \
            | grep -oE "(\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)")
            match=$(echo $ip | run_rustcan)
    
            if [ ! -z "$match" ]
            then
                ports_unformat=$(echo ${match} | grep -Po '\[\K[^]]*')
                ports=${ports_unformat//,/ }
    
                echo "$dns_name - $ip - $ports" | write_rustscan_log
                $script_path/notify -silent -data $only_new/rustscan.txt -bulk -provider discord -id portscan
            
                for port in ${ports}
                do
                    echo "$dns_name:$port" | $script_path/httpx -silent -sc -cl -ct -location \
                    -hash sha256 -rt -lc -wc -title -server -td -method -websocket \
                    -ip -cname -cdn -probe -x GET | write_httpx_log | grep "\[SUCCESS\]" | cut -d ' ' -f1 \
                    | $script_path/nuclei -silent -ni -s high, critical, unknown -rl 10 \
                    | write_nuclei_log | $script_path/notify -provider discord -id nuclei -silent
    
                    $script_path/notify -silent -data $only_new/httpx.txt -bulk -provider discord -id httpx
                done
            fi 
        done < "$only_new/dnsx.txt"
    }
    
    main() {
        dupe_script=$(ps -ef | grep "squiddy.sh" | grep -v grep | wc -l | xargs)
    
        if [ ${dupe_script} -gt 2 ]; then
            echo "Hey friends! Squiddy is already running, I am gonna try again later." | $script_path/notify  -provider discord -id crawl -silent
        else 
            send_starting
    
            echo "Running Squidward"
            run_squidward
    
            echo "Running the entire rest"
            what_i_want_what_i_really_really_want
    
            # -> leaving it in for now but replace with above function
            #echo "Running Subfinder"
            #run_subfinder
    
            #echo "Running DNSX"
            #run_dnsx
    
            #echo "Running HTTPX"
            #run_httpx
    
            #echo "Running Nuclei"
            #run_nuclei
        fi
    }
    
    setup
    
    dupe_script=$(ps -ef | grep "squiddy.sh" | grep -v grep | wc -l | xargs)
    if [ ${dupe_script} -gt 2 ]; then
        echo "Hey friends! Squiddy is already running, I am gonna try again later." | $script_path/notify  -provider discord -id crawl -silent
    else 
        #send_starting
        echo "Running Squidward"
        run_squidward
    fi

    There’s also a Python-based Discord bot that goes with this, but I’ll spare you that code—it did work back in the day 😬.

    Conclusion

    Back when I was a Red Teamer, this setup was a game-changer—not just during engagements, but even before them. Sometimes, during client sales calls, they’d expect you to be some kind of all-knowing security wizard who already understands their infrastructure better than they do.

    So, I’d sit in these calls, quietly feeding their possible targets into Squidward and within seconds, I’d have real-time recon data. Then, I’d casually drop something like, “Well, how about I start with server XYZ? I can already see it’s vulnerable to CVE-Blah.” Most customers loved that level of preparedness.

    I haven’t touched this setup in ages, and honestly, I have no clue how I’d even get it running again. I would probably go about it using Node-RED like in this post.

    These days, I work for big corporate, using commercial tools for the same tasks. But writing about this definitely brought back some good memories.

    Anyway, time for bed! It’s late, and you’ve got work tomorrow. Sweet dreams! 🥰😴

    Have another scary squid man monster that didn’t make featured, buh-byeee 👋

  • From Typos to Treason: The Dangerous Fun of Government Domain Squatting

    From Typos to Treason: The Dangerous Fun of Government Domain Squatting

    Hey there 👋 Since you’re reading this, chances are you’ve got some chaos brewing in your brain—I love it.

    For legal reasons I must kindly ask you to read and actually understand my disclaimer.

    Disclaimer:

    The information provided on this blog is for educational purposes only. The use of hacking tools discussed here is at your own risk.

    For the full disclaimer, please click here.

    Full full disclosure: I did have written permission to do this. And anything I didn’t have written permission for is wildly exaggerated fiction—pure imagination, no receipts, no logs, nothing but brain static.

    Now, another fair warning: this post is about to get particularly hairy. So seriously, do not try this without proper written consent, unless you have an unshakable desire to land yourself in a world of trouble.

    Intro

    I get bored really easily 😪. And when boredom strikes, I usually start a new project. Honestly, the fact that I’m still sticking with this blog is nothing short of a miracle. Could this be my forever project? Who knows—place your bets.

    Anyway, purely by accident, I stumbled across a tool that I immediately recognized as easy mode for typo squatting and bit squatting. The tool itself was kinda trash, but it did spark a deliciously questionable thought in my brain:

    “Can I intercept sensitive emails from government organizations and snatch session tokens and API keys?”

    To keep you on the edge of your seat (and slightly concerned), the answer is: Yes. Yes, I can. And trust me, it’s way worse than you think.

    It’s always the stupidly simple ideas that end up working the best.

    Typosquatting

    Typosquatting, also called URL hijacking, a sting site, a cousin domain, or a fake URL, is a form of cybersquatting, and possibly brandjacking which relies on mistakes such as typos made by Internet users when inputting a website address into a web browser. A user accidentally entering an incorrect website address may be led to any URL, including an alternative website owned by a cybersquatter.

    Wikipedia

    Basically, you register kark.fail, kick back, and wait for people to fat-finger karl.fail — and trust me, they will. Congratulations, you just hijacked some of my traffic without lifting a finger. It’s like phishing, but lazier.

    Bitsquatting

    Bitsquatting is a form of cybersquatting which relies on bit-flip errors that occur during the process of making a DNSrequest. These bit-flips may occur due to factors such as faulty hardware or cosmic rays. When such an error occurs, the user requesting the domain may be directed to a website registered under a domain name similar to a legitimate domain, except with one bit flipped in their respective binary representations.

    Wikipedia

    You register a domain that is a single-bit off your target, on my site you could register “oarl.fail”

    • ASCII of “k” = 01101011
    • Flipping the third-to-last bit:
    • 01101111 → This corresponds to “o”
    • This changes “karl” → “oarl

    Personally I have had 0 success with this, but apparently still works.

    The Setup

    Now that you know the basics, you’re officially armed with enough knowledge to cause some mild chaos 🎉.

    Here’s what we need to get started:

    • Money – Because sadly, domains don’t buy themselves.
    • A domain registrar account – I use Namecheap
    • Cloudflare account (optional, but much recommended)
    • A server connected to the internet – I use Hetzner (optional but also recommended)

    Getting a Domain

    You should probably know this if you’re planning to hack the government (or, you know, just theoretically explore some questionable cyberspace).

    Step one:

    Follow all the steps on Namecheap—or whichever registrar you fancy. You can probably find one that takes Bitcoin or Monero, if you want.

    For generating typo domains effortlessly, I use ChatGPT:

    Give me the top 5 most common typos english speaking people make for the domain "karl.fail" on a qwerty keyboard.

    ChatGPT does not know .fail is a valid TLD, but you get the point.

    Step two

    Add your domain to Cloudflare—unless, of course, you’re feeling extra ambitious and want to host your own Mailserver and Nameserver. But let’s be real, why suffer?

    Edit the “Nameservers” setting on Namecheap

    Mailserver

    I highly recommend Mailcow, though it might be complete overkill for this—unless your job involves hacking governments. In that case, totally worth it.

    Nameserver

    This is the best tutorial I could find for you—he’s using CoreDNS.

    In my tests, I used Certainly, which built a small authoritative DNS server with this Go library.

    The big perk of running your own nameserver is that you get to log every DNS query to your domain. As many pentesters know, DNS is passive recon—it doesn’t hit the target directly. That’s why you can get away with otherwise noisy tasks, like brute-forcing subdomains via DNS. But if your target runs their own nameserver, they’ll see you poking around.

    I went with a different setup because DNS logs are a mess—super noisy and, honestly, boring. Everyone and their mom ends up enumerating your domain until kingdom come.

    Beware! Different top-level domain organizations have different expectations for name servers. I ran into some trouble with the .de registry, DENIC—they insisted I set up two separate nameservers on two different IPs in two different networks. Oh, and they also wanted pretty SOA records before they’d even consider my .de domains.

    Save yourself the headache—double-check the requirements before you spend hours wrecking yourself.

    Hetzner Server

    Any server, anywhere, will do—the goal is to host a web server of your choice and capture all the weblogs. I’ll be using Debian and Caddy for this.

    The cheapest server on Hetzner

    We’ll be building our own Caddy with the Cloudflare plugin because I couldn’t get wildcard certificates to work without it. Plus, I always use Cloudflare (❤️ you guys).

    Installation of Go (current guide):

    sudo apt update && sudo apt upgrade -y
    wget https://go.dev/dl/go1.23.5.linux-amd64.tar.gz
    rm -rf /usr/local/go && tar -C /usr/local -xzf go1.23.5.linux-amd64.tar.gz
    export PATH=$PATH:/usr/local/go/bin
    echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.profile
    source ~/.profile

    Build Caddy with Cloudflare-DNS

    The official guide is here.

    go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest
    sudo mv ~/go/bin/xcaddy /usr/local/bin/
    xcaddy build --with github.com/caddy-dns/cloudflare
    sudo mv caddy /usr/local/bin/
    caddy version

    Getting a Cloudflare API Key

    To get the API key just follow the Cloudflare docs, I set mine with these permissions:

    All zones - Zone:Read, SSL and Certificates:Edit, DNS:Edit

    Here is also the official page for the Cloudflare-DNS Plugin.

    export CF_API_TOKEN="your_cloudflare_api_token"
    echo 'CF_API_TOKEN="your_cloudflare_api_token"' | sudo tee /etc/default/caddy > /dev/null

    Caddyfile

    I am using example domains!

    (log_requests) {
    	log {
    		output file /var/log/caddy/access.log
    		format json
    	}
    }
    
    karlkarlkarl.de, *.karlkarlkarl.de {
    	import log_requests
    
    	tls {
    		dns cloudflare {env.CLOUDFLARE_API_TOKEN}
    	}
    
    	header Content-Type "text/html"
    	respond "Wrong!" 200
    }
    
    karlkarl.de, *.karlkarl.de {
    	import log_requests
    
    	tls {
    		dns cloudflare {env.CLOUDFLARE_API_TOKEN}
    	}
    
    	header Content-Type "text/html"
    	respond "Wrong!" 200
    }
    

    Running Caddy as a service

    nano /etc/systemd/system/caddy.service
    [Unit]
    Description=Caddy Web Server
    After=network.target
    
    [Service]
    User=caddy
    Group=caddy
    ExecStart=/usr/bin/caddy run --config /etc/caddy/Caddyfile --adapter caddyfile
    EnvironmentFile=/etc/default/caddy
    AmbientCapabilities=CAP_NET_BIND_SERVICE
    Restart=always
    RestartSec=5s
    LimitNOFILE=1048576
    
    [Install]
    WantedBy=multi-user.target
    systemctl start caddy
    systemctl enable caddy
    systemctl status caddy

    Everything should work if you closely followed the steps up until now. If not check the caddy.service and Caddyfile. To check logs use:

    journalctl -u caddy --no-pager -n 50 -f

    Just a heads-up—Caddy automatically redacts credentials in its logs, and getting it to not do that is kind of a pain.

    {"level":"info","ts":1738162687.1416154,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_ip":"1.0.0.1","remote_port":"62128","client_ip":"1.0.0.1","proto":"HTTP/1.1","method":"GET","host":"api.karlkarlkarl.de","uri":"/api/resource","headers":{"User-Agent":["curl/8.7.1"],"Authorization":["REDACTED"],"Accept":["application/json"]}},"bytes_read":0,"user_id":"","duration":0.000052096,"size":0,"status":308,"resp_headers":{"Connection":["close"],"Location":["https://api.karlkarlkarl.de/login"],"Content-Type":[],"Server":["Caddy"]}}
    "Authorization":["REDACTED"]

    Lame for us 😒. If you want more control over logging, you can use any other server or even build your own. One day I might add this as a feature to my Node-RED-Team stack, including automatic Cloudflare settings via API, just add domain and go.

    As I mentioned earlier, I had permission for this, and my scope didn’t allow me to grab actual credentials since they belonged to third parties using the service.

    The most interesting things in these logs:

    • Credentials
    • IP addresses
    • Paths
    • Subdomains
    • Cookies and tokens

    That should be more than enough to hijack a session and dig up even more data—or at the very least, get some freebies.

    Cloudflare – DNS & Mail

    DNS

    We’ll add some wildcard DNS records so that all subdomains get routed to our server—because let’s be real, we don’t know all the subdomains of our target.

    Example of Wildcard DNS, best to set both, a normal A and Wildcard A. Point it to your IP.

    It’s almost as good as having your own nameserver. Plus, Cloudflare gives you a ton of DNS logs. Sure, you won’t get all of them like you would with your own setup, but honestly… I don’t really care that much about DNS logs anyway.

    SS/TLS Settings in Cloudflare

    Make sure to check your SSL/TLS setting in Cloudflare to be “Full (strict)” otherwise Caddy and Clouflare will get stuck in a redirect loop and it is gonna take you forever to figure out that this is the issue, which will annoy you quite a bit.

    Email

    Set up email routing through Cloudflare—it’s easy, just two clicks. Then, you’ll need a catch-all email rule and a destination address.

    This will forward all emails sent to the typo domain straight to your chosen domain.

    Catch-All Email rule in Cloudflare Email Settings

    You could set up your own mail server to do the same thing, which gives you more control over how emails are handled. But for my POC, I didn’t need the extra hassle.

    I should mention that I set up an email flow to notify people that they sent their mail to the wrong address and that it was not delivered using n8n:

    This post is already getting pretty long, so I might do a separate one about n8n another time. For now, just know that people were notified when they sent mail to the wrong address, and their important messages were delivered into the void.

    Profit

    By “profit,” I’m, of course, making a joke about the classic Step 1 → Step 2 → Step 3 → Profit meme—not actual profit. That would be illegal under American law, so let’s keep things legal and fun. Just thought I’d clarify 🫡.

    Now, you wait. Check the logs now and then, peek at the emails occasionally. Like a fisherman (or fisherwoman), you sit back and see what bites.

    How long does it take? Well, that depends on how good your typo is and how popular your target is—could be minutes, could be days.

    For me, I was getting around 10-15 emails per day. The weblogs are mostly just people scanning the crap out of my server.

    Email stats of the first 2 days for one of the domains (I hold 14)

    Conclusion

    I bought 14 domains with the most common typos for my target and ended up catching around 400 emails in a month —containing some of the most devastating info you could imagine.

    I’m talking government documents, filled-out contracts, filed reports. I got people’s birth certificates, death certificates, addresses, signatures—you name it.

    Think about it—when you email a government office, they already know everything about you, so you don’t think twice about sending them paperwork, right? Well… better triple-check that email address before you hit send, or guess what? It’s mine now.

    As for weblogs, their real value comes in when a developer is testing a tool and mistypes a public domain. I didn’t manage to snag any API keys, but I guarantee that if your target has public APIs or a sprawling IT infrastructure, credentials will slip through eventually.

    Defense

    The only real defense is to buy all the typo domains before the bad guys do. There are services that specialize in this—if you’ve got the budget, use them.

    If you can’t buy them, monitor them. Plenty of commercial tools can do this, or you can build your own. The easiest DIY approach would be to use dnstwist to generate typo variations and check WHOIS records or dig to see if anyone has registered them.

    Typo domains aren’t just used for passive logging—people also host malicious content and phishing campaigns on them. That said, those methods get caught pretty fast. The approach I showed you is much more silent and in my opinion, dangerous. It doesn’t set off alarms right away.

    Also, don’t bother scanning for typo domains with MX records—most registrars have catch-all rules, so that’s a dead end.

    Domains are dirt cheap compared to the damage I could do if I decided to leak this to the press, extort people, or trick them into giving me money. You instantly gain trust because the emails you receive usually say things like “As we just discussed over the phone… or contain entire ongoing conversations.

    This whole setup takes about an hour and costs maybe 50 bucks for some domains.

    Anyway, thanks for reading. Good night, sleep tight, and don’t let the bed bugs bite.

    Love you 😘

  • The Privacy-Friendly Mail Parser You’ve Been Waiting For

    The Privacy-Friendly Mail Parser You’ve Been Waiting For

    As you may or may not know (but now totally do), I have another beloved website, Exploit.to. It’s where I let my inner coder run wild and build all sorts of web-only tools. I’ll save those goodies for another project post, but today, we’re talking about my Mail Parser—a little labor of love born from frustration and an overdose of caffeine.

    See, as a Security Analyst and incident responder, emails are my bread and butter. Or maybe my curse. Parsing email headers manually? It’s a one-way ticket to losing your sanity. And if you’ve ever dealt with email headers, you know they’re basically the Wild West—nobody follows the rules, everyone’s just slapping on whatever they feel like, and chaos reigns supreme.

    The real kicker? Every single EML parser out there at the time was server-side. Let me paint you a picture: you, in good faith, upload that super-sensitive email from your mom (the one where she tells you your laundry’s done and ready for pick-up) to some rando’s sketchy server. Who knows what they’re doing with your mom’s loving words? Selling them? Training an AI to perfect the art of passive-aggressive reminders? The horror!

    So, I thought, “Hey, wouldn’t it be nice if we had a front-end-only EML parser? One that doesn’t send your personal business to anyone else’s server?” Easy peasy, right? Wrong. Oh, how wrong I was. But I did it anyway.

    You can find the Mail Parser here and finally parse those rogue headers in peace. You’re welcome.

    Technologies

    • React: Handles the user interface and dynamic interactions.
    • Astro.js: Used to generate the static website efficiently. (technically not needed for this project)
    • TailwindCSS: For modern and responsive design.
    • ProtonMail’s jsmimeparser: The core library for parsing email headers.

    When I first approached this project, I tried handling email header parsing manually with regular expressions. It didn’t take long to realize how complex email headers have become, with an almost infinite variety of formats, edge cases, and inconsistencies. Regex simply wasn’t cutting it.

    That’s when I discovered ProtonMail’s jsmimeparser, a library purpose-built for handling email parsing. It saved me from drowning in parsing logic and ensured the project met its functional goals.

    Sharing the output of this tool without accidentally spilling personal info all over the place is kinda tricky. But hey, I gave it a shot with a simple empty email I sent to myself:

    The Code

    As tradition dictates, the code isn’t on GitHub but shared right here in a blog post 😁.

    Kidding (sort of). The repo is private, but no gatekeeping here—here’s the code:

    mailparse.tsx
    import React, { useState } from "react";
    import { parseMail } from "@protontech/jsmimeparser";
    
    type Headers = {
      [key: string]: string[];
    };
    
    const MailParse: React.FC = () => {
      const [headerData, setHeaderData] = useState<Headers>({});
      const [ioc, setIoc] = useState<any>({});
    
      function extractEntitiesFromEml(emlContent: string) {
        const ipRegex =
          /\b(?:\d{1,3}\.){3}\d{1,3}\b|\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b/g;
        const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
        const urlRegex = /(?:https?|ftp):\/\/[^\s/$.?#].[^\s]*\b/g;
        const htmlTagsRegex = /<[^>]*>/g; // Regex to match HTML tags
    
        // Match IPs, emails, and URLs
        const ips = Array.from(new Set(emlContent.match(ipRegex) || []));
        const emails = Array.from(new Set(emlContent.match(emailRegex) || []));
        const urls = Array.from(new Set(emlContent.match(urlRegex) || []));
    
        // Remove HTML tags from emails and URLs
        const cleanEmails = emails.map((email) => email.replace(htmlTagsRegex, ""));
        const cleanUrls = urls.map((url) => url.replace(htmlTagsRegex, ""));
    
        return {
          ips,
          emails: cleanEmails,
          urls: cleanUrls,
        };
      }
    
      function parseDKIMSignature(signature: string): Record<string, string> {
        const signatureParts = signature.split(";").map((part) => part.trim());
        const parsedSignature: Record<string, string> = {};
    
        for (const part of signatureParts) {
          const [key, value] = part.split("=");
          parsedSignature[key.trim()] = value.trim();
        }
    
        return parsedSignature;
      }
    
      const handleFileChange = async (
        event: React.ChangeEvent<HTMLInputElement>
      ) => {
        const file = event.target.files?.[0];
        if (!file) return;
    
        const reader = new FileReader();
        reader.onload = async (e) => {
          const buffer = e.target?.result as ArrayBuffer;
    
          // Convert the buffer to a string
          const bufferArray = Array.from(new Uint8Array(buffer)); // Convert Uint8Array to number[]
          const bufferString = String.fromCharCode.apply(null, bufferArray);
    
          const { attachments, body, subject, from, to, date, headers, ...rest } =
            parseMail(bufferString);
    
          setIoc(extractEntitiesFromEml(bufferString));
          setHeaderData(headers);
        };
    
        reader.readAsArrayBuffer(file);
      };
    
      return (
        <>
          <div className="p-4">
            <h1>Front End Only Mailparser</h1>
            <p className="my-6">
              Have you ever felt uneasy about uploading your emails to a server you
              don't fully trust? I sure did. It's like handing over your private
              correspondence to a stranger. That's why I decided to take matters
              into my own hands.
            </p>
            <p className="mb-8">
              With this frontend-only mail parser, there's no need to worry about
              your privacy. Thanks to{" "}
              <a
                href="https://proton.me/"
                className="text-pink-500 underline dark:visited:text-gray-400 visited:text-gray-500 hover:font-bold after:content-['_↗']"
              >
                ProtonMail's
              </a>{" "}
              <a
                className="text-pink-500 underline dark:visited:text-gray-400 visited:text-gray-500 hover:font-bold after:content-['_↗']"
                href="https://github.com/ProtonMail/jsmimeparser"
              >
                jsmimeparser
              </a>
              , you can enjoy the same email parsing experience right in your
              browser. No more sending your sensitive data to external servers.
              Everything stays safe and secure, right on your own system.
            </p>
    
            <input
              type="file"
              onChange={handleFileChange}
              className="block w-full text-sm text-slate-500
          file:mr-4 file:py-2 file:px-4
          file:rounded-full file:border-0
          file:text-sm file:font-semibold
          file:bg-violet-50 file:text-violet-700
          hover:file:bg-violet-100
        "
            />
    
            {Object.keys(headerData).length !== 0 && (
              <table className="mt-8">
                <thead>
                  <tr className="border dark:border-white border-black">
                    <th>Header</th>
                    <th>Value</th>
                  </tr>
                </thead>
                <tbody>
                  {Object.entries(headerData).map(([key, value]) => (
                    <tr key={key} className="border dark:border-white border-black">
                      <td>{key}</td>
                      <td>{value}</td>
                    </tr>
                  ))}
                </tbody>
              </table>
            )}
          </div>
    
          {Object.keys(ioc).length > 0 && (
            <div className="mt-8">
              <h2>IPs:</h2>
              <ul>
                {ioc.ips && ioc.ips.map((ip, index) => <li key={index}>{ip}</li>)}
              </ul>
              <h2>Emails:</h2>
              <ul>
                {ioc.emails &&
                  ioc.emails.map((email, index) => <li key={index}>{email}</li>)}
              </ul>
              <h2>URLs:</h2>
              <ul>
                {ioc.urls &&
                  ioc.urls.map((url, index) => <li key={index}>{url}</li>)}
              </ul>
            </div>
          )}
        </>
      );
    };
    
    export default MailParse;

    Yeah, I know, it looks kinda ugly as-is—but hey, slap it into VSCode and let the prettifier work its magic.

    Most of the heavy lifting here is courtesy of the library I used. The rest is just some plain ol’ regex doing its thing—filtering for indicators in the email header and body to make life easier for further investigation.

    Conclusion

    Short and sweet—that’s the vibe here. Sometimes, less is more, right? Feel free to use this tool wherever you like—internally, on the internet, or even on a spaceship. You can also try it out anytime directly on my website.

    Don’t trust me? Totally fair. Open the website, yank out your internet connection, and voilà—it still works offline. No sneaky data sent to my servers, pinky promise.

    As for my Astro.js setup, I include the “mailparse.tsx” like this:

    ---
    import BaseLayout from "../../layouts/BaseLayout.astro";
    import Mailparse from "../../components/mailparse";
    ---
    
    <BaseLayout>
      <Mailparse client:only="react" />
    </BaseLayout>

    See you on the next one. Love you, byeeeee ✌️😘

  • KarlGPT – My Push to Freedom

    KarlGPT – My Push to Freedom

    KarlGPT represents my pursuit of true freedom, through AI. I’ve realized that my ultimate life goal is to do absolutely nothing. Unfortunately, my strong work ethic prevents me from simply slacking off or quietly quitting.

    This led me to the conclusion that I need to maintain, or even surpass, my current level of productivity while still achieving my dream of doing nothing. Given the advancements in artificial intelligence, this seemed like a solvable problem.

    I began by developing APIs to gather all the necessary data from my work accounts and tools. Then, I started working on a local AI model and server to ensure a secure environment for my data.

    Now, I just need to fine-tune the entire system, and soon, I’ll be able to automate my work life entirely, allowing me to finally live my dream: doing absolutely nothing.

    This is gonna be a highly censored post as it involves certain details about my work I can not legally disclose

    Technologies

    Django and Django REST Framework (DRF)

    Django served as the backbone for the server-side logic, offering a robust, scalable, and secure foundation for building web applications. The Django REST Framework (DRF) made it simple to expose APIs with fine-grained control over permissions, serialization, and views. DRF’s ability to handle both function-based and class-based views allowed for a clean, modular design, ensuring the APIs could scale as the project evolved.

    Celery Task Queue

    To handle asynchronous tasks such as sending emails, performing background computations, and integrating external services (AI APIs), I implemented Celery. Celery provided a reliable and efficient way to manage long-running tasks without blocking the main application. This was critical for tasks like scheduling periodic jobs and processing user-intensive data without interrupting the API’s responsiveness.

    React with TypeScript and TailwindCSS

    For the frontend, I utilized React with TypeScript for type safety and scalability. TypeScript ensured the codebase remained maintainable as the project grew. Meanwhile, TailwindCSS enabled rapid UI development with its utility-first approach, significantly reducing the need for writing custom CSS. Tailwind’s integration with React made it seamless to create responsive and accessible components.

    This is my usual front end stack, usually also paired with Astrojs. I use regular React, no extra framework.

    Vanilla Python

    Due to restrictions that prohibited the use of external libraries in local API wrappers, I had to rely on pure Python to implement APIs and related tools. This presented unique challenges, such as managing HTTP requests, data serialization, and error handling manually. Below is an example of a minimal API written without external dependencies:

    import re
    import json
    from http.server import BaseHTTPRequestHandler, HTTPServer
    
    
    items = {"test": "mewo"}
    
    
    class ControlKarlGPT(BaseHTTPRequestHandler):
        def do_GET(self):
            if re.search("/api/helloworld", self.path):
                self.send_response(200)
                self.send_header("Content-type", "application/json")
                self.end_headers()
                response = json.dumps(items).encode()
                self.wfile.write(response)
            else:
                self.send_response(404)
                self.end_headers()
                
    def run(server_class=HTTPServer, handler_class=ControlKarlGPT, port=8000):
        server_address = ("", port)
        httpd = server_class(server_address, handler_class)
        print(f"Starting server on port http://127.0.0.1:{port}")
        httpd.serve_forever()
    
    
    if __name__ == "__main__":
        run()

    By weaving these technologies together, I was able to build a robust, scalable system that adhered to the project’s constraints while still delivering a polished user experience. Each tool played a crucial role in overcoming specific challenges, from frontend performance to backend scalability and compliance with restrictions.

    File based Cache

    To minimize system load, I developed a lightweight caching framework based on a simple JSON file-based cache. Essentially, this required creating a “mini-framework” akin to Flask but with built-in caching capabilities tailored to the project’s needs. While a pull-based architecture—where workers continuously poll the server for new tasks—was an option, it wasn’t suitable here. The local APIs were designed as standalone programs, independent of a central server.

    This approach was crucial because some of the tools we integrate lack native APIs or straightforward automation options. By building these custom APIs, I not only solved the immediate challenges of this project (e.g., powering KarlGPT) but also created reusable components for other tasks. These standalone APIs provide a solid foundation for automation and flexibility beyond the scope of this specific system

    How it works

    The first step was to identify what tasks I perform in the daily and the tools I use for each of them. To automate anything effectively, I needed to abstract these tasks into programmable actions. For example:

    • Read Emails
    • Respond to Invitations
    • Check Tickets

    Next, I broke these actions down further to understand the decision-making process behind each. For instance, when do I respond to certain emails, and how do I determine which ones fall under my responsibilities? This analysis led to a detailed matrix that mapped out every task, decision point, and tool I use.

    The result? A comprehensive, structured overview of my workflow. Not only did this help me build the automation framework, but it also provided a handy reference for explaining my role. If my boss ever asks, “What exactly do you do here?” I can present the matrix and confidently say, “This is everything.”

    As you can see, automating work can be a lot of work upfront—an investment in reducing effort in the future. Ironically, not working requires quite a bit of work to set up! 😂

    The payoff is a system where tasks are handled automatically, and I have a dashboard to monitor, test, and intervene as needed. It provides a clear overview of all ongoing processes and ensures everything runs smoothly:

    AI Magic: Behind the Scenes

    The AI processing happens locally using Llama 3, which plays a critical role in removing all personally identifiable information (PII) from emails and text. This is achieved using a carefully crafted system prompt fine-tuned for my specific job and company needs. Ensuring sensitive information stays private is paramount, and by keeping AI processing local, we maintain control over data security.

    In most cases, the local AI is fully capable of handling the workload. However, for edge cases where additional computational power or advanced language understanding is required, Claude or ChatGPT serve as backup systems. When using cloud-based AI, it is absolutely mandatory to ensure that no sensitive company information is disclosed. For this reason, the system does not operate in full-auto mode. Every prompt is reviewed and can be edited before being sent to the cloud, adding an essential layer of human oversight.

    To manage memory and task tracking, I use mem0 in conjunction with a PostgreSQL database, which acts as the system’s primary “brain” 🧠. This database, structured using Django REST Framework, handles everything from polling for new tasks to storing results. This robust architecture ensures that all tasks are processed efficiently while maintaining data integrity and security.

    Conclusion

    Unfortunately, I had to skip over many of the intricate details and creative solutions that went into making this system work. One of the biggest challenges was building APIs around legacy tools that lack native automation capabilities. Bringing these tools into the AI age required innovative thinking and a lot of trial and error.

    The preparation phase was equally demanding. Breaking down my daily work into a finely detailed matrix took time and effort. If you have a demanding role, such as being a CEO, it’s crucial to take a step back and ask yourself: What exactly do I do? A vague answer like “represent the company” won’t cut it. To truly understand and automate your role, you need to break it down into detailed, actionable components.

    Crafting advanced prompts tailored to specific tasks and scenarios was another key part of the process. To structure these workflows, I relied heavily on frameworks like CO-START and AUTOMAT (stay tuned for an upcoming blog post about these).

    I even created AI personas for the people I interact with regularly and designed test loops to ensure the responses generated by the AI were accurate and contextually appropriate. While I drew inspiration from CrewAI, I ultimately chose LangChain for most of the complex workflows because its extensive documentation made development easier. For simpler tasks, I used lightweight local AI calls via Ollama.

    This project has been an incredible journey of challenges, learning, and innovation. It is currently in an early alpha stage, requiring significant manual intervention. Full automation will only proceed once I receive explicit legal approval from my employer to ensure compliance with all applicable laws, company policies, and data protection regulations.

    Legal Disclaimer: The implementation of any automation or AI-based system in a workplace must comply with applicable laws, organizational policies, and industry standards. Before deploying such systems, consult with legal counsel, relevant regulatory bodies, and your employer to confirm that all requirements are met. Unauthorized use of automation or AI may result in legal consequences or breach of employment contracts. Always prioritize transparency, data security, and ethical considerations when working with sensitive information.

  • Locking Down the Web: How I Secure My WordPress and Other Self-Hosted Public Sites

    Locking Down the Web: How I Secure My WordPress and Other Self-Hosted Public Sites

    Securing a WordPress hosting setup requires more than just the basics—it’s about creating a layered defense to protect your server and adapt to emerging threats. Today I am going to show you what I do to keep Karlcom hosted systems secure from outside attackers.

    Firewall Restriction

    To minimize exposure, my server only accepts traffic from Cloudflare’s IP ranges and only on port 443. This ensures that attackers cannot directly access my server’s IP address, significantly reducing the attack surface.

    On my Firewall it looks like this:

    • One rule to allow Cloudflare
    • One to allow my server to come back in from the internet
    • One block all rule for anything else

    This works pretty well so far.

    Cloudflare’s Web Application Firewall (WAF)

    I leverage Cloudflare’s free WAF to filter out malicious traffic before it reaches my server. It’s an effective first line of defense that helps block known attack patterns and suspicious behavior.

    Here you can find some more Information about it.

    I felt kind of weird sharing my WAF rules here, since you know people reading this can use them to build scans that get around but I figured, I am up for the challenge so lets go:

    (starts_with(http.request.full_uri, "https://karl.fail//xmlrpc.php")) or (starts_with(http.request.full_uri, "https://karl.fail/xmlrpc.php")) or (ends_with(http.request.uri, "xmlrpc.php")) or (http.request.full_uri contains "setup-config.php") or (http.request.full_uri contains "wp-admin/install.php") or (http.request.uri.path wildcard r"//*")

    This is pretty WordPress specific, I know you can set these on your reverse proxy as well as your wordpress server as well, but I figured letting Cloudflare handle it with their admittedly much more powerful server and taking some steam off of mine would be a good thing to do.

    EDIT:

    While writing this post attacks changed a little and I got some really annoying scans from some IP ranges that all came from Russia, so I ended up Rick Rolling all Russian IPs trying to get through to my home network. Nothing personal.

    Continuous Monitoring with Grafana Labs Loki

    Despite these measures, some scanners and attackers still manage to slip through. To address this, I use Grafana Labs Loki to analyze server logs. By identifying suspicious activity or unusual access paths, I can create new Cloudflare WAF rules to block emerging threats proactively.

    Here you can see some scans from the outside that made it through. I have since updated the WAF rules to block them as well.

    Updates

    As I mentioned in my post about backing up data, I automate the updates for all my LXCs, VMs, and container images. While this approach does carry the risk of introducing breaking changes, the time and effort saved by automating these updates outweigh the potential downsides for me at this stage. Manual maintenance just isn’t practical for my setup right now.

    Since I do daily backups I can recover real fast.

    The Cycle of Security

    This process of monitoring, analyzing, and refining creates an ongoing cycle of security improvements. It’s a proactive and dynamic approach that keeps my server well-protected against evolving threats.

    If you’re using a similar setup or have additional tips for securing WordPress hosting, I’d love to hear your thoughts. Sharing strategies and experiences is one of the best ways to stay ahead of attackers.

    That said, I’m genuinely curious if any attackers reading this will now take it as a challenge to get around my defenses. For that very reason, I stay vigilant, regularly auditing my Grafana logs at home. Security is a constant effort, and in my case, we have SIEM at home, son!

  • My Home Server: “PrettyLittleKitten” – A Personal Tech Haven

    My Home Server: “PrettyLittleKitten” – A Personal Tech Haven

    Hardware

    RAM

    My server is equipped with a total of 128GB of DDR5 RAM, made up of two Kingston FURY Beast kits (each consisting of 2 x 32GB, 6000 MHz, DDR5-RAM, DIMM).

    The RAM operates at around 3600 MHz and consistently maintains 32GB in active usage:

    Cooling

    I kept the fan coolers as they were and opted for an all-in-one liquid cooling solution: the Arctic Liquid Freezer III – 280. No particular reason, really—I just thought it was a cool choice (pun intended).

    PC Case

    This setup was originally intended to be a gaming-only PC, so I chose a sleek and clean-looking case: the Fractal Design North XL. While it’s an aesthetically pleasing choice, the one downside for use as a server is its limited storage capacity.

    CPU

    I chose the AMD Ryzen 7 7800X3D (AM5, 4.20 GHz, 8-Core), which is fantastic for gaming. However, as a server for my needs, I regret that it doesn’t have a better built-in GPU. Intel’s iGPUs are far superior for media transcoding, and using an integrated GPU instead of an external one would save a significant amount of energy.

    GPU

    I do have a dedicated GPU, the ASUS TUF Gaming AMD Radeon RX 7900 XTX OC Edition 24GB, which I chose primarily for its massive VRAM. This allows me to run larger models locally without any issues. However, when it comes to media transcoding, AMD GPUs fall short compared to other options, as highlighted in the Jellyfin – Selecting Appropriate Hardware guide.

    Mainboard

    I chose the MSI MAG B650 TOMAHAWK WIFI (AM5, AMD B650, ATX) as it seemed like a great match for the CPU. However, my GPU is quite large and ends up covering the only other PCI-E x16 slot. This limits my ability to install a decent hardware RAID card or other large expansion cards.

    Storage

    For the main OS, I selected the WD_BLACK SN770 NVMe SSD 2 TB, providing fast and reliable performance. To handle backups and media storage, I added a Seagate IronWolf 12 TB (3.5”, CMR) drive.

    For fast and redundant storage, I set up a ZFS mirror using two Intenso Internal 2.5” SSD SATA III Top, 1 TB drives. This setup ensures that critical data remains safe and accessible.

    Additionally, I included an external Samsung Portable SSD T7, 1 TB, USB 3.2 Gen.2 for extra media storage, rounding out the setup.

    Software

    For my main OS, I stick to what I know best—Proxmox. It’s absolutely perfect for home or small business servers, offering flexibility and reliability in a single package.

    I run a variety of services on it, and the list tends to evolve weekly. Here’s what I’m currently hosting:

    • Nginx Proxy Manager: For managing reverse proxies.
    • n8n: Automation tool for workflows.
    • Bearbot: A production-grade Django app.
    • Vaultwarden: A lightweight password manager alternative.
    • MySpeed: Network speed monitoring.
    • Another Nginx Proxy Manager: Dedicated to managing public-facing apps.
    • Code-Server: A browser-based IDE for developing smaller scripts.
    • Authentik: Single-Sign-On (SSO) solution for all local apps.
    • WordPress: This blog is hosted here.
    • Logs: A comprehensive logging stack including Grafana, Loki, Rsyslog, Promtail, and InfluxDB.
    • Home Assistant OS: Smart home management made easy.
    • Windows 11 Gaming VM: For gaming and other desktop needs.
    • Karlflix: A Jellyfin media server paired with additional tools to keep my media library organized.

    And this list is far from complete—there’s always something new to add or improve!

    Performance

    The core allocation may be displayed incorrectly, but otherwise, this is how my setup looks:

    Here’s the 8-core CPU usage over the last 7 days. As you can see, there’s plenty of headroom, ensuring the system runs smoothly even with all the services I have running:

    Energy costs for my server typically range between 20-25€ per month, but during the summer months, I can run it at 100% capacity during the day using the solar energy generated by my panels. My battery also helps offset some of the power usage during this time.

    Here’s a solid representation of the server’s power consumption:

    I track everything in my home using Home Assistant, which allows me to precisely calculate the energy consumption of each device, including my server. This level of monitoring ensures I have a clear understanding of where my energy is going and helps me optimize usage effectively.

    Conclusion

    Hosting a server locally is a significant investment—both in terms of hardware and energy costs. My setup cost €2405, and I spend about €40 per month on energy, including domain and backup services. While my solar panels make running the server almost free during summer, winter energy costs can be a challenge.

    That said, hosting locally has its advantages. It provides complete control over my data, excellent performance, and the flexibility to upgrade or downgrade hardware as needed. These benefits outweigh the trade-offs for me, even though the energy consumption is higher compared to a Raspberry Pi or Mini-PC.

    I could have gone a different route. A cloud server, or even an alternative like the Apple Mac Mini M4, might have been more efficient in terms of cost and power usage. However, I value upgradability and privacy too much to make those sacrifices.

    This setup wasn’t meticulously planned as a server from the start—it evolved from a gaming PC that was sitting unused. Instead of building a dedicated server from scratch or relying on a Mini-PC and NAS combination, I decided to repurpose what I already had.

    Sure, there are drawbacks. The fans are loud, energy costs add up, and it’s far from the most efficient setup. But for me, the flexibility, control, and performance make it worthwhile. While hosting locally might not be the perfect solution for everyone, it’s the right choice for my needs—and I think that’s what really matters.

  • Inception-Level Data Safety: Backing Up Your Proxmox Backups with Borg on Hetzner

    Inception-Level Data Safety: Backing Up Your Proxmox Backups with Borg on Hetzner

    Today, I want to walk you through how I handle backups for my home server. My primary method is using Proxmox’s built-in backup functionality, which I then sync to a Hetzner Storage Box for added security.

    When it comes to updates, I like to live on the edge. I enable automatic (security) updates on nearly all of my systems at home using UnattendedUpgrades. For containers, I usually deploy a Watchtower instance to keep them updated automatically. While this approach might make some people nervous—fearing a broken system after an update—I don’t sweat it. I back up daily and don’t run any mission-critical systems at home (except for this blog, of course 😉).

    For specific files or directories, like Vaultwarden, I take an extra layer of precaution by creating additional backups within the LXC container itself. These backups are synced to a Nextcloud instance I also manage through Hetzner, but in a different datacenter. Hetzner’s “Storage Shares” offer a great deal—€5 gets you 1TB of managed Nextcloud storage. While not the fastest, they’re reliable enough for my needs.

    I won’t dive into the details here, but my approach for these backups is pretty straightforward: I use ZIP files and rclone to upload everything to Nextcloud.

    Here is my script, maybe it helps you in some way:

    #!/bin/bash
    
    # Variables
    BITWARDEN_DIR="/root/bitwarden"
    BACKUP_DIR="/root/bitwarden-backup"
    NEXTCLOUD_REMOTE="nextcloud:Vaultwarden"
    TIMESTAMP=$(date '+%Y%m%d-%H%M')
    
    # Ensure backup directory exists
    mkdir -p $BACKUP_DIR
    
    # Create a single tarball of the entire Vaultwarden directory
    echo "Creating a full backup of the Vaultwarden directory..."
    tar -czvf $BACKUP_DIR/vaultwarden_full_backup-${TIMESTAMP}.tar.gz -C $BITWARDEN_DIR .
    
    # Sync the backup to Nextcloud
    echo "Uploading backup to Nextcloud..."
    rclone copy $BACKUP_DIR $NEXTCLOUD_REMOTE
    
    # Clean up local backup directory
    echo "Cleaning up local backups..."
    rm -rf $BACKUP_DIR
    
    echo "Backup completed successfully!"

    Basically, all you need to do is create an App Password and follow the Rclone guide for setting up with WebDAV. It’s straightforward and works seamlessly for this kind of setup.

    Backups in Proxmox

    Proxmox makes backups incredibly simple with its intuitive functionality. I back up pretty much everything—except for my Gaming VM. It’s a Windows 11 experiment where I’ve passed through my AMD RX7900XT for gaming. Ironically, instead of gaming, I end up spending more time tweaking backups and writing about them. Let’s just say that gaming setup hasn’t exactly gone as planned.

    I rely on Snapshot mode for my backups, and you can explore all its features and settings right here. As I mentioned earlier, I tend to restore backups more frequently than most people, and I’ve never faced any issues with inconsistencies. It’s been consistently reliable for me!

    For retention, I keep it straightforward by saving only the last two backups. Since I also back up my backups (as you’ll see later), this minimalist approach is more than sufficient for my needs and saves me some space.

    I left the rest of the settings as they are. The note templates are useful if you’re managing a large or multiple instances, but for my setup, I don’t use them.

    Trigger warning: For now, I’m storing these backups on a single internal Seagate IronWolf (12 TB). I know, not ideal. These drives are pretty pricey, but one day I plan to add another and set up a ZFS mirror or RAID for better redundancy. For now, I’m relying on this one drive—fingers crossed, it’s been rock solid so far!

    Borg(Backup)

    The first thing I heard when I proudly told my friends that I was finally taking the golden 3-2-1 backup rule seriously was: “Why not restic?”

    The simple answer? I Googled “backups to Hetzner Storage Box,” and the first result was an article explaining exactly what I wanted to do—using Borg 🤷‍♂️. Before I even considered trying restic, I had already set up encrypted incremental backups with Borg. Feel free to share what you use and why you might have switched, but for now, this setup works perfectly for me!

    Hetzner Storage Box

    Just to clarify, I’m not talking about Hetzner Storage Share 😁. I’m using their 5TB Storage Box and opted for Finland 🇫🇮 as the location since I already have other Karlcom-related stuff in their German datacenter. It helps keep things spread out a bit!

    Essentially, it’s a big, affordable storage backend with multiple options for uploading data. You could mount it using the “Samba/CIFS” option, but I decided against that. Instead, I went with a more secure SSH connection to send my backups there.

    Setup

    First, you’ll need to upload your SSH key to the Hetzner Storage Box. You can follow this step by step guide.

    Once that’s done, the next step is to install and Configure BorgBackup, which you can also follow the simple guide I linked to.

    I know, it seems like you came here just to find links to set this up somewhere else. But don’t worry—I’ve got some cool stuff to share with you next. Here’s my backup script:

    /usr/local/bin/proxmox_borg_backup.sh
    #!/bin/bash
    
    # Variables
    BORG_REPO="ssh://[email protected]:23/home/backups/central"
    
    BORG_PASSPHRASE=''
    BACKUP_SOURCE="/mnt/pve/wd_hdd_internal/dump"                               
    LOG_FILE="/var/log/proxmox_borg_backup.log"                                 
    MAX_LOG_SIZE=10485760
    RID=`uuidgen`
    CHECK_ID="ggshfo8-9ca6-1234-1234-326571681"
    
    # start
    curl -fsS -m 10 --retry 5 "https://ping.yourdomain.de/ping/$CHECK_ID/start?rid=$RID"
    
    # Export Borg passphrase
    export BORG_PASSPHRASE
    
    # Rotate log file if it exceeds MAX_LOG_SIZE
    if [ -f "$LOG_FILE" ] && [ $(stat -c%s "$LOG_FILE") -gt $MAX_LOG_SIZE ]; then
        mv "$LOG_FILE" "${LOG_FILE}_$(date +"%Y-%m-%d_%H-%M-%S")"
        touch "$LOG_FILE"
    fi
    
    # Check for BorgBackup installation
    if ! command -v borg &> /dev/null; then
        echo "ERROR: BorgBackup is not installed or not in PATH." >> "$LOG_FILE"
        exit 1
    fi
    
    # Check for SSH connection
    if ! ssh -q -o BatchMode=yes -o ConnectTimeout=5 -p 23 -i ~/.ssh/backup u123456@ u123456.your-storagebox.de exit; then
        echo "ERROR: Unable to connect to Borg repository." >> "$LOG_FILE"
        exit 1
    fi
    
    # Logging start time
    {
      echo "==== $(date +"%Y-%m-%d %H:%M:%S") Starting Proxmox Backup ===="
    
      # Check if the backup source exists
      if [ ! -d "$BACKUP_SOURCE" ]; then
          echo "ERROR: Backup source directory $BACKUP_SOURCE does not exist!"
          exit 1
      fi
    
      # Create a new Borg backup
      echo "Creating Borg backup..."
      borg create --stats --compression zstd \
          "$BORG_REPO::backup-{now:%Y-%m-%d}" \
          "$BACKUP_SOURCE" >> "$LOG_FILE" 2>&1
    
    
      if [ $? -ne 0 ]; then
          echo "ERROR: Borg backup failed!"
          exit 1
      fi
    
      # Prune old backups to save space
      echo "Pruning old backups..."
      borg prune --stats \
          --keep-daily=7 \
          --keep-weekly=4 \
          --keep-monthly=6 \
          "$BORG_REPO"
    
      if [ $? -ne 0 ]; then
          echo "ERROR: Borg prune failed!"
          exit 1
      fi
    
      echo "==== $(date +"%Y-%m-%d %H:%M:%S") Proxmox Backup Completed ===="
    } >> "$LOG_FILE" 2>&1
    
    # finished
    curl -fsS -m 10 --retry 5 "https://ping.yourdomain.de/ping/$CHECK_ID?rid=$RID"

    The curl requests at the top and bottom of the script are for my Healthchecks.io instance—I even wrote a blog post about it here.

    Before moving on, you should definitely test this script. Depending on the size of your setup, the initial backup could take several hours. However, if it doesn’t fail within the first 10 seconds, that’s usually a good sign. To be sure it’s running smoothly, check the log file to confirm it started correctly:

    /var/log/proxmox_borg_backup.log
    ==== 2025-01-10 01:39:07 Starting Proxmox Backup ====
    Creating Borg backup...
    ------------------------------------------------------------------------------
    Repository: ssh://u123456@ u123456.your-storagebox.de:23/home/backups/central
    Archive name: backup-2025-01-10
    Archive fingerprint: z724gf2789hgf972hf9uh...
    Time (start): Fri, 2025-01-10 01:39:08
    Time (end):   Fri, 2025-01-10 05:36:41
    Duration: 3 hours 57 minutes 32.92 seconds
    Number of files: 72
    Utilization of max. archive size: 0%
    ------------------------------------------------------------------------------
                           Original size      Compressed size    Deduplicated size
    This archive:               62.03 GB             61.98 GB             61.60 GB
    All archives:               62.03 GB             61.98 GB             61.60 GB
    
                           Unique chunks         Total chunks
    Chunk index:                   24030                40955
    ------------------------------------------------------------------------------
    Pruning old backups...
    ------------------------------------------------------------------------------
                           Original size      Compressed size    Deduplicated size
    Deleted data:                    0 B                  0 B                  0 B
    All archives:               62.03 GB             61.98 GB             61.60 GB
    
                           Unique chunks         Total chunks
    Chunk index:                   24030                40955
    ------------------------------------------------------------------------------
    ==== 2025-01-10 05:36:42 Proxmox Backup Completed ====

    Security of BORG_PASSPHRASE

    I decided to include the passphrase for encryption and decryption directly in the script because it fits within my threat model. My primary concern isn’t someone gaining access to my local Proxmox server and restoring or deleting my backups—my focus is on protecting against snooping by cloud providers or malicious admins.

    Having the passphrase in the script works for me. Sure, there are other ways to handle this, but for the script to run automatically, you’ll always need to store the passphrase somewhere on your system. At the very least, it has to be accessible by root. This setup strikes the right balance for my needs.

    Systemd timers

    I created a system service to handle this backup process. For long-running jobs, it’s generally better to use systemd timers instead of cron, as they’re less prone to timeouts. I found this post particularly helpful when setting it up.

    Here’s the service that actually runs my bash script:

    /etc/systemd/system/proxmox_borg_backup.service
    [Unit]
    Description=Proxmox BorgBackup Service
    After=network.target
    
    [Service]
    Type=oneshot
    ExecStart=/usr/local/bin/proxmox_borg_backup.sh

    And here’s the systemd timer that handles scheduling the service:

    /etc/systemd/system/proxmox_borg_backup.timer
    [Unit]
    Description=Run Proxmox BorgBackup Daily at 3 AM
    
    [Timer]
    OnCalendar=*-*-* 03:00:00
    Persistent=true
    
    [Install]
    WantedBy=timers.target

    Now, instead of enabling the service directly, you enable and start the timer. The timer will take care of starting the service according to the schedule you’ve defined. This setup ensures everything runs smoothly and on time!

    Bash
    systemctl enable proxmox_borg_backup.timer
    systemctl start proxmox_borg_backup.timer 
    systemctl status proxmox_borg_backup.timer

    That’s it! You’re all set. You can check the log file we created or use the journalctl command to review any errors or confirm successful runs. Happy backing up! 🎉

    Bash
    journalctl -xeu proxmox_borg_backup.timer
    
    # or 
    
    tail -n 50 /var/log/proxmox_borg_backup.log

    Conclusion

    You should now have an easy and efficient solution to back up your Proxmox backups to a Hetzner Storage Box using Borg Backup. Both Borg and Restic support a variety of storage targets, so you can adapt this approach to suit your needs. In my setup, Borg performs incremental backups, uploading only new data, which helps keep storage costs low while maintaining security.

    A word of caution: don’t lose your secrets—your encryption key or passphrase—because without them, you won’t be able to restore your data. Trust me, I’ve been there before! Thankfully, I had local backups to fall back on.

    On Hetzner, I schedule daily backups at noon, after all my backup jobs have completed. I retain only the last three days, which works perfectly for me, though your needs might differ. Just remember that snapshot storage counts toward your total storage capacity—so if you have 1TB, the space used by snapshots will reduce the available storage for new data.

    Thank you for reading! May your backups always be safe, your disks last long, and your systems run smoothly. Wishing you all the best—love you, byeeeeee! ❤️🚀

  • Before “PrettyLittleKitten”: A Brief Affair with the Mac Mini M4

    Before “PrettyLittleKitten”: A Brief Affair with the Mac Mini M4

    Before I built my beloved server, affectionately named “PrettyLittleKitten“, I had a brief fling with the brand-new Mac Mini M4. Spoiler alert: it was a short-lived relationship.

    Let me start with the good stuff: processing power-to-power usage ratio. It’s absolutely unmatched. The Mac Mini M4 is a beast in terms of efficiency—an essential factor for me. I wanted hardware that could handle Jellyfin with smooth hardware acceleration while still hosting all my containers.

    The Hardware

    On paper (and in practice as a desktop), the Mac Mini M4 shines. It offers:

    • 4 Thunderbolt USB-C ports, making storage expansion a breeze. Pair it with an external NVMe enclosure, and you can achieve speeds close to that of internal storage.
    • Hardware that punches way above its price point, making it a reasonable investment for many use cases.

    The Disappointment

    Here’s where the romance fell apart. While the Mac Mini M4 is brilliant as a desktop, using it as a server is a whole different ball game—and not a fun one.

    The iCloud Conundrum

    First up: the dreaded iCloud account requirement. This wasn’t a total shock (it’s Apple, after all), but it made me long for the simplicity of Debian and Proxmox, where everything is blissfully offline.

    I went ahead and set it up with my personal iCloud account—big mistake. To run the Mac Mini as I wanted, it needed to stay logged in indefinitely. And here’s the kicker: to achieve that, I had to disable authentication entirely. Translation? If anyone got their hands on my Mini, they’d have full access to my iCloud account. Yikes.

    Pro tip: Use a burner iCloud account if you’re planning to go down this route. (Is this what you want, Apple?!)

    Dummy HDM

    Then there’s the issue of fooling the Mac into thinking it’s doing desktop work. Without a connected display, macOS doesn’t fully utilize the GPU or cores, which impacts performance. Enter the Dummy HDMI Plug—a little device to trick the system into thinking a monitor is attached. At ~€40, it’s not a dealbreaker, but definitely annoying.

    Power Saving Woes

    You’ll also need to disable power-saving features. While the Mac Mini M4 consumes very little power in idle, turning off power-saving negates some of its efficiency benefits.

    Recap of Mac Mini Server Challenges

    If you’re still tempted to use the Mac Mini M4 as a server, here’s your checklist:

    • Dummy HDMI Plug: €40 (because macOS needs to “see” a monitor).
    • Burner iCloud Account: Necessary to avoid risking your real account.
    • Disable Authentication: Say goodbye to security.
    • Disable Power Saving: Because macOS doesn’t believe in idle servers.

    Final Thoughts

    If you’re determined, Evan Bartlett has written an excellent guide on setting up the Mac Mini as a server. However, as someone coming from the Linux world—where operating systems are designed for server use—it just didn’t feel right. Forcing macOS, an OS that clearly does not want to be a server, felt morally and ethically wrong.

    Here’s hoping Big Siri AI will be kind to me when it inevitably takes over. 🙇‍♂️🍏

    Bonus: Check this website’s response headers to see that it runs on PrettyLittleKitten

  • Why HedgeDoc Reigns as the King of Self-Hosted Note-Taking Apps

    Why HedgeDoc Reigns as the King of Self-Hosted Note-Taking Apps

    This is going to be a bold, highly opinionated take on how note-taking apps should be. For the non-technical folks, discussing text editors and note-taking apps with IT people is like walking straight into a heated geopolitical debate at the family Thanksgiving table—it’s passionate, intense, and probably never-ending. Gobble Gobble.

    I have tested a lot of note taking apps:

    There are probably even more apps I have used in the past, but these are the ones that left a lasting impression on me. First off, let me just say—I love taking notes in Markdown. Any app that doesn’t support Markdown is pretty much useless to me. I’m so much faster at writing styled notes this way, without the hassle of clicking around or memorizing weird shortcut commands.

    For me, HedgeDoc hit the sweet spot. It’s got just the right features and just the right amount of organization. I’m not looking for an app to micromanage my entire life—I just want to take some damn notes!

    Live editing has also become a game-changer for me. I often have multiple screens open, sometimes even on different networks, and being instantly up-to-date while copy-pasting seamlessly between them is invaluable. Before HedgeDoc, I was using Obsidian synced via Nextcloud, but that was neither instant nor reliable on many networks.

    And let’s talk about security. With HedgeDoc, it’s a breeze. Their authorization system is refreshingly simple, and backing up your notes is as easy as clicking a button. You get a ZIP file with all your Markdown documents, which you could technically use with other editors—but why would you? HedgeDoc feels like it was made for you, and honestly, you’ll feel the love right back.

    I run HedgeDoc inside a container on my server, and it’s rock-solid. It just works. No excessive resource use, no drama—just a tool that quietly does its job.

    Now, let’s dive in! I’m going to show you how to host HedgeDoc yourself. Let’s get started!

    Prerequisites

    Here’s what you’ll need to get started:

    • A Linux distribution: Any modern Linux distro that supports Docker will work, but for today, we’ll go with Alpine.
    • A server with a public IP address: While not strictly mandatory, this is highly recommended if you want to access your note-taking app from anywhere.
    • A reverse proxy: Something like Caddy or Nginx to handle HTTPS and make your setup accessible and secure.

    Got all that? Great—let’s get started!

    Setup

    Here’s a handy script to install Docker on a fresh Alpine setup:

    init.sh
    #!/bin/sh
    
    # Exit on any error
    set -e
    
    echo "Updating repositories and installing prerequisites..."
    cat <<EOF > /etc/apk/repositories
    http://dl-cdn.alpinelinux.org/alpine/latest-stable/main
    http://dl-cdn.alpinelinux.org/alpine/latest-stable/community
    EOF
    
    apk update
    apk add --no-cache curl openrc docker docker-compose
    
    echo "Configuring Docker to start at boot..."
    rc-update add docker boot
    service docker start
    
    echo "Verifying Docker installation..."
    docker --version
    if [ $? -ne 0 ]; then
        echo "Docker installation failed!"
        exit 1
    fi
    
    echo "Verifying Docker Compose installation..."
    docker-compose --version
    if [ $? -ne 0 ]; then
        echo "Docker Compose installation failed!"
        exit 1
    fi
    
    echo "Docker and Docker Compose installed successfully!"

    To make the script executable and run it, follow these steps:

    Bash
    chmod +x init.sh
    ./init.sh

    If everything runs without errors, Docker should now be installed and ready to go. 🎉

    To install HedgeDoc, we’ll follow the steps from their official documentation. It’s straightforward and easy

    I prefer to keep all my environment variables and secrets neatly stored in .env files, separate from the actual Compose file.

    .env
    POSTGRES_USER=hedgedoctor
    POSTGRES_PASSWORD=super_secure_password
    POSTGRES_DB=hedgedoc
    
    CMD_DB_URL=postgres://hedgedoctor:super_secure_password@database:5432/hedgedoc
    CMD_ALLOW_FREEURL=true
    CMD_DOMAIN=docs.yourdomain.de
    CMD_PROTOCOL_USESSL=true
    CMD_ALLOW_ANONYMOUS=false
    CMD_ALLOW_EMAIL_REGISTER=true # <- remove after you registered

    To keep things secure, it’s a good idea to set CMD_ALLOW_ANONYMOUS to false, so anonymous users can’t edit your documents. For added security, you can create your own account and then disable CMD_ALLOW_EMAIL_REGISTER to prevent outsiders from signing up, effectively locking down HedgeDoc.

    One great benefit of using the env_file directive in your Docker Compose setup is that it keeps your Compose files clean and tidy:

    docker-compose.yml
    services:
      database:
        image: postgres:13.4-alpine
        env_file:
          - .env
        volumes:
          - database:/var/lib/postgresql/data
        restart: always
    
      app:
        image: quay.io/hedgedoc/hedgedoc:latest
        env_file:
          - .env
        volumes:
          - uploads:/hedgedoc/public/uploads
        ports:
          - "3000:3000"
        restart: always
        depends_on:
          - database
    
    volumes:
      database:
      uploads:

    After running docker compose up -d, you should be all set! This setup assumes you already have a reverse proxy configured and pointing to the public domain where you’re hosting your HedgeDoc. If you need help setting that up, I’ve written a guide on it in another blog post.

    Keep in mind, with the settings in the .env file above, HedgeDoc won’t work unless it’s served via HTTPS through the reverse proxy using the domain you specified.

    Once everything’s in place, you should see the HedgeDoc login screen and be able to “Register” your account:

    Don’t forget to head back to your .env file and comment out that specific line once you’re done:

    .env
    ...
    # CMD_ALLOW_EMAIL_REGISTER=true # <- remove after you registered

    This ensures that no one else can create accounts on your HedgeDoc instance.

    Personally, I always set my notes to “Private” (you can do this in the top right). That way, even if I decide to let others use the instance later, I don’t have to worry about any old notes where I might have called them a stinky doodoo face (as one does):

    You can still share your documents with others, but you’ll need to change the setting to “Locked.” Anything more restrictive will prevent people from viewing your notes.

    Imagine sending your crush a beautifully crafted, markdown-styled love letter, only for them to get blocked because of your overly strict settings. Yeah… couldn’t be me.

    Conclusion

    I conclude —our notes are ready, no need for more WordPress blog posts. Now it’s time to hit the gym because it’s chest day, and let’s be honest, chest day is the best day! 💪