Tag: recon

Squidward:Continuous Observation and Monitoring

The name Squidward comes from TAD → Threat Modelling, Attack Surface and Data. “Tadl” is the German nickname for Squidward from SpongeBob, so I figured—since it’s kind of a data kraken—why not use that name?

It’s a continuous observation and monitoring script that notifies you about changes in your internet-facing infrastructure. Think Shodan Monitor, but self-hosted.

Technology Stack

certspotter: Keeps an eye on targets for new certificates and sneaky subdomains.
Discord: The command center—control the bot, add targets, and get real-time alerts.
dnsx: Grabs DNS records.
subfinder: The initial scout, hunting down subdomains.
rustscan: Blazing-fast port scanner for newly found endpoints.
httpx: Checks ports for web UI and detects underlying technologies.
nuclei: Runs a quick vulnerability scan to spot weak spots.
anew: Really handy deduplication tool.

At this point, I gotta give a massive shoutout to ProjectDiscovery for open-sourcing some of the best recon tools out there—completely free! Seriously, a huge chunk of my projects rely on these tools. Go check them out, contribute, and support them. They deserve it!

(Not getting paid to say this—just genuinely impressed.)

How it works

I had to rewrite certspotter a little bit in order to accomodate a different input and output scheme, the rest is fairly simple.

Setting Up Directories

The script ensures required directories exist before running:

$HOME/squidward/data for storing results.
Subdirectories for logs: onlynew, allfound, alldedupe, backlog.

Running Subdomain Enumeration

squidward (certspotter) fetches SSL certificates to discover new subdomains.
subfinder further identifies subdomains from multiple sources.
Results are stored in logs and sent as notifications (to a Discord webhook).

DNS Resolution

dnsx takes the discovered subdomains and resolves:

A/AAAA (IPv4/IPv6 records)
CNAME (Canonical names)
NS (Name servers)
TXT, PTR, MX, SOA records

HTTP Probing

httpx analyzes the discovered subdomains by sending HTTP requests, extracting:

Status codes, content lengths, content types.
Hash values (SHA256).
Headers like server, title, location, etc.
Probing for WebSocket, CDN, and methods.

Vulnerability Scanning

nuclei scans for known vulnerabilities on discovered targets.
The scan focuses on high, critical, and unknown severity issues.

Port Scanning

rustscan finds open ports for each discovered subdomain.
If open ports exist, additional HTTP probing and vulnerability scanning are performed.

Automation and Notifications

Discord notifications are sent after each stage.
The script prevents multiple simultaneous runs by checking if another instance is active (ps -ef | grep “squiddy.sh”).
Randomization (shuf) is used to shuffle the scan order.

Main Execution

If another squiddy.sh instance is running, the script waits instead of starting.

If no duplicate instance exists:
Squidward (certspotter) runs first.
The main scanning pipeline (what_i_want_what_i_really_really_want()) executes in a structured sequence:

The Code

I wrote this about six years ago and just laid eyes on it again for the first time. I have absolutely no clue what past me was thinking 😂, but hey—here you go:

#!/bin/bash

#############################################
#
# Single script usage:
# echo "test.karl.fail" | ./httpx -sc -cl -ct -location -hash sha256 -rt -lc -wc -title -server -td -method -websocket -ip -cname -cdn -probe -x GET -silent
# echo "test.karl.fail" | ./dnsx -a -aaaa -cname -ns -txt -ptr -mx -soa -resp -silent
# echo "test.karl.fail" | ./subfinder -silent
# echo "test.karl.fail" | ./nuclei -ni
#
#
#
#
#############################################

# -----> globals <-----
workdir="squidward"
script_path=$HOME/$workdir
data_path=$HOME/$workdir/data

only_new=$data_path/onlynew
all_found=$data_path/allfound
all_dedupe=$data_path/alldedupe
backlog=$data_path/backlog
# -----------------------

# -----> dir-setup <-----
setup() {
    if [ ! -d $backlog ]; then
        mkdir $backlog
    fi
    if [ ! -d $only_new ]; then
        mkdir $only_new
    fi
    if [ ! -d $all_found ]; then
        mkdir $all_found
    fi
    if [ ! -d $all_dedupe ]; then
        mkdir $all_dedupe
    fi
    if [ ! -d $script_path ]; then
        mkdir $script_path
    fi
    if [ ! -d $data_path ]; then
        mkdir $data_path
    fi
}
# -----------------------

# -----> subfinder <-----
write_subfinder_log() {
    tee -a $all_found/subfinder.txt | $script_path/anew $all_dedupe/subfinder.txt | tee $only_new/subfinder.txt
}
run_subfinder() {
    $script_path/subfinder -dL $only_new/certspotter.txt -silent | write_subfinder_log;
    $script_path/notify -data $only_new/subfinder.txt -bulk -provider discord -id crawl -silent
    sleep 5
}
# -----------------------

# -----> dnsx <-----
write_dnsx_log() {
    tee -a $all_found/dnsx.txt | $script_path/anew $all_dedupe/dnsx.txt | tee $only_new/dnsx.txt
}
run_dnsx() {
    $script_path/dnsx -l $only_new/subfinder.txt -a -aaaa -cname -ns -txt -ptr -mx -soa -resp -silent | write_dnsx_log;
    $script_path/notify -data $only_new/dnsx.txt -bulk -provider discord -id crawl -silent
    sleep 5
}
# -----------------------

# -----> httpx <-----
write_httpx_log() {
    tee -a $all_found/httpx.txt | $script_path/anew $all_dedupe/httpx.txt | tee $only_new/httpx.txt
}
run_httpx() {
    $script_path/httpx -l $only_new/subfinder.txt -sc -cl -ct -location -hash sha256 -rt -lc -wc -title \ 
    -server -td -method -websocket -ip -cname -cdn -probe -x GET -silent | write_httpx_log;
    $script_path/notify -data $only_new/httpx.txt -bulk -provider discord -id crawl -silent
    sleep 5
}
# -----------------------

# -----> nuclei <-----
write_nuclei_log() {
    tee -a $all_found/nuclei.txt | $script_path/anew $all_dedupe/nuclei.txt | tee $only_new/nuclei.txt
}
run_nuclei() {
    $script_path/nuclei -ni -l $only_new/httpx.txt -s high, critical, unknown -rl 5 -silent \
    | write_nuclei_log | $script_path/notify -provider discord -id vuln -silent
}
# -----------------------

# -----> squidward <-----
write_squidward_log() {
    tee -a $all_found/certspotter.txt | $script_path/anew $all_dedupe/certspotter.txt | tee -a $only_new/forscans.txt
}
run_squidward() {
    rm $script_path/config/certspotter/lock
    $script_path/squidward | write_squidward_log | $script_path/notify -provider discord -id cert -silent
    sleep 3
}
# -----------------------

send_certspotted() {
    $script_path/notify -data $only_new/certspotter.txt -bulk -provider discord -id crawl -silent
    sleep 5
}

send_starting() {
    echo "Hi! I am Squiddy!" | $script_path/notify  -provider discord -id crawl -silent
    echo "I am gonna start searching for new targets now :)" | $script_path/notify  -provider discord -id crawl -silent
}

dns_to_ip() {
    # TODO: give txt file of subdomains to get IPs from file 
    $script_path/dnsx -a -l $1 -resp -silent \
    | grep -oE "\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b" \
    | sort --unique 
}

run_rustcan() {
    local input=""

    if [[ -p /dev/stdin ]]; then
        input="$(cat -)"
    else
        input="${@}"
    fi

    if [[ -z "${input}" ]]; then
        return 1
    fi

    # ${input/ /,} -> join space to comma
    # -> loop because otherwise rustscan will take forever to scan all IPs and only save results at the end
    # we could do this to scan all at once instead: $script_path/rustscan -b 100 -g --scan-order random -a ${input/ /,}
    for ip in ${input}
    do
        $script_path/rustscan -b 500 -g --scan-order random -a $ip
    done

}

write_rustscan_log() {
    tee -a $all_found/rustscan.txt | $script_path/anew $all_dedupe/rustscan.txt | tee $only_new/rustscan.txt
}
what_i_want_what_i_really_really_want() {
    # shuffle certspotter file cause why not
    cat $only_new/forscans.txt | shuf -o $only_new/forscans.txt 

    $script_path/subfinder -silent -dL $only_new/forscans.txt | write_subfinder_log
    $script_path/notify -silent -data $only_new/subfinder.txt -bulk -provider discord -id subfinder

    # -> empty forscans.txt
    > $only_new/forscans.txt

    # shuffle subfinder file cause why not
    cat $only_new/subfinder.txt | shuf -o $only_new/subfinder.txt

    $script_path/dnsx -l $only_new/subfinder.txt -silent -a -aaaa -cname -ns -txt -ptr -mx -soa -resp | write_dnsx_log
    $script_path/notify -data $only_new/dnsx.txt -bulk -provider discord -id dnsx -silent
    
    # shuffle dns file before iter to randomize scans a little bit
    cat $only_new/dnsx.txt | shuf -o $only_new/dnsx.txt
    sleep 1
    cat $only_new/dnsx.txt | shuf -o $only_new/dnsx.txt

    while IFS= read -r line
    do
        dns_name=$(echo $line | cut -d ' ' -f1)
        ip=$(echo ${line} \
        | grep -E "\[(\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)\]" \
        | grep -oE "(\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b)")
        match=$(echo $ip | run_rustcan)

        if [ ! -z "$match" ]
        then
            ports_unformat=$(echo ${match} | grep -Po '\[\K[^]]*')
            ports=${ports_unformat//,/ }

            echo "$dns_name - $ip - $ports" | write_rustscan_log
            $script_path/notify -silent -data $only_new/rustscan.txt -bulk -provider discord -id portscan
        
            for port in ${ports}
            do
                echo "$dns_name:$port" | $script_path/httpx -silent -sc -cl -ct -location \
                -hash sha256 -rt -lc -wc -title -server -td -method -websocket \
                -ip -cname -cdn -probe -x GET | write_httpx_log | grep "\[SUCCESS\]" | cut -d ' ' -f1 \
                | $script_path/nuclei -silent -ni -s high, critical, unknown -rl 10 \
                | write_nuclei_log | $script_path/notify -provider discord -id nuclei -silent

                $script_path/notify -silent -data $only_new/httpx.txt -bulk -provider discord -id httpx
            done
        fi 
    done < "$only_new/dnsx.txt"
}

main() {
    dupe_script=$(ps -ef | grep "squiddy.sh" | grep -v grep | wc -l | xargs)

    if [ ${dupe_script} -gt 2 ]; then
        echo "Hey friends! Squiddy is already running, I am gonna try again later." | $script_path/notify  -provider discord -id crawl -silent
    else 
        send_starting

        echo "Running Squidward"
        run_squidward

        echo "Running the entire rest"
        what_i_want_what_i_really_really_want

        # -> leaving it in for now but replace with above function
        #echo "Running Subfinder"
        #run_subfinder

        #echo "Running DNSX"
        #run_dnsx

        #echo "Running HTTPX"
        #run_httpx

        #echo "Running Nuclei"
        #run_nuclei
    fi
}

setup

dupe_script=$(ps -ef | grep "squiddy.sh" | grep -v grep | wc -l | xargs)
if [ ${dupe_script} -gt 2 ]; then
    echo "Hey friends! Squiddy is already running, I am gonna try again later." | $script_path/notify  -provider discord -id crawl -silent
else 
    #send_starting
    echo "Running Squidward"
    run_squidward
fi

There’s also a Python-based Discord bot that goes with this, but I’ll spare you that code—it did work back in the day 😬.

Conclusion

Back when I was a Red Teamer, this setup was a game-changer—not just during engagements, but even before them. Sometimes, during client sales calls, they’d expect you to be some kind of all-knowing security wizard who already understands their infrastructure better than they do.

So, I’d sit in these calls, quietly feeding their possible targets into Squidward and within seconds, I’d have real-time recon data. Then, I’d casually drop something like, “Well, how about I start with server XYZ? I can already see it’s vulnerable to CVE-Blah.” Most customers loved that level of preparedness.

I haven’t touched this setup in ages, and honestly, I have no clue how I’d even get it running again. I would probably go about it using Node-RED like in this post.

These days, I work for big corporate, using commercial tools for the same tasks. But writing about this definitely brought back some good memories.

Anyway, time for bed! It’s late, and you’ve got work tomorrow. Sweet dreams! 🥰😴

Have another scary squid man monster that didn’t make featured, buh-byeee 👋

2025-01-31

From Typos to Treason: The Dangerous Fun of Government Domain Squatting
Hey there 👋 Since you’re reading this, chances are you’ve got some chaos brewing in your brain—I love it.
For legal reasons I must kindly ask you to read and actually understand my disclaimer.
Disclaimer:
The information provided on this blog is for educational purposes only. The use of hacking tools discussed here is at your own risk.
For the full disclaimer, please click here.
Full full disclosure: I did have written permission to do this. And anything I didn’t have written permission for is wildly exaggerated fiction—pure imagination, no receipts, no logs, nothing but brain static.
Now, another fair warning: this post is about to get particularly hairy. So seriously, do not try this without proper written consent, unless you have an unshakable desire to land yourself in a world of trouble.
Intro
I get bored really easily 😪. And when boredom strikes, I usually start a new project. Honestly, the fact that I’m still sticking with this blog is nothing short of a miracle. Could this be my forever project? Who knows—place your bets.
Anyway, purely by accident, I stumbled across a tool that I immediately recognized as easy mode for typo squatting and bit squatting. The tool itself was kinda trash, but it did spark a deliciously questionable thought in my brain:
“Can I intercept sensitive emails from government organizations and snatch session tokens and API keys?”
To keep you on the edge of your seat (and slightly concerned), the answer is: Yes. Yes, I can. And trust me, it’s way worse than you think.
It’s always the stupidly simple ideas that end up working the best.
Typosquatting
Typosquatting, also called URL hijacking, a sting site, a cousin domain, or a fake URL, is a form of cybersquatting, and possibly brandjacking which relies on mistakes such as typos made by Internet users when inputting a website address into a web browser. A user accidentally entering an incorrect website address may be led to any URL, including an alternative website owned by a cybersquatter.
— Wikipedia
Basically, you register kark.fail, kick back, and wait for people to fat-finger karl.fail — and trust me, they will. Congratulations, you just hijacked some of my traffic without lifting a finger. It’s like phishing, but lazier.
Bitsquatting
Bitsquatting is a form of cybersquatting which relies on bit-flip errors that occur during the process of making a DNSrequest. These bit-flips may occur due to factors such as faulty hardware or cosmic rays. When such an error occurs, the user requesting the domain may be directed to a website registered under a domain name similar to a legitimate domain, except with one bit flipped in their respective binary representations.
— Wikipedia
You register a domain that is a single-bit off your target, on my site you could register “oarl.fail”
- ASCII of “k” = 01101011
- Flipping the third-to-last bit:
- 01101111 → This corresponds to “o”
- This changes “karl” → “oarl“
Personally I have had 0 success with this, but apparently still works.
The Setup
Now that you know the basics, you’re officially armed with enough knowledge to cause some mild chaos 🎉.
Here’s what we need to get started:
- Money – Because sadly, domains don’t buy themselves.
- A domain registrar account – I use Namecheap
- Cloudflare account (optional, but much recommended)
- A server connected to the internet – I use Hetzner (optional but also recommended)
Getting a Domain
You should probably know this if you’re planning to hack the government (or, you know, just theoretically explore some questionable cyberspace).
Step one:
Follow all the steps on Namecheap—or whichever registrar you fancy. You can probably find one that takes Bitcoin or Monero, if you want.
For generating typo domains effortlessly, I use ChatGPT:
```
Give me the top 5 most common typos english speaking people make for the domain "karl.fail" on a qwerty keyboard.
```
ChatGPT does not know .fail is a valid TLD, but you get the point.
Step two
Add your domain to Cloudflare—unless, of course, you’re feeling extra ambitious and want to host your own Mailserver and Nameserver. But let’s be real, why suffer?
Edit the “Nameservers” setting on Namecheap
Mailserver
I highly recommend Mailcow, though it might be complete overkill for this—unless your job involves hacking governments. In that case, totally worth it.
Nameserver
This is the best tutorial I could find for you—he’s using CoreDNS.
In my tests, I used Certainly, which built a small authoritative DNS server with this Go library.
The big perk of running your own nameserver is that you get to log every DNS query to your domain. As many pentesters know, DNS is passive recon—it doesn’t hit the target directly. That’s why you can get away with otherwise noisy tasks, like brute-forcing subdomains via DNS. But if your target runs their own nameserver, they’ll see you poking around.
I went with a different setup because DNS logs are a mess—super noisy and, honestly, boring. Everyone and their mom ends up enumerating your domain until kingdom come.
Beware! Different top-level domain organizations have different expectations for name servers. I ran into some trouble with the .de registry, DENIC—they insisted I set up two separate nameservers on two different IPs in two different networks. Oh, and they also wanted pretty SOA records before they’d even consider my .de domains.
Save yourself the headache—double-check the requirements before you spend hours wrecking yourself.
Hetzner Server
Any server, anywhere, will do—the goal is to host a web server of your choice and capture all the weblogs. I’ll be using Debian and Caddy for this.
The cheapest server on Hetzner
We’ll be building our own Caddy with the Cloudflare plugin because I couldn’t get wildcard certificates to work without it. Plus, I always use Cloudflare (❤️ you guys).
Installation of Go (current guide):
```
sudo apt update && sudo apt upgrade -y
wget https://go.dev/dl/go1.23.5.linux-amd64.tar.gz
rm -rf /usr/local/go && tar -C /usr/local -xzf go1.23.5.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
echo 'export PATH=$PATH:/usr/local/go/bin' >> ~/.profile
source ~/.profile
```
Build Caddy with Cloudflare-DNS
The official guide is here.
```
go install github.com/caddyserver/xcaddy/cmd/xcaddy@latest
sudo mv ~/go/bin/xcaddy /usr/local/bin/
xcaddy build --with github.com/caddy-dns/cloudflare
sudo mv caddy /usr/local/bin/
caddy version
```
Getting a Cloudflare API Key
To get the API key just follow the Cloudflare docs, I set mine with these permissions:
```
All zones - Zone:Read, SSL and Certificates:Edit, DNS:Edit
```
Here is also the official page for the Cloudflare-DNS Plugin.
```
export CF_API_TOKEN="your_cloudflare_api_token"
echo 'CF_API_TOKEN="your_cloudflare_api_token"' | sudo tee /etc/default/caddy > /dev/null
```
Caddyfile
I am using example domains!
```
(log_requests) {
	log {
		output file /var/log/caddy/access.log
		format json
	}
}

karlkarlkarl.de, *.karlkarlkarl.de {
	import log_requests

	tls {
		dns cloudflare {env.CLOUDFLARE_API_TOKEN}
	}

	header Content-Type "text/html"
	respond "Wrong!" 200
}

karlkarl.de, *.karlkarl.de {
	import log_requests

	tls {
		dns cloudflare {env.CLOUDFLARE_API_TOKEN}
	}

	header Content-Type "text/html"
	respond "Wrong!" 200
}
```
Running Caddy as a service
```
nano /etc/systemd/system/caddy.service
```
```
[Unit]
Description=Caddy Web Server
After=network.target

[Service]
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --config /etc/caddy/Caddyfile --adapter caddyfile
EnvironmentFile=/etc/default/caddy
AmbientCapabilities=CAP_NET_BIND_SERVICE
Restart=always
RestartSec=5s
LimitNOFILE=1048576

[Install]
WantedBy=multi-user.target
```
```
systemctl start caddy
systemctl enable caddy
systemctl status caddy
```
Everything should work if you closely followed the steps up until now. If not check the caddy.service and Caddyfile. To check logs use:
```
journalctl -u caddy --no-pager -n 50 -f
```
Just a heads-up—Caddy automatically redacts credentials in its logs, and getting it to not do that is kind of a pain.
```
{"level":"info","ts":1738162687.1416154,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_ip":"1.0.0.1","remote_port":"62128","client_ip":"1.0.0.1","proto":"HTTP/1.1","method":"GET","host":"api.karlkarlkarl.de","uri":"/api/resource","headers":{"User-Agent":["curl/8.7.1"],"Authorization":["REDACTED"],"Accept":["application/json"]}},"bytes_read":0,"user_id":"","duration":0.000052096,"size":0,"status":308,"resp_headers":{"Connection":["close"],"Location":["https://api.karlkarlkarl.de/login"],"Content-Type":[],"Server":["Caddy"]}}
```
```
"Authorization":["REDACTED"]
```
Lame for us 😒. If you want more control over logging, you can use any other server or even build your own. One day I might add this as a feature to my Node-RED-Team stack, including automatic Cloudflare settings via API, just add domain and go.
As I mentioned earlier, I had permission for this, and my scope didn’t allow me to grab actual credentials since they belonged to third parties using the service.
The most interesting things in these logs:
- Credentials
- IP addresses
- Paths
- Subdomains
- Cookies and tokens
That should be more than enough to hijack a session and dig up even more data—or at the very least, get some freebies.
Cloudflare – DNS & Mail
DNS
We’ll add some wildcard DNS records so that all subdomains get routed to our server—because let’s be real, we don’t know all the subdomains of our target.
Example of Wildcard DNS, best to set both, a normal A and Wildcard A. Point it to your IP.
It’s almost as good as having your own nameserver. Plus, Cloudflare gives you a ton of DNS logs. Sure, you won’t get all of them like you would with your own setup, but honestly… I don’t really care that much about DNS logs anyway.
SS/TLS Settings in Cloudflare
Make sure to check your SSL/TLS setting in Cloudflare to be “Full (strict)” otherwise Caddy and Clouflare will get stuck in a redirect loop and it is gonna take you forever to figure out that this is the issue, which will annoy you quite a bit.
Email
Set up email routing through Cloudflare—it’s easy, just two clicks. Then, you’ll need a catch-all email rule and a destination address.
This will forward all emails sent to the typo domain straight to your chosen domain.
Catch-All Email rule in Cloudflare Email Settings
You could set up your own mail server to do the same thing, which gives you more control over how emails are handled. But for my POC, I didn’t need the extra hassle.
I should mention that I set up an email flow to notify people that they sent their mail to the wrong address and that it was not delivered using n8n:
This post is already getting pretty long, so I might do a separate one about n8n another time. For now, just know that people were notified when they sent mail to the wrong address, and their important messages were delivered into the void.
Profit
By “profit,” I’m, of course, making a joke about the classic Step 1 → Step 2 → Step 3 → Profit meme—not actual profit. That would be illegal under American law, so let’s keep things legal and fun. Just thought I’d clarify 🫡.
Now, you wait. Check the logs now and then, peek at the emails occasionally. Like a fisherman (or fisherwoman), you sit back and see what bites.
How long does it take? Well, that depends on how good your typo is and how popular your target is—could be minutes, could be days.
For me, I was getting around 10-15 emails per day. The weblogs are mostly just people scanning the crap out of my server.
Email stats of the first 2 days for one of the domains (I hold 14)
Conclusion
I bought 14 domains with the most common typos for my target and ended up catching around 400 emails in a month —containing some of the most devastating info you could imagine.
I’m talking government documents, filled-out contracts, filed reports. I got people’s birth certificates, death certificates, addresses, signatures—you name it.
Think about it—when you email a government office, they already know everything about you, so you don’t think twice about sending them paperwork, right? Well… better triple-check that email address before you hit send, or guess what? It’s mine now.
As for weblogs, their real value comes in when a developer is testing a tool and mistypes a public domain. I didn’t manage to snag any API keys, but I guarantee that if your target has public APIs or a sprawling IT infrastructure, credentials will slip through eventually.
Defense
The only real defense is to buy all the typo domains before the bad guys do. There are services that specialize in this—if you’ve got the budget, use them.
If you can’t buy them, monitor them. Plenty of commercial tools can do this, or you can build your own. The easiest DIY approach would be to use dnstwist to generate typo variations and check WHOIS records or dig to see if anyone has registered them.
Typo domains aren’t just used for passive logging—people also host malicious content and phishing campaigns on them. That said, those methods get caught pretty fast. The approach I showed you is much more silent and in my opinion, dangerous. It doesn’t set off alarms right away.
Also, don’t bother scanning for typo domains with MX records—most registrars have catch-all rules, so that’s a dead end.
Domains are dirt cheap compared to the damage I could do if I decided to leak this to the press, extort people, or trick them into giving me money. You instantly gain trust because the emails you receive usually say things like “As we just discussed over the phone… or contain entire ongoing conversations.
This whole setup takes about an hour and costs maybe 50 bucks for some domains.
Anyway, thanks for reading. Good night, sleep tight, and don’t let the bed bugs bite.
Love you 😘
2025-01-31

The Privacy-Friendly Mail Parser You’ve Been Waiting For

As you may or may not know (but now totally do), I have another beloved website, Exploit.to. It’s where I let my inner coder run wild and build all sorts of web-only tools. I’ll save those goodies for another project post, but today, we’re talking about my Mail Parser—a little labor of love born from frustration and an overdose of caffeine.

See, as a Security Analyst and incident responder, emails are my bread and butter. Or maybe my curse. Parsing email headers manually? It’s a one-way ticket to losing your sanity. And if you’ve ever dealt with email headers, you know they’re basically the Wild West—nobody follows the rules, everyone’s just slapping on whatever they feel like, and chaos reigns supreme.

The real kicker? Every single EML parser out there at the time was server-side. Let me paint you a picture: you, in good faith, upload that super-sensitive email from your mom (the one where she tells you your laundry’s done and ready for pick-up) to some rando’s sketchy server. Who knows what they’re doing with your mom’s loving words? Selling them? Training an AI to perfect the art of passive-aggressive reminders? The horror!

So, I thought, “Hey, wouldn’t it be nice if we had a front-end-only EML parser? One that doesn’t send your personal business to anyone else’s server?” Easy peasy, right? Wrong. Oh, how wrong I was. But I did it anyway.

You can find the Mail Parser here and finally parse those rogue headers in peace. You’re welcome.

Technologies

React: Handles the user interface and dynamic interactions.
Astro.js: Used to generate the static website efficiently. (technically not needed for this project)
TailwindCSS: For modern and responsive design.
ProtonMail’s jsmimeparser: The core library for parsing email headers.

When I first approached this project, I tried handling email header parsing manually with regular expressions. It didn’t take long to realize how complex email headers have become, with an almost infinite variety of formats, edge cases, and inconsistencies. Regex simply wasn’t cutting it.

That’s when I discovered ProtonMail’s jsmimeparser, a library purpose-built for handling email parsing. It saved me from drowning in parsing logic and ensured the project met its functional goals.

Sharing the output of this tool without accidentally spilling personal info all over the place is kinda tricky. But hey, I gave it a shot with a simple empty email I sent to myself:

The Code

As tradition dictates, the code isn’t on GitHub but shared right here in a blog post 😁.

Kidding (sort of). The repo is private, but no gatekeeping here—here’s the code:

mailparse.tsx

import React, { useState } from "react";
import { parseMail } from "@protontech/jsmimeparser";

type Headers = {
  [key: string]: string[];
};

const MailParse: React.FC = () => {
  const [headerData, setHeaderData] = useState<Headers>({});
  const [ioc, setIoc] = useState<any>({});

  function extractEntitiesFromEml(emlContent: string) {
    const ipRegex =
      /\b(?:\d{1,3}\.){3}\d{1,3}\b|\b(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}\b/g;
    const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
    const urlRegex = /(?:https?|ftp):\/\/[^\s/$.?#].[^\s]*\b/g;
    const htmlTagsRegex = /<[^>]*>/g; // Regex to match HTML tags

    // Match IPs, emails, and URLs
    const ips = Array.from(new Set(emlContent.match(ipRegex) || []));
    const emails = Array.from(new Set(emlContent.match(emailRegex) || []));
    const urls = Array.from(new Set(emlContent.match(urlRegex) || []));

    // Remove HTML tags from emails and URLs
    const cleanEmails = emails.map((email) => email.replace(htmlTagsRegex, ""));
    const cleanUrls = urls.map((url) => url.replace(htmlTagsRegex, ""));

    return {
      ips,
      emails: cleanEmails,
      urls: cleanUrls,
    };
  }

  function parseDKIMSignature(signature: string): Record<string, string> {
    const signatureParts = signature.split(";").map((part) => part.trim());
    const parsedSignature: Record<string, string> = {};

    for (const part of signatureParts) {
      const [key, value] = part.split("=");
      parsedSignature[key.trim()] = value.trim();
    }

    return parsedSignature;
  }

  const handleFileChange = async (
    event: React.ChangeEvent<HTMLInputElement>
  ) => {
    const file = event.target.files?.[0];
    if (!file) return;

    const reader = new FileReader();
    reader.onload = async (e) => {
      const buffer = e.target?.result as ArrayBuffer;

      // Convert the buffer to a string
      const bufferArray = Array.from(new Uint8Array(buffer)); // Convert Uint8Array to number[]
      const bufferString = String.fromCharCode.apply(null, bufferArray);

      const { attachments, body, subject, from, to, date, headers, ...rest } =
        parseMail(bufferString);

      setIoc(extractEntitiesFromEml(bufferString));
      setHeaderData(headers);
    };

    reader.readAsArrayBuffer(file);
  };

  return (
    <>
      <div className="p-4">
        <h1>Front End Only Mailparser</h1>
        <p className="my-6">
          Have you ever felt uneasy about uploading your emails to a server you
          don't fully trust? I sure did. It's like handing over your private
          correspondence to a stranger. That's why I decided to take matters
          into my own hands.
        </p>
        <p className="mb-8">
          With this frontend-only mail parser, there's no need to worry about
          your privacy. Thanks to{" "}
          <a
            href="https://proton.me/"
            className="text-pink-500 underline dark:visited:text-gray-400 visited:text-gray-500 hover:font-bold after:content-['_↗']"
          >
            ProtonMail's
          </a>{" "}
          <a
            className="text-pink-500 underline dark:visited:text-gray-400 visited:text-gray-500 hover:font-bold after:content-['_↗']"
            href="https://github.com/ProtonMail/jsmimeparser"
          >
            jsmimeparser
          </a>
          , you can enjoy the same email parsing experience right in your
          browser. No more sending your sensitive data to external servers.
          Everything stays safe and secure, right on your own system.
        </p>

        <input
          type="file"
          onChange={handleFileChange}
          className="block w-full text-sm text-slate-500
      file:mr-4 file:py-2 file:px-4
      file:rounded-full file:border-0
      file:text-sm file:font-semibold
      file:bg-violet-50 file:text-violet-700
      hover:file:bg-violet-100
    "
        />

        {Object.keys(headerData).length !== 0 && (
          <table className="mt-8">
            <thead>
              <tr className="border dark:border-white border-black">
                <th>Header</th>
                <th>Value</th>
              </tr>
            </thead>
            <tbody>
              {Object.entries(headerData).map(([key, value]) => (
                <tr key={key} className="border dark:border-white border-black">
                  <td>{key}</td>
                  <td>{value}</td>
                </tr>
              ))}
            </tbody>
          </table>
        )}
      </div>

      {Object.keys(ioc).length > 0 && (
        <div className="mt-8">
          <h2>IPs:</h2>
          <ul>
            {ioc.ips && ioc.ips.map((ip, index) => <li key={index}>{ip}</li>)}
          </ul>
          <h2>Emails:</h2>
          <ul>
            {ioc.emails &&
              ioc.emails.map((email, index) => <li key={index}>{email}</li>)}
          </ul>
          <h2>URLs:</h2>
          <ul>
            {ioc.urls &&
              ioc.urls.map((url, index) => <li key={index}>{url}</li>)}
          </ul>
        </div>
      )}
    </>
  );
};

export default MailParse;

Yeah, I know, it looks kinda ugly as-is—but hey, slap it into VSCode and let the prettifier work its magic.

Most of the heavy lifting here is courtesy of the library I used. The rest is just some plain ol’ regex doing its thing—filtering for indicators in the email header and body to make life easier for further investigation.

Conclusion

Short and sweet—that’s the vibe here. Sometimes, less is more, right? Feel free to use this tool wherever you like—internally, on the internet, or even on a spaceship. You can also try it out anytime directly on my website.

Don’t trust me? Totally fair. Open the website, yank out your internet connection, and voilà—it still works offline. No sneaky data sent to my servers, pinky promise.

As for my Astro.js setup, I include the “mailparse.tsx” like this:

---
import BaseLayout from "../../layouts/BaseLayout.astro";
import Mailparse from "../../components/mailparse";
---

<BaseLayout>
  <Mailparse client:only="react" />
</BaseLayout>

See you on the next one. Love you, byeeeee ✌️😘

2025-01-15

Scraproxy: A High-Performance Web Scraping API
After building countless web scrapers over the past 15 years, I decided it was time to create something truly versatile—a tool I could use for all my projects, hosted anywhere I needed it. That’s how Scraproxy was born: a high-performance web scraping API that leverages the power of Playwright and is built with FastAPI.
Scraproxy streamlines web scraping and automation by enabling browsing automation, content extraction, and advanced tasks like capturing screenshots, recording videos, minimizing HTML, and tracking network requests and responses. It even handles challenges like cookie banners, making it a comprehensive solution for any scraping or automation project.
Best of all, it’s free and open-source. Get started today and see what it can do for you. 🔥
👉 https://github.com/StasonJatham/scraproxy
Features
- Browse Web Pages: Gather detailed information such as network data, logs, redirects, cookies, and performance metrics.
- Screenshots: Capture live screenshots or retrieve them from cache, with support for full-page screenshots and thumbnails.
- Minify HTML: Minimize HTML content by removing unnecessary elements like comments and whitespace.
- Extract Text: Extract clean, plain text from HTML content.
- Video Recording: Record a browsing session and retrieve the video as a webm file.
- Reader Mode: Extract the main readable content and title from an HTML page, similar to “reader mode” in browsers.
- Markdown Conversion: Convert HTML content into Markdown format.
- Authentication: Optional Bearer token authentication using API_KEY.
Technology Stack
- FastAPI: For building high-performance, modern APIs.
- Playwright: For automating web browser interactions and scraping.
- Docker: Containerized for consistent environments and easy deployment.
- Diskcache: Efficient caching to reduce redundant scraping requests.
- Pillow: For image processing, optimization, and thumbnail creation.
Working with Scraproxy
Thanks to FastAPI it has full API documentation via Redoc
After deploying it like described on my GitHub page you can use it like so:
```
#!/bin/bash

# Fetch the JSON response from the API
json_response=$(curl -s "http://127.0.0.1:5001/screenshot?url=https://karl.fail")

# Extract the Base64 string using jq
base64_image=$(echo "$json_response" | jq -r '.screenshot')

# Decode the Base64 string and save it as an image
echo "$base64_image" | base64 --decode > screenshot.png

echo "Image saved as screenshot.png"
```
Make sure jq is installed
The API provides images in base64 format, so we use the native base64 command to decode it and save it as a PNG file. If everything went smoothly, you should now have a file named “screenshot.png”.
Keep in mind, this isn’t a full-page screenshot. For that, you’ll want to use this script:
```
#!/bin/bash

# Fetch the JSON response from the API
json_response=$(curl -s "http://127.0.0.1:5001/screenshot?url=https://karl.fail&full_page=true")

# Extract the Base64 string using jq
base64_image=$(echo "$json_response" | jq -r '.screenshot')

# Decode the Base64 string and save it as an image
echo "$base64_image" | base64 --decode > screenshot.png

echo "Image saved as screenshot.png"
```
Just add &full_page=true, and voilà! You’ll get a clean, full-page screenshot of the website.
The best part? You can run this multiple times since the responses are cached, which helps you avoid getting blocked too quickly.
Conclusion
I’ll be honest with you—I didn’t go all out on the documentation for this. But don’t worry, the code is thoroughly commented, and you can easily figure things out by taking a look at the app.py file.
That said, I’ve used this in plenty of my own projects as my go-to tool for fetching web data, and it’s been a lifesaver. Feel free to jump in, contribute, and help make this even better!
2025-01-08
Sandkiste.io – A Smarter Sandbox for the Web
As a principal incident responder, my team and I often face the challenge of analyzing potentially malicious websites quickly and safely. This work is crucial, but it can also be tricky, especially when it risks compromising our test environments. Burning through test VMs every time we need to inspect a suspicious URL is far from efficient.
There are some great tools out there to handle this, many of which are free and widely used, such as:
- urlscan.io – A tool for visualizing and understanding web requests.
- VirusTotal – Renowned for its file and URL scanning capabilities.
- Joe Sandbox – A powerful tool for detailed malware analysis.
- Web-Check – Another useful resource for URL scanning.
While these tools are fantastic for general purposes, I found myself needing something more tailored to my team’s specific needs. We needed a solution that was straightforward, efficient, and customizable—something that fit seamlessly into our workflows.
So, I decided to create it myself: Sandkiste.io. My goal was to build a smarter, more accessible sandbox for the web that not only matches the functionality of existing tools but offers the simplicity and flexibility we required for our day-to-day incident response tasks with advanced features (and a beautiful UI 🤩!).
Sandkiste.io is part of a larger vision I’ve been working on through my Exploit.to platform, where I’ve built a collection of security-focused tools designed to make life easier for incident responders, analysts, and cybersecurity enthusiasts. This project wasn’t just a standalone idea—it was branded under the Exploit.to umbrella, aligning with my goal of creating practical and accessible solutions for security challenges.
The Exploit.to logo
If you haven’t explored Exploit.to, it’s worth checking out. The website hosts a range of open-source intelligence (OSINT) tools that are not only free but also incredibly handy for tasks like gathering public information, analyzing potential threats, and streamlining security workflows. You can find these tools here: https://exploit.to/tools/osint/.
Technologies Behind Sandkiste.io: Building a Robust and Scalable Solution
Sandkiste.io has been, and continues to be, an ambitious project that combines a variety of technologies to deliver speed, reliability, and flexibility. Like many big ideas, it started small—initially leveraging RabbitMQ, custom Golang scripts, and chromedp to handle tasks like web analysis. However, as the project evolved and my vision grew clearer, I transitioned to my favorite tech stack, which offers the perfect blend of power and simplicity.
Here’s the current stack powering Sandkiste.io:
Django & Django REST Framework
At the heart of the application is Django, a Python-based web framework known for its scalability, security, and developer-friendly features. Coupled with Django REST Framework (DRF), it provides a solid foundation for building robust APIs, ensuring smooth communication between the backend and frontend.
Celery
For task management, Celery comes into play. It handles asynchronous and scheduled tasks, ensuring the system can process complex workloads—like analyzing multiple URLs—without slowing down the user experience. It is easily integrated into Django and the developer experience and ecosystem around it is amazing.
Redis
Redis acts as the message broker for Celery and provides caching support. Its lightning-fast performance ensures tasks are queued and processed efficiently. Redis is and has been my go to although I did enjoy RabbitMQ a lot.
PostgreSQL
For the database, I chose PostgreSQL, a reliable and feature-rich relational database system. Its advanced capabilities, like full-text search and JSONB support, make it ideal for handling complex data queries. The full-text search works perfect with Django, here is a very detailed post about it.
FastAPI
FastAPI adds speed and flexibility to certain parts of the system, particularly where high-performance APIs are needed. Its modern Python syntax and automatic OpenAPI documentation make it a joy to work with. It is used to decouple the Scraper logic, since I wanted this to be a standalone project called “Scraproxy“.
Playwright
For web scraping and analysis, Playwright is the backbone. It’s a modern alternative to Selenium, offering cross-browser support and powerful features for interacting with websites in a headless (or visible) manner. This ensures that even complex, JavaScript-heavy sites can be accurately analyzed. The killer feature is how easy it is to capture a video and record network activity, which are basically the two main features needed here.
React with Tailwind CSS and shadcn/ui
On the frontend, I use React for building dynamic user interfaces. Paired with TailwindCSS, it enables rapid UI development with a clean, responsive design. shadcn/ui (a component library based on Radix) further enhances the frontend by providing pre-styled, accessible components that align with modern design principles.
This combination of technologies allows Sandkiste.io to be fast, scalable, and user-friendly, handling everything from backend processing to an intuitive frontend experience. Whether you’re inspecting URLs, performing in-depth analysis, or simply navigating the site, this stack ensures a seamless experience. I also have the most experience with React and Tailwind 😁.
Features of Sandkiste.io: What It Can Do
Now that you know the technologies behind Sandkiste.io, let me walk you through what this platform is capable of. Here are the key features that make Sandkiste.io a powerful tool for analyzing and inspecting websites safely and effectively:
Certificate Lookups
One of the fundamental features is the ability to perform certificate lookups. This lets you quickly fetch and review SSL/TLS certificates for a given domain. It’s an essential tool for verifying the authenticity of websites, identifying misconfigurations, or detecting expired or suspicious certificates. We use it a lot to find possibly generated subdomains and to get a better picture of the adversary infrastructure, it helps with recon in general. I get the info from crt.sh, they offer an exposed SQL database for these lookups.
DNS Records
Another key feature of Sandkiste.io is the ability to perform DNS records lookups. By analyzing a domain’s DNS records, you can uncover valuable insights about the infrastructure behind it, which can often reveal patterns or tools used by adversaries.
DNS records provide critical information about how a domain is set up and where it points. For cybersecurity professionals, this can offer clues about:
- Hosting Services: Identifying the hosting provider or server locations used by the adversary.
- Mail Servers: Spotting potentially malicious email setups through MX (Mail Exchange) records.
- Subdomains: Finding hidden or exposed subdomains that may indicate a larger infrastructure or staging areas.
- IP Addresses: Tracing A and AAAA records to uncover the IP addresses linked to a domain, which can sometimes reveal clusters of malicious activity.
- DNS Security Practices: Observing whether DNSSEC is implemented, which might highlight the sophistication (or lack thereof) of the adversary’s setup.
By checking DNS records, you not only gain insights into the domain itself but also start piecing together the tools and services the adversary relies on. This can be invaluable for identifying common patterns in malicious campaigns or for spotting weak points in their setup that you can exploit to mitigate threats.
HTTP Requests and Responses Analysis
One of the core features of Sandkiste.io is the ability to analyze HTTP requests and responses. This functionality is a critical part of the platform, as it allows you to dive deep into what’s happening behind the scenes when a webpage is loaded. It reveals the files, scripts, and external resources that the website requests—many of which users never notice.
When you visit a webpage, the browser makes numerous background requests to load additional resources like:
- JavaScript files
- CSS stylesheets
- Images
- APIs
- Third-party scripts or trackers
These requests often tell a hidden story about the behavior of the website. Sandkiste captures and logs every requests. Every HTTP request made by the website is logged, along with its corresponding response. (Jup, we store the raw data as well). For security professionals, monitoring and understanding these requests is essential because:
- Malicious Payloads: Background scripts may contain harmful code or trigger the download of malware.
- Unauthorized Data Exfiltration: The site might be sending user data to untrusted or unexpected endpoints.
- Suspicious Third-Party Connections: You can spot connections to suspicious domains, which might indicate phishing attempts, tracking, or other malicious activities.
- Alerts for Security Teams: Many alerts in security monitoring tools stem from these unnoticed, automatic requests that trigger red flags.
Security Blocklist Check
The Security Blocklist Check is another standout feature of Sandkiste.io, inspired by the great work at web-check.xyz. The concept revolves around leveraging malware-blocking DNS servers to verify if a domain is blacklisted. But I took it a step further to make it even more powerful and insightful.
Instead of simply checking whether a domain is blocked, Sandkiste.io enhances the process by using a self-hosted AdGuard DNS server. This server doesn’t just flag blocked domains—it captures detailed logs to provide deeper insights. By capturing logs from the DNS server, Sandkiste.io doesn’t just say “this domain is blacklisted.” It identifies why it’s flagged and where the block originated, this enables me to assign categories to the domains. The overall scores tells you very quickly if the page is safe or not.
Video of the Session
One of the most practical features of Sandkiste.io is the ability to create a video recording of the session. This feature was the primary reason I built the platform—because a single screenshot often falls short of telling the full story. With a video, you gain a complete, dynamic view of what happens during a browsing session.
Static screenshots capture a single moment in time, but they don’t show the sequence of events that can provide critical insights, such as:
- Pop-ups and Redirects: Videos reveal if and when pop-ups appear or redirects occur, helping analysts trace how users might be funneled into malicious websites or phishing pages.
- Timing of Requests: Understanding when specific requests are triggered can pinpoint what actions caused them, such as loading an iframe, clicking a link, or executing a script.
- Visualized Responses: By seeing the full process—what loads, how it behaves, and the result—you get a better grasp of the website’s functionality and intent.
- Recreating the User Journey: Videos enable you to recreate the experience of a user who might have interacted with the target website, helping you diagnose what happened step by step.
A video provides a much clearer picture of the target website’s behavior than static tools alone.
How Sandkiste.io Works: From Start to Insight
Using Sandkiste.io is designed to be intuitive and efficient, guiding you through the analysis process step by step while delivering detailed, actionable insights.
You kick things off by simply starting a scan. Once initiated, you’re directed to a loading page, where you can see which tasks (or “workers”) are still running in the background.
This page keeps you informed without overwhelming you with unnecessary technical details.
The Results Page
Once the scan is complete, you’re automatically redirected to the results page, where the real analysis begins. Let’s break down what you’ll see here:
Video Playback
At the top, you’ll find a video recording of the session, showing everything that happened after the target webpage was loaded. This includes:
- Pop-ups and redirects.
- The sequence of loaded resources (scripts, images, etc.).
- Any suspicious behavior, such as unexpected downloads or external connections.
This video gives you a visual recap of the session, making it easier to understand how the website behaves and identify potential threats.
Detected Technologies
Below the video, you’ll see a section listing the technologies detected. These are inferred from response headers and other site metadata, and they can include:
- Web frameworks (e.g., Django, WordPress).
- Server information (e.g., Nginx, Apache).
This data is invaluable for understanding the website’s infrastructure and spotting patterns that could hint at malicious setups.
Statistics Panel
On the right side of the results page, there’s a statistics panel with several semi-technical but insightful metrics. Here’s what you can learn:
- Size Percentile:
  - Indicates how the size of the page compares to other pages.
  - Why it matters: Unusually large pages can be suspicious, as they might contain obfuscated code or hidden malware.
- Number of Responses:
  - Shows how many requests and responses were exchanged with the server.
  - Why it matters: A high number of responses could indicate excessive tracking, unnecessary redirects, or hidden third-party connections.
- Duration to “Network Idle”:
  - Measures how long it took for the page to fully load and stop making network requests.
  - Why it matters: Some pages continue running scripts in the background even after appearing fully loaded, which can signal malicious or resource-intensive behavior.
- Redirect Chain Analysis:
  - A list of all redirects encountered during the session.
  - Why it matters: A long chain of redirects is a common tactic in phishing, ad fraud, or malware distribution campaigns.
By combining these insights—visual evidence from the video, infrastructure details from detected technologies, and behavioral stats from the metrics—you get a comprehensive view of the website’s behavior. This layered approach helps security analysts identify potential threats with greater accuracy and confidence.
At the top of the page, you’ll see the starting URL and the final URL you were redirected to.
- “Public” means that others can view the scan.
- The German flag indicates that the page is hosted in Germany.
- The IP address shows the final server we landed on.
The party emoji signifies that the page is safe; if it weren’t, you’d see a red skull (spooky!). Earlier, I explained the criteria for flagging a page as good or bad.
On the “Responses” page I mentioned earlier, you can take a closer look at them. Here, you can see exactly where the redirects are coming from and going to. I’ve added a red shield icon to clearly indicate when HTTP is used instead of HTTPS.
As an analyst, it’s pretty common to review potentially malicious scripts. Clicking on one of the results will display the raw response safely. In the image below, I clicked on that long JavaScript URL (normally a risky move, but in Sandkiste, every link is completely safe!).
Conclusion
And that’s the story of Sandkiste.io, a project I built over the course of a month in my spare time. While the concept itself was exciting, the execution came with its own set of challenges. For me, the toughest part was achieving a real-time feel for the user experience while ensuring the asynchronous jobs running in the background were seamlessly synced back together. It required a deep dive into task coordination and real-time updates, but it taught me lessons that I now use with ease.
Currently, Sandkiste.io is still in beta and runs locally within our company’s network. It’s used internally by my team to streamline our work and enhance our incident response capabilities. Though it’s not yet available to the public, it has already proven its value in simplifying complex tasks and delivering insights that traditional tools couldn’t match.
Future Possibilities
While it’s an internal tool for now, I can’t help but imagine where this could go.
For now, Sandkiste.io remains a testament to what can be built with focus, creativity, and a drive to solve real-world problems. Whether it ever goes public or not, this project has been a milestone in my journey, and I’m proud of what it has already achieved. Who knows—maybe the best is yet to come!
2025-01-08
Certsplotting: Elevating Intelligence – Part 2
Disclaimer:
The information provided on this blog is for educational purposes only. The use of hacking tools discussed here is at your own risk.
For the full disclaimer, please click here.
Introduction
Welcome back to the second installment of our exploration into Certspotter and the world of passive reconnaissance. In Part 1, we laid the groundwork for understanding the significance of Certspotter as a vital tool in monitoring certificate transparency logs. We delved into the nuances of passive reconnaissance, highlighting the importance of discreet operations in gathering intelligence without alerting targets.
Now, in Part 2, we’re ready to dive even deeper. Building upon the foundation established in Part 1, we’ll explore advanced techniques for leveraging Certspotter’s capabilities to their fullest potential. Our focus will be on enriching the data obtained from Certspotter and enhancing our reconnaissance efforts through the integration of additional tools and methodologies.
Join me as I uncover the untapped potential of Certspotter and embark on a journey to uncover valuable insights that will inform and empower your hacking strategies. Let’s dive in and elevate our reconnaissance game to new heights.
Data Enrichment
So, you’ve already gathered a wealth of information about your target. But let’s take it a step further.
Here’s what you want to know:
- What’s running on the new subdomain?
- Any interesting paths?
- Open ports?
- Can we capture a screenshot?
- Are there any potential vulnerabilities?
- Perhaps you have a custom target, like specifically testing for WordPress.
Now, there might be a tool out there that handles all these tasks, but I haven’t found it yet. (Feel free to shoot me a message on Signal if you know one). Instead, I’ve decided to build a tool together with you, right here, right now, leveraging ProjectDiscovery’s Tools, which are awesome open-source projects written in one of my favorite languages: Go.
However, as we transition from passive to active reconnaissance, I must reiterate the importance of reading my disclaimer.
Web Technology:
For this task, we’ll use a tool called Webanalyze.
Bash
```
# Installation
go install -v github.com/rverton/webanalyze/cmd/webanalyze@latest

# Update
$HOME/go/bin/webanalyze -update
```
Now, a quick note: I’m not authorized to recon sandbox.google.com. If, by chance, any of my tools cause a denial of service state on the endpoint, I might be held liable for damages.
To demonstrate, I whitelisted my IP and scanned my own website:
Bash
```
$HOME/go/bin/webanalyze -host exploit.to -crawl 2
 :: webanalyze        : v0.3.9
 :: workers           : 4
 :: technologies      : technologies.json
 :: crawl count       : 2
 :: search subdomains : true
 :: follow redirects  : false

http://exploit.to (0.6s):
    HSTS,  (Security)
    HTTP/3,  (Miscellaneous)
    Cloudflare,  (CDN)
    Astro, 4.5.2 (Static site generator, JavaScript frameworks)
```
For further consumption I suggest using -output json and storing it locally or sending it to your central system.
Screenshot
For this task, we’ll utilize playwright. While some might argue that this is overkill, I have some future plans in mind. You can learn more about playwright here.
Bash
```
npm init playwright@latest
```
Simply respond with “yes” to all the prompts, as having a positive attitude is always beneficial.
Below is a script that captures a full-page screenshot and lists all the network calls made by a loaded page:
JavaScript
```
const { chromium } = require("playwright");

(async () => {
  // Launch browser
  const browser = await chromium.launch();

  // Create a new page
  const page = await browser.newPage();

  // Enable request interception
  await page.route("**", (route) => {
    console.log(route.request().url());
    route.continue();
  });

  // Navigate to the desired page
  await page.goto("https://exploit.to");

  // Take a full-page screenshot
  await page.screenshot({ path: "exploit.png", fullPage: true });

  // Close the browser
  await browser.close();
})();
```
Here’s how you can run the script and check its output:
Bash
```
sudo node screenshot.js

https://exploit.to/
https://exploit.to/_astro/styles.DS6QQjAg.css
https://exploit.to/_astro/hoisted.DfX8MIxs.js
https://exploit.to/_astro/page.BZ5QGxwt.js
https://exploit.to/_astro/ViewTransitions.astro_astro_type_script_index_0_lang.D0ayWLBG.js
https://exploit.to/_astro/index.Vl7qCdEu.js
https://exploit.to/_astro/CryptoBackground.c9l8WxZ_.js
https://exploit.to/_astro/client.B60e5CTm.js
https://exploit.to/cdn-cgi/challenge-platform/scripts/jsd/main.js
https://exploit.to/_astro/index.LHP-L4Pl.js
https://exploit.to/_astro/index.C3GvvkrT.js
https://exploit.to/_astro/jsx-runtime.BoiYzbTN.js
https://exploit.to/_astro/utils.xgzLAuTe.js
```
Open Ports
Understanding the open ports on a target system can provide valuable insights into its network architecture and potential vulnerabilities. To accomplish this, we’ll conduct a quick scan using nmap, a powerful network scanning tool.
Bash
```
sudo nmap -sS -Pn -T4 exploit.to
```
This command initiates a SYN scan (-sS) without host discovery (-Pn) at an aggressive timing level (-T4) against the target exploit.to.
Here’s a breakdown of the scan results:
Bash
```
Starting Nmap 7.94SVN ( https://nmap.org ) at 2024-03-25 15:14 CET
Nmap scan report for exploit.to (IP)
Host is up (0.10s latency).
Other addresses for exploit.to (not scanned): IP
Not shown: 996 filtered tcp ports (no-response)
PORT     STATE SERVICE
80/tcp   open  http
443/tcp  open  https
8080/tcp open  http-proxy
8443/tcp open  https-alt

Nmap done: 1 IP address (1 host up) scanned in 8.17 seconds
```
The scan reveals the following open ports:
- Port 80/tcp: Open for HTTP.
- Port 443/tcp: Open for HTTPS.
- Port 8080/tcp: Open for HTTP proxy.
- Port 8443/tcp: Open for alternate HTTPS.
Subdomains
Exploring subdomains can uncover hidden entry points and potential vulnerabilities within a target’s infrastructure. Let’s leverage Subfinder for passive subdomain enumeration and HTTPX for validation.
Bash
```
go install -v github.com/projectdiscovery/httpx/cmd/httpx@latest

go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
```
You can easily pipe the output of subfinder to httpx for further analysis:
Bash
```
$HOME/go/bin/subfinder -d sandbox.google.com | $HOME/go/bin/httpx -status-code -title -tech-detect
```
Here’s a basic setup, but you can fine-tune these flags extensively. Additionally, I recommend integrating free API Keys to enhance subdomain discovery.
In our hypothetical Google case, here are some findings:
Bash
```
https://ecc-test.sandbox.google.com [200] [ECC-capable Certificate Success] [HTTP/3]
https://dry.sandbox.google.com [404] [Error 404 (Not Found)!!1] [HTTP/3]
https://during.sandbox.google.com [404] [Error 404 (Not Found)!!1] [HTTP/3]
https://earth.sandbox.google.com [404] [Error 404 (Not Found)!!1] [HTTP/3]
https://cert-test.sandbox.google.com [200] [Test Success] [HTTP/3]
https://dynamite-preprod.sandbox.google.com [302] [] [HSTS,HTTP/3]
```
The tech detection capabilities are surprisingly robust. In my earlier site example, the results were as follows:
Bash
```
https://exploit.to [200] [Karl Machleidt | Cyber Security Expert] [Astro:4.5.2,Cloudflare,HSTS,HTTP/3]
```
Paths
Now, let’s delve into fuzzing some paths. While tools like Gobuster can handle both subdomain enumeration and directory enumeration, I’d like to showcase some different tools for this task.
For the wordlist, we’ll use Daniel Miessler’s SecLists common.txt.
Bash
```
gobuster dir  --useragent "EXPLOIT.TO" --wordlist "common.txt" --url https://exploit.to
```
Here’s a breakdown of the Gobuster scan results:
Bash
```
===============================================================
Gobuster v3.6
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@firefart)
===============================================================
[+] Url:                     https://exploit.to
[+] Method:                  GET
[+] Threads:                 10
[+] Wordlist:                common.txt
[+] Negative Status codes:   404
[+] User Agent:              EXPLOIT.TO
[+] Timeout:                 10s
===============================================================
Starting gobuster in directory enumeration mode
===============================================================
/.git/index           (Status: 308) [Size: 0] [--> /.git/]
/.well-known/http-opportunistic (Status: 200) [Size: 21]
/404                  (Status: 200) [Size: 17047]
/about                (Status: 308) [Size: 0] [--> /about/]
/blog                 (Status: 308) [Size: 0] [--> /blog/]
/contact              (Status: 308) [Size: 0] [--> /contact/]
/disclaimer           (Status: 308) [Size: 0] [--> /disclaimer/]
/feed                 (Status: 301) [Size: 0] [--> https://exploit.to/rss.xml]
/index                (Status: 308) [Size: 0] [--> /]
/index.html           (Status: 308) [Size: 0] [--> /]
/robots.txt           (Status: 200) [Size: 57]
/rss                  (Status: 301) [Size: 0] [--> https://exploit.to/rss.xml]
/search               (Status: 308) [Size: 0] [--> /search/]
/tags                 (Status: 308) [Size: 0] [--> /tags/]
/tools                (Status: 308) [Size: 0] [--> /tools/]
Progress: 4727 / 4727 (100.00%)
===============================================================
Finished
===============================================================
```
These results provide insights into various paths on the target site, facilitating potential avenues for further exploration and potential vulnerabilities.
Vulnerabilities
Vulnerability scanners are notorious for their loud presence, and we have several options at our disposal:
For this demonstration, I’ll opt for Nuclei, which simplifies custom discovery tasks significantly.
To install Nuclei, execute the following command:
Bash
```
go install -v github.com/projectdiscovery/nuclei/v3/cmd/nuclei@latest
```
Using Nuclei without specifying templates to use can generate excessive traffic. Here’s an example command specifying templates:
Bash
```
$HOME/go/bin/nuclei -target exploit.to -t http/cves/ -t ssl
```
Running Nuclei with all available templates can uncover a plethora of issues. However, be cautious, as this scan can be aggressive. Here’s an example scan of my website. Note that running such a scan on unauthorized targets is not recommended:
YAML
```
[nameserver-fingerprint] [dns] [info] exploit.to [jonah.ns.cloudflare.com.,uma.ns.cloudflare.com.]
[caa-fingerprint] [dns] [info] exploit.to
[dmarc-detect] [dns] [info] _dmarc.exploit.to ["v=DMARC1; p=reject; sp=reject; adkim=s; aspf=s; rua=mailto:[email protected];"]
[mx-fingerprint] [dns] [info] exploit.to [54 route1.mx.cloudflare.net.,84 route2.mx.cloudflare.net.,98 route3.mx.cloudflare.net.]
[txt-fingerprint] [dns] [info] exploit.to ["v=spf1 include:_spf.mx.cloudflare.net ~all"]
[spf-record-detect] [dns] [info] exploit.to [v=spf1 include:_spf.mx.cloudflare.net ~all"]
[dns-waf-detect:cloudflare] [dns] [info] exploit.to
[INF] Using Interactsh Server: oast.fun
[addeventlistener-detect] [http] [info] https://exploit.to
[xss-deprecated-header] [http] [info] https://exploit.to [1; mode=block]
[metatag-cms] [http] [info] https://exploit.to [Astro v4.5.2]
[tech-detect:cloudflare] [http] [info] https://exploit.to
[http-missing-security-headers:content-security-policy] [http] [info] https://exploit.to
[http-missing-security-headers:permissions-policy] [http] [info] https://exploit.to
[http-missing-security-headers:x-permitted-cross-domain-policies] [http] [info] https://exploit.to
[http-missing-security-headers:clear-site-data] [http] [info] https://exploit.to
[http-missing-security-headers:cross-origin-embedder-policy] [http] [info] https://exploit.to
[http-missing-security-headers:cross-origin-opener-policy] [http] [info] https://exploit.to
[http-missing-security-headers:cross-origin-resource-policy] [http] [info] https://exploit.to
[robots-txt-endpoint] [http] [info] https://exploit.to/robots.txt
[waf-detect:cloudflare] [http] [info] https://exploit.to/
[ssl-issuer] [ssl] [info] exploit.to:443 [Google Trust Services LLC]
[ssl-dns-names] [ssl] [info] exploit.to:443 [exploit.to]
```
Let’s break down some of the identified vulnerabilities from the Nuclei scan results:
1. nameserver-fingerprint [dns]: This vulnerability detection identifies the nameservers associated with the domain exploit.to, revealing that it is using Cloudflare’s nameservers (jonah.ns.cloudflare.com and uma.ns.cloudflare.com). While not necessarily a vulnerability, this information can be useful for reconnaissance purposes.
2. caa-fingerprint [dns]: This indicates the absence of CAA (Certificate Authority Authorization) records for the domain exploit.to. CAA records specify which certificate authorities are allowed to issue certificates for a domain. Lack of CAA records might imply less control over certificate issuance, potentially leaving the domain vulnerable to unauthorized certificate issuance.
3. dmarc-detect [dns]: This detection reveals the DMARC (Domain-based Message Authentication, Reporting, and Conformance) policy for the domain _dmarc.exploit.to. The policy specifies how a receiving mail server should handle emails that fail SPF (Sender Policy Framework) and DKIM (DomainKeys Identified Mail) checks. In this case, the policy is set to “reject,” indicating strict handling of failed authentication, which is generally considered good practice.
4. mx-fingerprint [dns]: This vulnerability detection identifies the mail servers (MX records) associated with the domain exploit.to, which are provided by Cloudflare. While not necessarily a vulnerability, this information can be useful for understanding the email infrastructure associated with the domain.
5. txt-fingerprint [dns]: This reveals the SPF (Sender Policy Framework) record for the domain exploit.to, specifying which servers are allowed to send emails on behalf of the domain. The record indicates that emails should be sent only from servers included in the _spf.mx.cloudflare.net include mechanism.
6. waf-detect:cloudflare [http]: This detection indicates the presence of a WAF (Web Application Firewall) provided by Cloudflare for the domain exploit.to. WAFs help protect web applications from common security threats such as SQL injection, cross-site scripting (XSS), and DDoS attacks.
7. ssl-issuer [ssl]: This reveals information about the SSL certificate issuer for the domain exploit.to, which is Google Trust Services LLC. SSL certificates issued by reputable authorities help establish secure HTTPS connections, ensuring data transmitted between the user’s browser and the web server remains encrypted and secure.
These are just a few examples of the vulnerabilities and configurations identified in the Nuclei scan results. Each of these findings provides valuable insights into potential security risks and areas for improvement in the domain’s infrastructure and configuration.
Custom
Now, let’s illustrate a simple example:
Bash
```
$HOME/go/bin/nuclei -target exploit.to -t http/honeypot/elasticpot-honeypot-detect.yaml
```
Imagine you’re interested in scanning for Elastipot, Elasticsearch honeypots. Identifying these honeypots beforehand can be crucial before launching any new zero-day attack on open Elasticsearch instances. While creating custom templates for such detections isn’t overly complicated, it allows you to tailor detection scripts to your specific needs. Alternatively, you can employ Gobuster, as mentioned earlier, to test for specific paths.
Recon Data
We’ve successfully gathered all the desired data:
- Identification of services running on new subdomains.
- Open ports analysis.
- Screenshot capture.
- Discovery of interesting paths.
- Identification of possible vulnerabilities.
- Custom targeting, such as explicit testing for WordPress.
We now know that our target is developing a new project, the technolgies used, possible vulnerabilities, interesting paths, have a screenshot and more.
Summary
We explored various reconnaissance techniques, from subdomain enumeration and directory scanning to vulnerability assessments and customized detections. Leveraging tools like Certspotter, Gobuster, Nuclei, and others, we gained profound insights into our target’s infrastructure and potential security vulnerabilities.
Our adventure began with an introduction to Certspotter, the pioneer in certificate transparency log monitoring. We dissected the significance of passive reconnaissance, emphasizing its discreet nature compared to active methods. With Certspotter, we learned how to continuously monitor for new subdomains and certificate registrations, all at minimal cost.
From envisioning scenarios of seizing control over freshly set up WordPress sites to stealthily infiltrating default credentials in Grafana or Jenkins installations, the possibilities for mischief are boundless. Armed with our newfound knowledge and toolkit, the next logical step involves automating these processes and integrating them into a centralized system for ongoing monitoring and analysis.
I am working on a Part 3. In the next part I want to combine all the tools to one final script that should be triggered whenever certspotter finds a new certificate:
- run dnsx
- run subfinder and httpx if wildcard else run httpx
- use playwright for screenshot and network traffic
- port scan
- maybe use httpx for path ? Otherwise gobuster Should also run alone based on domain, wildcard or subdomain input.
I want the output to be one final JSON I can then render on my website.
2025-01-06
Certsplotting: Exploiting Certificate Transparency for Mischief – Part 1
Disclaimer:
The information provided on this blog is for educational purposes only. The use of hacking tools discussed here is at your own risk.
For the full disclaimer, please click here.
Introduction
Certspotter stands as the original authority in certificate transparency log monitoring—a mouthful, indeed. Let’s dissect why you, as a hacker, should pay attention to it.
One of your primary maneuvers when targeting a system is reconnaissance, particularly passive reconnaissance. Unlike active reconnaissance, which directly engages the target, passive recon operates discreetly.
Passive recon involves employing tactics that evade triggering any alerts from the target. For instance, conducting a Google search about your target doesn’t tip them off. While technically they might detect someone from your area or country searching for them via Search Console, using a VPN and a private browser can easily circumvent this.
You can even explore their entire website using Google cache (just search for cache:your-target.com) or archive.org without exposing your IP or intentions to them. On the other hand, active recon tends to be more assertive, such as port scanning, which leaves traces in the target’s logs. Depending on their security measures and level of vigilance, they might notice and decide to block you.
If you were to scan my public IP, I’d promptly block you 😃.
But I digress. What if you could continuously and passively monitor your target for new subdomains, project developments, systems, or any other endeavors that require a certificate? Imagine being alerted right as they register it.
Now, you might wonder, “How much will that cost me?” Surprisingly, nothing but the electricity to power your server or whatever charges your cloud provider levies. With Certspotter, you can scrutinize every certificate issued to your target’s domains and subdomains.
What mischief can I stir?
Your mind is probably already concocting schemes, so here’s a scenario to fuel your imagination:
Imagine your target sets up a WordPress site requiring an admin password upon the first visit. You could swoop in ahead of them, seizing control of their server. (Sure, they might reinstall, but it’ll definitely ruffle their feathers 😏).
A bit sneakier? How about adding a covert admin account to a fresh Grafana or Jenkins installation, which might still be using default credentials upon release. Truly, you never know what you might uncover.
Setting up Certspotter
To begin, you’ll need a fresh Debian-based Linux distro. I’ll opt for Kali to simplify later use of other hacking tools. Alternatively, you can choose any Linux distribution to keep your image size compact.
Certspotter
Start by visiting their Certspotter GitHub. I strongly advise thoroughly reading their documentation to acquaint yourself with the tool.
Installation:
Bash
```
go install software.sslmate.com/src/certspotter/cmd/certspotter@latest
```
Next, create directories:
Bash
```
mkdir $HOME/.certspotter
mkdir $HOME/.certspotter/hooks.d # scripts
touch $HOME/.certspotter/watchlist # targets
```
The watchlist file is straightforward:
Bash
```
exploit.to
virus.malware.to
.bmw.de
```
Prefixing a domain with a . signifies monitoring the domain and all its subdomains. Without the prefix, Certspotter will monitor certificates matching the exact domain/subdomain.
I can anticipate your next thought—you want all the logs, don’t you? Since 2013, there have been 7,485,653,605 of them (Source), requiring substantial storage. If you’re undeterred, you’d need to modify this code here and rebuild Certspotter to bypass the watchlist and retrieve everything.
Now, let’s set up the systemd service. Here’s how mine looks:
Bash
```
sudo nano /etc/systemd/system/certspotter.service
```
You’ll need to adjust the paths unless your username is also karl:
Bash
```
[Unit]
Description=Certspotter Service
After=network.target

[Service]
Environment=HOME=/home/karl
Environment=CERTSPOTTER_CONFIG_DIR=/home/karl/.certspotter
Type=simple
ExecStart=/home/karl/go/bin/certspotter -verbose
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
```
Note: I’m currently not utilizing the -start_at_end flag. As a result, my script begins its operation from the initial point and might take a considerable amount of time to detect recently issued certificates. By modifying the line that begins with ExecStart= and adding the -start_at_end parameter to the certspotter command, you instruct the script to disregard previously issued certificates and commence monitoring from the current time onward.
To activate and check if it’s running, run this:
Bash
```
sudo systemctl daemon-reload
sudo systemctl start certspotter
sudo systemctl status certspotter
```
Now let us add a script in hooks.d:
Bash
```
touch $HOME/.certspotter/hooks.d/certspotter.sh
sudo chmod u+x $HOME/.certspotter/hooks.d/certspotter.sh
```
If you have issues with reading ENV, you might have to experiment with the permissions.
In cerstpotter.sh:
Bash
```
#!/bin/bash

if [ -z "$EVENT" ] || [ "$EVENT" != 'discovered_cert' ]; then
    # no event
    exit 0
fi

DNS=$(cut -d "=" -f2 <<< "$SUBJECT_DN")
IP="$(dig "$DNS" A +short | grep -v '\.$' | head -n 1 | tr -d '\n')"
IP6="$(dig "$DNS" AAAA +short | grep -v '\.$' | head -n 1 | tr -d '\n')"

JSON_FILE_DATA=$(cat "$JSON_FILENAME")
dns_names=$(echo "$JSON_FILE_DATA" | jq -r '.dns_names | join("\n")')

JSON_DATA=$(cat <<EOF
{
    "pubkey": "$PUBKEY_SHA256",
    "watch_item": "$WATCH_ITEM",
    "not_before": "$NOT_BEFORE_RFC3339",
    "not_after": "$NOT_AFTER_RFC3339",
    "dns_names": "$dns_names",
    "issuer": "$ISSUER_DN",
    "asn": "$ASN",
    "ipv4": "$IP",
    "ipv6": "$IP6",
    "cn": "$SUBJECT_DN",
    "crt.sh": "https://crt.sh/?sha256=$CERT_SHA256"
}
EOF
)

# post data to br... might do somethign with answer
response=$(curl -s -X POST -H "Content-Type: application/json" \
    -H "Content-Type: application/json" \
    -d "$JSON_DATA" \
    "http://10.102.0.11:8080/api/v1/certspotter/in")
```
You could edit this to your liking. The data should look like this:
JSON
```
{
  "pubkey": "ca4567a91cfe51a2771c14f1462040a71d9b978ded9366fe56bcb990ae25b73d",
  "watch_item": ".google.com",
  "not_before": "2023-11-28T14:30:55Z",
  "not_after": "2024-01-09T14:30:54Z",
  "dns_names": ["*.sandbox.google.com"],
  "isssuer": "C=US, O=Google Trust Services LLC, CN=GTS CA 1C3",
  "asn": "GOOGLE,US",
  "ipv4": "142.250.102.81",
  "ipv6": "2a00:1450:4013:c00::451",
  "cn": "CN=*.sandbox.google.com",
  "crt.sh": "https://crt.sh/?sha256=cb657858d9fb6475f20ed5413d06da261be20951f6f379cbd30fe6f1e2558f01"
}
```
Depending on your target, it will take a while until you see results. Maybe even days.
Summary
In this first part of our exploration into Certspotter, we’ve laid the groundwork for understanding its significance in passive reconnaissance. Certspotter emerges as a pivotal tool in monitoring certificate transparency logs, enabling hackers to gather crucial intelligence without alerting their targets.
We’ve delved into the distinction between passive and active reconnaissance, emphasizing the importance of discreet operations in avoiding detection. Through Certspotter, hackers gain the ability to monitor target domains and subdomains continuously, staying informed about new developments and potential vulnerabilities.
As we conclude Part 1, we’ve only scratched the surface of what Certspotter has to offer. In Part 2, we’ll dive deeper into advanced techniques for leveraging Certspotter’s capabilities, exploring tools to enrich our data and enhance our reconnaissance efforts. Stay tuned for an in-depth exploration of Certspotter’s potential in uncovering valuable insights for hackers.
For Part 2 go this way -> Here
2025-01-06
Master Google Dorking: A Guide for Beginners – Part 1
Disclaimer:
The information provided on this blog is for educational purposes only. The use of hacking tools discussed here is at your own risk.
For the full disclaimer, please click here.
Introduction
Ever found yourself deep in the abyss of the internet, wishing you could uncover more than what’s on the surface? If so, Google Hacking, also known as Google Dorking, might just be your next favorite hobby. This amusing and surprisingly potent skill will turn you into an internet sleuth, uncovering secrets like a digital Sherlock Holmes. By the end of this article, you’ll be ready to create your own Google dorks and impress (or mildly concern) your friends with your newfound abilities.
If you’re interested in OSINT in general you can also check out my other articles:
- OSINT – Dorking with Brave Search
- Tool supported OSINT with Spiderfoot
What is Google Hacking?
Google Hacking , whcih is also called Google Dorking, is the playful art of using Google’s search engine to uncover sensitive information that wasn’t meant for public eyes. From personal data and financial info to website security flaws, Google Hacking can reveal it all. But don’t panic—it’s perfectly legal as long as you don’t misuse the info you stumble upon.
To break it down a bit, Google Hacking isn’t some kind of sorcery. It’s about finding anything that’s been indexed by Google or other major search engines. With the right search queries, you can dig up info that’s not ranking high on Google—often the kind of stuff that wasn’t meant to be easily found. So go ahead, have fun, and happy Googling (responsibly)!
Why the Term “Dorking”?
“Dork” in this context refers to a set of search parameters that expose unprotected information. Think of it as a key that unlocks hidden doors on the internet. The term “dorking” might sound silly, but the results can be pretty serious.
Tools of the Trade
Before we dive into the nitty-gritty, let’s talk about the essential tools and resources you’ll need:
1. Google Advanced Search Operators: These are special commands you can use in Google’s search bar to filter results more precisely. You can find a comprehensive list of these operators on Ahrefs’ blog.
2. Google Hacking Database (GHDB): A treasure trove of pre-made Google dorks. Check out the database on Exploit-DB to see what others have discovered.
3. Alternative Search Engines: Bing, DuckDuckGo, and Startpage also offer advanced search capabilities. Explore their documentation on Bing (and this), DuckDuckGo, and Startpage.
4. OSINT Tools: Tools like Pentest-Tools and IntelTechniques can enhance your search capabilities.
Pro tip: You can use Dorks from Exploit-DB to play around with them and create new dorks focued on your target or niche.
The Basics of Google Dorking
I will focus on Google here as it is the biggest search engine and will usually give you some solid results. Let’s start with some simple Google search operators:
1. site: – Restrict results to a specific website.
  - Example: site:example.com
2. filetype: – Search for specific file types.
  - Example: filetype:pdf
3. inurl: – Find URLs containing specific text.
  - Example: inurl:login
4. intitle: – Search for page titles containing specific text.
  - Example: intitle:index.of
Combining these operators can yield powerful results. For instance, to find login pages on example.com, you could use: site:example.com inurl:login
Let’s do another example searching a webiste for a contact email address (or to send them phishing mails (pls don’t)): "@cancom.de" site:"cancom.de"
Useful dorks to get started
Now for some fun! Here are a few beginner-friendly dorks, please feel free to copy and modify them to your liking:
1. Finding Open Directories:
  - intitle:"index of" "parent directory"
2. Discovering Public Cameras:
  - intitle:"Live View / - AXIS"
3. Uncovering Interesting PDFs:
  - filetype:pdf "confidential"
4. Locating Forgotten Passwords:
  - filetype:log inurl:"password"
Creating Your Own Dorks
Creating your own Google dorks is like cooking a new dish—start with the basics and experiment. Here’s a step-by-step guide:
1. Identify Your Target: Decide what type of information you’re seeking. Is it emails, passwords, or hidden directories?
2. Choose the Right Operators: Based on your target, select appropriate search operators.
  - Example: To find Excel files with passwords, you might use filetype:xls inurl:password.
3. Test and Refine: Enter your dork into Google and see what comes up. Refine your search terms to get more relevant results.
4. Document Your Findings: Keep a record of effective dorks for future reference. You never know when you might need them again!
You can combine many operators to refine your results.
Final Thoughts
Hooray! You’ve officially unlocked the secrets of Google Dorking. Get ready to dive deeper in the next part, where I’ll dish out more details and examples about other search engines and why they’re worth your time too. But before we move on, here are a few ways to flex your new skills:
- Become a Digital Bounty Hunter: Track down elusive individuals like a pro (In the U.S. you can check your states State Trooper website fpr active bounties).
- Debt Detective: Find those who owe you money faster than a speeding algorithm.
- Hack the Planet: Discover websites with vulnerable software
- Doxing – Beyond the usual “it’s illegal” disclaimer, doxing can irreversibly ruin someone’s life. Trust me, no matter how much you dislike someone, you do not want to go down this path.
- Find pirated Software
Stay tuned, because your Google Dorking journey is just getting started!
2025-01-06