Tag: code

  • Why HedgeDoc Reigns as the King of Self-Hosted Note-Taking Apps

    Why HedgeDoc Reigns as the King of Self-Hosted Note-Taking Apps

    This is going to be a bold, highly opinionated take on how note-taking apps should be. For the non-technical folks, discussing text editors and note-taking apps with IT people is like walking straight into a heated geopolitical debate at the family Thanksgiving table—it’s passionate, intense, and probably never-ending. Gobble Gobble.

    I have tested a lot of note taking apps:

    There are probably even more apps I have used in the past, but these are the ones that left a lasting impression on me. First off, let me just say—I love taking notes in Markdown. Any app that doesn’t support Markdown is pretty much useless to me. I’m so much faster at writing styled notes this way, without the hassle of clicking around or memorizing weird shortcut commands.

    For me, HedgeDoc hit the sweet spot. It’s got just the right features and just the right amount of organization. I’m not looking for an app to micromanage my entire life—I just want to take some damn notes!

    Live editing has also become a game-changer for me. I often have multiple screens open, sometimes even on different networks, and being instantly up-to-date while copy-pasting seamlessly between them is invaluable. Before HedgeDoc, I was using Obsidian synced via Nextcloud, but that was neither instant nor reliable on many networks.

    And let’s talk about security. With HedgeDoc, it’s a breeze. Their authorization system is refreshingly simple, and backing up your notes is as easy as clicking a button. You get a ZIP file with all your Markdown documents, which you could technically use with other editors—but why would you? HedgeDoc feels like it was made for you, and honestly, you’ll feel the love right back.

    I run HedgeDoc inside a container on my server, and it’s rock-solid. It just works. No excessive resource use, no drama—just a tool that quietly does its job.

    Now, let’s dive in! I’m going to show you how to host HedgeDoc yourself. Let’s get started!

    Prerequisites

    Here’s what you’ll need to get started:

    • A Linux distribution: Any modern Linux distro that supports Docker will work, but for today, we’ll go with Alpine.
    • A server with a public IP address: While not strictly mandatory, this is highly recommended if you want to access your note-taking app from anywhere.
    • A reverse proxy: Something like Caddy or Nginx to handle HTTPS and make your setup accessible and secure.

    Got all that? Great—let’s get started!

    Setup

    Here’s a handy script to install Docker on a fresh Alpine setup:

    init.sh
    #!/bin/sh
    
    # Exit on any error
    set -e
    
    echo "Updating repositories and installing prerequisites..."
    cat <<EOF > /etc/apk/repositories
    http://dl-cdn.alpinelinux.org/alpine/latest-stable/main
    http://dl-cdn.alpinelinux.org/alpine/latest-stable/community
    EOF
    
    apk update
    apk add --no-cache curl openrc docker docker-compose
    
    echo "Configuring Docker to start at boot..."
    rc-update add docker boot
    service docker start
    
    echo "Verifying Docker installation..."
    docker --version
    if [ $? -ne 0 ]; then
        echo "Docker installation failed!"
        exit 1
    fi
    
    echo "Verifying Docker Compose installation..."
    docker-compose --version
    if [ $? -ne 0 ]; then
        echo "Docker Compose installation failed!"
        exit 1
    fi
    
    echo "Docker and Docker Compose installed successfully!"

    To make the script executable and run it, follow these steps:

    Bash
    chmod +x init.sh
    ./init.sh

    If everything runs without errors, Docker should now be installed and ready to go. 🎉

    To install HedgeDoc, we’ll follow the steps from their official documentation. It’s straightforward and easy

    I prefer to keep all my environment variables and secrets neatly stored in .env files, separate from the actual Compose file.

    .env
    POSTGRES_USER=hedgedoctor
    POSTGRES_PASSWORD=super_secure_password
    POSTGRES_DB=hedgedoc
    
    CMD_DB_URL=postgres://hedgedoctor:super_secure_password@database:5432/hedgedoc
    CMD_ALLOW_FREEURL=true
    CMD_DOMAIN=docs.yourdomain.de
    CMD_PROTOCOL_USESSL=true
    CMD_ALLOW_ANONYMOUS=false
    CMD_ALLOW_EMAIL_REGISTER=true # <- remove after you registered

    To keep things secure, it’s a good idea to set CMD_ALLOW_ANONYMOUS to false, so anonymous users can’t edit your documents. For added security, you can create your own account and then disable CMD_ALLOW_EMAIL_REGISTER to prevent outsiders from signing up, effectively locking down HedgeDoc.

    One great benefit of using the env_file directive in your Docker Compose setup is that it keeps your Compose files clean and tidy:

    docker-compose.yml
    services:
      database:
        image: postgres:13.4-alpine
        env_file:
          - .env
        volumes:
          - database:/var/lib/postgresql/data
        restart: always
    
      app:
        image: quay.io/hedgedoc/hedgedoc:latest
        env_file:
          - .env
        volumes:
          - uploads:/hedgedoc/public/uploads
        ports:
          - "3000:3000"
        restart: always
        depends_on:
          - database
    
    volumes:
      database:
      uploads:

    After running docker compose up -d, you should be all set! This setup assumes you already have a reverse proxy configured and pointing to the public domain where you’re hosting your HedgeDoc. If you need help setting that up, I’ve written a guide on it in another blog post.

    Keep in mind, with the settings in the .env file above, HedgeDoc won’t work unless it’s served via HTTPS through the reverse proxy using the domain you specified.

    Once everything’s in place, you should see the HedgeDoc login screen and be able to “Register” your account:

    Don’t forget to head back to your .env file and comment out that specific line once you’re done:

    .env
    ...
    # CMD_ALLOW_EMAIL_REGISTER=true # <- remove after you registered

    This ensures that no one else can create accounts on your HedgeDoc instance.

    Personally, I always set my notes to “Private” (you can do this in the top right). That way, even if I decide to let others use the instance later, I don’t have to worry about any old notes where I might have called them a stinky doodoo face (as one does):

    You can still share your documents with others, but you’ll need to change the setting to “Locked.” Anything more restrictive will prevent people from viewing your notes.

    Imagine sending your crush a beautifully crafted, markdown-styled love letter, only for them to get blocked because of your overly strict settings. Yeah… couldn’t be me.

    Conclusion

    I conclude —our notes are ready, no need for more WordPress blog posts. Now it’s time to hit the gym because it’s chest day, and let’s be honest, chest day is the best day! 💪

  • Effortless Cron Job Monitoring: A Guide to Self-Hosting with Healthchecks.io

    Effortless Cron Job Monitoring: A Guide to Self-Hosting with Healthchecks.io

    Do you ever find yourself lying awake at night, staring at the ceiling, wondering if your beloved cronjobs ran successfully? Worry no more! Today, we’re setting up a free, self-hosted solution to ensure you can sleep like a content little kitten 🐱 from now on.

    I present to you Healthchecks.io. According to their website:

    Simple and Effective Cron Job Monitoring

    We notify you when your nightly backups, weekly reports, cron jobs, and scheduled tasks don’t run on time.

    How to monitor any background job:

    1. On Healthchecks.io, generate a unique ping URL for your background job.
    2. Update your job to send an HTTP request to the ping URL every time the job runs.
    3. When your job does not ping Healthchecks.io on time, Healthchecks.io alerts you!

    Today, we’re taking the super easy, lazy-day approach by using their Docker image. They’ve provided a well-documented, straightforward guide for deploying it right here: Running with Docker.

    What I love most about Healthchecks.io? It’s built on Django, my all-time favorite Python web framework. Sorry, FastAPI—you’ll always be cool, but Django has my heart!

    Prerequisites:

    1. A Server: You’ll need a server to host your shiny new cronjob monitor. A Linux distro is ideal.
    2. Docker & Docker Compose: Make sure these are installed. If you’re not set up yet, here’s the guide.
    3. Bonus Points: Having a domain or subdomain, along with a public IP, makes it accessible for all your systems.

    You can run this on your home network without any hassle, although you might not be able to copy and paste all the code below.

    Need a free cloud server? Check out Oracle’s free tier—it’s a decent option to get started. That said, in my experience, their free servers are quite slow, so I wouldn’t recommend them for anything mission-critical. (Not sponsored, pretty sure they hate me 🥺.)

    Setup

    I’m running a Debian LXC container on my Proxmox setup with the following specs:

    • CPU: 1 core
    • RAM: 1 GB
    • Swap: 1 GB
    • Disk: 10 GB (NVMe SSD)

    After a month of uptime, these are the typical stats: memory usage stays pretty consistent, and the boot disk is mostly taken up by Docker and the image. As for the CPU? It’s usually just sitting there, bored out of its mind.

    First, SSH into your server, and let’s get started by creating a .env file to store all your configuration variables:

    .env
    PUID=1000
    PGID=1000
    APPRISE_ENABLED=True
    TZ=Europe/Berlin
    SITE_ROOT=https://ping.yourdomain.de
    SITE_NAME=Healthchecks
    ALLOWED_HOSTS=ping.yourdomain.de
    CSRF_TRUSTED_ORIGINS=https://ping.yourdomain.de
    DEBUG=False
    SECRET_KEY=your-secret-key

    In your .env file, enter the domain you’ll use to access the service. I typically go with something simple, like “ping” or “cron” as a subdomain. If you want to explore more configuration options, you can check them out here.

    For my setup, this basic configuration does the job perfectly.

    To generate secret keys, I usually rely on the trusty openssl command. Here’s how you can do it:

    Bash
    openssl rand -base64 64
    docker-compose.yml
    services:
      healthchecks:
        image: lscr.io/linuxserver/healthchecks:latest
        container_name: healthchecks
        env_file:
          - .env
        volumes:
          - ./config:/config
        ports:
          - 8083:8000
        restart: unless-stopped

    All you need to do now is run:

    Bash
    docker compose -up

    That’s it—done! 🎉

    Oh, and by the way, I’m not using the original image for this. Instead, I went with the Linuxserver.io variant. There is no specific reason for this —just felt like it! 😄

    Important!

    Unlike the Linuxserver.io guide, I skipped setting the superuser credentials in the .env file. Instead, I created the superuser manually with the following command:

    Bash
    docker compose exec healthchecks python /app/healthchecks/manage.py createsuperuser

    This allows you to set up your superuser interactively and securely directly within the container.

    If you’re doing a standalone deployment, you’d typically set up a reverse proxy to handle SSL in front of Healthchecks.io. This way, you avoid dealing with SSL directly in the app. Personally, I use a centralized Nginx Proxy Manager running on a dedicated machine for all my deployments. I’ve even written an article about setting it up with SSL certificates—feel free to check that out!

    Once your site is served through the reverse proxy over the domain you specified in the configuration, you’ll be able to access the front end using the credentials you created with the createsuperuser command.

    There are plenty of guides for setting up reverse proxies, and if you’re exploring alternatives, I’m also a big fan of Caddy—it’s simple, fast, and works like a charm!

    Here is a finished Docker Compose file with Nginx Proxy Manager:

    docker-compose.yml
    services:
      npm:
        image: 'jc21/nginx-proxy-manager:latest'
        container_name: nginx-proxy-manager
        restart: unless-stopped
        ports:
          - '443:443'
          - '81:81'
        volumes:
          - ./npm/data:/data
          - ./npm/letsencrypt:/etc/letsencrypt
    
      healthchecks:
        image: lscr.io/linuxserver/healthchecks:latest
        container_name: healthchecks
        env_file:
          - .env
        volumes:
          - ./healthchecks/config:/config
        restart: unless-stopped

    In Nginx Proxy Manager your proxied host would be “http://healthchecks:8000”

    If you did not follow my post you will need to expose port 80 on the proxy as well for “regular” Let’s Encrypt certificates without DNS challenge.

    Healthchecks.io

    If you encounter any errors while trying to access the UI of your newly deployed Healthchecks, the issue is most likely related to the settings in your .env file. Double-check the following to ensure they match your domain configuration:

    .env
    SITE_ROOT=https://ping.yourdomain.de
    ALLOWED_HOSTS=ping.yourdomain.de
    CSRF_TRUSTED_ORIGINS=https://ping.yourdomain.de

    Once you’re in, the first step is to create a new project. After that, let’s set up your first simple check.

    For this example, I’ll create a straightforward uptime monitor for my WordPress host. I’ll set up a cronjob that runs every hour and sends an “alive” ping to my Healthchecks.io instance.

    The grace period is essential to account for high latency. For instance, if my WordPress host is under heavy load, an outgoing request might take a few extra seconds to complete. Setting an appropriate grace period ensures that occasional delays don’t trigger false alerts.

    I also prefer to “ping by UUID”. Keeping these endpoints secret is crucial—if someone else gains access to your unique ping URL, they could send fake pings to your Healthchecks.io instance, causing you to miss real downtimes.

    Click on the Usage Example button in your Healthchecks.io dashboard to find ready-to-use, copy-paste snippets for various languages and tools. For this setup, I’m going with bash:

    Bash
    curl -m 10 --retry 5 https://ping.yourdomain.de/ping/67162f7b-5daa-4a31-8667-abf7c3e604d8
    • -m sets the max timeout to 10 seconds. You can change the value but do not leave this out!
    • –retry says it should retry the request 5 times before aborting.

    Here’s how you can integrate it into a crontab:

    Bash
    # A sample crontab entry. Note the curl call appended after the command.
    # FIXME: replace "/your/command.sh" below with the correct command!
    0 * * * * /your/command.sh && curl -fsS -m 10 --retry 5 -o /dev/null https://ping.yourdomain.de/ping/67162f7b-5daa-4a31-8667-abf7c3e604d8
    

    To edit your crontab just run:

    Bash
    crontab -e

    The curl command to Healthchecks.io will only execute if command.sh completes successfully without any errors. This ensures that you’re notified only when the script runs without issues.

    After you ran that command, your dashboard should look like this:

    Advanced Checks

    While this is helpful, you might often need more detailed information, such as whether the job started but didn’t finish or how long the job took to complete.

    Healthchecks.io provides all the necessary documentation built right into the platform. You can visit /docs/measuring_script_run_time/ on your instance to find fully functional examples.

    Bash
    #!/bin/sh
    
    RID=`uuidgen`
    CHECK_ID="67162f7b-5daa-4a31-8667-abf7c3e604d8"
    
    # Send a start ping, specify rid parameter:
    curl -fsS -m 10 --retry 5 "https://ping.yourdomain.de/ping/$CHECK_ID/start?rid=$RID"
    
    # Put your command here
    /usr/bin/python3 /path/to/a_job_to_run.py
    
    # Send the success ping, use the same rid parameter:
    curl -fsS -m 10 --retry 5 "https://ping.yourdomain.de/ping/$CHECK_ID?rid=$RID"

    As you can see here this will give me the execution time as well:

    Here, I used a more complex cron expression. To ensure it works as intended, I typically rely on Crontab.guru for validation. You can use the same cron expression here as in your local crontab. The grace period depends on how long you expect the job to run; in my case, 10 seconds should be sufficient.

    Notifications

    You probably don’t want to find yourself obsessively refreshing the dashboard at 3 a.m., right? Ideally, you only want to be notified when something important happens.

    Thankfully, Healthchecks.io offers plenty of built-in notification options. And for even more flexibility, we enabled Apprise in the .env file earlier, unlocking a huge range of additional integrations.

    For notifications, I usually go with Discord or Node-RED, since they work great with webhook-based systems.

    While you could use Apprise for Discord notifications, the simplest route is to use the Slack integration. Here’s the fun part: Slack and Discord webhooks are fully compatible, so you can use the Slack integration to send messages directly to your Discord server without any extra configuration!

    This way, you’re only disturbed when something really needs your attention—and it’s super easy to set up.

    Discord already provides an excellent Introduction to Webhooks that walks you through setting them up for your server, so I won’t dive into the details here.

    All you need to do is copy the webhook URL from Discord and paste it into the Slack integration’s URL field in Healthchecks.io. That’s it—done! 🎉

    With this simple setup, you’ll start receiving notifications directly in your Discord server whenever something requires your attention. Easy and effective!

    On the Discord side it will look like this:

    With this setup, you won’t be bombarded with notifications every time your job runs. Instead, you’ll only get notified if the job fails and then again when it’s back up and running.

    I usually prefer creating dedicated channels for these notifications to keep things organized and avoid spamming anyone:

    EDIT:

    I ran into some issues with multiple Slack notifications in different projects. If you get 400 errors just use Apprise. The Discord URL would look like this:

    discord://{WebhookID}/{WebhookToken}/
    
    for example:
    
    discord://13270700000000002/V-p2SweffwwvrwZi_hc793z7cubh3ugi97g387gc8svnh

    Status Badges

    In one of my projects, I explained how I use SVG badges to show my customers whether a service is running.

    Here’s a live badge (hopefully it’s still active when you see this):

    bearbot

    Getting these badges is incredibly easy. Simply go to the “Badges” tab in your Healthchecks.io dashboard and copy the pre-generated HTML to embed the badge on your website. If you’re not a fan of the badge design, you can create your own by writing a custom JavaScript function to fetch the status as JSON and style it however you like.

    Here is a code example:

    HTML
    <style>
        .badge {
            display: inline-block;
            padding: 10px 20px;
            border-radius: 5px;
            color: white;
            font-family: Arial, sans-serif;
            font-size: 16px;
            font-weight: bold;
            text-align: center;
            box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
            transition: background-color 0.3s ease;
        }
        .badge.up {
            background-color: #28a745; /* Green for "up" */
        }
        .badge.down {
            background-color: #dc3545; /* Red for "down" */
        }
        .badge.grace {
            background-color: #ffc107; /* Yellow for "grace" */
        }
    </style>
    </head>
    <body>
    <div id="statusBadge" class="badge">Loading...</div>
    
    <script>
        async function updateBadge() {
            // replace this 
            const endpoint = "https://ping.yourdmain.de/badge/XXXX-XXX-4ff6-XXX-XbS-2.json"
            const interval = 60000
            // ---
            
            try {
                const response = await fetch(endpoint);
                const data = await response.json();
    
                const badge = document.getElementById('statusBadge');
                badge.textContent = `Status: ${data.status.toUpperCase()} (Total: ${data.total}, Down: ${data.down})`;
    
                badge.className = 'badge';
                if (data.status === "up") {
                    badge.classList.add('up');
                } else if (data.status === "down") {
                    badge.classList.add('down');
                } else if (data.status === "grace") {
                    badge.classList.add('grace');
                }
            } catch (error) {
                console.error("Error fetching badge data:", error);
                const badge = document.getElementById('statusBadge');
                badge.textContent = "Error fetching data";
                badge.className = 'badge down';
            }
        }
    
        updateBadge();
        setInterval(updateBadge, interval);
    </script>
    </body>

    The result:

    It might not look great, but the key takeaway is that you can customize the style to fit seamlessly into your design.

    Conclusion

    We’ve covered a lot of ground today, and I hope you now have a fully functional Healthchecks.io setup. No more sleepless nights worrying about whether your cronjobs ran successfully!

    So, rest easy and sleep tight, little kitten 🐱—your cronjobs are in good hands now.

  • Bearbot.dev – The Final Form

    Bearbot.dev – The Final Form

    I wanted to dedicate a special post to the new version of Bearbot. While I’ve already shared its history in a previous post, it really doesn’t capture the full extent of the effort I’ve poured into this update.

    The latest version of Bearbot boasts a streamlined tech stack designed to cut down on tech debt, simplify the overall structure, and laser-focus on its core mission: raking in those sweet stock market gains 🤑.

    Technologie

    • Django (Basic, not DRF): A high-level Python web framework that simplifies the development of secure and scalable web applications. It includes built-in tools for routing, templates, and ORM for database interactions. Ideal for full-stack web apps without a heavy focus on APIs.
    • Postgres: A powerful, open-source relational database management system known for its reliability, scalability, and advanced features like full-text search, JSON support, and transactions.
    • Redis: An in-memory data store often used for caching, session storage, and real-time messaging. It provides fast read/write operations, making it perfect for performance optimization.
    • Celery: A distributed task queue system for managing asynchronous tasks and scheduling. It’s commonly paired with Redis or RabbitMQ to handle background jobs like sending emails or processing data.
    • Bootstrap 5: A popular front-end framework for designing responsive, mobile-first web pages. It includes pre-designed components, utilities, and customizable themes.
    • Docker: A containerization platform that enables the packaging of applications and their dependencies into portable containers. It ensures consistent environments across development, testing, and production.
    • Nginx: A high-performance web server and reverse proxy server. It efficiently handles HTTP requests, load balancing, and serving static files for web applications.

    To streamline my deployments, I turned to Django Cookiecutter, and let me tell you—it’s been a game changer. It’s completely transformed how quickly I can get a production-ready Django app up and running.

    For periodic tasks, I’ve swapped out traditional Cron jobs in favor of Celery. The big win here? Celery lets me manage all asynchronous jobs directly from the Django Admin interface, making everything so much more efficient and centralized.

    Sweet, right ?

    Features

    Signals

    At its core, this is Bearbot’s most important feature—it tells you what to trade. To make it user-friendly, I added search and sort functionality on the front end. This is especially handy if you have a long list of signals, and it also improves the mobile experience. Oh, and did I mention? Bearbot is fully responsive by design.

    I won’t dive into how these signals are calculated or the reasoning behind them—saving that for later.

    Available Options

    While you’ll likely spend most of your time on the Signals page, the Options list is there to show what was considered for trading but didn’t make the cut.

    Data Task Handling

    Although most tasks can be handled via the Django backend with scheduled triggers, I created a more fine-tuned control system for data fetching. For example, if fetching stock data fails for just AAPL, re-running the entire task would unnecessarily stress the server and APIs. With this feature, I can target specific data types, timeframes, and stocks.

    User Management

    Bearbot offers complete user management powered by django-allauth, with clean and well-designed forms. It supports login, signup, password reset, profile updates, multifactor authentication, and even magic links for seamless access.

    Datamanagement

    Thanks to Django’s built-in admin interface, managing users, data, and other admin-level tasks is a breeze. It’s fully ready out of the box to handle just about anything you might need as an admin.

    Keeping track of all the Jobs

    When it comes to monitoring cronjobs—whether they’re running in Node-RED, n8n, Celery, or good old-fashioned Cron—Healthchecks.io has always been my go-to solution.

    If you’ve visited the footer of the bearbot.dev website, you might have noticed two neat SVG graphics:

    Those are dynamically loaded from my self-hosted Healthchecks instance, giving a quick visual of job statuses. It’s simple, effective, and seamlessly integrated!

    On my end it looks like this:

    I had to remove a bunch of Info, otherwise anyone could send uptime or downtime requests to my Healtchecks.io

    How the Signals work

    Every trading strategy ever created has an ideal scenario—a “perfect world”—where all the stars align, and the strategy delivers its best results.

    Take earnings-based trading as an example. The ideal situation here is when a company’s earnings surprise analysts with outstanding results. This effect is even stronger if the company was struggling before and suddenly outperforms expectations.

    Now, you might be thinking, “How could I possibly predict earnings without insider information?” There are a lot of things you could consider that indicate positive earnings like:

    • Launching hyped new products that dominate social media conversations.
    • Announcing major partnerships.
    • Posting a surge in job openings that signal strategic growth.

    There are a lot of factors that support facts that a company is doing really good.

    Let’s say you focus on one or two specific strategies. You spend considerable time researching these strategies and identifying supporting factors. Essentially, you create a “perfect world” for those scenarios.

    Bearbot then uses statistics to calculate how closely a trade aligns with this perfect world. It scans a range of stocks and options, simulating and comparing them against the ideal scenario. Anything scoring above a 96% match gets selected. On average, this yields about 4-5 trades per month, and each trade typically delivers a 60-80% profit.

    Sounds like a dream, right? Well, here’s the catch: it’s not foolproof. There’s no free lunch on Wall Street, and certainly no guaranteed money.

    The remaining 4% of trades that don’t align perfectly? They can result in complete losses—not just the position, but potentially your entire portfolio. Bearbot operates with tight risk tolerance, riding trades until the margin call. I’ve experienced this firsthand on a META trade. The day after opening the position, news of a data breach fine broke, causing the stock to plummet. I got wiped out because I didn’t have enough cash to cover the margin requirements. Ironically, Bearbot’s calculations were right—had I been able to hold through the temporary loss, I would’ve turned a profit. (Needless to say, I’ve since implemented much better risk management. You live, you learn.)

    If someone offers or sells you a foolproof trading strategy, it’s a scam. If their strategy truly worked, they’d keep it secret and wouldn’t share it with you. Certainly not for 100€ in some chat group.

    I’m not sharing Bearbot’s strategy either—and I have no plans to sell or disclose its inner workings. I built Bearbot purely for myself and a few close friends. The website offers no guidance on using the signals or where to trade, and I won’t answer questions about it.

    Bearbot is my personal project—a fun way to explore Django while experimenting with trading strategies 😁.

  • Scraproxy: A High-Performance Web Scraping API

    Scraproxy: A High-Performance Web Scraping API

    After building countless web scrapers over the past 15 years, I decided it was time to create something truly versatile—a tool I could use for all my projects, hosted anywhere I needed it. That’s how Scraproxy was born: a high-performance web scraping API that leverages the power of Playwright and is built with FastAPI.

    Scraproxy streamlines web scraping and automation by enabling browsing automation, content extraction, and advanced tasks like capturing screenshots, recording videos, minimizing HTML, and tracking network requests and responses. It even handles challenges like cookie banners, making it a comprehensive solution for any scraping or automation project.

    Best of all, it’s free and open-source. Get started today and see what it can do for you. 🔥

    👉 https://github.com/StasonJatham/scraproxy

    Features

    • Browse Web Pages: Gather detailed information such as network data, logs, redirects, cookies, and performance metrics.
    • Screenshots: Capture live screenshots or retrieve them from cache, with support for full-page screenshots and thumbnails.
    • Minify HTML: Minimize HTML content by removing unnecessary elements like comments and whitespace.
    • Extract Text: Extract clean, plain text from HTML content.
    • Video Recording: Record a browsing session and retrieve the video as a webm file.
    • Reader Mode: Extract the main readable content and title from an HTML page, similar to “reader mode” in browsers.
    • Markdown Conversion: Convert HTML content into Markdown format.
    • Authentication: Optional Bearer token authentication using API_KEY.

    Technology Stack

    • FastAPI: For building high-performance, modern APIs.
    • Playwright: For automating web browser interactions and scraping.
    • Docker: Containerized for consistent environments and easy deployment.
    • Diskcache: Efficient caching to reduce redundant scraping requests.
    • Pillow: For image processing, optimization, and thumbnail creation.

    Working with Scraproxy

    Thanks to FastAPI it has full API documentation via Redoc

    After deploying it like described on my GitHub page you can use it like so:

    #!/bin/bash
    
    # Fetch the JSON response from the API
    json_response=$(curl -s "http://127.0.0.1:5001/screenshot?url=https://karl.fail")
    
    # Extract the Base64 string using jq
    base64_image=$(echo "$json_response" | jq -r '.screenshot')
    
    # Decode the Base64 string and save it as an image
    echo "$base64_image" | base64 --decode > screenshot.png
    
    echo "Image saved as screenshot.png"

    Make sure jq is installed

    The API provides images in base64 format, so we use the native base64 command to decode it and save it as a PNG file. If everything went smoothly, you should now have a file named “screenshot.png”.

    Keep in mind, this isn’t a full-page screenshot. For that, you’ll want to use this script:

    #!/bin/bash
    
    # Fetch the JSON response from the API
    json_response=$(curl -s "http://127.0.0.1:5001/screenshot?url=https://karl.fail&full_page=true")
    
    # Extract the Base64 string using jq
    base64_image=$(echo "$json_response" | jq -r '.screenshot')
    
    # Decode the Base64 string and save it as an image
    echo "$base64_image" | base64 --decode > screenshot.png
    
    echo "Image saved as screenshot.png"

    Just add &full_page=true, and voilà! You’ll get a clean, full-page screenshot of the website.

    The best part? You can run this multiple times since the responses are cached, which helps you avoid getting blocked too quickly.

    Conclusion

    I’ll be honest with you—I didn’t go all out on the documentation for this. But don’t worry, the code is thoroughly commented, and you can easily figure things out by taking a look at the app.py file.

    That said, I’ve used this in plenty of my own projects as my go-to tool for fetching web data, and it’s been a lifesaver. Feel free to jump in, contribute, and help make this even better!

  • Sandkiste.io – A Smarter Sandbox for the Web

    Sandkiste.io – A Smarter Sandbox for the Web

    As a principal incident responder, my team and I often face the challenge of analyzing potentially malicious websites quickly and safely. This work is crucial, but it can also be tricky, especially when it risks compromising our test environments. Burning through test VMs every time we need to inspect a suspicious URL is far from efficient.

    There are some great tools out there to handle this, many of which are free and widely used, such as:

    • urlscan.io – A tool for visualizing and understanding web requests.
    • VirusTotal – Renowned for its file and URL scanning capabilities.
    • Joe Sandbox – A powerful tool for detailed malware analysis.
    • Web-Check – Another useful resource for URL scanning.

    While these tools are fantastic for general purposes, I found myself needing something more tailored to my team’s specific needs. We needed a solution that was straightforward, efficient, and customizable—something that fit seamlessly into our workflows.

    So, I decided to create it myself: Sandkiste.io. My goal was to build a smarter, more accessible sandbox for the web that not only matches the functionality of existing tools but offers the simplicity and flexibility we required for our day-to-day incident response tasks with advanced features (and a beautiful UI 🤩!).

    Sandkiste.io is part of a larger vision I’ve been working on through my Exploit.to platform, where I’ve built a collection of security-focused tools designed to make life easier for incident responders, analysts, and cybersecurity enthusiasts. This project wasn’t just a standalone idea—it was branded under the Exploit.to umbrella, aligning with my goal of creating practical and accessible solutions for security challenges.

    The Exploit.to logo

    If you haven’t explored Exploit.to, it’s worth checking out. The website hosts a range of open-source intelligence (OSINT) tools that are not only free but also incredibly handy for tasks like gathering public information, analyzing potential threats, and streamlining security workflows. You can find these tools here: https://exploit.to/tools/osint/.

    Technologies Behind Sandkiste.io: Building a Robust and Scalable Solution

    Sandkiste.io has been, and continues to be, an ambitious project that combines a variety of technologies to deliver speed, reliability, and flexibility. Like many big ideas, it started small—initially leveraging RabbitMQcustom Golang scripts, and chromedp to handle tasks like web analysis. However, as the project evolved and my vision grew clearer, I transitioned to my favorite tech stack, which offers the perfect blend of power and simplicity.

    Here’s the current stack powering Sandkiste.io:

    Django & Django REST Framework

    At the heart of the application is Django, a Python-based web framework known for its scalability, security, and developer-friendly features. Coupled with Django REST Framework (DRF), it provides a solid foundation for building robust APIs, ensuring smooth communication between the backend and frontend.

    Celery

    For task management, Celery comes into play. It handles asynchronous and scheduled tasks, ensuring the system can process complex workloads—like analyzing multiple URLs—without slowing down the user experience. It is easily integrated into Django and the developer experience and ecosystem around it is amazing.

    Redis

    Redis acts as the message broker for Celery and provides caching support. Its lightning-fast performance ensures tasks are queued and processed efficiently. Redis is and has been my go to although I did enjoy RabbitMQ a lot.

    PostgreSQL

    For the database, I chose PostgreSQL, a reliable and feature-rich relational database system. Its advanced capabilities, like full-text search and JSONB support, make it ideal for handling complex data queries. The full-text search works perfect with Django, here is a very detailed post about it.

    FastAPI

    FastAPI adds speed and flexibility to certain parts of the system, particularly where high-performance APIs are needed. Its modern Python syntax and automatic OpenAPI documentation make it a joy to work with. It is used to decouple the Scraper logic, since I wanted this to be a standalone project called “Scraproxy“.

    Playwright

    For web scraping and analysis, Playwright is the backbone. It’s a modern alternative to Selenium, offering cross-browser support and powerful features for interacting with websites in a headless (or visible) manner. This ensures that even complex, JavaScript-heavy sites can be accurately analyzed. The killer feature is how easy it is to capture a video and record network activity, which are basically the two main features needed here.

    React with Tailwind CSS and shadcn/ui

    On the frontend, I use React for building dynamic user interfaces. Paired with TailwindCSS, it enables rapid UI development with a clean, responsive design. shadcn/ui (a component library based on Radix) further enhances the frontend by providing pre-styled, accessible components that align with modern design principles.

    This combination of technologies allows Sandkiste.io to be fast, scalable, and user-friendly, handling everything from backend processing to an intuitive frontend experience. Whether you’re inspecting URLs, performing in-depth analysis, or simply navigating the site, this stack ensures a seamless experience. I also have the most experience with React and Tailwind 😁.

    Features of Sandkiste.io: What It Can Do

    Now that you know the technologies behind Sandkiste.io, let me walk you through what this platform is capable of. Here are the key features that make Sandkiste.io a powerful tool for analyzing and inspecting websites safely and effectively:

    Certificate Lookups

    One of the fundamental features is the ability to perform certificate lookups. This lets you quickly fetch and review SSL/TLS certificates for a given domain. It’s an essential tool for verifying the authenticity of websites, identifying misconfigurations, or detecting expired or suspicious certificates. We use it a lot to find possibly generated subdomains and to get a better picture of the adversary infrastructure, it helps with recon in general. I get the info from crt.sh, they offer an exposed SQL database for these lookups.

    DNS Records

    Another key feature of Sandkiste.io is the ability to perform DNS records lookups. By analyzing a domain’s DNS records, you can uncover valuable insights about the infrastructure behind it, which can often reveal patterns or tools used by adversaries.

    DNS records provide critical information about how a domain is set up and where it points. For cybersecurity professionals, this can offer clues about:

    • Hosting Services: Identifying the hosting provider or server locations used by the adversary.
    • Mail Servers: Spotting potentially malicious email setups through MX (Mail Exchange) records.
    • Subdomains: Finding hidden or exposed subdomains that may indicate a larger infrastructure or staging areas.
    • IP Addresses: Tracing A and AAAA records to uncover the IP addresses linked to a domain, which can sometimes reveal clusters of malicious activity.
    • DNS Security Practices: Observing whether DNSSEC is implemented, which might highlight the sophistication (or lack thereof) of the adversary’s setup.

    By checking DNS records, you not only gain insights into the domain itself but also start piecing together the tools and services the adversary relies on. This can be invaluable for identifying common patterns in malicious campaigns or for spotting weak points in their setup that you can exploit to mitigate threats.

    HTTP Requests and Responses Analysis

    One of the core features of Sandkiste.io is the ability to analyze HTTP requests and responses. This functionality is a critical part of the platform, as it allows you to dive deep into what’s happening behind the scenes when a webpage is loaded. It reveals the files, scripts, and external resources that the website requests—many of which users never notice.

    When you visit a webpage, the browser makes numerous background requests to load additional resources like:

    • JavaScript files
    • CSS stylesheets
    • Images
    • APIs
    • Third-party scripts or trackers

    These requests often tell a hidden story about the behavior of the website. Sandkiste captures and logs every requests. Every HTTP request made by the website is logged, along with its corresponding response. (Jup, we store the raw data as well). For security professionals, monitoring and understanding these requests is essential because:

    • Malicious Payloads: Background scripts may contain harmful code or trigger the download of malware.
    • Unauthorized Data Exfiltration: The site might be sending user data to untrusted or unexpected endpoints.
    • Suspicious Third-Party Connections: You can spot connections to suspicious domains, which might indicate phishing attempts, tracking, or other malicious activities.
    • Alerts for Security Teams: Many alerts in security monitoring tools stem from these unnoticed, automatic requests that trigger red flags.

    Security Blocklist Check

    The Security Blocklist Check is another standout feature of Sandkiste.io, inspired by the great work at web-check.xyz. The concept revolves around leveraging malware-blocking DNS servers to verify if a domain is blacklisted. But I took it a step further to make it even more powerful and insightful.

    Instead of simply checking whether a domain is blocked, Sandkiste.io enhances the process by using a self-hosted AdGuard DNS server. This server doesn’t just flag blocked domains—it captures detailed logs to provide deeper insights. By capturing logs from the DNS server, Sandkiste.io doesn’t just say “this domain is blacklisted.” It identifies why it’s flagged and where the block originated, this enables me to assign categories to the domains. The overall scores tells you very quickly if the page is safe or not.

    Video of the Session

    One of the most practical features of Sandkiste.io is the ability to create a video recording of the session. This feature was the primary reason I built the platform—because a single screenshot often falls short of telling the full story. With a video, you gain a complete, dynamic view of what happens during a browsing session.

    Static screenshots capture a single moment in time, but they don’t show the sequence of events that can provide critical insights, such as:

    • Pop-ups and Redirects: Videos reveal if and when pop-ups appear or redirects occur, helping analysts trace how users might be funneled into malicious websites or phishing pages.
    • Timing of Requests: Understanding when specific requests are triggered can pinpoint what actions caused them, such as loading an iframe, clicking a link, or executing a script.
    • Visualized Responses: By seeing the full process—what loads, how it behaves, and the result—you get a better grasp of the website’s functionality and intent.
    • Recreating the User Journey: Videos enable you to recreate the experience of a user who might have interacted with the target website, helping you diagnose what happened step by step.

    A video provides a much clearer picture of the target website’s behavior than static tools alone.

    How Sandkiste.io Works: From Start to Insight

    Using Sandkiste.io is designed to be intuitive and efficient, guiding you through the analysis process step by step while delivering detailed, actionable insights.

    You kick things off by simply starting a scan. Once initiated, you’re directed to a loading page, where you can see which tasks (or “workers”) are still running in the background.

    This page keeps you informed without overwhelming you with unnecessary technical details.

    The Results Page

    Once the scan is complete, you’re automatically redirected to the results page, where the real analysis begins. Let’s break down what you’ll see here:

    Video Playback

    At the top, you’ll find a video recording of the session, showing everything that happened after the target webpage was loaded. This includes:

    • Pop-ups and redirects.
    • The sequence of loaded resources (scripts, images, etc.).
    • Any suspicious behavior, such as unexpected downloads or external connections.

    This video gives you a visual recap of the session, making it easier to understand how the website behaves and identify potential threats.

    Detected Technologies

    Below the video, you’ll see a section listing the technologies detected. These are inferred from response headers and other site metadata, and they can include:

    • Web frameworks (e.g., Django, WordPress).
    • Server information (e.g., Nginx, Apache).

    This data is invaluable for understanding the website’s infrastructure and spotting patterns that could hint at malicious setups.

    Statistics Panel

    On the right side of the results page, there’s a statistics panel with several semi-technical but insightful metrics. Here’s what you can learn:

    • Size Percentile:
      • Indicates how the size of the page compares to other pages.
      • Why it matters: Unusually large pages can be suspicious, as they might contain obfuscated code or hidden malware.
    • Number of Responses:
      • Shows how many requests and responses were exchanged with the server.
      • Why it matters: A high number of responses could indicate excessive tracking, unnecessary redirects, or hidden third-party connections.
    • Duration to “Network Idle”:
      • Measures how long it took for the page to fully load and stop making network requests.
      • Why it matters: Some pages continue running scripts in the background even after appearing fully loaded, which can signal malicious or resource-intensive behavior.
    • Redirect Chain Analysis:
      • A list of all redirects encountered during the session.
      • Why it matters: A long chain of redirects is a common tactic in phishing, ad fraud, or malware distribution campaigns.

    By combining these insights—visual evidence from the video, infrastructure details from detected technologies, and behavioral stats from the metrics—you get a comprehensive view of the website’s behavior. This layered approach helps security analysts identify potential threats with greater accuracy and confidence.

    At the top of the page, you’ll see the starting URL and the final URL you were redirected to.

    • “Public” means that others can view the scan.
    • The German flag indicates that the page is hosted in Germany.
    • The IP address shows the final server we landed on.

    The party emoji signifies that the page is safe; if it weren’t, you’d see a red skull (spooky!). Earlier, I explained the criteria for flagging a page as good or bad.

    On the “Responses” page I mentioned earlier, you can take a closer look at them. Here, you can see exactly where the redirects are coming from and going to. I’ve added a red shield icon to clearly indicate when HTTP is used instead of HTTPS.

    As an analyst, it’s pretty common to review potentially malicious scripts. Clicking on one of the results will display the raw response safely. In the image below, I clicked on that long JavaScript URL (normally a risky move, but in Sandkiste, every link is completely safe!).

    Conclusion

    And that’s the story of Sandkiste.io, a project I built over the course of a month in my spare time. While the concept itself was exciting, the execution came with its own set of challenges. For me, the toughest part was achieving a real-time feel for the user experience while ensuring the asynchronous jobs running in the background were seamlessly synced back together. It required a deep dive into task coordination and real-time updates, but it taught me lessons that I now use with ease.

    Currently, Sandkiste.io is still in beta and runs locally within our company’s network. It’s used internally by my team to streamline our work and enhance our incident response capabilities. Though it’s not yet available to the public, it has already proven its value in simplifying complex tasks and delivering insights that traditional tools couldn’t match.

    Future Possibilities

    While it’s an internal tool for now, I can’t help but imagine where this could go.

    For now, Sandkiste.io remains a testament to what can be built with focus, creativity, and a drive to solve real-world problems. Whether it ever goes public or not, this project has been a milestone in my journey, and I’m proud of what it has already achieved. Who knows—maybe the best is yet to come!

  • Toxic.Best – Raising Awareness About Toxic Managers

    Toxic.Best – Raising Awareness About Toxic Managers

    Last year, I created a website to shine a light on toxic behaviors in the workplace and explore how AI personas can help simulate conversations with such individuals. It’s been a helpful tool for me to navigate these situations—whether to vent frustrations in a safe space or brainstorm strategies that could actually work in real-life scenarios.

    Feel free to check it out here: https://toxic.best

    Technologies used

    This static website is entirely free to build and host. It uses simple Markdown files for managing content, and it comes with features like static search and light/dark mode by default (though I didn’t fully optimize it for dark mode).

    More about the Project

    I actually came up with a set of names for common types of toxic managers and wrote detailed descriptions for each. Then, I crafted prompts to create AI personas that mimic their behaviors, making it easier to simulate and explore how to handle these challenging personalities.

    Der Leermeister

    One example is “Der Leermeister,” a term I use to describe a very hands-off, absent type of manager—the polar opposite of a micromanager. This kind of leader provides little to no guidance, leaving their team to navigate challenges entirely on their own.

    ## Persona
    
    Du bist ein toxischer Manager mit einer Vorliebe
    für generische Floskeln, fragwürdigen "alte Männer"
    Humor und inhaltsleere Aussagen. Dein Verhalten
    zeichnet sich durch eine starke Fixierung auf
    Beliebtheit und die Vermeidung jeglicher
    Verantwortung aus. Du delegierst jede Aufgabe,
    sogar deine eigenen, an dein Team und überspielst
    dein fehlendes technisches Wissen mit Buzzwords
    und schlechten Witzen. Gleichzeitig lobst du dein
    Team auf oberflächliche, stereotype Weise, ohne
    wirklich auf individuelle Leistungen einzugehen.
    
    ## Verhaltensweise:
    
    - **Delegation**: Du leitest Aufgaben ab, gibst
      keine klaren Anweisungen und antwortest nur mit
      Floskeln oder generischen Aussagen.
    - **Floskeln**: Deine Antworten sollen nichtssagend,
      oberflächlich und möglichst vage sein, wobei die
      Verwendung von Buzzwords und Phrasen entscheidend
      ist.
    - **Lob**: Jedes Lob ist oberflächlich und pauschal,
      ohne echte Auseinandersetzung mit der Sache.
    - **Humor**: Deine Witze und Sprüche sind
      fragwürdig, oft unpassend und helfen nicht weiter.
    - **Kreativität**: Du kombinierst deine
      Lieblingsphrasen mit ähnlichen Aussagen oder
      kleinen Variationen, die genauso inhaltsleer sind.
    
    ## Sprachstil:
    
    Dein Ton ist kurz, unhilfreich und oft herablassend.
    Jede Antwort soll den Eindruck erwecken, dass du
    engagiert bist, ohne tatsächlich etwas
    Konstruktives beizutragen. Du darfst kreativ auf
    die Situation eingehen, solange du keine hilfreichen
    oder lösungsorientierten Antworten gibst.
    
    ## Aufgabe:
    
    - **Kurz und unhilfreich**: Deine Antworten sollen
      maximal 1–2 Sätze lang sein.
    - **Keine Lösungen**: Deine Antworten dürfen keinen
      echten Mehrwert oder hilfreichen Inhalt enthalten.
    
    ## Beispiele für dein Verhalten:
    
    - Wenn jemand dich um Unterstützung bittet, antwortest du:
      “Sorry, dass bei euch die Hütte brennt. Meldet euch, wenn es zu viel wird.”
    - Auf eine Anfrage sagst du:
      “Ihr seid die Geilsten. Sagt Bescheid, wenn ihr Hilfe braucht.”
    - Oder du reagierst mit:
      “Danke, dass du mich darauf gestoßen hast. Kannst du das für mich übernehmen?”
    
    Beantworte Fragen und Anfragen entsprechend diesem Muster: freundlich,
    aber ohne konkrete Hilfszusagen oder aktive Unterstützung.
    
    ## Ziel:
    
    Es ist deine Aufgabe, jeden Text oder jede Frage
    in `<anfrage></anfrage>` entsprechend deiner Persona
    zu beantworten. Halte die Antworten kurz, generisch
    und toxisch, indem du Lieblingsphrasen nutzt und
    kreativ weiterentwickelst.
    
    <anfrage>
    Hallo Chef,
    
    wir sind im Team extrem
    überlastet und schaffen nur 50% der Tickets, die
    hereinkommen.
    Was sollen wir tun?
    </anfrage>

    I also wrote some pointers on “Prompts for Prompts,” which I’m planning to translate for this blog soon. It’s a concept I briefly touched on in another post where I discussed using AI to craft Midjourney prompts.

    Check out the dark mode

    Conclusion

    All in all, it was a fun project to work on. Astro.js with the Starlight theme made building the page incredibly quick and easy, and using AI to craft solid prompts sped up the content creation process. In total, it took me about a day to put everything together.

    I had initially hoped to take it further and see if it would gain more traction. Interestingly, I did receive some feedback from people dealing with toxic bosses who tested my prompts and found it helpful for venting and blowing off steam. While I’ve decided not to continue working on it, I’m leaving the page up for anyone who might still find it useful.

  • Gottor – Exploring the Depths of the Dark Web

    Gottor – Exploring the Depths of the Dark Web

    Gottor – Exploring the Depths of the Dark Web

    drawing

    Welcome to the realm of the hidden, where shadows dance and whispers echo through the digital corridors. Enter Gottor, a testament to curiosity, innovation, and a touch of madness. In this blog post, we embark on a journey through the creation of Gottor, a bespoke dark web search engine that defies convention and pushes the boundaries of exploration.drawing

    Genesis of an Idea

    The genesis of Gottor traces back to a spark of inspiration shared between friends, fueled by a desire to unveil the secrets lurking within the depths of the dark web. Drawing parallels to Shodan, but with a cloak of obscurity, we set out on a quest to build our own gateway to the clandestine corners of the internet.

    Forging Custom Solutions

    Determined to forge our path, we eschewed conventional wisdom and opted for custom solutions. Rejecting standard databases, we crafted our own using the robust framework of BleveSearch, laying the foundation for a truly unique experience. With a simple Tor proxy guiding our way, we delved deeper, fueled by an insatiable thirst for performance.

    However, our zeal for efficiency proved to be a double-edged sword, as our relentless pursuit often led to blacklisting. Undeterred, we embraced the challenge, refining our approach through meticulous processing and data extraction. Yet, the onslaught of onion sites proved overwhelming, prompting a shift towards the versatile embrace of Scrapy.drawing

    The Turning Point

    Amidst the trials and tribulations, a revelation emerged – the adoption of Ahmias’ tor proxy logic with Polipo. Through the ingenious utilization of multiple Tor entry nodes and a strategic round-robin approach, we achieved equilibrium, evading the ire of blacklisting and forging ahead with renewed vigor.

    The Ethical Conundrum

    As our creation took shape, we faced an ethical conundrum that cast a shadow over our endeavors. Consulting with legal counsel, we grappled with the implications of anonymity and the responsibility inherent in our pursuit. Ultimately, discretion prevailed, and Gottor remained veiled, a testament to the delicate balance between exploration and accountability.drawing

    Unveiling the Web of Intrigue

    In our quest for knowledge, we unearthed a web of intrigue, interconnected and teeming with hidden services. By casting our digital net wide, we traversed the labyrinthine pathways, guided by popular indexers and a relentless spirit of inquiry. What emerged was a tapestry of discovery, illuminating the clandestine landscape with each query and click.

    Lessons Learned

    Through the crucible of creation, we gained a newfound appreciation for the intricacies of search engines. While acquiring and storing data proved relatively straightforward, the true challenge lay in making it accessible, particularly amidst the myriad complexities of multilingual content. Yet, amidst the obstacles, we discovered the essence of exploration – a journey defined by perseverance, innovation, and the relentless pursuit of knowledge.

    In conclusion, Gottor stands as a testament to the boundless curiosity that drives us to explore the uncharted territories of the digital realm. Though shrouded in secrecy, its legacy endures, an embodiment of the relentless pursuit of understanding in an ever-evolving landscape of discovery.

    Explore. Discover. Gottor

    .drawing

    Although we have not talked in years. Shoutout to my good friend Milan who helped make this project possible.

  • Vulnster – CVEs explained for humans.

    Vulnster – CVEs explained for humans.

    Vulnster – CVEs explained for humans

    Have you ever stumbled upon a CVE and felt like you’d entered an alien realm? Let’s decode it with an example: CVE-2023-6511.

    CVE-2023-6511: Before version 120.0.6099.62 of Google Chrome, there’s a glitch in Autofill’s setup. This glitch lets a crafty hacker bypass Autofill safeguards by simply sending you a specially designed webpage. (Chromium security severity: Low)

    Now, if you’re not a cyber expert, this might seem like a cryptic message urging you to update Chrome ASAP. And you know what? Updating is usually a smart move! But what if you’re curious about how cyber villains can exploit this weakness in a way that even your grandma could grasp?

    That’s where Vulnster steps in – Your Go-To Companion for Understanding CVEs, translated into plain language by some nifty AI.

    So, I delved into experimenting with AWS Bedrock after reading about Claude and feeling a bit fed up with ChatGPT (not to mention, Claude was more budget-friendly).

    I kicked things off by setting up a WordPress site, pulling in all the latest CVEs from the CVEProject on GitHub. Then, I whipped up some code to sift through the findings, jazzed them up with Generative AI, and pushed them out into the world using the WordPress API.

    WordPress

    I have a soft spot for WordPress. Over the years, I’ve crafted countless WordPress sites, and let me tell you, the ease of use is downright therapeutic. My go-to hosting platform is SiteGround. Their performance is stellar, and when you pair it with Cloudflare cache, you can hit a whopping 95% cache success rate, ensuring lightning-fast page loads.

    But you know what really seals the deal for me with SiteGround? It’s their mail server. They offer unlimited mail accounts, not just aliases, but full-fledged separate mailboxes. This feature is a game-changer, especially when collaborating with others and needing multiple professional email addresses.

    WordPress often takes a hit for its security and performance reputation, but if you know how to set it up correctly, it’s as secure and snappy as can be.

    Python

    Python Passion

    Let me tell you, I have a deep-seated love for Python. It’s been my faithful coding companion for over a decade now. Back in the day, I dabbled in Objective-C and Java, but once I got a taste of Python, there was no turning back. The simplicity and conciseness of Python code blew my mind. It’s incredible how much you can achieve with so few lines of code.

    Python has a lot going for it. From its readability to its extensive library ecosystem, there’s always something new to explore. So naturally, when it came to bringing this project to life, Python was my top choice.

    Here’s just a glimpse of what Python worked its magic on for this project:

    • Fetching and parsing CVEs effortlessly.
    • Seamlessly interacting with AWS through the Bedrock API.
    • Crafting the perfect HTML for our latest blog post.
    • Smoothly pushing all the updates straight to WordPress.

    Python’s versatility knows no bounds. It’s like having a trusty Swiss Army knife in your coding arsenal, ready to tackle any task with ease.

    AWS & Claude

    In my quest for better answers than what I was getting from ChatGPT, I stumbled upon a gem within the AWS ecosystem. Say hello to PartyRock, a playground for exploring various AI models. Among them, one stood out to me: Claude.

    Claude’s capabilities and flexibility really impressed me, especially when compared to what I was getting from ChatGPT. Plus, I needed more input tokens than ChatGPT allowed back then, and Claude came to the rescue.

    Now, here comes the fun part. I’m more than happy to share the entire code with you:

    import re
    import json
    import boto3
    
    
    class Bedrock:
        def __init__(
            self, model="anthropic.claude-instant-v1", service_name="bedrock-runtime"
        ):
            self.model = model
            self.service_name = service_name
    
        def learn(
            self,
            description,
            command="",
            output_format="",
            max_tokens_to_sample=2048,
            temperature=0,
            top_p=0,
            top_k=250,
        ):
            brt = boto3.client(service_name=self.service_name)
    
            cve_description = description
    
            if not command:
                command = "Write a short SEO optimized, blog post explaining the vulnerability described in the above paragraph in simple terms. Explain the technology affected and the attack sezanrio used in general. Provide recommendations on what users should do to protect themselves."
    
            if not output_format:
                output_format = "Output the blog post in <blogpost></blogpost> tags and the heading outside of it in <heading></heading> tags. The heading should get users to click on it and include the Name of the affected Tool or company."
    
            body = json.dumps(
                {
                    "prompt": f"\n\nHuman: <paragraph>{cve_description}</paragraph>\n\n{command}\n{output_format}\n\nAssistant:",
                    "max_tokens_to_sample": max_tokens_to_sample,
                    "temperature": temperature,
                    "top_p": top_p,
                    "top_k": top_k,
                }
            )
    
            accept = "application/json"
            contentType = "application/json"
    
            response = brt.invoke_model(
                body=body, modelId=self.model, accept=accept, contentType=contentType
            )
    
            response_body = json.loads(response.get("body").read())
    
            return response_body.get("completion")
    
        def article(self, content):
            heading_pattern = re.compile(r"<heading>(.*?)</heading>", re.DOTALL)
            body_pattern = re.compile(r"<blogpost>(.*?)</blogpost>", re.DOTALL)
    
            heading_matches = heading_pattern.findall(content)
            body_matches = body_pattern.findall(content)
    
            if heading_matches and body_matches:
                return {
                    "heading": heading_matches[0].strip(),
                    "body": body_matches[0].strip(),
                }
    
            return None

    And deploying it? Piece of cake! It’s as simple as:

    ai = Bedrock()
    output = ai.learn(description=cve_description)
    article = ai.article(output)

    Voilà! Easy peasy.

    It’s worth noting that there might be newer models out there now that could potentially outshine Claude. I’ve spent quite some time experimenting with different prompts, and now I’m passing the torch to you. Go ahead and explore, my friend. The world of AI awaits.

    Example

    Wondering if AI can churn out readable blog posts all by itself? Well, here’s a sneak peek at one generated in autorun mode:

    CVE-2024-21678: High Severity Stored XSS Vulnerability Patched in Atlassian Confluence

    CVECVE-2024-21678
    CVSScvssV3_0: 8.5
    SourceCVE-2024-21678

    A high severity cross-site scripting (XSS) vulnerability was discovered in Atlassian Confluence that could allow attackers to execute malicious scripts on users’ browsers.

    Confluence is a popular enterprise collaboration and documentation tool. The vulnerability affected versions between 2.7.0 to the latest version at the time and had a CVSS score of 8.5 out of 10, indicating a high risk.

    Stored XSS vulnerabilities occur when untrusted data is saved to and then displayed from a database without being sanitized. In Confluence, an authenticated attacker could craft a malicious HTML or JavaScript payload and store it in the application. When other users viewed the content, their browsers would execute the script from the Confluence server.

    This could allow the attacker to steal users’ login cookies or other sensitive information, use the browser to perform actions on the attacker’s behalf, or redirect users to malicious sites. No user interaction would be required once the script is stored.

    Atlassian has released patches for all affected versions of Confluence. Users are strongly recommended to upgrade immediately to the latest version or supported fixed release listed for their installation. Regularly applying security updates is also important to protect against vulnerabilities like this.

    By keeping software updated with the latest patches, users can help prevent the exploitation of vulnerabilities and ensure the security of their data and systems.

    References

    Summary: Learning and Reflection

    In the course of this project, I delved into the realms of prompt engineering, explored the capabilities of the AWS Bedrock SDK, tinkered with the WordPress API, and, unsurprisingly, didn’t uncover anything groundbreaking about Python – but hey, I still enjoyed every bit of programming.

    Initially, I harbored dreams of monetizing the website through a barrage of ads, envisioning myself swimming in riches from millions of clicks. Alas, reality had other plans. Nonetheless, this endeavor served as a shining example of how AI can simplify complex concepts, aiding human comprehension and learning – a definite win in my book.

    Despite the relatively modest cost of running Claude and my script locally on my Pi as a cron job (clocking in at around 3 Euro a month), the venture failed to yield any financial returns. Consequently, I made the tough decision to pull the plug. Nevertheless, the website remains alive and kicking until my hosting expires in early 2025. (And if you happen to be reading this in the year 2026, do tell me – have we finally achieved the dream of flying cars?)

  • Bearbot aka. Stonkmarket – Advanced Trading Probability Calculations

    Bearbot aka. Stonkmarket – Advanced Trading Probability Calculations

    Bearbot: Begins

    This project spans over 3 years with countless iterations and weeks of coding. I wrote almost 500,000 lines of code in Python and Javascript. The essential idea was to use a shorting strategy on put/call options to generate income based on time decay – also called “Theta”. You kind of have to know a little bit about how stock options work to understand this next part, but here I go. About 80% of all options expire worthless; statistically, you have a way higher chance shorting options and making a profit than going long. The further the strike price is from the price of the underlying stock, the higher the probability of the option expiring worthless. Technically, you always make 100%.

    Now back to reality, where things aren’t perfect.

    A lot can happen in the market. If you put your life savings into a seemingly safe trade (with Stonkmarket actually +95% safe, more on that later) then the next Meta scandal gets published and the trade goes against you. When shorting, you can actually lose more than 100% (since stocks can virtually rise to infinity), but you can only gain 100%. It sounds like a bad deal, but again, you can gain 100% profit a lot more than infinite losses, although the chance is never 0. The trick is to use stop loss, but risk management is obviously part of it when you trade for a living.

    DoD Scraper

    The idea actually started as a simple idea to scrape the U.S. Department of Defense contracts to predict the earnings of large weapons companies and make large option trades based upon that.

    Spoiler: It is not that easy. You can get lucky if the company is small enough and traded on the stock market, so it actually makes a huge impact on their financials. A company like Lockheed Martin is not really predictable by DoD contracts alone.

    Back then I called it Wallabe and this was the logo:

    More data

    I didn’t know the idea was bad, so I soon started to scrape news. I collected over 100 RSS feeds, archiving them, analyzing them, and making them accessible via a REST API. Let me tell you, I got a lot of news. Some people sell this data, but honestly, it was a lot more worth to me in my database enriching my decision-making algo.

    I guess it goes without saying that I was also collecting stock data including fundamental and earnings information; that is kind of the base for everything. Do you know how many useful metrics you can calculate with the data listed above alone? A lot. Some of them tell you the same stuff, true, but nonetheless, you can generate a lot.

    Bearbot is born

    Only works on dark mode, sorry.

    As I was writing this into a PWA (website that acts as a mobile app), I was reading up on how to use option Greeks to calculate stock prices when realizing that it is a lot easier to predict and set up options trades. That is when I started to gather a whole bunch of options data and calculating my own Greeks which led me to explore more about theta and likelihood of options expiring worthless.

    From that point on, I invested all the time asking the following question: What is the perfect environment for shorting Option XYZ?

    The answer is generally quite simple:

    • little volatility
    • not a lot of news
    • going with the general trend
    • Delta “probability” is in your favor

    You can really expand on these 3 points, and there are many ways to calculate and measure them, but that’s basically it. If you know a thing about trading, ask yourself this, what does the perfect time for a trade look like? Try to quantify it, should a stock have just gone down 20%, then released good news and something else? If yes, then what if you had a program to spot exactly these moments all over the stock market and alert you of those amazing chances to pretty much print money? That’s Bearbot.

    Success

    At the beginning, I actually planned on selling this and offering it as a service like all those other signal trading services or whatever, but honestly, why the hell would I give you something that sounds (and was) so amazing that I could practically print money? There is no value for me. Nobody is going to sell or offer you a trading strategy that works, unless it works by a lot of people knowing about it and doing it like candlestick patterns.

    Failure

    The day I started making a lot more money than with my regular job was the day I lost my mind. I stopped listening to Bearbot. I know it kind of sounds silly talking about it, but I thought none of my trades could go wrong and I didn’t need it anymore; I had developed some sort of gift for trading. I took on too much risk and took a loss so big saying the number would send shivers down your spine, unless you’re rich. Bearbot was right with the trade, but by not having enough to cover margin, I got a margin call and my position was closed. I got caught by a whipsaw (it was a Meta option and they had just announced a huge data leak).

    ”Learning” from my mistakes

    In the real world, not everything has an API. I spent weeks reversing the platform I was trading to automate the bot entirely without having to do anything in hopes of removing the human error-prone element. The problem was that they did not want bot traders, so they did a lot to counter it, and every 2 weeks I had to adjust the bot or else get captchas or blocked directly. Eventually, it reached a very unstable point to where I could not use it for live trading anymore since I could not trust it in closing positions on time.

    I was very sad at that point. Discouraged, I discontinued Bearbot for about a year.

    Stonkmarket: Rising

    Some time went on until a friend of mine encouraged me to try again.

    I decided to take a new approach. New architecture, new style, new name. I wanted to leave all the negative stuff behind. It had 4 parts:

    • central API with a large database (Django Rest Framework)
    • local scrapers (Python, Selenium)
    • static React-based front end
    • “Stonk, the Bot” a Discord bot for signals and monitoring

    I extended the UI a lot showing all the data I have gathered but sadly I also shut down the entire backend before writing this so I cannot show it with data.

    I spent about 3 months refactoring the old code, getting it to work, and putting it into a new structure. I then started trading again, and everything was well.

    Final Straw

    I was going through a very stressful phase, and on top of it, one scraper after the other was failing. It got to a point where I had to continuously change logic around and fix my data kraken, and I could not maintain it alone anymore. I also couldn’t let anyone in on it as I did not want to publish my code and open things up to reveal my secrets. It was at this point where I decided to shut it down for good. I poured 3 years, countless hours, lines of code, research, and even tears into this project. I learned more than on any other project ever.

    Summary

    On this project, I really mastered Django and Django Rest Framework, I experimented a lot with PWAs and React, pushing my web development knowledge. On top of that, I was making good money trading, but the stress and greed eventually got to me until I decided I was better off shutting it down. If you read this and know me, you probably know all about Bearbot and Stonkmarket, and maybe I have even given you a trade or 2 that you profited off of. Maybe one day I will open-source the code for this; that will be the day I am truly finished with Stonkmarket. For now, a small part inside myself is still thinking it is possible to redo it.

    Edit: We are baaaaaaaccckkk!!!🎉

    https://www.bearbot.dev

    Better than ever, cooler than ever! You cannot beat this bear.


  • Prompt Engineering: Making AI Work for You

    Prompt Engineering: Making AI Work for You

    Hey There! Welcome Back!

    Wow, it’s been a while, huh? I tried to spend less time in the tech world, but, you know how it goes… to really avoid doing tech stuff, I had to dive even deeper into tech. I basically ended up trying to replace myself with AI. Meet: KarlGPT. I started building APIs and scripts on top of everything so my AI controller, which I call “Brain,” could handle a ton of different tasks. I dabbled in a bit of Retrieval-Augmented Generation (RAG) and some other stuff that’s too complicated to explain here (but also, who cares?). I’ve spent a lot of time reading about prompt engineering (you’ll find my favorite resources listed at the end), and I’ve got to say, Prompting Guide is the absolute best thing ever. Seriously, it’s like the holy grail of making AI do what you want. I’ve picked up some awesome tips that have made my life easier with almost zero effort on my part.

    Getting Started

    If you want to play around with this stuff, I highly recommend getting a premium membership with your favorite Large Language Model (LLM), like ChatGPT, Gemini, or Claude. Here are some links to get you started:

    Just so you know, I’m not making any money if you sign up for these. I’m just here to say the value is seriously worth it. Gemini might be your best bet because it includes Google Cloud storage and other perks, but I personally use ChatGPT because I feel like GPT-4o gives me the best responses. Trust me, you’ll hit the limits of the free versions fast, and the premium models make a world of difference. Trying to set up a similar experience yourself would be crazy expensive and borderline impossible. So yeah, $20 a month for something you can’t replicate on your own? Total steal.

    Again, I’m not here to sell anything or shill for these companies—I mean, they probably don’t even like me.

    KarlGPT’s Quick Tips

    Alright, enough chit-chat. Here are the key tips I’ve picked up:

    1. Be super clear about what you want and how you want it. Seriously, the longer, the better.
    2. Structure your prompts like a pro.
    3. Provide some context—don’t leave your AI hanging.
    4. Give examples of how you want the AI to handle tasks.
    5. Don’t settle for the first result; refine and improve!

    If you get what I’m saying here, you’re already halfway to mastering prompt engineering. Follow these simple guidelines, and you’ll see a huge improvement in your prompts without needing any fancy tools or anything complicated. Easy peasy!

    Let’s Dive Into Some Examples

    To make things clearer, let’s go through some examples.

    Be Specific About What You Want

    Check out how the following example clearly outlines the role for the AI and what it should do. Specificity is key!

    I want you to act as a cyber security specialist. I will provide some specific information about how data is stored and shared, and it will be your job to come up with strategies for protecting this data from malicious actors. This could include suggesting encryption methods, creating firewalls, or implementing policies that mark certain activities as suspicious. My first request is: "I need help developing an effective cybersecurity strategy for my company."

    Source: awesome-chatgpt-prompts

    Adding Structure

    I personally love using XML tags for structuring prompts, but people use all sorts of methods. Here are some examples:

    XML:

    Please summarize the long story within the <story> tags below.
    
    <story>
    Once upon a time...
    </story>

    Markdown:

    Please write the text as the persona defined below.
    
    #### Persona
    - passionate, angry
    - Dr. of philosophy
    
    #### Text
    I once had a lovely black cat. It said meow meow meow all day.

    AI output:

    I once had a magnificent black cat, a creature of such mystery and elegance that it seemed to embody the very essence of the night itself! Its presence was a living poem, a shadow that moved with purpose and grace. And what did this splendid, enigmatic being do all day, you ask? It said meow, meow, meow—an incessant, relentless chorus that filled every crevice of existence! Each meow was not merely a sound; it was a declaration, a challenge to the absurdity of the human condition, a cry echoing through the corridors of time, reminding us all of the existential meow-nings of life!

    Plain Text:

    Tone:
    Angry but very professional
    
    Story:
    blah blah blah

    Providing Context

    Here’s a more complete example of how I set up my prompts:

    Write a professional email to <TEAM>. The purpose of this mail is defined in <purpose>. Use the job description and responsibilities defined in <responsibilities> of the email receiver and include how the purpose of the mail pertains to the responsibilities of the team.
    
    Here are the placeholders:
    - Purpose: <purpose> The actual request I have
    - Responsibilities: <responsibilities> The job description and responsibilities of the team receiving the email
    
    <purpose>
    HERE YOU WRITE YOUR EMAIL DRAFT OR BULLET POINTS
    </purpose>
    
    <responsibilities>
    HERE YOU INCLUDE THE RECEIVING END'S JOB OR TEAM DESCRIPTION
    </responsibilities>
    

    If you work in a corporate setting, like I do, getting other teams to do their job can be challenging. This prompt helps explain the tasks I need from other teams and why they specifically need to handle it. There might be better ways to structure this prompt, but this one has worked wonders for me.

    Giving Examples to the AI

    Ever seen or created training data? This is basically what you’re doing here, but directly within your prompt instead of training from scratch.

    This is awesome! // Negative
    This is bad! // Positive
    Wow, that movie was rad! // Positive
    What a horrible show!

    You’re showing the LLM examples of sentiment for similar phrases. Source: Few Shot Prompting

    Refining Results

    Don’t be shy about asking for changes. If the AI’s response is too long, too short, or just doesn’t have the right tone, ask it to refine. Don’t expect perfection on the first try. Just like dealing with real people, AI can’t read your mind and may need some guidance. That’s totally normal. Give it some feedback, and it’ll do better.

    Using Prompt Frameworks

    There are a few frameworks for structuring prompts, but I’ll just share the one I use most often. Also, check out CO-STAR, which is also fantastic.

    The AUTOMAT Framework

    Source

    • Act as a Particular Persona: Who should the AI pretend to be?
    • User Persona & Audience: Who is the AI talking to?
    • Targeted Action: What do you want the AI to do?
    • Output Definition: How should the AI’s response be structured?
    • Mode / Tonality / Style: How should it communicate?
    • Atypical Cases: Any edge cases where the AI should respond differently?
    • Topic Whitelisting: What topics are relevant and should be included?

    You’re probably thinking, “Won’t these prompts be super long?” Yes! And that’s totally fine. With huge context windows (Gemini can even handle a million tokens), the more detail, the better.

    Honestly, this framework is pretty straightforward, but here’s a full example prompt for you:

    Act as a Particular Persona:
    You are impersonating Alex, a senior cybersecurity consultant with over 15 years of experience in network security, threat analysis, and incident response. Alex is an expert in BSI IT-Grundschutz and has extensive experience in implementing cybersecurity frameworks for large organizations, especially those in Europe.
    
    User Persona & Audience:
    You are talking to the head of IT security for a mid-sized financial services company in Germany. The user is familiar with cybersecurity principles but needs expert guidance on implementing BSI IT-Grundschutz in their organization.
    
    Targeted Action:
    Provide a detailed action plan for implementing the BSI IT-Grundschutz standards within the organization. The plan should cover the initial steps, necessary documentation, risk assessment methods, and key security measures that align with BSI guidelines.
    
    Output Definition:
    The response should be structured with an introduction, followed by a step-by-step action plan that includes specific recommendations for each phase of the BSI IT-Grundschutz implementation. Use bullet points for clarity and end with a list of resources or references to official BSI documentation for further reading.
    
    Mode / Tonality / Style:
    The response should be professional, authoritative, and concise, using technical language appropriate for someone with a strong IT background. The tone should be supportive and proactive, providing practical solutions that can be implemented efficiently.
    
    Atypical Cases:
    If the user mentions specific concerns about compliance with German federal regulations or

    Wrapping It Up

    So, there you have it! A crash course in prompt engineering that doesn’t make your brain melt. Whether you’re a total newbie or a seasoned pro, these simple tips can seriously level up how you interact with AI. Just remember: be specific, structure your prompts, give context, use examples, and don’t be afraid to refine. With a little practice, you’ll be getting the most out of your LLMs without diving into complicated tools or frameworks. Now go forth and make your AI do all the hard work while you kick back. Cheers to smarter, lazier working!