You can test it here: av.sandkiste.io

Introduction
If you’re anything like me, you’ve probably had one of those random late-night thoughts:
What if I built a scalable cluster of ClamAV instances, loaded it up with 35,000 YARA rules, and used it to really figure out what a file is capable of , whether it’s actually a virus or just acting suspicious?
It’s the kind of idea that starts as a “wouldn’t it be cool” moment and then slowly turns into “well… now I have to build it.“
And if that thought has never crossed your mind, that’s fine – because I’m going to walk you through it anyway.
How it Started
Like many of my projects, this one was born out of pure anger.
I was told, with a straight face, that scaling our ClamAV cluster into something actually usable would take multiple people, several days, extra resources, and probably outside help.
I told them I would do this in an afternoon, fully working, with REST API and Frontend
They laughed.
That same afternoon, I shipped the app.
How It’s Going
Step one: You upload a file.

The scanner gets to work and you wait for it to finish:

Once it’s done, you can dive straight into the results:

That first result was pretty boring.
So, I decided to spice things up by testing the Windows 11 Download Helper tool, straight from Microsoft’s own website.


You can see it’s clean , but it does have a few “invasive” features.
Most of these are perfectly normal for installer tools.
This isn’t a sandbox in the traditional sense. YARA rules simply scan the text inside files, looking for certain patterns or combinations, and then infer possible capabilities. A lot of the time, that’s enough to give you interesting insights, but it’s not a replacement for a full sandbox if you really want to see what the file can do in action.
The Setup
Here’s what you need to get this running:
- HAProxy: for TLS-based load balancing
- 2 ClamAV instances: plus a third dedicated to updating definitions
- Malcontent: YARA Scanner
- Database: to store scan results
You’ll also need a frontend and an API… but we’ll get to that part soon.
services:
haproxy:
image: haproxy:latest
restart: unless-stopped
ports:
- "127.0.0.1:3310:3310"
volumes:
- ./haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro
networks:
- clam-net
depends_on:
- clamd1
- clamd2
clamd1:
image: clamav/clamav-debian:latest
restart: unless-stopped
networks:
- clam-net
volumes:
- ./tmp/uploads:/scandir
- clamav-db:/var/lib/clamav
command: ["clamd", "--foreground=true"]
clamd2:
image: clamav/clamav-debian:latest
restart: unless-stopped
networks:
- clam-net
volumes:
- ./tmp/uploads:/scandir
- clamav-db:/var/lib/clamav
command: ["clamd", "--foreground=true"]
freshclam:
image: clamav/clamav-debian:latest
restart: unless-stopped
networks:
- clam-net
volumes:
- clamav-db:/var/lib/clamav
command: ["freshclam", "-d", "--foreground=true", "--checks=24"]
mariadb:
image: mariadb:latest
restart: unless-stopped
environment:
MARIADB_ROOT_PASSWORD: SECREEEEEEEET
MARIADB_DATABASE: avscanner
MARIADB_USER: avuser
MARIADB_PASSWORD: SECREEEEEEEET2
volumes:
- mariadb-data:/var/lib/mysql
ports:
- "127.0.0.1:3306:3306"
volumes:
mariadb-data:
clamav-db:
networks:
clam-net:
Here’s my haproxy.cfg:
global
daemon
maxconn 256
defaults
mode tcp
timeout connect 5s
timeout client 50s
timeout server 50s
frontend clamscan
bind *:3310
default_backend clamd_pool
backend clamd_pool
balance roundrobin
server clamd1 clamd1:3310 check
server clamd2 clamd2:3310 check
Now you’ve got yourself a fully functioning ClamAV cluster, yay 🦄🎉!
FastAPI
I’m not going to dive deep into setting up an API with FastAPI (their docs cover that really well), but here’s the code I use:
@app.post("/upload")
async def upload_and_scan(files: List[UploadFile] = File(...)):
results = []
for file in files:
upload_id = str(uuid.uuid4())
filename = f"{upload_id}_{file.filename}"
temp_path = UPLOAD_DIR / filename
with temp_path.open("wb") as f_out:
shutil.copyfileobj(file.file, f_out)
try:
result = scan_and_store_file(
file_path=temp_path,
original_filename=file.filename,
)
results.append(result)
finally:
temp_path.unlink(missing_ok=True)
return {"success": True, "data": {"result": results}}
There’s a lot more functionality in other functions, but here’s the core flow:
- Save the uploaded file to a temporary path
- Check if the file’s hash is already in the database (if yes, return cached results)
- Use pyclamd to submit the file to our ClamAV cluster
- Run Malcontent as the YARA scanner
- Store the results in the database
- Delete the file
Here’s how I use Malcontent in my MVP:
def analyze_capabilities(filepath: Path) -> dict[str, Any]:
path = Path(filepath).resolve()
if not path.exists() or not path.is_file():
raise FileNotFoundError(f"File not found: {filepath}")
cmd = [
"docker",
"run",
"--rm",
"-v",
f"{path.parent}:/scan",
"cgr.dev/chainguard/malcontent:latest",
"--format=json",
"analyze",
f"/scan/{path.name}",
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
return json.loads(result.stdout)
except subprocess.CalledProcessError as e:
raise RuntimeError(f"malcontent failed: {e.stderr.strip()}") from e
except json.JSONDecodeError as e:
raise ValueError(f"Invalid JSON output from malcontent: {e}") from e
I’m not going to get into the whole frontend, it just talks to the API and makes things look nice.
For status updates, I use long polling instead of WebSockets. Other than that, it’s all pretty straightforward.
Final Thoughts
I wanted something that could handle large files too and so far, this setup delivers, since files are saved locally. For a production deployment, I’d recommend using something like Kata Containers, which is my go-to for running sketchy, untrusted workloads safely.
Always handle malicious files with caution. In this setup, you’re not executing anything, so you should mostly be safe, but remember, AV systems themselves can be exploited, so stay careful.
As for detection, I don’t think ClamAV alone is enough for solid malware protection. It’s better than nothing, but its signatures aren’t updated as frequently as I’d like. For a truly production-grade solution, I’d probably buy a personal AV product, build my own cluster and CLI tool for it, and plug that in. Most licenses let you use multiple devices, so you could easily scale to 10 workers for about €1.50 a month (just grab a license from your preferred software key site).
Of course, this probably violates license terms. I’m not a lawyer 😬
Anyway, I just wanted to show you something I built, so I built it, and now I’m showing it.
One day, this will be part of my Sandkiste tool suite. I’m also working on a post about another piece of Sandkiste I call “Data Loss Containment”, but that one’s long and technical, so it might take a while.
Love ya, thanks for reading, byeeeeeeee ❤️