LinnemanLabs Infrastructure
This site runs on the full LinnemanLabs platform - the same multi-account AWS organization, observability stack, supply chain security, and hardened images behind every project. Everything described here applies to what’s serving this page.
I build and manage all of this myself. No Terraform modules, no managed platforms, no abstraction layers I don’t own. Every CloudFormation template, every Ansible role, every pipeline is hand-written because I want to understand exactly what’s running and why.
I stand on plenty of established tools - Ansible, Cosign, Hugo, dozens of others just in producing this site. What I do myself is the architecture and everything that connects it: how accounts are isolated, how signing flows through the pipeline, how the observability stack is distributed, how hardening is applied. The configuration layer is where security-relevant decisions actually live, and that’s where I want full understanding. Building it all from the ground up means every layer knows about every other layer - cross-cutting concerns like signing, telemetry, and hardening are woven through the whole stack instead of bolted on per-service.
Static Generation
- Hugo generates static HTML with no server-side rendering
- Minimal JavaScript
- Content-first design with Tailwind CSS
Application Server (linnemanlabs-web)
The site is served by a custom Go binary built with observability, security, and performance as first-class concerns:
- Prometheus metrics - request latency, error rates, and custom business metrics
- Pyroscope profiling - continuous CPU and memory profiling
- Secure CI/CD - binary deployed from signed release artifacts with SBOM, vulnerability, license, and attestation gates
Infrastructure
All infrastructure is defined as code using AWS CloudFormation and deployed across a multi-account AWS Organization. Currently running 200+ nodes across 10+ accounts.
Infrastructure as Code
Everything is defined declaratively, version-controlled, and written from scratch:
- CloudFormation - all AWS infrastructure managed through hand-crafted templates
- Packer - automated AMI builds with security hardening baked in for Ubuntu 24.04 and RHEL 9
- Ansible - system management, application deployment, and configuration management across all EC2 instances
- Bash - deployment automation, bootstrap processes, build pipelines, glue across systems
- Git-based workflows - all changes reviewed and auditable
- Parameterized templates with extensive SSM Parameter Store integration
- Multi-environment support (prod, dev, qa, staging)
AWS Architecture
Multi-account Organizations structure with proper isolation:
- Organizations - separate accounts for networking/DNS, security, CI builds, observability, and application workloads (separate account per company/project/concern)
- Transit Gateway - hub-and-spoke network topology with dedicated route tables per account
- Ingress/egress isolation - separate ingress and egress VPCs in the networking account, all traffic routes through central networking
- Private subnets - no direct internet access for application workloads
- DNS automation - Route53 with automated CNAME management
- Cross-account resource sharing - RAM with SSM parameters, cross-account security group references, KMS key policies
- EC2 instances with least-privileged IAM roles, ECR with KMS encryption and immutable tags, S3 for artifacts and TUF metadata, Secrets Manager for credential storage and rotation
Network Security
- Network isolation via Transit Gateway with per-account route tables
- Security groups with least-privilege access patterns
- VPC flow logs for network visibility
- Private endpoints for AWS service access
- HTTPS/TLS everywhere, including internal traffic
Golden AMI Pipeline
Automated, security-hardened base images:
- Packer builds for Ubuntu (x86_64, arm64) and RHEL 9
- CIS Level 2 Benchmark hardened using hand-crafted configuration
- Shared across organizational accounts
- Automated SSM parameter updates for latest stable AMIs
- Scheduled automated rebuilds
- Vulnerability scanning and AWS Patch Manager compliance before promotion
- Immutable infrastructure patterns
Security Hardening
All instances are built from hardened golden AMIs and further configured at deployment for their specific role. Controls include:
- CIS Level 2 benchmark compliance validated at AMI build time
- SSH hardening (key-only auth, no root login, restricted groups)
- Filesystem hardening (noexec on /tmp, separate partitions)
- auditd with comprehensive rules for file access, privilege escalation, and kernel module events
- AppArmor profiles
- AIDE file integrity monitoring with SHA-512 checksums
- UFW firewall with default-deny policies
- pam_faillock for brute force protection, pam_pwquality for password complexity
- ASLR, core dump restrictions, automatic security updates, minimal installed packages
- Kernel module blacklisting (USB storage, DCCP, TIPC, RDS, SCTP)
- IPv6 disabled, TCP SYN cookies enabled
- Centralized audit log collection
Detection and SIEM
Multi-signal detection across the network, kernel, and host layers, all flowing into a single SIEM. The detection stack is exercised continuously by purple-team work in the lab - I run my own offensive tooling against it, find the gaps, and tune.
- Wazuh SIEM with OpenSearch as the backend - agents deployed across all accounts and roles, centralized rule management, and integrated alerting
- Suricata for network intrusion detection, integrated with Wazuh for unified alerting
- Tetragon for eBPF-based runtime detection - kprobe and tracepoint policies covering syscall execution, socket allocation, raw socket and AF_PACKET use, and the kernel’s TCP/UDP send paths below
tcp_sendmsg - YARA for binary detection (process memory and on-disk artifacts), running through the Wazuh pipeline
- osquery for endpoint introspection
- auditd rules (file access, privilege escalation, kernel module events) ship to the SIEM rather than staying local
- Detection rules and Tetragon policies live in version control alongside the offensive tooling that exercises them
Supply Chain Security
Securing the full path from source to deployment to operation.
Self-Hosted Transparency Infrastructure
The lab runs its own Sigstore stack rather than depending on the public instance. This is the trust root that everything else’s cosign signing flow chains to. Trust material is published at trust.linnemanlabs.com.
- Fulcio issues short-lived code-signing certificates from GitHub OIDC identities
- Rekor records every signed artifact in a tamper-evident transparency log
- TesseraCT records every certificate Fulcio issues in a CT log
- Timestamp Authority provides RFC 3161 signed timestamps so signatures remain verifiable after certificates expire
- Root CA private key on an offline YubiKey, 10-year root certificate lifetime
- Fulcio CA, TSA, Rekor, and TesseraCT signing keys are non-exportable in AWS KMS, held in a dedicated KMS-only account, all signing flows are cross-account KMS API calls
Artifact Signing and Attestation
All release artifacts are cryptographically signed and attested:
- Cosign - container images signed with both keyless (OIDC via Sigstore) and AWS KMS-backed keys
- SLSA Level 3 provenance attestations
- SBOM generation - Syft (SPDX) and cyclonedx-gomod
- Vulnerability scanning - Trivy, Grype, and govulncheck
- Signatures and attestations stored as OCI referrers alongside release artifacts
- Separate signing keys per application and environment (canary/stable)
- Cross-account key access via KMS key policies and IAM roles
- SSM parameters store signer URIs for build systems
Deploy-Time Verification
- Deployment playbooks verify signatures before extracting binaries
- Public keys baked into golden AMIs as the initial trust root anchor
- Every deployment is a verification event, not just a file copy
TUF (The Update Framework)
Content delivery uses TUF for verified updates:
- S3-based TUF repository for secure update distribution
- Signed metadata with role separation and threshold signatures for critical roles
- Snapshot and timestamp for freshness guarantees
- Automatic verification before content display
- Build roles with write access scoped to specific prefixes only
- Separate TUF paths per application and channel (canary/stable)
- Cross-account read access with bucket policies
Container Security
- ECR repositories with immutable image tags
- KMS encryption for images at rest
- Lifecycle policies for image retention
- Organization-scoped pull access via service control policies
Observability Stack
The observability platform serves the full LinnemanLabs environment - 118 nodes dedicated to observability alone, supporting the larger multi-account infrastructure across all projects.
This is a full Grafana stack running as distributed microservices across multiple availability zones. Service discovery and per-app configurations drive collection across multiple exporters including custom eBPF collectors and instrumented applications. All communication uses OTLP to stay standards-based and avoid lock-in to any specific vendor implementation.
Metrics (Prometheus + Mimir)
- Prometheus - metric collection in HA configuration
- node_exporter and ebpf_exporter across all nodes with tailored eBPF collectors
- blackbox_exporter for custom checks
- Instrumented applications with full RED metrics and database query metrics
- Remote write to Mimir with exemplars enabled
- Service discovery via file_sd and static configs
- Mimir - long-term metrics storage with multi-tenancy
- Distributed architecture (distributor, ingester, querier, query-frontend, query-scheduler, store-gateway, compactor)
- S3 backend storage with memberlist for cluster coordination
- External labels for cluster, region, environment, and company
- Memcached caching layer for query results, chunks, metadata, and index
Logging (Loki)
- Loki - log aggregation with label-based indexing
- Distributed architecture (distributor, ingester, querier, query-frontend, query-scheduler, index-gateway, compactor, ruler)
- Structured logging with trace correlation
- Memberlist for cluster coordination, OTLP support for structured metadata
- S3 backend storage, retention and rate limiting per-tenant
- Memcached caching layer for query results, chunks, deduplication, and index
Tracing (Tempo)
- Tempo - distributed tracing backend
- vParquet4 block format with S3 storage
- OTLP receivers on gRPC (4317) and HTTP (4318)
- Metrics generator producing service-graphs and span-metrics
- Zone-aware replication (factor of 3)
- Memcached caching for bloom filters and parquet pages
Profiling (Pyroscope)
- Pyroscope - continuous profiling with S3 backend storage
- Grafana Alloy - agent running eBPF-based profiling on all hosts
- System-wide CPU profiling with automatic process discovery
- Production-safe with minimal overhead
- Instrumented applications for continuous CPU, memory, and goroutine profiling
Telemetry Collection
- OpenTelemetry Collector - unified telemetry pipeline for log and trace shipping
- Protocol translation (OTLP, Prometheus)
- Sampling and filtering at collection time
- journald receiver for system logs, file log receiver for direct logs
- OTLP receiver for instrumented application telemetry
- Resource detection (EC2 metadata) and attribute enrichment
- Alloy - profile pipeline for OTLP profile schema translation to Pyroscope
Visualization and Alerting
- Grafana - unified dashboards for all telemetry
- HA with PostgreSQL backend for dashboard and session storage
- OAuth/OIDC integration with Okta
- High-level environmental ops dashboards plus custom dashboards per app and service
- Dedicated memcached clusters per service
- Alertmanager - HA cluster with de-duplication
- Slack and PagerDuty integration for routing and on-call
- Comprehensive alert rules for all LGTM components
- Node-level alerts (CPU, disk, memory, network, time sync) and per-application conditions
- CloudFormation stack event notifications via Slack
AI-Augmented Operations
- Vigil - a Go service I built that closes the loop between alerts and investigation. Alertmanager posts to Vigil’s webhook, Vigil dispatches an async triage to Claude with tool access to instant and range PromQL queries against Mimir and LogQL queries against Loki. Claude iterates - typically 7-10 tool calls - until it has enough context to produce a root-cause analysis that lands in Slack
- Full conversations (every tool call and response) persisted to PostgreSQL for replay and prompt evaluation
- Every triage instrumented with OpenTelemetry GenAI semantic attributes - LLM calls, tool executions, and database writes are spans visible as a single trace in Tempo
- Pyroscope continuous profiling correlated to traces, Prometheus histograms track triage duration, token usage, tool call counts, and per-tool latency
- Vigil consumes the same observability signals as everything else and sits inside the platform it monitors
Deployment and Automation
CI/CD
- Hand-crafted pipelines from build to deploy
- Blue-green deployments for zero-downtime releases
- Automated rollback on health check failures
Bootstrap System
- S3-hosted bootstrap scripts pulled at instance launch
- Git-based configuration with deploy keys stored in Secrets Manager
- CloudFormation signal-resource for deployment status
- Autoscaling group health reporting
- Post-install hooks for service-specific configuration
Notifications
- SNS topics for autoscaling events
- Lambda functions for CloudFormation event processing
- Slack webhooks for real-time deployment notifications with AWS Organizations account name resolution for context