Engineering Handbook
DevOps

Production server setup checklist

Production checklist by phase—before deployment, while deploying, after deployment, and after an incident.

Production server setup checklist

Work through tasks in order by phase. Check items as you go—progress is saved in this browser until you use Reset checklist.

Before deployment

Environment variables

    • The project lead is responsible for correct and complete values.

    • DevOps confirms variables are present and wired in the deployment target (containers, systemd, platform config, etc.).

    • Set APP_DEBUG=false (or equivalent) and APP_ENV=production; never ship with debug enabled on public traffic.

    • Block or remove developer tools (Telescope, debugbar, route dumps) from production builds or protect them like internal admin surfaces.

Secrets and version control

    • Do not commit API keys, database passwords, APP_KEY, or provider tokens—use environment injection, a secret manager (for example AWS Secrets Manager, SSM Parameter Store, or your host’s vault), or encrypted CI secrets.

    • Rotate any credential that was ever committed or leaked; scan history if needed.

Runtime parity and lockfiles

    • Same PHP version (and extensions where it matters).

    • Same database engine and major version.

    • Same Redis, Node (front-end builds), and other stack pieces the app depends on.

    • Document any intentional differences in the project README or runbook.

    • Commit composer.lock and your JS lockfile (package-lock.json, pnpm-lock.yaml, or bun.lock).

    • Local, CI, and production must resolve the same dependency versions.

    • Regenerate and merge lockfiles when PHP/Node requirements or package versions change.

    • Prefer latest stable or your org’s LTS for PHP, database, Redis, Node, and other primary runtimes.

    • Plan upgrades so every environment moves together and parity is preserved.

    • Record any exceptions in the project docs.

Dependency and vulnerability scanning

    • Enable a team-standard tool such as Snyk and/or GitHub Dependabot on repositories that ship to production.

    • Cover application dependencies (for example Composer, npm) and, where relevant, container images and infrastructure definitions.

    • Route findings into your normal triage workflow (issues, Slack/Discord, or sprint) so critical alerts are not ignored.

Cost and budgets

    • Create budgets and forecast or threshold alerts for production (for example AWS Budgets on AWS, or your cloud’s equivalent).

    • Send notifications to email, Discord, or your finance/ops channel so unexpected spend is caught early.

    • Align budgets with tags or accounts (environment, service, owner) where your provider supports it.

Network access and firewalls

    • SSH: allow only known IP ranges (office egress, VPN, bastion)—avoid open internet on port 22 except a rare, documented exception.

    • Databases: disable public access where the platform allows; restrict inbound to app subnets, bastions, or explicitly allowlisted IPs (security groups / firewall rules).

    • Document the allowlist; review when people, offices, or networks change.

    • Allow only inbound/outbound paths the service needs (for example HTTP/HTTPS from the load balancer; app → database/Redis on private networks).

    • Apply on host firewalls (where used) and cloud controls (security groups, network ACLs, managed firewalls).

    • Document rules; revisit after architecture or vendor changes.

Security headers and CORS

    • Set headers at Nginx, load balancer, or application—follow current best practice.

    • You will scan the live site after deployment (see below).

    • Define CORS on APIs and browser-facing origins with an allowlist of schemes, hosts, and ports—avoid * with credentials.

    • Allow only the HTTP methods and headers the front end actually needs; review when adding new clients or subdomains.

TLS and HTTPS

    • Terminate TLS at the load balancer, reverse proxy, or platform (for example ACM on AWS) with a valid certificate for every public hostname.

    • Redirect HTTP to HTTPS; set renewal or expiry alerts if certificates are not fully automatic.

CDN and static assets

    • Put static assets (for example JS, CSS, images, fonts, and public uploads served as files) behind a CDN such as Amazon CloudFront or Cloudflare (or another provider your stack standardizes on).

    • Configure origins and cache behavior so immutable build assets cache aggressively while HTML or API responses follow the correct policy.

    • Serve the CDN over HTTPS with a valid certificate; use your DNS or provider’s docs for custom domains and propagation.

DNS

    • Schedule DNS updates when there is time to verify—avoid urgent changes during rush or peak hours when mistakes are costly and rollback is harder.

    • Remember that DNS can take time to propagate (TTL, caching resolvers, and provider delays vary); plan overlap or low-traffic windows when cutting over records.

    • Host DNS on a managed provider when possible—prefer Amazon Route 53 (AWS) or Cloudflare DNS unless the project already standardizes elsewhere.

Database backups

    • Enable daily automated backups for the production database.

    • Retain backups for at least 7 days unless policy explicitly allows a shorter window.

    • On a schedule (for example quarterly) or after changing backup tooling, restore to a non-production instance and confirm data and app connectivity.

    • Document who runs the drill and how long it took; fix gaps in backup scope or runbooks.

Transactional email

    • Send production mail through a proper transactional provider—examples include Amazon SES, SendGrid, or Resend—not an ad-hoc SMTP server unless policy allows.

    • Configure SPF, DKIM, and DMARC for the sending domain so deliverability and spoofing protection are in place; follow your provider’s DNS guidance.

    • Apply rate limits at the provider and in the application (per user, per IP, or per tenant) to control abuse and protect reputation.

Laravel production defaults

    • Use Redis for queue, session, and other drivers Redis supports (for example cache, broadcasting).

    • Avoid file or database drivers for these when Redis is available.

    • Run Horizon for queue workers and the Horizon dashboard.

    • Restrict the Horizon UI to trusted admins (authentication and authorization).

    • Provide a production log viewer using the team’s standard package or tool.

    • Enforce admin-only access: authentication, authorization, and/or network restrictions.

    • This is mandatory, not optional.

Load testing

    • Run load tests before major launches, big marketing events, or large architectural changes so you know limits and failure modes.

    • Use Grafana k6 (or an equivalent tool the team agrees on); script realistic scenarios and watch app, database, and queue metrics during the run.

    • Document baseline results and revisit after meaningful scale or code-path changes.

Application monitoring

Choose Sentry and/or Laravel Nightwatch per project; tick the option (or both) your app uses.

    • Enable Sentry in production with your framework’s Sentry SDK (works for Laravel and many other stacks).

    • Error logging: exceptions and failures.

    • Request logging: HTTP transactions, tracing, or performance monitoring.

    • Query logging: database query spans or slow-query visibility (per SDK and integrations).

    • For Laravel, enable Laravel Nightwatch in production per the official docs.

    • Error logging: exceptions and failures.

    • Request logging: HTTP transactions, tracing, or performance monitoring.

    • Query logging: database query visibility (per Nightwatch configuration and docs).

Infrastructure monitoring and Discord

    • Monitor CPU and memory usage.

    • Alert when usage crosses agreed thresholds.

    • Send alerts to email and/or Discord (or your team’s standard channel).

    • Monitor disk space (and volume growth where you use attached storage) so full disks do not take the app down silently.

    • Include databases, job servers, and log partitions; tune thresholds below 100% so you have time to react.

    • Monitor production URLs and critical health checks.

    • Use a dedicated service (for example Better Uptime) or equivalent.

    • Route alerts so the team is notified on incidents.

    • Configure Discord as a destination for operational alerts—use incoming webhooks or each product’s native Discord integration where available.

    • Connect Sentry (or your chosen error/monitoring tool) so new issues, regressions, or alert rules can post to the right channel.

    • Connect uptime monitoring (for example Better Uptime) so incident start, resolve, and escalation notices reach Discord.

    • Extend the same pattern to other services your stack uses (CI/CD failures, cloud or billing alerts, database or queue monitors)—keep channels scoped (for example per environment or per severity) if that helps noise control.

Incident runbook

    • Maintain a short runbook for production: how to reach on-call, where to check first (uptime, Sentry, logs, metrics), how to scale or roll back, and who approves customer comms.

    • Link to dashboards, log views, and deploy docs so responders are not guessing under pressure.

While deploying

Reverse proxy

    • If the server uses Nginx, configure access and error logs, log format, and rotation or forwarding as required.

    • Tune limits: client body size, timeouts, worker connections, rate limiting (where policy applies).

Release execution

    • Follow the project’s deploy pipeline (CI/CD, container rollout, or host deploy)—use protected branches, approvals, and change records if your org requires them.

    • Run database migrations and other release steps in the agreed order; avoid manual drift from what is documented.

    • Before or during deploy, confirm the rollback path (previous image or release, redeploy tag, migration downgrade policy, or feature flag)—especially when migrations are destructive or irreversible.

    • Document who can approve rollback and how to communicate status to stakeholders.

    • If this deployment includes DNS changes, apply them in the planned window and monitor propagation.

    • Confirm TTL and rollback steps before switching traffic.

After deployment

Smoke checks

    • Hit health checks and a short set of critical user flows (login, core API, checkout, or your app’s equivalent).

    • Confirm the deployed version or build matches what you intended (release tag, image digest, or deploy artifact).

Security verification

Monitoring verification

    • Trigger or review a test alert path where safe (for example Sentry test event, uptime check recovery, or non-prod channel) so Discord or email routing works.

    • Glance at dashboards for error rate, latency, and resource use right after deploy.

After an incident

Postmortem

    • For significant outages or data-impacting events, run a postmortem (blameless): timeline, root cause, what went well, what to improve, and tracked action items.

    • Share learnings with the team and update the runbook or checklist when gaps surface.

References

Internal

On this page