Cloud Platform Engineer
Consensus
About Consensus:
Consensus is an AI search engine for scientific research. We use LLMs to help millions of users find and analyze research papers, easily. Our Series A was led by USV, with major participation from top AI investors like Nat Freidman and Daniel Gross. Consensus has been featured in The Wall Street Journal, The Atlantic, The New York Times, Nature, and a16z as one of the world's most exciting new AI search engines.
Our mission is to make expert knowledge discoverable and consumable for all. Help us build the future of research.
Role: We’re looking for a Cloud Platform Engineer to own, improve, and scale our cloud infrastructure, CI/CD workflows, and observability, while occasionally supporting backend web feature work for our high-traffic AI search engine. You’ll have broad experience across the cloud stack and work with autonomy, initiative, and a devops mindset. You are excited about ways to have our infrastructure accelerate engineers, not protecting the infrastructure from the engineers, by designing systems that enable teammates to self-serve, move quickly, and build with confidence.
Responsibilities:
- Own & improve cloud infrastructure. Keep our platform secure, performant, and highly available using Infrastructure as Code (Terraform), Kubernetes, GCP, and other cloud-native tooling.
- Build robust observability. Implement end-to-end monitoring, alerting, and clear runbooks leveraging tools like Datadog so the entire engineering team can respond confidently.
- Evolve CI/CD. Maintain and optimize GitHub-based pipelines, speed up builds through caching, improve deployment automation, and own the release process, closely aligning with QA and engineering workflows to maximize developer efficiency and experience.
- Be the deployment systems expert. Act as a proactive leader in detecting, diagnosing, resolving, and implementing long-term remediation for outages and infrastructure issues. Become the go-to expert who deeply understands system reliability and operational excellence, modeling best practices so others can self-serve and contribute confidently.
- Stress-test the stack. Design and run comprehensive load-testing, implement automated QA, and enable engineers to build robust automated unit and integration tests, ensuring safe deployments under realistic traffic conditions.
- Enable engineers. Develop empathy for pain points contributing to product, backend, data, and ML development, proactively debug performance bottlenecks and infrastructure-related issues in our FastAPI and Python backend, and support infrastructure integrations for ML inference pipelines.
Must Haves:
- 5+ years building and debugging high-throughput cloud infrastructure systems.
- Deep experience with Kubernetes, Terraform, Docker, and modern CI/CD workflows, with strong familiarity with major cloud providers such as GCP, AWS, or Azure.
- Experience setting up comprehensive monitoring, alerting, and load testing.
- Understanding of cloud security best practices.
- Proven ability to work autonomously across a broad scope of infrastructure responsibilities.
Nice-to-haves:
- Proficient in modern web backend technology such as Python, FastAPI, Postgres, Redis.
- Experience with ML infrastructure, search infrastructure, distributed data pipelines, or ElasticSearch.
- Experience optimizing performance of LLM-powered apps.
Why You’ll Succeed:
- High ownership and autonomy.
- Deliberate urgency and a track record of high-velocity delivery
- Sharp prioritization and focusing on the high-leverage problems without overengineering. Building for today with a path to scale
- Interest in science, research, and LLMs.
Compensation:
- $160-$230k cash
- Competitive Series A equity
Final offers are determined by multiple factors and may vary from the amounts listed above.