Principal Software Engineer, DevOps Job at Utilidata, Ann Arbor, MI

Z2t2ekxFcWxDbVlndGYrVUJnQ0dGcGxE
  • Utilidata
  • Ann Arbor, MI

Job Description

Utilidata is a fast-growing NVIDIA-backed AI company enabling AI data centers to dynamically orchestrate power and unlock more compute capacity from existing energy infrastructure. For over a decade, we have applied AI to the electric grid — bringing real-time visibility and power-flow control to complex energy infrastructure. Our Karman platform, built on a custom NVIDIA module, brings that same capability to AI data centers, giving operators a way to better use the power already available to them.

We are seeking a DevOps Engineer to help design, build, and operate Utilidata’s off-device platform that ingests, processes, and serves data flowing from edge AI devices. The role will build and maintain infrastructure across on-premises and cloud environments - bridging edge deployments with cloud-based data processing to support analytics, operations, and ML workloads at scale. This is a hands-on development role with technical leadership responsibilities and with company wide impact. This engineer will architect and maintain the systems that keep our platform running, set technical direction for infrastructure and deployment practices, and mentor engineers. This engineer will partner closely with on device, and ML teams to ensure our off-device platform is resilient, well-instrumented, and ready to scale. This is a remote position based in the United States, working with distributed teams across the country.

Responsibilities
  • Oversee the deployment and management of containerized applications using Kubernetes, ensuring optimal performance and availability
  • Contribute to strategic planning regarding how the infrastructure solutions evolve to match the requirements of Data Center partners
  • Lead the design, implementation, and maintenance of scalable and reliable systems on AWS and/or on-premise
  • Utilize Terraform for infrastructure as code to automate the provisioning and management of cloud resources
  • Monitor system performance and uptime, ensuring systems meet established service level objectives (SLOs)
  • Support SOC2 security compliance requirements for data handling
  • Mentor and guide team members in DevOps practices, promoting a culture of reliability and excellence
  • Advocate for automation of operational tasks to enhance efficiency and reduce manual intervention
  • Collaborate with cross-functional teams to build and maintain CI/CD pipelines
  • Troubleshoot and resolve complex production issues, conducting root cause analysis and implementing corrective actions
  • Participate in on-call rotations and incident response teams
  • Assist in capacity planning, performance tuning, and technical decision-making
  • Drive continuous improvement initiatives for processes and infrastructure
Minimum Qualifications 
  • 8+ years of development experience including extensive experience in platform engineering, SRE, or distributed systems, with clear senior or principal-level impact
  • Experience designing and operating infrastructure across on-premises and cloud environments
  • Strong proficiency in container orchestration, particularly Kubernetes
  • Strong proficiency with AWS services and architecture
  • Hands-on experience with Terraform for infrastructure automation
  • Familiarity with monitoring tools (Prometheus, Grafana, or similar) and observability best practices
  • Excellent problem-solving skills, leadership abilities, and attention to detail
  • Strong communication and collaboration skills, with experience in driving technical outcomes
  • Willingness to travel up to 20% of time
Enhanced Qualifications (Nice to Have) 
  • Bachelor's degree in Computer Science, Engineering, or a related field
  • Experience supporting or enabling MLOps platforms, model deployment pipelines, or ML-adjacent infrastructure
  • AI Workload scheduling using Kubernetes
  • Knowledge of Apache Spark for large-scale data processing
  • Knowledge of database technologies (SQL, NoSQL)
  • Understanding of networking concepts and security best practices
Salary Range: $180,000 to $210,000 base compensation depending on experience and stock options. Salary will be commensurate with an individual's skills, training, years of experience, and in line with internal compensation bands. 

Location: This position can be performed remotely from anywhere in the United States.

Our Commitments:
Utilidata values the diversity of our team. We provide equal employment opportunities without regard to race, color, religion, creed, sex, gender, sexual orientation, gender identity or expression, national origin, age, physical disability, mental disability, medical condition, pregnancy or childbirth, sexual orientation, genetics, genetic information, marital status, or status as a covered veteran or any other basis protected by applicable federal, state and local laws.

We are committed to:
  • Creating a diverse and inclusive workplace that is welcoming, supportive, affirming and respectful
  • Empowering employees to solve problems and work together to make a difference
  • Providing mentorship and growth opportunities as part of a collaborative team
  • A flexible work environment with flexible paid time off
  • Competitive compensation and benefits, including health, dental, vision, and employer-match 401k

Job Tags

Full time, Local area, Remote work, Flexible hours

Similar Jobs

Continental

Production Operator - Night Shift 7pm - 7am Job at Continental

 ...contribute to our goals with your new role as a Production Operator for the business area Surface Solutions. Continental Winchester VA has night shift (7pm - 7am) openings. HOW YOU WILL MAKE AN IMPACT As a Production Operator you will function as an active team member... 

Exelon

General Engineer - Substation Electrical Job at Exelon

 ...safety, innovation, integrity and community service. We are a Fortune 200 company, 20,000 colleagues strong serving more than 10.7 million customers at six energy companies -- Atlantic City Electric (ACE), Baltimore Gas and Electric (BGE), Commonwealth Edison (ComEd),... 

Abercrombie and Fitch Co.

A&F Co. Design Sophomore Summit - Summer Job at Abercrombie and Fitch Co.

 ...Job Description The Design Sophomore Summit provides a unique opportunity to explore a variety of design-related internships...  ...career paths in corporate retail at A&F Co. The three-day virtual event will bring students from across the country together to learn about... 

GMX, Inc.

Class A Class A CDL Tanker Endorsed Driver Job Job at GMX, Inc.

Class A Class A CDL Tanker Endorsed Driver JobGrowing company in the Indian Trail/Monroe area in need of a tanker driver.We are a private carrier with all loads going to dedicated customers.You will have assigned equipment. Our routes are mainly to the Midwest... 

Ardelyx

Clinical Records Specialist Job at Ardelyx

 ...has been approved in China with Fosun Pharma. Knight Therapeutics commercializes IBSRELA in Canada. We are seeking a Clinical Records Specialist I with excellent attention to detail and communication/organizational skills to assist in the quality review tracking and...