AI Operations Specialist
Company: Northeastern University
Location: Boston
Posted on: April 1, 2026
|
|
|
Job Description:
About the Opportunity This job description is intended to
describe the general nature and level of work being performed by
people assigned to this classification. It is not intended to be
construed as an exhaustive list of all responsibilities, duties and
skills required of personnel so classified . JOB SUMMARY The AI
Operations Specialist will be responsible for the day-to-day
management, monitoring, and operational support of the university's
AI systems and data pipelines across various departments. This role
is vital in ensuring AI solutions and their supporting data
infrastructure function reliably, meet performance expectations,
and continuously improve to deliver maximum value. The position
requires expertise in MLOps practices, data pipeline operations,
system monitoring, incident management, and continuous improvement
of AI systems in production environments. This role is hybrid and
in the office a minimum of three days a week to facilitate
collaboration and teamwork. In-office presence is an essential part
of our on-campus culture and allows for engaging directly with
staff and students, sharing ideas, and contributing to a dynamic
work environment. Being on-site allows for stronger connections,
more effective problem-solving, and enhanced team synergy, all of
which are key to achieving our collective goals and driving
success. * Applicants must be authorized to work in the United
States. The University is unable to work sponsor for this role, now
or in the future MINIMUM QUALIFICATIONS Knowledge and skills
required for this position are normally obtained through a
Bachelor's degree in Computer Science, Information Technology, or
related field; technical certifications in relevant areas (e.g.,
cloud platforms, MLOps, data engineering) preferred and a minimum
of 3 years of experience in IT operations, with at least 1 year
focused on AI/ML systems and data pipeline support. Experience with
cloud platforms (AWS, Azure, or GCP) and their AI/ML and data
engineering service offerings. Other necessary skills: MLOps
Experience: Demonstrated experience in operationalizing and
maintaining machine learning models in production environments,
including deployment, monitoring, and lifecycle management. Data
Pipeline Operations: Extensive experience maintaining and
troubleshooting data pipelines built with tools like Apache
Airflow, Prefect, cloud data services (AWS, Azure, GCP), and data
processing frameworks (Spark, Kafka), ensuring reliable data flow
for AI systems. System Monitoring: Proficiency in monitoring AI
system and data pipeline performance, detecting anomalies, and
implementing proactive measures to ensure system reliability and
availability. Experience in troubleshooting, diagnosing, and
resolving AI system and data infrastructure issues, with the
ability to prioritize incidents based on business impact.
Performance Optimization: Knowledge of techniques to optimize AI
system and data pipeline performance, including resource
allocation, scaling strategies, and performance tuning. Change
Management: Experience implementing changes to production AI
systems and data pipelines with minimal disruption, including
testing, validation, and rollback procedures. Data Quality
Management: Understanding of data quality principles and their
impact on AI system performance, with the ability to identify and
address data-related issues in processing pipelines. Documentation
and Knowledge Management: Excellence in creating and maintaining
operational documentation, runbooks, and knowledge articles for AI
systems and data pipelines. Automation Skills: Ability to create
and implement automation scripts and workflows to streamline
routine operational tasks for both AI systems and data flows,
enhancing overall system reliability. DevOps Practices: Familiarity
with DevOps and CI/CD principles as applied to AI systems and data
pipelines, including containerization, orchestration, and
infrastructure as code. Security Awareness: Understanding of
security best practices for AI operations and data handling,
including access control, data protection, and vulnerability
management. KEY RESPONSIBILITIES & ACCOUNTABILITIES System
Monitoring and Incident Management Monitor AI system and data
pipeline health, performance, and availability using established
monitoring tools and dashboards. Detect, triage, and resolve
incidents affecting AI systems and their data infrastructure,
coordinating with technical teams as needed. Implement proactive
measures to prevent recurring issues and minimize service
disruptions. Operational Support and Maintenance Perform routine
operational tasks to maintain AI systems and data pipelines,
including model updates, data refreshes, pipeline maintenance, and
system patches. Implement scheduled maintenance activities with
minimal service disruption. Manage user access and permissions for
AI platforms according to security policies. Performance Analysis
and Optimization Analyze AI system and data pipeline performance
metrics, identify bottlenecks and inefficiencies, and implement
optimizations to improve response times, data flow, accuracy, and
resource utilization. Monitor for model drift and data quality
issues, coordinating retraining or pipeline adjustments when
necessary. Documentation and Knowledge Management Create and
maintain comprehensive operational documentation, including
runbooks, standard operating procedures, and knowledge base
articles. Document system configurations, data pipeline
dependencies, and recovery procedures to ensure operational
continuity. Continuous Improvement and Automation Identify
opportunities for process improvement and automation in AI
operations. Develop and implement scripts and workflows to automate
routine tasks, reducing manual effort and minimizing human error.
Contribute to the evolution of MLOps practices based on operational
experience and emerging best practices. Position Type Information
Technology Additional Information Northeastern University considers
factors such as candidate work experience, education and skills
when extending an offer. Northeastern has a comprehensive benefits
package for benefit eligible employees. This includes medical,
vision, dental, paid time off, tuition assistance, wellness & life,
retirement- as well as commuting & transportation. Visit
https://hr.northeastern.edu/benefits/ for more information. All
qualified applicants are encouraged to apply and will receive
consideration for employment without regard to race, religion,
color, national origin, age, sex, sexual orientation, disability
status, or any other characteristic protected by applicable law.
Compensation Grade/Pay Type: 111S Expected Hiring Range: $87,785.00
- $123,998.75 With the pay range(s) shown above, the starting
salary will depend on several factors, which may include your
education, experience, location, knowledge and expertise, and
skills as well as a pay comparison to similarly-situated employees
already in the role. Salary ranges are reviewed regularly and are
subject to change.
Keywords: Northeastern University, Providence , AI Operations Specialist, IT / Software / Systems , Boston, Rhode Island