Chi tiết Kỹ Sư Nền Tảng Dữ Liệu CNTT
KEY RESPONSIBILITIES
ML Data Platform Engineering & Administration
- Install, configure, and maintain ML data platforms on top of Kubernetes, Object Storage, Cassandra, Postgres and related technologies
- Monitor platform performance and optimize as needed for reliability and efficiency
Platform Configuration and Maintenance
- Implement and manage platform configurations, ensuring adherence to best practices and security standards
- Regularly update and patch systems to maintain security and stability
Collaborate with Cross-Functional Teams
- Work closely with ML and data engineers and other roles to align IT needs and strategies
- Provide expert guidance on ML data platform best practices and optimization
Troubleshoot and Resolve Technical Issues
- Identify, diagnose, and resolve data platform problems in a timely manner
- Escalate complex issues to upper-level support when necessary
Backup and Recovery Management
- Implement and maintain backup and recovery strategies for data platforms, ensuring data integrity and availability
Maintain and Update Documentation
- Create and maintain documentation related to data platform administration, configuration, and maintenance
- Share knowledge with team members and contribute to a culture of continuous learning
Enhance Data Security and Compliance
- Ensure data platforms adhere to security best practices and comply with relevant regulations
- Stay up-to-date on industry trends and evolving security standards
Drive Continuous Improvement
- Evaluate and implement new technologies and techniques to enhance data platform performance and administration
- Proactively identify areas for improvement, prepare plan for implementation and get support from management and development teams
- Always prefer automation and code-first approach over hard to reproduce manual tasks
Yêu cầu Kỹ Sư Nền Tảng Dữ Liệu CNTT
REQUIREMENTS
- A degree in computer science, software engineering, information technology or related fields is preferred
- Production use experience with AI agents (like Langchain, Agno) and LLM stack (like KServe, vLLM, pgvector)
- At least 3 years of experience in data platform (Kafka, Spark, Airflow, Flink, KServe, MLFlow, Lakekeeper) and related infrastructure installation, administration, patching and automation (Linux, Ansible, Helm, Terraform, Kubernetes, Ceph)
- Continuous improvement mindset covering daily operations, stability, reliability and performance of ML data platforms
- Knowledge of key concepts like infrastructure-as-code, templates, playbooks, code versioning using GIT, CI/CD automation, high availability, disaster and recovery
- Proficient in Linux Shell, Python scripting, configuration using JSON, YAML files
- Ability to troubleshoot infrastructure, analyze logs and setup/update monitoring dashboards and metrics, describe and document root cause, attend retrospective meetings
- Understand IT systems documentation, data flow diagrams, integration diagrams and related terminology (UML)
- Understand major ML data platforms concepts – LLM, RAG, agent, quantization, fine-tuning, Data Lake, ACID, Distributed Query, Feature Store, Data Governance, Data Catalogue, Streaming, Batch , Parquet
- Proficient in using documentation (Markdown, Visio, Office 365) and communication tools (MS Teams)
- Proficient user of change management and support ticket/service desk tools (JIRA SD)
COMPENSATION & BENEFITS
- 13th Salary Fixed and KPI Bonus
- Premium Health Care program
- 24/7 Accidental Insurance
- 100% Social Insurance
- Meal + Phone Allowance
- 15 Annual Leaves
- Yearly Medical Checkup
- Professional and Transparent Working Environment
- Apply Latest Financial Technology in the World
If you are referred for this position by our Employee/Recruitment Collaborator, please apply via this LINK.
If you are an internal candidate, please apply via this LINK
Otherwise, please click the Apply button as below for application