Город: Сербия
Занятость: Полная занятость, Удаленная работа
Опыт работы: От 3 до 6 лет
We are looking for LLM/ ML Infrastructure engineer experienced with Rust/C++ and CUDA for remote work.
Our client is building a decentralized AI infrastructure focused on running and serving ML models directly on user-owned hardware (on-prem / edge environments).
A core component of the product is a proprietary “capsule” runtime for deploying and running ML models. Currently, some components rely on popular open-source solutions (e.g., llama.cpp). Still, the strategic goal is to replace community-driven components with in-house ML infrastructure to gain complete control over performance, optimization, and long-term evolution.
In parallel, the company is developing:
-
its own network for generating high-quality, domain-specific datasets,
-
fine-tuned compact models for specialized use cases,
-
a research track focused on ranking, aggregation, accuracy improvements, and latency reduction.
The primary target audience is B2B IT companies.
The long-term product vision is to move beyond generic code generation and focus on high-performance, hardware-aware, and efficiency-optimized code generation.
ML Direction
1. Applied ML Track (Primary focus for this role)
-
Development of ML inference infrastructure
-
Building and evolving proprietary runtime capsules
-
Porting and implementing ML algorithms on a custom architecture
-
Low-level performance optimization across hardware platforms
2. Research Track
-
ML research with published papers
-
Improvements in answer quality and inference efficiency
-
Experiments with aggregation, ranking, and latency reduction
This position is primarily focused on the applied ML / engineering track.
Role
This is a strongly engineering-oriented ML role focused on inference, performance, and systems-level implementation rather than model experimentation.
Approximately 90% of the work is hands-on coding and optimization.
You will
-
Implement ML algorithms from research papers into production-ready code
-
Port existing ML inference algorithms to the company’s proprietary architecture
-
Develop and optimize inference
-
Optimize performance, memory usage, and latency
-
Integrate and adapt open-source ML solutions (LLaMA, VLMs, llama.cpp, etc.)
-
Contribute to the foundational architecture of the ML platform
Key Responsibilities
Inference Infrastructure Development:
Design and implementation of a cross-platform engine for ML model inference
Development of low-level components in Rust and C++ with focus on maximum performance
Creation and integration of APIs for interaction with the inference engine
Performance Optimization:
Implementation of modern optimization algorithms: Flash Attention, PagedAttention, continuous batching
Development and optimization of CUDA kernels for GPU-accelerated computations
Profiling and performance tuning across various GPU architectures
Optimization of memory usage and model throughput
Model Operations:
Implementation of efficient model quantization methods (GPTQ, AWQ, GGUF)
Development of memory management system for working with large language models
Integration of support for various model architectures (LLaMA, Mistral, Qwen, and others)
We are waiting from you
-
Strong proficiency in Rust or C++
-
Hands-on experience with GPU / hardware acceleration, including:
-
CUDA, AMD or Metal (Apple Silicon)
-
-
Solid understanding of:
-
LLM principles
-
core ML algorithms
-
modern ML approaches used in production systems
-
-
Ability to read ML research papers and implement them in code
-
Ability to write clean, efficient, highly optimized code
-
Interest in systems-level ML and low-level performance optimization
-
High level of autonomy:
-
take existing algorithms from research or open-source,
-
understand them deeply,
-
adapt and integrate them into a new architecture
-
-
Fruent English
What The Company Offers
-
Remote-first setup (work from anywhere)
-
Dubai working hours
-
High level of ownership and autonomy
-
Flat structure
-
Salary in cryptocurrency
-
An opportunity to create a great product that will break the AI market
Похожие вакансии