Просмотр вакансии

Сегодня 05-02-2026 04:43

15.01.2026, 13:44

LLM engineer

Работодатель: Hi, Rockits!

Город: Сербия
Занятость: Полная занятость, Удаленная работа
Опыт работы: От 3 до 6 лет

We are looking for LLM/ ML Infrastructure engineer experienced with Rust/C++ and CUDA for remote work.

Our client is building a decentralized AI infrastructure focused on running and serving ML models directly on user-owned hardware (on-prem / edge environments).

A core component of the product is a proprietary “capsule” runtime for deploying and running ML models. Currently, some components rely on popular open-source solutions (e.g., llama.cpp). Still, the strategic goal is to replace community-driven components with in-house ML infrastructure to gain complete control over performance, optimization, and long-term evolution.

In parallel, the company is developing:

its own network for generating high-quality, domain-specific datasets,
fine-tuned compact models for specialized use cases,
a research track focused on ranking, aggregation, accuracy improvements, and latency reduction.

The primary target audience is B2B IT companies.

The long-term product vision is to move beyond generic code generation and focus on high-performance, hardware-aware, and efficiency-optimized code generation.

ML Direction

1. Applied ML Track (Primary focus for this role)

Development of ML inference infrastructure
Building and evolving proprietary runtime capsules
Porting and implementing ML algorithms on a custom architecture
Low-level performance optimization across hardware platforms

2. Research Track

ML research with published papers
Improvements in answer quality and inference efficiency
Experiments with aggregation, ranking, and latency reduction

This position is primarily focused on the applied ML / engineering track.

Role

This is a strongly engineering-oriented ML role focused on inference, performance, and systems-level implementation rather than model experimentation.

Approximately 90% of the work is hands-on coding and optimization.

You will

Implement ML algorithms from research papers into production-ready code
Port existing ML inference algorithms to the company’s proprietary architecture
Develop and optimize inference
Optimize performance, memory usage, and latency
Integrate and adapt open-source ML solutions (LLaMA, VLMs, llama.cpp, etc.)
Contribute to the foundational architecture of the ML platform

Key Responsibilities

Inference Infrastructure Development:

Design and implementation of a cross-platform engine for ML model inference

Development of low-level components in Rust and C++ with focus on maximum performance

Creation and integration of APIs for interaction with the inference engine

Performance Optimization:

Implementation of modern optimization algorithms: Flash Attention, PagedAttention, continuous batching

Development and optimization of CUDA kernels for GPU-accelerated computations

Profiling and performance tuning across various GPU architectures

Optimization of memory usage and model throughput

Model Operations:

Implementation of efficient model quantization methods (GPTQ, AWQ, GGUF)

Development of memory management system for working with large language models

Integration of support for various model architectures (LLaMA, Mistral, Qwen, and others)

We are waiting from you

Strong proficiency in Rust or C++
Hands-on experience with GPU / hardware acceleration, including:
- CUDA, AMD or Metal (Apple Silicon)
Solid understanding of:
- LLM principles
- core ML algorithms
- modern ML approaches used in production systems
Ability to read ML research papers and implement them in code

Ability to write clean, efficient, highly optimized code
Interest in systems-level ML and low-level performance optimization
High level of autonomy:
- take existing algorithms from research or open-source,
- understand them deeply,
- adapt and integrate them into a new architecture
Fruent English

What The Company Offers

Remote-first setup (work from anywhere)
Dubai working hours
High level of ownership and autonomy
Flat structure
Salary in cryptocurrency
An opportunity to create a great product that will break the AI market