Inference / Serving / Evaluation

Med Toujani

Inference Engineer

Building the infrastructure that makes frontier AI run at scale.

Current Focus LLM inference on 4×H100 clusters
Grad Date ENSICAEN · August 2026
Base Roubaix, France

About

A bit about me.

I'm a final-year engineering student at ENSICAEN specializing in AI & Software Engineering, currently interning at OVHcloud's R&D Hardware team in Croix where I build and benchmark inference infrastructure for frontier LLMs on 4×H100 GPU clusters.

I work across the full serving stack: TensorRT-LLM engine compilation, Triton backends, and monitoring with DCGM, Prometheus, and Grafana.

Outside of work: Video games, basketball player and concerts.

Experience

Where I've worked and what I built there.

AI/ML Software Engineering Intern

February 2026 – August 2026

On-site · Croix, France

  • Developing an SRE Assistant for the Baremetal hardware team using RAG with a regenerated Knowledge Base and MCP.
  • Built the full inference pipeline end to end: TRT-LLM engine compilation, Triton Backend, DCGM, Prometheus, Grafana.
  • Serving LLMs and VLMs on HPC: 4×H100, 4×L40S, with TP/PP parallelism, MPI orchestration, and fixes for hard caps on model configs and VRAM constraints.
  • Discovered and fixed a bug in TensorRT-LLM. Issue link on GitHub: #12805.
  • Designing and building an end-to-end evaluation framework for long-context LLMs at 1M+ tokens, covering retrieval, reconstruction, and regeneration over large-scale corpora.

AI/ML Software Engineering Intern

September 2025 – February 2026

On-site · Caen, France

  • Developed an AI assistant for automated exploration and querying of enterprise metadata, improving information retrieval.
  • Took full responsibility for the team's AI inference and serving stack based on vLLM.
  • Implemented RAG pipelines to query database schemas and columns.
  • Ensured a reproducible microservices based environment: integrated OpenMetadata, MariaDB, and FastAPI in a containerized Docker Compose environment.

AI/ML Software Engineering Intern

April 2025 – August 2025

Hybrid

  • Built an AI-powered threat intelligence aggregation platform (SOC Assistant) for SOC teams: collects and analyzes CTI data from MISP, AlienVault, VirusTotal, and X, extracts IOCs and CVEs with NLP, and drives real-time alerting dashboards.
  • Built NER pipelines for extraction and enrichment of IOCs and CVEs with SpaCy.
  • Developed multi-source data pipelines from MISP, AlienVault, and VirusTotal via Python APIs.

AI/ML Developer (Hackathon)

September 2024

On-site · Paris, France

  • Co-developed a conversational RAG-based chatbot handling user inquiries about card benefits, pricing, and insurance plans.
  • Built on Google Cloud Platform and the Dialogflow API.

Projects

Selected projects.

DERRAL

Long-Context LLM Evaluation Framework

End-to-end benchmark for evaluating frontier LLMs on areas of expertise at 1M+ token context windows. Covers retrieval, reconstruction, and regeneration across large corpora with token budget optimization.

  • TensorRT-LLM
  • Triton Inference Server
  • DCGM
  • vLLM
  • Llama 4
  • Nemotron
  • Python

HaiChat

Hybrid AI Chat Platform

My first concrete inference work: Hybrid LLM inference pipeline combining a locally served model through vLLM and a cloud model through API calls, backed by a Docker and FastAPI microservices architecture for routing and serving.

  • vLLM
  • FastAPI
  • Docker
  • NextJs
  • Python
  • LLM Serving

Epidemic Simulation

Systems Programming

Low-level C application modeling viral spread across a 7×7 city grid with multiple citizen archetypes, a multi-process architecture, and explicit inter-process communication.

  • C
  • Systems Programming
  • Multi-process
  • IPC

Mini Project SSI: Load Balancer

SYSTEMS / INFRASTRUCTURE

HAProxy-based load balancer distributing HTTP traffic across two Apache web servers using Round Robin. Deployed on a virtualized 3-VM network (VirtualBox, Xubuntu), with static IP configuration via netplan, health checks, and a real-time HAProxy stats dashboard.

  • HAProxy
  • Networking
  • Linux
  • VirtualBox
  • Apache
  • Sysadmin

Game Price Comparator

API Integration

Application integrating Steam and IsThereAnyDeal APIs to centralize and compare video game prices across storefronts without juggling tabs.

  • Python
  • REST APIs
  • Steam API

Education

Academic background.

ENSICAEN

Engineering Degree: AI & Software Engineering

2023 – 2026

Caen, France

  • Specialization in Artificial Intelligence, Machine Learning, Deep Learning, and Software Engineering.
  • Software Engineering: OOP, Design Patterns, C/C++, Python, Algorithms, Git Workflow.
  • Exposure to Cybersecurity: network security, cryptography, secure system design.
  • Coursework: Linear Algebra, Probability, Advanced Algorithms, Computer Architecture, Networks, Parallel Computing.

IPEST

CPGE (Preparatory Classes for Engineering Schools): Mathematics & Physics

2021 – 2023

Tunis, Tunisia

  • Ranked #1 preparatory school in Tunisia.
  • Highly selective admission: only accepts the top 153 nationally at the Baccalauréat.

Lycée Pilote Borguiba

Baccalauréat: Mathematics Track

2021

Tunis, Tunisia

  • Highest Honors: 18.40/20.
  • Perfect grade in Mathematics: 20/20.

Skills

What I work with.

Inference & Serving

  • TensorRT-LLM
  • vLLM
  • Triton Inference Server
  • NVIDIA Dynamo
  • TGI
  • ONNX Runtime
  • TP/PP Parallelism
  • MPI

ML & AI Systems

  • PyTorch
  • Transformers
  • RAG
  • Weaviate
  • ChromaDB
  • QDrant
  • LLM Evaluation
  • Embeddings
  • FAISS

Infrastructure & Monitoring

  • Docker
  • Linux
  • Prometheus
  • Grafana
  • DCGM
  • Git
  • YAML

Programming

  • Python
  • C
  • C++
  • Bash

Contact

Open to full-time roles from October 2026.

Always open to talking inference infrastructure, LLM systems, or whatever you're building.