Med Toujani | Inference Engineer

About

A bit about me.

I'm a final-year engineering student at ENSICAEN specializing in AI & Software Engineering, currently interning at OVHcloud's R&D Hardware team in Croix where I build and benchmark inference infrastructure for frontier LLMs on 4×H100 GPU clusters.

I work across the full serving stack: TensorRT-LLM engine compilation, Triton backends, and monitoring with DCGM, Prometheus, and Grafana.

Outside of work: Video games, basketball player and concerts.

Experience

Where I've worked and what I built there.

AI/ML Software Engineering Intern

February 2026 – August 2026

On-site · Croix, France

Developing an SRE Assistant for the Baremetal hardware team using RAG with a regenerated Knowledge Base and MCP.
Built the full inference pipeline end to end: TRT-LLM engine compilation, Triton Backend, DCGM, Prometheus, Grafana.
Serving LLMs and VLMs on HPC: 4×H100, 4×L40S, with TP/PP parallelism, MPI orchestration, and fixes for hard caps on model configs and VRAM constraints.
Discovered and fixed a bug in TensorRT-LLM. Issue link on GitHub: #12805.
Designing and building an end-to-end evaluation framework for long-context LLMs at 1M+ tokens, covering retrieval, reconstruction, and regeneration over large-scale corpora.

AI/ML Software Engineering Intern

September 2025 – February 2026

On-site · Caen, France

Developed an AI assistant for automated exploration and querying of enterprise metadata, improving information retrieval.
Took full responsibility for the team's AI inference and serving stack based on vLLM.
Implemented RAG pipelines to query database schemas and columns.
Ensured a reproducible microservices based environment: integrated OpenMetadata, MariaDB, and FastAPI in a containerized Docker Compose environment.

AI/ML Software Engineering Intern

April 2025 – August 2025

Hybrid

Built an AI-powered threat intelligence aggregation platform (SOC Assistant) for SOC teams: collects and analyzes CTI data from MISP, AlienVault, VirusTotal, and X, extracts IOCs and CVEs with NLP, and drives real-time alerting dashboards.
Built NER pipelines for extraction and enrichment of IOCs and CVEs with SpaCy.
Developed multi-source data pipelines from MISP, AlienVault, and VirusTotal via Python APIs.

AI/ML Developer (Hackathon)

September 2024

On-site · Paris, France

Co-developed a conversational RAG-based chatbot handling user inquiries about card benefits, pricing, and insurance plans.
Built on Google Cloud Platform and the Dialogflow API.

Projects

Selected projects.

DERRAL

Long-Context LLM Evaluation Framework

End-to-end benchmark for evaluating frontier LLMs on areas of expertise at 1M+ token context windows. Covers retrieval, reconstruction, and regeneration across large corpora with token budget optimization.

HaiChat

Hybrid AI Chat Platform

My first concrete inference work: Hybrid LLM inference pipeline combining a locally served model through vLLM and a cloud model through API calls, backed by a Docker and FastAPI microservices architecture for routing and serving.

GitHub Demo

Epidemic Simulation

Systems Programming

Low-level C application modeling viral spread across a 7×7 city grid with multiple citizen archetypes, a multi-process architecture, and explicit inter-process communication.

GitHub Demo

Mini Project SSI: Load Balancer

SYSTEMS / INFRASTRUCTURE

HAProxy-based load balancer distributing HTTP traffic across two Apache web servers using Round Robin. Deployed on a virtualized 3-VM network (VirtualBox, Xubuntu), with static IP configuration via netplan, health checks, and a real-time HAProxy stats dashboard.

Demo

Game Price Comparator

API Integration

Application integrating Steam and IsThereAnyDeal APIs to centralize and compare video game prices across storefronts without juggling tabs.

GitHub Demo

Education

Academic background.

ENSICAEN

Engineering Degree: AI & Software Engineering

2023 – 2026

Caen, France

Specialization in Artificial Intelligence, Machine Learning, Deep Learning, and Software Engineering.
Software Engineering: OOP, Design Patterns, C/C++, Python, Algorithms, Git Workflow.
Exposure to Cybersecurity: network security, cryptography, secure system design.
Coursework: Linear Algebra, Probability, Advanced Algorithms, Computer Architecture, Networks, Parallel Computing.

IPEST

CPGE (Preparatory Classes for Engineering Schools): Mathematics & Physics

2021 – 2023

Tunis, Tunisia

Ranked #1 preparatory school in Tunisia.
Highly selective admission: only accepts the top 153 nationally at the Baccalauréat.

Lycée Pilote Borguiba

Baccalauréat: Mathematics Track

2021

Tunis, Tunisia

Highest Honors: 18.40/20.
Perfect grade in Mathematics: 20/20.

Skills

What I work with.

Inference & Serving

ML & AI Systems

Infrastructure & Monitoring

Programming

Contact

Open to full-time roles from October 2026.

Always open to talking inference infrastructure, LLM systems, or whatever you're building.