Inference / Serving / Evaluation
Med Toujani
Inference Engineer
Building the infrastructure that makes frontier AI run at scale.
About
A bit about me.
I'm a final-year engineering student at ENSICAEN specializing in AI & Software Engineering, currently interning at OVHcloud's R&D Hardware team in Croix where I build and benchmark inference infrastructure for frontier LLMs on 4×H100 GPU clusters.
I work across the full serving stack: TensorRT-LLM engine compilation, Triton backends, and monitoring with DCGM, Prometheus, and Grafana.
Outside of work: Video games, basketball player and concerts.
Experience
Where I've worked and what I built there.
- Developing an SRE Assistant for the Baremetal hardware team using RAG with a regenerated Knowledge Base and MCP.
- Built the full inference pipeline end to end: TRT-LLM engine compilation, Triton Backend, DCGM, Prometheus, Grafana.
- Serving LLMs and VLMs on HPC: 4×H100, 4×L40S, with TP/PP parallelism, MPI orchestration, and fixes for hard caps on model configs and VRAM constraints.
- Discovered and fixed a bug in TensorRT-LLM. Issue link on GitHub: #12805.
- Designing and building an end-to-end evaluation framework for long-context LLMs at 1M+ tokens, covering retrieval, reconstruction, and regeneration over large-scale corpora.
- Developed an AI assistant for automated exploration and querying of enterprise metadata, improving information retrieval.
- Took full responsibility for the team's AI inference and serving stack based on vLLM.
- Implemented RAG pipelines to query database schemas and columns.
- Ensured a reproducible microservices based environment: integrated OpenMetadata, MariaDB, and FastAPI in a containerized Docker Compose environment.
- Built an AI-powered threat intelligence aggregation platform (SOC Assistant) for SOC teams: collects and analyzes CTI data from MISP, AlienVault, VirusTotal, and X, extracts IOCs and CVEs with NLP, and drives real-time alerting dashboards.
- Built NER pipelines for extraction and enrichment of IOCs and CVEs with SpaCy.
- Developed multi-source data pipelines from MISP, AlienVault, and VirusTotal via Python APIs.
- Co-developed a conversational RAG-based chatbot handling user inquiries about card benefits, pricing, and insurance plans.
- Built on Google Cloud Platform and the Dialogflow API.
Projects
Selected projects.
DERRAL
Long-Context LLM Evaluation FrameworkEnd-to-end benchmark for evaluating frontier LLMs on areas of expertise at 1M+ token context windows. Covers retrieval, reconstruction, and regeneration across large corpora with token budget optimization.
HaiChat
Hybrid AI Chat PlatformMy first concrete inference work: Hybrid LLM inference pipeline combining a locally served model through vLLM and a cloud model through API calls, backed by a Docker and FastAPI microservices architecture for routing and serving.
Epidemic Simulation
Systems ProgrammingLow-level C application modeling viral spread across a 7×7 city grid with multiple citizen archetypes, a multi-process architecture, and explicit inter-process communication.
Mini Project SSI: Load Balancer
SYSTEMS / INFRASTRUCTUREHAProxy-based load balancer distributing HTTP traffic across two Apache web servers using Round Robin. Deployed on a virtualized 3-VM network (VirtualBox, Xubuntu), with static IP configuration via netplan, health checks, and a real-time HAProxy stats dashboard.
Game Price Comparator
API IntegrationApplication integrating Steam and IsThereAnyDeal APIs to centralize and compare video game prices across storefronts without juggling tabs.
Education
Academic background.
- Specialization in Artificial Intelligence, Machine Learning, Deep Learning, and Software Engineering.
- Software Engineering: OOP, Design Patterns, C/C++, Python, Algorithms, Git Workflow.
- Exposure to Cybersecurity: network security, cryptography, secure system design.
- Coursework: Linear Algebra, Probability, Advanced Algorithms, Computer Architecture, Networks, Parallel Computing.
- Ranked #1 preparatory school in Tunisia.
- Highly selective admission: only accepts the top 153 nationally at the Baccalauréat.
- Highest Honors: 18.40/20.
- Perfect grade in Mathematics: 20/20.
Skills
What I work with.
Inference & Serving
ML & AI Systems
Infrastructure & Monitoring
Programming
Contact
Open to full-time roles from October 2026.
Always open to talking inference infrastructure, LLM systems, or whatever you're building.
- Emailcontact.med.toujani@gmail.com
- LinkedInlinkedin.com/in/med-toujani
- GitHubgithub.com/1MrazorT1