◉ Moradabad, Uttar Pradesh, IndiaAVAILABLE FOR WORK

▸ AI / ML & LLM Engineer · Researcher

Aayush
Kumar.

I design memory-efficient training systems, build LLM-powered products, and ship production ML — from CUDA-level activation offloading to RAG pipelines and CV inference at the edge.

Aayush Kumar
Aayush Kumar● REC

CGPA

8.75

GATE AIR

8076

PROJECTS

9+

PAPERS

01

NEUROCACHELLM TRAININGRAG PIPELINESLANGCHAINPYTORCHCOMPUTER VISIONFASTAPIMONGODBPROMPT ENGINEERINGCUDAMCP SERVERSPRODUCTION MLNEUROCACHELLM TRAININGRAG PIPELINESLANGCHAINPYTORCHCOMPUTER VISIONFASTAPIMONGODBPROMPT ENGINEERINGCUDAMCP SERVERSPRODUCTION ML

§ 01

About
the Engineer.

I'm Aayush Kumar, an undergrad researcher at the intersection of systems-level ML and applied LLM engineering. My work spans memory-efficient transformer training, retrieval-augmented agents, and shipping production-grade AI products end-to-end.

Currently pursuing B.Tech in AI & ML at Moradabad Institute of Technology. I publish research, write CUDA-aware PyTorch, and obsess over the tradeoffs between throughput, memory, and latency.

Focus

LLM Systems

Stack

PyTorch · FastAPI

Hardware

RTX 2050 / CUDA

Status

Open to roles

Languages

Python · SQL

Mindset

Ship & measure

§ 02 — Research Paper

Featured
Publication.

An academic contribution to memory-efficient LLM training, published on ResearchGate.

NeuroCache: Budget-Constrained Activation Offloading for Memory-Efficient LLM Training

Aayush Kumar

Abstract

NeuroCache proposes a budget-controlled activation offloading scheme for large language model training. By introducing a single tunable parameter k that governs how many transformer layers retain activations on-GPU versus those offloaded to pinned CPU memory via PyTorch saved_tensors_hooks, the work delivers ~15% GPU memory reduction with negligible throughput impact. Experiments on RTX 2050 reveal an optimal tradeoff at k ≈ 5.

GPU Memory Reduction

~15%

Optimal k

≈ 5

Hardware

RTX 2050 / CUDA

Throughput Impact

Negligible

LLM TrainingActivation OffloadingPyTorchCUDAMemory-Efficient MLPinned Memory

§ 03

Selected
Projects.

01 / 0915% GPU mem ↓

NeuroCache

Budget-Constrained Activation Offloading

Memory-efficient LLM training via tunable activation offloading. Achieves ~15% GPU memory reduction with negligible throughput change using PyTorch saved_tensors_hooks + pinned CPU memory.

PyTorchCUDASystems
#Research#LLM
Paper / DOI
02 / 09Live in production

IntervuAI

Autonomous AI Technical Interview Platform

Live, two-way AI technical interviewer. GPT-4o-mini drives dynamic questions, Deepgram Nova-2 transcribes candidates in real time, and ElevenLabs streams natural TTS replies. Razorpay-powered tiers and a full evaluation report. Deployed and accepting interviews now.

ReactNode.jsMongoDBGPT-4o-miniDeepgramElevenLabsRazorpay
#LLM#Full-stack#Production
Live Demo
03 / 09Risk-managed

Trading Swarm 2.0

FALCON v1 crypto trading bot

Production-ready Binance trading bot using EMA / MACD / RSI / volume signals. ATR-based SL/TP, drawdown circuit breakers, modular position manager. Deployed on Railway.

PythonBinance APIRailway
#Quant#Production
LinkedIn
04 / 0990%+ accuracy

AI Object & Finger Counter

Real-time CV with YOLOv8 + MediaPipe

Real-time object detection and gesture recognition pipeline with 90%+ accuracy across varied lighting. Optimized inference latency by 25% via image preprocessing.

YOLOv8MediaPipeOpenCV
#ComputerVision
LinkedIn
05 / 0992% accuracy

Empathy AI

Speech emotion recognition assistant

Multilingual speech emotion classifier with LangChain orchestration and real-time audio streaming. 92% classification accuracy.

LangChainSpeechPython
#AI#Audio
LinkedIn
06 / 0985% auto-fix

AI Code Analyzer

Auto-fix syntax & logical errors

FastAPI + Gemini LLM tool that detects and repairs code issues, correcting 85% of syntax & logical bugs with optional GitHub CI integration.

FastAPIGeminiGitHub API
#LLM#DevTools
LinkedIn
07 / 0990%+ extraction

OCR + LLM Document Chatbot

Query any document in plain English

FastAPI + Streamlit app combining Tesseract OCR with LLMs for high-fidelity document Q&A. 90%+ extraction accuracy across mixed document types.

FastAPIStreamlitTesseract
#LLM#OCR
LinkedIn
08 / 0998% accuracy

Smart Crop Recommender

Precision agriculture ML system

FastAPI + Streamlit ML service that recommends crops from soil + weather inputs in real time. SVC-based classifier reaching 98% accuracy.

Scikit-learnFastAPIStreamlit
#ML#AgriTech
LinkedIn
09 / 0970% time saved

AI Form Assistant

Conversational data entry automation

LLM-powered conversational assistant cutting manual form-entry time by 70%. Integrated with Google Sheets API for automatic record creation.

LLMGoogle Sheets APIPython
#Automation
LinkedIn

§ 04

Tooling &
Capabilities.

// The toolkit I reach for when building, training, and shipping.

/ 01

Programming

  • Python
  • SQL

/ 02

AI / ML

  • PyTorch
  • TensorFlow
  • Scikit-learn
  • NLP

/ 03

LLM

  • LangChain
  • LangGraph
  • RAG
  • Prompt Engineering

/ 04

Tools

  • FastAPI
  • FastMCP / MCP Servers
  • Docker
  • MongoDB
  • Git

§ 05

Education
& Credentials.

Academic record and verified certifications.

Education

B.Tech, Artificial Intelligence & Machine Learning

Moradabad Institute of Technology

2023 — 2027

CGPA: 8.75

GATE — Data Science & AI

Graduate Aptitude Test in Engineering

2026

Score: 373 · AIR: 8076

Class XII — UP Board

Uttar Pradesh Board of High School and Intermediate Education

2022 — 2023

Score: 90%

Certifications

NPTEL Elite + Gold (Top 1%)

AI: Concepts and Techniques

2025

Oracle OCI AI Foundations Associate

Oracle

2025

NPTEL Elite + Silver

Developing Soft Skills and Personality

2024

Blockchain Basics — Cyfrin Updraft

ID: JT46H0BKWVXJ

2025

§ 06

Let's
build something.

Open to research collaborations, full-time AI/ML & LLM engineering roles, and freelance ML projects. Drop a note — I read every message.

// Encrypted in transit. Stored privately. Never shared.