◉ Moradabad, Uttar Pradesh, India// AI · ML · LLM · RESEARCHAVAILABLE FOR WORK

▸ AI / ML & LLM Engineer · Researcher

Aayush
Kumar.

I design memory-efficient training systems, build LLM-powered products, and ship production ML — from CUDA-level activation offloading to RAG pipelines and CV inference at the edge.

View Projects Resume

Aayush Kumar● REC

CGPA

8.75

GATE AIR

8076

PROJECTS

PAPERS

NEUROCACHE★LLM TRAINING★RAG PIPELINES★LANGCHAIN★PYTORCH★COMPUTER VISION★FASTAPI★MONGODB★PROMPT ENGINEERING★CUDA★MCP SERVERS★PRODUCTION ML★NEUROCACHE★LLM TRAINING★RAG PIPELINES★LANGCHAIN★PYTORCH★COMPUTER VISION★FASTAPI★MONGODB★PROMPT ENGINEERING★CUDA★MCP SERVERS★PRODUCTION ML★

§ 01

About
the Engineer.

I'm Aayush Kumar, an undergrad researcher at the intersection of systems-level ML and applied LLM engineering. My work spans memory-efficient transformer training, retrieval-augmented agents, and shipping production-grade AI products end-to-end.

Currently pursuing B.Tech in AI & ML at Moradabad Institute of Technology. I publish research, write CUDA-aware PyTorch, and obsess over the tradeoffs between throughput, memory, and latency.

Focus

LLM Systems

Stack

PyTorch · FastAPI

Hardware

RTX 2050 / CUDA

Status

Open to roles

Languages

Python · SQL

Mindset

Ship & measure

§ 02 — Research Paper

Featured
Publication.

An academic contribution to memory-efficient LLM training, published on ResearchGate.

Preprint · 2025DOI: 10.13140/RG.2.2.11793.39526

NeuroCache: Budget-Constrained Activation Offloading for Memory-Efficient LLM Training

Aayush Kumar

Abstract

NeuroCache proposes a budget-controlled activation offloading scheme for large language model training. By introducing a single tunable parameter k that governs how many transformer layers retain activations on-GPU versus those offloaded to pinned CPU memory via PyTorch saved_tensors_hooks, the work delivers ~15% GPU memory reduction with negligible throughput impact. Experiments on RTX 2050 reveal an optimal tradeoff at k ≈ 5.

GPU Memory Reduction

~15%

Optimal k

≈ 5

Hardware

RTX 2050 / CUDA

Throughput Impact

Negligible

LLM TrainingActivation OffloadingPyTorchCUDAMemory-Efficient MLPinned Memory

Read on ResearchGate Download PDF

§ 03

Selected
Projects.

01 / 0915% GPU mem ↓

NeuroCache

Budget-Constrained Activation Offloading

Memory-efficient LLM training via tunable activation offloading. Achieves ~15% GPU memory reduction with negligible throughput change using PyTorch saved_tensors_hooks + pinned CPU memory.

PyTorchCUDASystems

#Research#LLM

Paper / DOI

02 / 09Live in production

IntervuAI

Autonomous AI Technical Interview Platform

Live, two-way AI technical interviewer. GPT-4o-mini drives dynamic questions, Deepgram Nova-2 transcribes candidates in real time, and ElevenLabs streams natural TTS replies. Razorpay-powered tiers and a full evaluation report. Deployed and accepting interviews now.

ReactNode.jsMongoDBGPT-4o-miniDeepgramElevenLabsRazorpay

#LLM#Full-stack#Production

Live Demo

03 / 09Risk-managed

Trading Swarm 2.0

FALCON v1 crypto trading bot

Production-ready Binance trading bot using EMA / MACD / RSI / volume signals. ATR-based SL/TP, drawdown circuit breakers, modular position manager. Deployed on Railway.

PythonBinance APIRailway

#Quant#Production

04 / 0990%+ accuracy

AI Object & Finger Counter

Real-time CV with YOLOv8 + MediaPipe

Real-time object detection and gesture recognition pipeline with 90%+ accuracy across varied lighting. Optimized inference latency by 25% via image preprocessing.

YOLOv8MediaPipeOpenCV

#ComputerVision

05 / 0992% accuracy

Empathy AI

Speech emotion recognition assistant

Multilingual speech emotion classifier with LangChain orchestration and real-time audio streaming. 92% classification accuracy.

LangChainSpeechPython

#AI#Audio

06 / 0985% auto-fix

AI Code Analyzer

Auto-fix syntax & logical errors

FastAPI + Gemini LLM tool that detects and repairs code issues, correcting 85% of syntax & logical bugs with optional GitHub CI integration.

FastAPIGeminiGitHub API

#LLM#DevTools

07 / 0990%+ extraction

OCR + LLM Document Chatbot

Query any document in plain English

FastAPI + Streamlit app combining Tesseract OCR with LLMs for high-fidelity document Q&A. 90%+ extraction accuracy across mixed document types.

FastAPIStreamlitTesseract

#LLM#OCR

08 / 0998% accuracy

Smart Crop Recommender

Precision agriculture ML system

FastAPI + Streamlit ML service that recommends crops from soil + weather inputs in real time. SVC-based classifier reaching 98% accuracy.

Scikit-learnFastAPIStreamlit

#ML#AgriTech

09 / 0970% time saved

AI Form Assistant

Conversational data entry automation

LLM-powered conversational assistant cutting manual form-entry time by 70%. Integrated with Google Sheets API for automatic record creation.

LLMGoogle Sheets APIPython

#Automation

§ 04

Tooling &
Capabilities.

// The toolkit I reach for when building, training, and shipping.

/ 01

Programming

▸Python
▸SQL

/ 02

AI / ML

▸PyTorch
▸TensorFlow
▸Scikit-learn
▸NLP

/ 03

LLM

▸LangChain
▸LangGraph
▸RAG
▸Prompt Engineering

/ 04

Tools

▸FastAPI
▸FastMCP / MCP Servers
▸Docker
▸MongoDB
▸Git

§ 05

Education
& Credentials.

Academic record and verified certifications.

Education

B.Tech, Artificial Intelligence & Machine Learning

Moradabad Institute of Technology

2023 — 2027

CGPA: 8.75

GATE — Data Science & AI

Graduate Aptitude Test in Engineering

2026

Score: 373 · AIR: 8076

Class XII — UP Board

Uttar Pradesh Board of High School and Intermediate Education

2022 — 2023

Score: 90%

Certifications

NPTEL Elite + Gold (Top 1%)

AI: Concepts and Techniques

2025

Oracle OCI AI Foundations Associate

Oracle

2025

NPTEL Elite + Silver

Developing Soft Skills and Personality

2024

Blockchain Basics — Cyfrin Updraft

ID: JT46H0BKWVXJ

2025

§ 06

Let's
build something.

Open to research collaborations, full-time AI/ML & LLM engineering roles, and freelance ML projects. Drop a note — I read every message.

ayushkumarshivaliya@gmail.com github.com/ABL4Z3 linkedin.com/in/aayush-kumar

AayushKumar.

Aboutthe Engineer.

FeaturedPublication.

NeuroCache: Budget-Constrained Activation Offloading for Memory-Efficient LLM Training

SelectedProjects.

NeuroCache

IntervuAI

Trading Swarm 2.0

AI Object & Finger Counter

Empathy AI

AI Code Analyzer

OCR + LLM Document Chatbot

Smart Crop Recommender

AI Form Assistant

Tooling &Capabilities.

Programming

AI / ML

LLM

Tools

Education& Credentials.

B.Tech, Artificial Intelligence & Machine Learning

GATE — Data Science & AI

Class XII — UP Board

NPTEL Elite + Gold (Top 1%)

Oracle OCI AI Foundations Associate

NPTEL Elite + Silver

Blockchain Basics — Cyfrin Updraft

Let'sbuild something.

Aayush
Kumar.

About
the Engineer.

Featured
Publication.

Selected
Projects.

Tooling &
Capabilities.

Education
& Credentials.

Let's
build something.