How I Built an AI-Powered Tutor Platform

Most students don't struggle because they lack intelligence. They struggle because textbooks are static — they never respond, never check if you're following, and never slow down when you're lost. A book can't ask you "does that make sense?" and a YouTube video can't pause when your face goes blank.

That's the gap AI-Tutor was built to close. The goal was simple on paper but hard in practice: take any textbook chapter or educational video — already on the platform — and turn it into a live, two-way conversation between the student and an AI that actually teaches.

What we set out to build: A platform where a student logs in, picks a chapter already available on the platform, selects a topic, and learns it — concept by concept — through dialogue, visual aids, and real-time feedback on their answers. No passive scrolling. Active understanding.

The Challenge: Turning Static Content into a Live Teacher

The first question we had to answer wasn't technical — it was pedagogical. How does a good teacher actually teach? They don't read the chapter aloud. They break it into digestible pieces, check comprehension at each step, adjust based on the student's response, and loop back when something isn't landing.

Replicating that digitally meant solving three hard problems:

Content needs to be structured before it can be taught — a 40-page chapter is too dense to hand to an AI and say "teach this"
The AI needs context continuity — it must remember what it already taught and where the student currently is in the lesson
Assessment can't be multiple choice — real understanding shows up in handwritten working, verbal explanation, and how a student handles an unseen problem

The System at a Glance

Before diving into each layer, here's how the full system flows from a piece of content to a student completing a lesson:

End-to-End Learning Pipeline

📄

Content Upload

PDF / YouTube

→

✂️

AI Segmentation

Topics + Exercises

→

🎓

Teaching Session

Concept by concept

→

✍️

Practice Mode

Text / Voice / Image

→

📊

Assessment

Instant feedback

Every step is designed so the student never has to think about the infrastructure. They just learn. The complexity lives in the system — not in the experience.

Layer 1: Content Ingestion & Intelligent Segmentation

Before a student ever opens a chapter, the platform has already done the hard work. Admins upload the master content — a full NCERT chapter or any educational PDF — and the system automatically prepares it for teaching.

A raw chapter file isn't useful on its own — it has no awareness of which parts are concepts, which are examples, and which are exercises. We solved this with an AI-powered segmentation layer. When a chapter is uploaded by an admin, the system reads the entire document and automatically identifies the logical boundaries — where one topic ends and another begins, where worked examples sit, and where practice questions are.

Content Ingestion Flow

Input

Admin-uploaded chapter PDF YouTube URL (admin-curated)

AI Analysis

Structural understanding Topic boundary detection Exercise identification

Output

Topic segments (individual PDFs) Exercise sections Indexed content database

The result is a set of focused, self-contained topic segments — each one small enough to be taught in a single session, but complete enough to stand alone. This is what makes the teaching engine work: it only ever has to deal with one focused piece of content at a time.

Auto

Topic Segmentation

Zero

Manual Tagging Needed

Any

Chapter or Subject

Layer 2: The Teaching Engine

This is the heart of the platform. Once a student selects a topic, the teaching engine takes over. It receives the segmented content and the student's conversation history, and begins teaching — one concept at a time.

What makes this different from just asking an AI "explain this topic" is the pacing and confirmation loop. The AI doesn't dump everything at once. It introduces a concept, checks for understanding, waits for the student to respond, and only moves forward when the student signals they've got it.

AI introduces the concept

Teaches one idea clearly, with examples where needed. Avoids information overload.

Student responds — text or voice

The student can reply by typing or speaking. Voice is transcribed in real time and treated identically to text input.

AI evaluates and adapts

If the student understands, the AI proceeds. If there's confusion, it rephrases, simplifies, or gives an additional example before moving on.

Doubt panel for side questions

If a student has a specific doubt, a separate panel opens for that conversation — the main teaching session stays intact and resumes after.

Visual aids generated on demand

For geometry, graphs, or any spatial concept — the AI generates an interactive diagram directly in the chat. No external tool, no page reload.

Layer 3: Practice & Multimodal Assessment

After the teaching session, the student moves into practice mode. This is where the real test of understanding happens — and where we made our most deliberate design choices.

Most edtech platforms test with multiple choice. We believed that was a shortcut that doesn't reflect how students actually learn. Real understanding shows up in how a student works through a problem — not just whether they pick the right answer.

Assessment Input Modes

Text Input

Typed answers and explanations Step-by-step working

Voice Input

Spoken answers transcribed live Natural explanation in any language

Image Input

Photo of handwritten solution Sketch or diagram upload AI evaluates the working, not just the answer

The AI evaluates what the student submitted — gives specific feedback on what's right, what's wrong, and offers a hint rather than the answer outright. The goal is always to bring the student to the answer themselves, not to give it to them.

Input Modalities

Assessment Types

Live

Feedback Speed

Layer 4: YouTube Study Mode

One of the most interesting design decisions we made was treating YouTube as a content source — not just a video player. The insight was simple: students already learn from YouTube. The problem is it's passive. You watch, you don't interact.

We built a mode where a student pastes any educational YouTube link, the system extracts the full spoken content from the video, and then teaches it interactively — exactly as it would with a PDF chapter.

YouTube Study Mode Pipeline

🔗

YouTube URL

Pasted by student

→

🎙️

Transcript Extraction

API or audio fallback

→

🧠

Knowledge Base

Video content indexed

→

💬

Interactive Teaching

Same engine as PDF

The student never watches the video passively — they're taught its content interactively, can ask questions about it, and are assessed on it. Any YouTube video becomes a fully interactive lesson.

How the System Is Organized

The platform is broken into focused modules — each owning one part of the lifecycle. This separation made the system easier to build, test, and extend.

🔐

Auth & Profiles

User sessions, login, language preference and personalization settings.

📤

Content Management

Admin uploads master PDFs, triggers AI segmentation, and manages the full content library available to students.

🗂️

Dashboard

Topic-wise progress tracking — Not Started, In Progress, Completed — across all chapters.

🎓

Teaching Engine

AI session management, conversation history, voice transcription, diagram rendering, and YouTube mode.

📝

Assessment

Graded and practical assessments with multimodal input evaluation and instant AI feedback.

🗄️

Data Layer

Persistent storage of progress, chat histories, segmented content, and user data.

What We Learned Building This

Segmentation quality determines teaching quality. If the content chunks are too large or too small, the AI either overwhelms the student or loses important context. Getting the right granularity was one of the hardest problems to tune.
Voice is not just a convenience — it changes the dynamic. Students who use voice interact more naturally and longer than those who type. The barrier to responding drops significantly.
The doubt panel was a late addition that became essential. Without it, side questions would derail the main lesson. With it, students feel free to ask anything without losing their place.
Handwritten answer evaluation opened up the platform to real exam preparation. Students preparing for board exams need to practice writing solutions, not clicking options. Supporting image input made the assessment genuinely useful.
Multi-language support isn't just a feature — it's equity. A student who thinks in Hindi but learns from English textbooks is operating at a disadvantage. Letting them interact in their own language removes that friction entirely.

What's Coming Next

Areas we're actively developing further:

📈 Adaptive learning — questions and teaching pace that automatically adjust based on how well each student is demonstrating understanding
📊 Progress tracking & feedback — detailed per-topic progress reports for students, with insights into where they're strong and where they need more time
🌐 Broader curriculum support — expanding beyond Class 10 to cover competitive exams, higher secondary, and undergraduate subjects

👩‍💻

PeopleLabs AI Team

Building AI products that solve real problems in hiring, inventory, and education. Reach us at [email protected]