📚 Welcome to SQL with AI
SQL (Structured Query Language) is the universal language for interacting with relational databases. It powers every enterprise application — SAP, Oracle, Salesforce, banking systems, e-commerce platforms, HR systems and more. SQL is the #1 skill requested in data-related job descriptions worldwide.
This course adds a powerful twist — you will also learn how to use AI tools like ChatGPT to generate, explain, debug and optimise SQL queries in seconds, making you dramatically more productive as a data professional.
💡 How to use this page: Click any topic in the left sidebar. All notes load instantly. Every section has working SQL examples. Use Prev / Next to go in order.
📈 SQL Basics — DDL, DML, DCL, TCL
BeginnerSQL Command Categories
| Category | Full Form | Commands | Purpose |
|---|---|---|---|
| DDL | Data Definition Language | CREATE, ALTER, DROP, TRUNCATE | Define and modify table structure |
| DML | Data Manipulation Language | SELECT, INSERT, UPDATE, DELETE | Manipulate data inside tables |
| DCL | Data Control Language | GRANT, REVOKE | Control user permissions |
| TCL | Transaction Control Language | COMMIT, ROLLBACK, SAVEPOINT | Manage database transactions |
CREATE TABLE — Build Your Database
-- Create a training courses table
CREATE TABLE courses (
course_id INT PRIMARY KEY AUTO_INCREMENT,
course_name VARCHAR(100) NOT NULL,
category VARCHAR(50),
duration_days INT DEFAULT 3,
fee DECIMAL(10,2),
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Create students table with foreign key
CREATE TABLE enrollments (
enroll_id INT PRIMARY KEY AUTO_INCREMENT,
student_name VARCHAR(100) NOT NULL,
course_id INT,
enroll_date DATE,
score INT CHECK (score BETWEEN 0 AND 100),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);INSERT, UPDATE, DELETE
-- INSERT — add rows
INSERT INTO courses (course_name, category, duration_days, fee)
VALUES
('SAP Security & GRC', 'SAP', 5, 25000),
('Python with AI', 'IT', 4, 18000),
('Leadership Skills', 'Soft', 2, 12000);
-- UPDATE — always use WHERE! Without WHERE = updates ALL rows
UPDATE courses
SET fee = 22000, duration_days = 6
WHERE course_name = 'SAP Security & GRC';
-- DELETE — always use WHERE! Without WHERE = deletes ALL rows
DELETE FROM enrollments
WHERE enroll_date < '2020-01-01';
-- TRUNCATE — faster than DELETE for clearing entire table
TRUNCATE TABLE temp_data; -- Cannot rollback! Use carefully🚫 Critical Rule: NEVER run UPDATE or DELETE without a WHERE clause in production. Always test with SELECT first using the same WHERE condition to verify which rows will be affected.
Data Types — Choosing Correctly
| Category | Type | Use For | Example |
|---|---|---|---|
| Numeric | INT / BIGINT | Whole numbers — IDs, counts, ages | course_id INT, age INT |
| Numeric | DECIMAL(p,s) | Exact decimals — money, percentages | fee DECIMAL(10,2) |
| Numeric | FLOAT / DOUBLE | Approximate decimals — scientific data | measurement FLOAT |
| String | VARCHAR(n) | Variable-length text — names, emails | name VARCHAR(100) |
| String | CHAR(n) | Fixed-length text — codes, status | status CHAR(1) |
| String | TEXT | Long content — descriptions, notes | description TEXT |
| Date/Time | DATE | Date only — birthdays, deadlines | dob DATE |
| Date/Time | DATETIME | Date + time — transactions, logs | created_at DATETIME |
| Boolean | TINYINT(1) / BOOL | True/False flags | is_active BOOL |
🔍 SELECT & Filtering — Complete Guide
Beginner → IntermediateThe SELECT statement is the most used SQL command. Mastering every clause of SELECT is the foundation of all data work.
SELECT — Full Syntax & Clause Order
-- Clauses must appear in THIS order:
SELECT column1, column2, COUNT(*) -- 1. What columns
FROM table_name -- 2. Which table
JOIN other_table ON condition -- 3. Join tables
WHERE condition -- 4. Filter ROWS (before grouping)
GROUP BY column1 -- 5. Group rows
HAVING COUNT(*) > 5 -- 6. Filter GROUPS (after grouping)
ORDER BY column1 DESC -- 7. Sort results
LIMIT 10; -- 8. Limit rows returned💡 Interview Gold: Execution order is different from write order: FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT. This is why you cannot use a SELECT alias in a WHERE clause — WHERE runs before SELECT!
WHERE — All Filtering Operators
-- Comparison operators
WHERE score = 85 -- Equal
WHERE score != 85 -- Not equal (also: <>)
WHERE score > 80 -- Greater than
WHERE score BETWEEN 70 AND 90 -- Inclusive range
-- Pattern matching
WHERE name LIKE 'Hari%' -- Starts with Hari
WHERE name LIKE '%Krishna' -- Ends with Krishna
WHERE name LIKE '%ari%' -- Contains 'ari'
-- List membership
WHERE category IN ('SAP', 'IT', 'Soft')
WHERE category NOT IN ('Old')
-- NULL checks — must use IS NULL, not = NULL
WHERE email IS NULL
WHERE email IS NOT NULL
-- Combine conditions
WHERE category = 'SAP' AND fee > 15000
WHERE category = 'SAP' OR category = 'IT'Useful String & Date Functions
-- String functions
SELECT UPPER(name), LOWER(name), LENGTH(name) FROM students;
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM students;
SELECT TRIM(name), SUBSTRING(name, 1, 3) FROM students;
SELECT REPLACE(phone, '-', '') FROM students;
-- Date functions
SELECT NOW(), CURDATE(), CURTIME();
SELECT YEAR(enroll_date), MONTH(enroll_date), DAY(enroll_date);
SELECT DATEDIFF(NOW(), enroll_date) AS days_enrolled;
SELECT DATE_FORMAT(enroll_date, '%d %M %Y') AS formatted_date;🔗 JOINs — Complete Guide with Diagrams
IntermediateJOINs combine rows from two or more tables based on a related column. This is the most important and most tested SQL concept in interviews. Master all 5 types.
Visual Guide to All JOIN Types
JOIN Examples — Using Cuesys Database
-- INNER JOIN — only students who enrolled in a course
SELECT e.student_name, c.course_name, e.score
FROM enrollments e
INNER JOIN courses c ON e.course_id = c.course_id;
-- LEFT JOIN — ALL students, even those NOT enrolled
SELECT s.name, e.course_id, e.score
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id;
-- Students with no enrollment will have NULL in course_id
-- Find students NOT enrolled in any course (LEFT JOIN trick)
SELECT s.name
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
WHERE e.student_id IS NULL;
-- Multi-table JOIN — 3 tables at once
SELECT
s.name AS student,
c.course_name,
t.name AS trainer,
e.score
FROM enrollments e
JOIN students s ON e.student_id = s.student_id
JOIN courses c ON e.course_id = c.course_id
JOIN trainers t ON c.trainer_id = t.trainer_id
ORDER BY e.score DESC;📊 Functions & Aggregations
IntermediateAggregate Functions — GROUP BY & HAVING
-- Aggregate functions overview
SELECT
COUNT(*) AS total_enrollments,
COUNT(DISTINCT course_id) AS unique_courses,
SUM(fee) AS total_revenue,
AVG(score) AS average_score,
MAX(score) AS highest_score,
MIN(score) AS lowest_score
FROM enrollments;
-- GROUP BY — aggregate per category
SELECT
c.category,
COUNT(e.enroll_id) AS enrollments,
AVG(e.score) AS avg_score,
SUM(c.fee) AS total_revenue
FROM enrollments e
JOIN courses c ON e.course_id = c.course_id
GROUP BY c.category
HAVING COUNT(e.enroll_id) > 10 -- Filter GROUPS not rows
ORDER BY total_revenue DESC;Window Functions — Advanced Analytics
-- RANK students within each course by score
SELECT
student_name,
course_id,
score,
RANK() OVER (PARTITION BY course_id ORDER BY score DESC) AS course_rank,
ROW_NUMBER() OVER (ORDER BY score DESC) AS overall_rank,
AVG(score) OVER (PARTITION BY course_id) AS course_avg,
score - AVG(score) OVER (PARTITION BY course_id) AS diff_from_avg
FROM enrollments;
-- Running total (cumulative sum)
SELECT
enroll_date,
fee,
SUM(fee) OVER (ORDER BY enroll_date) AS running_total
FROM enrollments
JOIN courses USING(course_id);🏆 Interview Tip: RANK() gives equal rank to ties and skips the next number (1,1,3). DENSE_RANK() gives equal rank but does NOT skip (1,1,2). ROW_NUMBER() always gives unique numbers even on ties. Know when to use each!
💥 Subqueries & CTEs
AdvancedSubqueries — Queries Inside Queries
-- Subquery in WHERE — students above average score
SELECT student_name, score
FROM enrollments
WHERE score > (SELECT AVG(score) FROM enrollments);
-- Subquery in FROM (derived table)
SELECT course_id, AVG(score) AS avg_score
FROM (
SELECT * FROM enrollments WHERE score >= 70
) AS passing_students
GROUP BY course_id;
-- Correlated subquery — runs once per outer row
SELECT student_name, score
FROM enrollments e1
WHERE score > (
SELECT AVG(score)
FROM enrollments e2
WHERE e2.course_id = e1.course_id -- references outer query
);CTEs — Common Table Expressions (WITH clause)
-- CTE makes complex queries readable and reusable
WITH course_stats AS (
SELECT
c.course_name,
c.category,
COUNT(e.enroll_id) AS total_students,
AVG(e.score) AS avg_score,
SUM(c.fee) AS revenue
FROM courses c
LEFT JOIN enrollments e ON c.course_id = e.course_id
GROUP BY c.course_id, c.course_name, c.category
),
top_courses AS (
SELECT * FROM course_stats
WHERE total_students > 20
)
SELECT course_name, total_students, avg_score, revenue
FROM top_courses
ORDER BY revenue DESC;⚙️ Stored Procedures, Functions & Views
AdvancedStored Procedures
-- Create a reusable procedure to enroll a student
DELIMITER //
CREATE PROCEDURE EnrollStudent(
IN p_student_name VARCHAR(100),
IN p_course_id INT,
OUT p_message VARCHAR(200)
)
BEGIN
DECLARE v_seats INT;
-- Check available seats
SELECT available_seats INTO v_seats
FROM courses
WHERE course_id = p_course_id;
IF v_seats > 0 THEN
INSERT INTO enrollments (student_name, course_id, enroll_date)
VALUES (p_student_name, p_course_id, CURDATE());
UPDATE courses SET available_seats = available_seats - 1
WHERE course_id = p_course_id;
SET p_message = CONCAT('Enrolled: ', p_student_name);
ELSE
SET p_message = 'Error: No seats available';
END IF;
END//
DELIMITER ;
-- Call the procedure
CALL EnrollStudent('Ravi Kumar', 1, @msg);
SELECT @msg;Views — Virtual Tables
-- Create a view for frequently used complex query
CREATE VIEW vw_enrollment_summary AS
SELECT
e.student_name,
c.course_name,
c.category,
e.score,
CASE
WHEN e.score >= 90 THEN 'Distinction'
WHEN e.score >= 75 THEN 'Merit'
WHEN e.score >= 60 THEN 'Pass'
ELSE 'Fail'
END AS grade
FROM enrollments e
JOIN courses c ON e.course_id = c.course_id;
-- Use view like a normal table
SELECT * FROM vw_enrollment_summary WHERE grade = 'Distinction';⚡ Indexes & Query Optimisation
AdvancedA slow query can take minutes; an optimised query takes milliseconds. Query optimisation is one of the highest-impact skills a database professional can have — and it is always asked in senior-level interviews.
Indexes — How They Work
-- Create indexes
CREATE INDEX idx_course_name ON courses(course_name);
CREATE INDEX idx_enroll_date ON enrollments(enroll_date);
CREATE UNIQUE INDEX idx_email ON students(email);
CREATE INDEX idx_composite ON enrollments(course_id, score);
-- EXPLAIN — see how MySQL executes your query
EXPLAIN SELECT * FROM enrollments WHERE course_id = 5;
-- Look for: type=ref (good), type=ALL (bad — full table scan)
-- Check which indexes exist
SHOW INDEXES FROM enrollments;Query Optimisation Rules
🧠 AI-Powered SQL — Use ChatGPT to Write & Debug Queries
All LevelsThis is the game-changer skill of 2026. Instead of spending hours writing complex queries from scratch, you can use ChatGPT, GitHub Copilot and other AI tools to generate, explain and optimise SQL queries in seconds.
10 Powerful ChatGPT Prompts for SQL
Python + ChatGPT API to Auto-Generate SQL
"""Use natural language to generate SQL queries automatically"""
from openai import OpenAI
import mysql.connector
client = OpenAI()
def natural_language_to_sql(question, schema):
"""Convert plain English question to SQL query"""
prompt = f"""
Database schema:
{schema}
Convert this question to a MySQL SELECT query:
"{question}"
Return ONLY the SQL query, nothing else. No explanation.
"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role":"user","content":prompt}],
temperature=0
)
return response.choices[0].message.content.strip()
# Your database schema
schema = """
courses(course_id, course_name, category, fee, duration_days)
enrollments(enroll_id, student_name, course_id, score, enroll_date)
"""
# Ask in plain English!
query = natural_language_to_sql(
"Which SAP courses have more than 20 enrollments and average score above 80?",
schema
)
print(f"Generated SQL:\n{query}")✅ Real Result from AI: The above prompt will generate: SELECT c.course_name, COUNT(*) as enrollments, AVG(e.score) as avg_score FROM courses c JOIN enrollments e ON c.course_id = e.course_id WHERE c.category = 'SAP' GROUP BY c.course_id, c.course_name HAVING COUNT(*) > 20 AND AVG(e.score) > 80;
💻 5 Real-World SQL Projects for Your Resume
All LevelsBeginner
Design and query a complete HR system — employees, departments, salaries, leave records, performance reviews.
Intermediate
Full database for Cuesys Infotech — courses, trainers, enrollments, certificates, invoices.
Intermediate
Analyse sales data — monthly trends, top products, regional performance, customer cohorts.
Advanced
Orders, products, inventory, payments, shipping — complex multi-table database.
Advanced
Python app that takes business questions in plain English → generates SQL → executes → returns results.
❓ SQL Interview Q&A — 70+ Real Questions
Most frequently asked SQL questions at MNCs, IT companies, banking, e-commerce and data roles. Click to reveal model answers.
Model Answer:
WHERE filters individual rows BEFORE grouping. HAVING filters groups AFTER the GROUP BY clause. WHERE cannot use aggregate functions (COUNT, SUM, AVG) because aggregation hasn't happened yet. HAVING can use aggregate functions. Example: WHERE score > 80 filters individual rows. HAVING COUNT(*) > 5 filters groups that have more than 5 rows. Execution order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. This is why you cannot reference a SELECT alias in WHERE (WHERE runs before SELECT), but you can use it in ORDER BY (ORDER BY runs after SELECT).
Model Answer:
DELETE removes specific rows using a WHERE clause. It is logged (can be rolled back with ROLLBACK), fires triggers and is slower for large datasets. DELETE FROM students WHERE grade = "F". TRUNCATE removes ALL rows from a table instantly. It is not fully logged (cannot be rolled back in most databases), does not fire triggers, resets AUTO_INCREMENT counter and is much faster than DELETE for clearing entire tables. DROP removes the entire table structure along with all data, indexes, triggers and constraints — it is permanent and cannot be rolled back. Memory aid: DELETE = selective surgeon. TRUNCATE = table wiper. DROP = table destroyer.
Model Answer:
INNER JOIN returns only rows where the join condition is satisfied in BOTH tables — rows with no match in either table are excluded. LEFT JOIN returns ALL rows from the LEFT table and matching rows from the RIGHT table — unmatched right-side rows appear as NULL. RIGHT JOIN is the mirror — ALL rows from RIGHT, matched rows from LEFT. FULL OUTER JOIN returns ALL rows from BOTH tables — unmatched rows appear as NULL on the missing side (not all databases support FULL OUTER JOIN). CROSS JOIN returns the Cartesian product — every row in table A paired with every row in table B (A=5 rows, B=3 rows → 15 rows). SELF JOIN joins a table with itself — used for hierarchical data like employee-manager relationships.
Model Answer:
An index is a separate data structure (usually a B-Tree) that stores a sorted copy of one or more columns with pointers to the full row data. When you query with WHERE on an indexed column, the database uses the index instead of scanning every row, reducing query time from O(n) to O(log n). Create indexes on: (1) Primary keys and foreign keys (automatically indexed in most databases). (2) Columns frequently used in WHERE clauses. (3) Columns used in JOIN conditions. (4) Columns used in ORDER BY when sorting large result sets. Avoid too many indexes: indexes speed up reads but slow down writes (INSERT, UPDATE, DELETE must update all indexes). Rule of thumb: tables with mostly reads benefit heavily from indexes; high-write tables need fewer, carefully chosen indexes.
Model Answer:
A stored procedure is a precompiled set of SQL statements stored in the database and executed as a single unit. Advantages: (1) Performance — compiled once and cached; faster than sending individual queries. (2) Security — grant EXECUTE permission without giving direct table access; users cannot see underlying logic. (3) Code reuse — write once, call from any application or other procedures. (4) Reduced network traffic — one call executes many statements instead of sending multiple queries over the network. (5) Business logic in database — ensures consistent business rules regardless of which application calls the data. Disadvantages: harder to version control than application code, database-vendor specific syntax (not portable), can be harder to debug and test than application code.
Model Answer:
All three are window functions that assign numbers to rows. ROW_NUMBER() assigns a unique sequential number to every row — no ties are possible (1, 2, 3, 4, 5). RANK() assigns the same rank to tied rows but leaves gaps — two rows tied for rank 2 both get rank 2, and the next row gets rank 4 (not 3). DENSE_RANK() assigns the same rank to tied rows but WITHOUT gaps — two rows tied for rank 2 both get rank 2, and the next row gets rank 3. Real example: scores 90, 85, 85, 80 → ROW_NUMBER: 1,2,3,4 → RANK: 1,2,2,4 → DENSE_RANK: 1,2,2,3. Use RANK when rank gaps matter (competitions). Use DENSE_RANK when you want continuous ranking (categories). Use ROW_NUMBER when you just need unique sequential ordering.
Model Answer:
A CTE (Common Table Expression) is a named temporary result set defined with the WITH keyword that exists only for the duration of the query. Advantages over subqueries: (1) Readability — complex queries become self-documenting; each CTE has a descriptive name. (2) Reusability — a CTE can be referenced multiple times in the same query (a subquery must be repeated). (3) Recursion — CTEs support recursive queries for hierarchical data (org charts, bill of materials) — subqueries cannot. (4) Debugging — easier to test CTEs individually. Use a CTE when: the query has multiple logical steps, you need to reference the same result set more than once, or you need recursive queries for tree/hierarchy data. Use a subquery for simple, single-use filtering where readability is not a concern.
Model Answer:
Classic approach using GROUP BY and HAVING: SELECT email, COUNT(*) as occurrences FROM students GROUP BY email HAVING COUNT(*) > 1; This finds all email values that appear more than once. To see the full duplicate rows: SELECT * FROM students WHERE email IN (SELECT email FROM students GROUP BY email HAVING COUNT(*) > 1) ORDER BY email; To delete duplicates keeping only one (the one with lowest ID): DELETE FROM students WHERE id NOT IN (SELECT MIN(id) FROM students GROUP BY email); Always verify your DELETE query with a SELECT first! Use ROW_NUMBER() in more complex scenarios where you need to identify duplicates across multiple columns.
🏅 Certification Tip: For Oracle SQL Certified Associate (1Z0-071): focus on SELECT, JOINs, GROUP BY/HAVING, subqueries, DML statements and single-row functions. For MySQL Developer (1Z0-147): add stored procedures, triggers, views and transactions. These Q&As cover the core of both exams.