Data analyst interviews are deceptive. On the surface, they look technical — SQL, statistics, Python. But most candidates who fail don't fail on the technical questions. They fail because they can't explain their thinking, they've never practised translating analysis into a business recommendation, or they freeze when given an open-ended problem.

Here's what's actually being tested, broken down by category — with 15 questions and honest guidance on what a strong answer looks like.

SQL — What They Really Want to See

SQL is tested in almost every data analyst interview. Most interviewers aren't expecting perfection — they want to see that you can write a clean query, explain your logic, and handle joins without guessing.

Q1: "Write a query to find the top 5 customers by total revenue in the last 30 days."

What a strong answer looks like

Approach it like this: SELECT customer_id, SUM(order_value) AS total_revenue FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '30 days' GROUP BY customer_id ORDER BY total_revenue DESC LIMIT 5; Then add: "I'd also want to check if customer_id is unique per customer or if there could be duplicates I need to handle — depends on the schema." That last sentence signals real-world thinking and will impress most interviewers.

Q2: "Explain the difference between a LEFT JOIN and an INNER JOIN. When would you use each?"

Model answer

"An INNER JOIN returns only rows where there's a match in both tables — so if a customer has no orders, they won't appear. A LEFT JOIN keeps all rows from the left table regardless of whether there's a match on the right. I use INNER JOIN when I only want complete records — for example, analysing purchasing behaviour. I use LEFT JOIN when I need to identify records that DON'T have a match — for example, finding customers who haven't placed an order yet. That retention analysis use case is one of the most common places a LEFT JOIN is the right tool."

Q3: "How would you identify duplicate records in a dataset?"

Model answer

"I'd use GROUP BY on the columns that should be unique and filter for groups with a COUNT > 1: SELECT email, COUNT(*) as count FROM users GROUP BY email HAVING COUNT(*) > 1; Depending on the context, I might also use ROW_NUMBER() with PARTITION BY to identify and then remove the duplicates while keeping the most recent record. I'd also ask why the duplicates exist — sometimes they're a data pipeline issue that should be fixed upstream."

Statistics — They're Not Expecting a PhD

Most data analyst roles don't require advanced statistics. They do require a solid grasp of the fundamentals — and more importantly, knowing WHEN to use which concept.

Q4: "When would you use the mean vs. the median?"

Model answer

"The mean is sensitive to outliers — one very large or very small value drags it significantly. The median isn't affected by outliers at all, it's just the midpoint. In practice: if I'm analysing salary data or house prices or transaction values, I'd almost always use the median because a few extreme values (a £10M property in a residential dataset, a CEO salary) would make the mean misleading. If I'm working with something like test scores where the distribution is roughly normal and there are no extreme outliers, mean and median are close and either works. The key question is: does this distribution have outliers, and should they influence my measure of centre?"

Q5: "What is statistical significance and why does it matter?"

Model answer

"Statistical significance tells us whether a result is likely due to a real effect or just random chance. A p-value below 0.05 is the conventional threshold — it means there's less than a 5% probability the result occurred by chance, assuming the null hypothesis is true. In practice, this comes up most in A/B testing — if I change a button colour and conversions go up 3%, I need to know whether that difference is statistically significant before recommending a permanent change. Where I'd push back on the concept: statistical significance doesn't tell you if the effect is practically meaningful. A 0.1% conversion lift might be significant with a huge sample but not worth the engineering effort. Practical significance matters as much as statistical significance."

Business Analysis — The Questions That Separate Good from Great

Q6: "How would you design a metric to measure the health of a customer support team?"

Model answer

"I'd start by asking what 'health' means to the business. Is the priority efficiency (speed and cost), or quality (customer satisfaction), or both? Then I'd think about leading vs. lagging metrics. First response time and tickets-per-agent are leading indicators you can act on quickly. CSAT scores and churn attributed to support issues are lagging outcomes. A healthy dashboard would include both. I'd also want to understand what's NOT being measured — for example, if you only measure speed, agents might rush through tickets and hurt quality. Metrics shape behaviour, so I'd be explicit about the trade-offs of each choice." This kind of structured thinking — what are the trade-offs? — is exactly what senior analysts do.

Q7: "You pull a report and find a sudden 30% drop in conversions last Tuesday. Walk me through how you'd investigate it."

Strong approach

"I'd start by checking if the data itself is reliable — is this a tracking issue, a reporting bug, or a real drop? I'd cross-reference the conversion numbers against raw traffic. Then I'd segment: is the drop across all devices, channels, and geographies, or is it isolated to one? A global drop points to something structural (site outage, payment processor issue, pricing change). A segmented drop suggests something specific (a campaign that ended, a broken checkout flow on mobile). I'd also check what changed on Tuesday — was there a deployment, a marketing campaign change, an external event? Most analytics investigations are really debugging sessions, and the goal is to isolate the variable."

Q8: "What's the difference between correlation and causation? Give a real example."

Model answer

"Correlation means two variables move together. Causation means one causes the other. Classic example: ice cream sales and drowning rates both go up in summer. They're correlated, but ice cream doesn't cause drowning — both are caused by a third variable (hot weather). In data analysis, conflating the two can lead to bad decisions. I've seen companies conclude that users who use a feature are more likely to retain, so they push the feature harder — when actually the retained users were already more engaged and would have retained anyway. To establish causation you usually need a controlled experiment (A/B test) or careful causal inference methods. As an analyst, one of my main jobs is to flag when a stakeholder is inferring causation from correlation."

Communication — The Underrated Half

Q9: "How do you explain a complex analysis to a non-technical stakeholder?"

Model answer

"I lead with the conclusion, not the method. Stakeholders want to know what to DO, not how the analysis works. So instead of 'I ran a logistic regression on 12 months of data and the coefficient for variable X was...', I'd say 'Customers who do X in their first week are three times more likely to still be with us after 90 days — here's what that means for our onboarding flow.' If they want to understand the methodology, I explain it in analogy — I've found the analogy that works for the specific person is more valuable than the technically accurate explanation. I also always include a 'so what' — the analysis is only valuable when it changes a decision."

Q10: "Tell me about a time your analysis changed a business decision."

This is a behavioral question — use STAR format. The most impressive answers show that you didn't just present numbers, but pushed for action and followed up on the outcome.

Tools and Technical Depth

Q11: "How comfortable are you with Python for data analysis?"

Honest answer that works

"I use pandas and matplotlib regularly for exploratory analysis and visualisation. I'm comfortable with data manipulation, merging DataFrames, handling missing values, and building basic charts. I'm not a software engineer, so I wouldn't build a production ML pipeline from scratch — but for analytical work, which is what this role needs, I'm confident in Python."

💡Tip

Be specific about what you can and can't do. "I'm comfortable with Python" means nothing. "I use pandas for X and matplotlib for Y" tells them exactly where you are.

Prepare These Additional Questions

Beyond the 11 above, be ready for:

Window functions (RANK, ROW_NUMBER, LAG/LEAD) — increasingly common in SQL tests
"What's the difference between a data analyst and a data scientist?" — they want to know you understand the scope of the role
"How do you handle missing data?" — mention multiple strategies and when you'd choose each
"What's the biggest dataset you've worked with?" — if you have large-scale experience, quantify it
"Walk me through a project you're proud of" — prepare a specific, impact-driven story

What Actually Gets You Hired

The data analysts who get offers aren't always the ones with the strongest SQL. They're the ones who can clearly articulate what problem they solved, what data they used, and what the outcome was. They ask smart questions. They push back when something doesn't make sense. They understand that data analysis exists to inform decisions, not just to produce tables.

Practice the technical questions until you're comfortable. But invest equally in your ability to communicate what you found and why it mattered.

Practice all 1,000 data analyst interview questions on CentricQ — with AI feedback on every answer.

Practice free — 200 questions →

Data Analyst Interview Questions: What You'll Actually Be Asked in 2026

SQL — What They Really Want to See

Q1: "Write a query to find the top 5 customers by total revenue in the last 30 days."

Q2: "Explain the difference between a LEFT JOIN and an INNER JOIN. When would you use each?"

Q3: "How would you identify duplicate records in a dataset?"

Statistics — They're Not Expecting a PhD

Q4: "When would you use the mean vs. the median?"

Q5: "What is statistical significance and why does it matter?"

Business Analysis — The Questions That Separate Good from Great

Q6: "How would you design a metric to measure the health of a customer support team?"

Q7: "You pull a report and find a sudden 30% drop in conversions last Tuesday. Walk me through how you'd investigate it."

Q8: "What's the difference between correlation and causation? Give a real example."

Communication — The Underrated Half

Q9: "How do you explain a complex analysis to a non-technical stakeholder?"

Q10: "Tell me about a time your analysis changed a business decision."

Tools and Technical Depth

Q11: "How comfortable are you with Python for data analysis?"

Prepare These Additional Questions

What Actually Gets You Hired

More from the blog

How to Answer "Tell Me About Yourself" (With Real Examples)

Why You Keep Failing Job Interviews (An Honest Look)

The STAR Method: How to Answer Behavioral Questions Without Sounding Like a Robot