有一张 Person 表和一张 Address 表,Person 表标识人的姓名,Address 表标识人的住址,它们的定义如下——
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Person +-------------+---------+ | Column Name | Type | +-------------+---------+ | personId | int | | lastName | varchar | | firstName | varchar | +-------------+---------+
Address +-------------+---------+ | Column Name | Type | +-------------+---------+ | addressId | int | | personId | int | | city | varchar | | state | varchar | +-------------+---------+
Input: Person table: +----------+----------+-----------+ | personId | lastName | firstName | +----------+----------+-----------+ | 1 | Wang | Allen | | 2 | Alice | Bob | +----------+----------+-----------+ Address table: +-----------+----------+---------------+------------+ | addressId | personId | city | state | +-----------+----------+---------------+------------+ | 1 | 2 | New York City | New York | | 2 | 3 | Leetcode | California | +-----------+----------+---------------+------------+ Output: +-----------+----------+---------------+----------+ | firstName | lastName | city | state | +-----------+----------+---------------+----------+ | Allen | Wang | Null | Null | | Bob | Alice | New York City | New York | +-----------+----------+---------------+----------+
这道题目明显地过了头,可以看到,Wang Allen 这个人是没有住址的,因此对应字段为空,这是一个非常显然的外连接——
1 2 3 4 5 6
SELECT firstName, lastName, city, state FROM Person LEFTJOIN Address ON Person.personId = Address.personId;
现在有一张雇员表,其中 id 字段代表当前雇员的 id,managerId 代表它的上级的 id(也是雇员),现在要求查询工资比他的上级高的雇员的名称。
1 2 3 4 5 6 7 8
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | name | varchar | | salary | int | | managerId | int | +-------------+---------+
SELECT today.id AS id FROM Weather today JOIN Weather yesterday ON today.recordDate = date_add(yesterday.recordDate, INTERVAL1DAY) WHERE today.temperature > yesterday.temperature;
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | revenue | int | | month | varchar | +-------------+---------+
但是我们做不到,对分组后的数据进行操作,只能使用聚集函数,没法进一步筛选。因此这里只能使用特定的聚集函数来进行该操作,这里符合要求的只有 SUM 了。我们可以在 SUM 中使用 IF 或 CASE 表达式,仅统计所需月份的收入,否则就使用 NULL。
1 2 3 4 5 6
SELECT id, SUM(IF(month='Jan', revenue, NULL)) AS Jan_Revenue, SUM(IF(month='Feb', revenue, NULL)) AS Feb_Revenue, SUM(IF(month='Mar', revenue, NULL)) AS Mar_Revenue -- ... FROM Department GROUPBY id;
SELECT score, (SELECT (COUNT(*) +1) FROM Scores AS s1 WHERE s1.score > s0.score) AS `rank` FROM Scores AS s0 ORDERBY s0.score DESC;
-- 等价于 SELECT score, RANK() OVER (ORDERBY score DESC) FROM Scores;
ROW_NUMBER 就有一些麻烦,需要妥善处理相等的情况。
1 2 3 4 5 6 7 8 9
SELECT score, (SELECT (COUNT(*) +1) FROM Scores AS s1 WHERE s1.score > s0.score OR (s1.score = s0.score AND s1.id > s0.id)) AS `rank` FROM Scores AS s0 ORDERBY s0.score DESC, s0.id DESC;
-- 等价于 SELECT score, ROW_NUMBER() OVER (ORDERBY score DESC) FROM Scores;
CREATEFUNCTION getNthHighestSalary(N INT) RETURNSINT BEGIN SELECT (N -1) INTO N; RETURN ( # Write your MySQL query statement below. SELECTDISTINCT salary FROM Employee ORDERBY salary DESC LIMIT N, 1 ); END
第一想法是,直接拿上一题的代码过来不就好了,这里需要注意 N 需要减 1(偏移量从 0 开始计数),LIMIT 子句里不允许对偏移量和列数进行运算,所以直接修改变量。
CREATEFUNCTION getNthHighestSalary(N INT) RETURNSINT BEGIN SELECT (N -1) INTO N; RETURN ( SELECT t.salary FROM (SELECT salary, DENSE_RANK() OVER (ORDERBY salary DESC) AS `rank` FROM Employee) AS t WHERE t.rank = N +1 LIMIT 1 ); END
SELECT d.name AS Department, e.name AS Employee, salary FROM Employee AS e JOIN Department AS d ON e.departmentId = d.id WHERE salary = (SELECTMAX(salary) FROM Employee AS e0 WHERE e0.departmentId = e.departmentId);
SELECT d.name AS Department, e.name AS Employee, salary FROM Employee AS e JOIN Department AS d ON e.departmentId = d.id WHERE (d.id, e.salary) IN ( SELECT departmentId, MAX(salary) FROM Employee GROUPBY departmentId );
SELECT d.name AS Department, e.name AS Employee, salary FROM (SELECT*, RANK() OVER (PARTITIONBY departmentId ORDERBY salary DESC) AS rk FROM Employee) AS e JOIN Department AS d ON e.departmentId = d.id WHERE e.rk =1;
这题,直接使用特定窗口函数吧……有一个 LEAD 函数和 LAG 函数,LEAD 函数获取向前看 n 行的数据,LAG 函数获取向后看 n 行的数据,这里往前后分别看一行,如果前后一行和当前行相等,说明三行相等,符合要求。
1 2 3 4 5 6 7 8
SELECTDISTINCT num AS ConsecutiveNums FROM ( SELECT num, LEAD(num) OVER (ORDERBY id) AS `lead`, LAG(num) OVER (ORDERBY id) AS `lag` FROM Logs ) AS t WHERE num=`lead` and num=`lag`;
SELECT d.name AS Department, e.name AS Employee, salary FROM (SELECT*, DENSE_RANK() OVER (PARTITIONBY departmentId ORDERBY salary DESC) AS rk FROM Employee) AS e JOIN Department AS d ON e.departmentId = d.id WHERE e.rk <=3;
SELECT d.name AS Department, e.name AS Employee, salary FROM Employee AS e JOIN Department AS d ON e.departmentId = d.id WHERE3> (SELECTCOUNT(DISTINCT salary) FROM Employee e1 WHERE e1.departmentId = e.departmentId AND e1.salary > e.salary);
可见,这种求第 N 个排名的题基本都可以用关联子查询或窗口函数来解决,其中窗口函数显然是更加通俗易懂的。
现在有一张出租车订单表和一张用户表,出租车订单表中包含用户 id 和驾驶员 id,两 id 都引用用户表,还包含订单的状态——完成或取消;用户表包含用户的角色以及是否被 ban。
现在,要求获取 2013-10-01 到 2013-10-03 日期中的每天的,用户和驾驶员均未被 ban 的,订单的取消率。
问题其实很简单——筛选出驾驶员和用户未被 ban 的订单,按天分组,统计状态为取消的订单数除以总订单数,格式化即可。
1 2 3 4 5 6 7
SELECT request_at ASDay, ROUND(1- (COUNT(IF(status = "completed", 1, NULL)) /COUNT(*)), 2) AS `Cancellation Rate` FROM Trips WHERE request_at BETWEEN "2013-10-01" AND "2013-10-03" AND client_id NOTIN (SELECT users_id FROM Users WHERE banned ='Yes') AND driver_id NOTIN (SELECT users_id FROM Users WHERE banned ='Yes') GROUPBY request_at;
这里使用 ROW 来取得前后的数据,需注意如果前后没有数据,它就干脆不取了,所以需要保证传递给 MIN 的数据为 3 个。
1 2 3 4 5 6 7 8 9 10
-- 好工整啊 -- 因为 id 本来就是 ASC 的,这里就不用加了 SELECT id, visit_date, people FROM (SELECT id, visit_date, people, MIN(people) OVER (ROWSBETWEEN1 PRECEDING AND1 FOLLOWING) >=100 ANDCOUNT(*) OVER (ROWSBETWEEN1 PRECEDING AND1 FOLLOWING) =3 ORMIN(people) OVER (ROWSBETWEEN2 PRECEDING ANDCURRENTROW) >=100 ANDCOUNT(*) OVER (ROWSBETWEEN2 PRECEDING ANDCURRENTROW) =3 ORMIN(people) OVER (ROWSBETWEENCURRENTROWAND2 FOLLOWING) >=100 ANDCOUNT(*) OVER (ROWSBETWEENCURRENTROWAND2 FOLLOWING) =3AS flag FROM Stadium) AS t WHERE flag;