Why Does MySQL Use B+ Trees Instead of Skip Lists for Indexing?

===============================================================

Today I saw a very useful and clear video about the explanation of B+ tree and skip lists in Database, so I want to write a blog to record my understanding.

When we think about MySQL tables, they seem to be simply storing rows of data, similar to a spreadsheet.

Directly traversing these rows of data gives us a performance of O(n), which is quite slow. To accelerate queries, B+ trees are used for indexing, optimizing the query performance to O(log n).

But this raises a question: there are many data structures with log(n) level query performance, such as skip lists used in Redis’s zset, which are also log(n) and even simpler to implement.

So why doesn’t MySQL use skip lists for indexing?

The Structure of B+ Trees

Generally, B+ trees are a multi-level structure composed of multiple pages, each being 16Kb. For primary key indexes, the leaf nodes at the lowest level hold row data, while the internal nodes hold index information (primary key id and page number) to speed up queries.

For instance, if we want to find row data with id=5, we start from the top-level page records. Each record contains a primary key id and a page number (page address). Follow the yellow arrow; to the left, the smallest id is 1, to the right, the smallest id is 7. Therefore, if the data with id=5 exists, it must be on the left side. Following the record’s page address, we arrive at page 6, then determine id=5>4, so it must be on the right side in page 105.

In page 105, although there are multiple rows of data, they are not traversed one by one. The page also has a page directory, which can speed up the query for row data through binary search, hence finding the row data with id=5 and completing the query.

B+ Tree Diagram

Database View

Introduction

A View in SQL Server can be considered as a virtual table defined on top of it. True to its name, a view offers another entrance to look at data. A conventional view does not store actual data but merely contains a SELECT statement and metadata about the tables involved. Through views, clients no longer need to understand the underlying table structures and their relationships, as views provide a unified interface to access data.

Why Use Views?

  • Views abstract the underlying table structures, simplifying data access operations.
  • By hiding the underlying table structures, security is significantly enhanced, allowing users to see only the data provided by the views.
  • Views facilitate permission management by granting users access to views instead of the underlying tables, further strengthening security.
  • Views provide an interface for users to access data. When the underlying tables change, altering the view’s statement to adapt ensures that client programs built on this view remain unaffected.

Classification of Views in SQL

Views in SQL can be categorized into three types:

  1. Regular View
  2. Indexed View
  3. Partitioned View

Let’s discuss these view types in detail.

1) Regular View

A Regular View is defined by a SELECT statement and includes only its definition and the metadata of the referenced tables, without actually storing data. The template for creating a view as per MSDN is as follows:

1
2
3
4
5
6
7
8
9
10
11
CREATE VIEW [schema_name.]view_name [(column [, ...n])]
[WITH <view_attribute> [, ...n]]
AS select_statement
[WITH CHECK OPTION] [;]

<view_attribute> ::=
{
[ENCRYPTION]
[SCHEMABINDING]
[VIEW_METADATA]
}

Database Indexing

Yesterday I had an interview with the question about indexing, I didn’t answer very well so I’d write a blog to record my study and understanding for future useage.

Understanding the Need for Database Indexes: A Simple Example

Through a straightforward example, this tutorial aims to elucidate the necessity of database indexes. Let’s consider we possess a database table named Employee, encompassing three columns: Employee_Name, Employee_Age, and Employee_Address. Imagine this Employee table hosts thousands of rows of data.

Suppose we wish to retrieve information on all employees named ‘Jesus’ from this table. We opt to utilize the following query statement:

1
SELECT * FROM Employee WHERE Employee_Name = 'Jesus'

What transpires if the table lacks an index?

Upon executing this query, what unfolds as the database endeavors to locate employees named Jesus? The database is compelled to sift through each row within the Employee table to ascertain whether the employee’s name (Employee_Name) matches ‘Jesus’. Since our goal is to acquire information on every employee named Jesus, halting the search after finding the first matching row is not an option, as there could be additional rows meeting the criteria. Thus, the search must continue row by row until the final one is examined - indicating the database must inspect thousands of rows to identify all employees named Jesus. This process is known as a full table scan.

How Do Database Indexes Enhance Performance?

One might ponder if conducting a full table scan for such a rudimentary task seems inefficient - shouldn’t the database be more astute? This is akin to manually scanning an entire table from start to finish - slow and far from elegant (“not at all sleek”). However, as you might infer from the title, this is precisely where indexes come into play. The essence of using indexes is to expedite search operations by reducing the number of records/rows in a table that need to be examined.


:D 一言句子获取中...