2024-03-17 9eebc976b5e554dc2131d0a18bf27c35 99+ 2 m 0.2 k

Merge Intervals

A coding challenge about Intervals Merging

I finished an interesting coding problem and I want to share it here.

Interval merging refers to combining several overlapping intervals into a single interval. This process is useful in various applications, such as consolidating time slots or ranges of numbers.

Python Function to Merge Intervals

Below is a Python function that merges overlapping intervals:

def merge_intervals(intervals):
    # Sort the intervals by their starting point
    intervals = sorted(intervals, key=lambda x: x[0])
    merged = []
    for interval in intervals:
        # If the current interval overlaps with the last interval in merged, merge them
        if merged and interval[0] <= merged[-1][1]:
            merged[-1][1] = max(merged[-1][1], interval[1])
        else:
            # If they don't overlap, add the current interval to merged
            merged.append(interval)
    return merged
 
# Example usage
intervals = [[1, 3], [2, 6], [8, 10], [15, 18]]
merged_intervals = merge_intervals(intervals)
print(merged_intervals)
# Output: [[1, 6], [8, 10], [15, 18]]

Technical skills

Algorithms

2024-03-17

2024-02-21 98344ac5fe55bd169704bea7879a67b3 99+ 12 m 1.8 k

Why Does MySQL Use B+ Trees Instead of Skip Lists for Indexing?

===============================================================

Today I saw a very useful and clear video about the explanation of B+ tree and skip lists in Database, so I want to write a blog to record my understanding.

When we think about MySQL tables, they seem to be simply storing rows of data, similar to a spreadsheet.

Directly traversing these rows of data gives us a performance of O(n), which is quite slow. To accelerate queries, B+ trees are used for indexing, optimizing the query performance to O(log n).

But this raises a question: there are many data structures with log(n) level query performance, such as skip lists used in Redis’s zset, which are also log(n) and even simpler to implement.

So why doesn’t MySQL use skip lists for indexing?

The Structure of B+ Trees

Generally, B+ trees are a multi-level structure composed of multiple pages, each being 16Kb. For primary key indexes, the leaf nodes at the lowest level hold row data, while the internal nodes hold index information (primary key id and page number) to speed up queries.

For instance, if we want to find row data with id=5, we start from the top-level page records. Each record contains a primary key id and a page number (page address). Follow the yellow arrow; to the left, the smallest id is 1, to the right, the smallest id is 7. Therefore, if the data with id=5 exists, it must be on the left side. Following the record’s page address, we arrive at page 6, then determine id=5>4, so it must be on the right side in page 105.

In page 105, although there are multiple rows of data, they are not traversed one by one. The page also has a page directory, which can speed up the query for row data through binary search, hence finding the row data with id=5 and completing the query.

B+ Tree Diagram

Technical skills

Database

2024-03-16

2024-01-20 c315fc25abafa91472379de223e13773 99+ 4 m 0.6 k

RESTful API

Understanding RESTful in Web Development

Web development has evolved significantly with the advent of Web 2.0, enabling more dynamic interactions between users and websites. This evolution has led to a shift in development methodologies, particularly with the adoption of the front-end and back-end separation model. In this blog, we will delve into the concept of RESTful architecture, its relevance in the current web development landscape, and how it facilitates the development of both web and mobile applications.

The Current State of Web Development

To fully comprehend RESTful architecture, it’s essential to understand the current state of web development. The separation of the front-end and back-end is a fundamental paradigm in modern web development. This approach involves creating two distinct servers:

Front-end Server: Handles the user interface and interaction.
Back-end Server: Manages the database and server-side logic.

Technical skills

RESTful API

2024-03-16

2023-12-17 1f92442e725e047e644292e91e534781 99+ 10 m 1.5 k

Database View

Introduction

A View in SQL Server can be considered as a virtual table defined on top of it. True to its name, a view offers another entrance to look at data. A conventional view does not store actual data but merely contains a SELECT statement and metadata about the tables involved. Through views, clients no longer need to understand the underlying table structures and their relationships, as views provide a unified interface to access data.

Why Use Views?

Views abstract the underlying table structures, simplifying data access operations.
By hiding the underlying table structures, security is significantly enhanced, allowing users to see only the data provided by the views.
Views facilitate permission management by granting users access to views instead of the underlying tables, further strengthening security.
Views provide an interface for users to access data. When the underlying tables change, altering the view’s statement to adapt ensures that client programs built on this view remain unaffected.

Classification of Views in SQL

Views in SQL can be categorized into three types:

Regular View
Indexed View
Partitioned View

Let’s discuss these view types in detail.

1) Regular View

A Regular View is defined by a SELECT statement and includes only its definition and the metadata of the referenced tables, without actually storing data. The template for creating a view as per MSDN is as follows:

CREATE VIEW [schema_name.]view_name [(column [, ...n])]
[WITH <view_attribute> [, ...n]]
AS select_statement
[WITH CHECK OPTION] [;]

<view_attribute> ::=
{
    [ENCRYPTION]
    [SCHEMABINDING]
    [VIEW_METADATA]
}

Technical skills

Database

2024-03-17

2023-12-05 e419c1d8371d282a238796cb13f2b5a6 99+ 11 m 1.6 k

Database Indexing

Yesterday I had an interview with the question about indexing, I didn’t answer very well so I’d write a blog to record my study and understanding for future useage.

Understanding the Need for Database Indexes: A Simple Example

Through a straightforward example, this tutorial aims to elucidate the necessity of database indexes. Let’s consider we possess a database table named `Employee`, encompassing three columns: `Employee_Name`, `Employee_Age`, and `Employee_Address`. Imagine this `Employee` table hosts thousands of rows of data.

Suppose we wish to retrieve information on all employees named ‘Jesus’ from this table. We opt to utilize the following query statement:

1	SELECT * FROM Employee WHERE Employee_Name = 'Jesus'

What transpires if the table lacks an index?

Upon executing this query, what unfolds as the database endeavors to locate employees named Jesus? The database is compelled to sift through each row within the Employee table to ascertain whether the employee’s name (Employee_Name) matches ‘Jesus’. Since our goal is to acquire information on every employee named Jesus, halting the search after finding the first matching row is not an option, as there could be additional rows meeting the criteria. Thus, the search must continue row by row until the final one is examined - indicating the database must inspect thousands of rows to identify all employees named Jesus. This process is known as a full table scan.

How Do Database Indexes Enhance Performance?

One might ponder if conducting a full table scan for such a rudimentary task seems inefficient - shouldn’t the database be more astute? This is akin to manually scanning an entire table from start to finish - slow and far from elegant (“not at all sleek”). However, as you might infer from the title, this is precisely where indexes come into play. The essence of using indexes is to expedite search operations by reducing the number of records/rows in a table that need to be examined.

Technical skills

Database

2024-03-14

2023-08-10 fc6cf8ae68d4717b75a1d7158fac3b90 99+ 3 m 0.5 k

Theoretical Basics of Greedy Algorithms

By choosing the local optimum at every stage, we aim to achieve the global optimum. The key to selecting a greedy algorithm is: it’s possible to deduce the global optimum from the local optimum.

How to verify if you can use greedy:

Provide a counterexample: If you can’t think of a counterexample, try harder.
Mathematical induction.

Steps for a greedy algorithm:

Decompose the problem into several sub-problems.
Identify an appropriate greedy strategy.
Find the optimal solution for each sub-problem.
Stack the local optimums to form a global optimum.

Example: Leetcode 455: Distributing Cookies

Local Optimum: Give the largest cookie to the child with the biggest appetite.
Global Optimum: Feed as many children as possible.

class Solution:
    def sort(self, nums):
        for i in range(len(nums) - 1):
            for j in range(i + 1, len(nums)):
                if nums[j] < nums[i]:
                    temp = nums[i]
                    nums[i] = nums[j]
                    nums[j] = temp
    
    def findContentChildren(self, g, s):
        self.sort(g)
        self.sort(s)
        count = 0
        i = 0  # pointer to g
        j = 0  # pointer to s
        
        while i < len(g) and j < len(s):
            if s[j] - g[i] >= 0:
                count += 1
                i += 1
                j += 1
            else:
                j += 1
        
        return count

Technical skills

Algorithms

2024-03-14

2023-07-20 ef752bbafc41f19c20f8a5cbceb6a3c4 99+ 4 m 0.6 k

MoreCAP

Diving Deeper into the CAP Theorem

As a fresh graduate exploring the vast landscape of Computer Science, I find myself particularly intrigued by principles that serve as the backbone of the digital world. The CAP Theorem, known as Brewer’s Theorem, is one such principle that provides fundamental guidelines when designing distributed systems. In this blog post, I will try to shed some light on this intricate topic in a comprehensible manner.

What is CAP Theorem?

The CAP Theorem is a concept in distributed computing that states a distributed data store cannot simultaneously provide all three of the following guarantees:

Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a response, without guarantee that it contains the most recent write.
Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes.

In simpler terms, when designing distributed systems, we can only guarantee two out of these three properties at any given time. This trade-off has significant implications on the design and usability of distributed systems.

Technical skills

System Design

2024-03-17

2023-07-10 c16cc4cdcaf28551212417b27eff2c1c 99+ 7 m 1.0 k

System Design

Demystifying Web Infrastructure and Technologies: A Comprehensive Guide

Introduction

From our everyday interactions with various web applications to the seamless flow of data across platforms, the internet’s vast landscape thrives on numerous technologies working in harmony. As a technology enthusiast, I embarked on a quest to better understand this intricate fabric of interconnected technologies. In this blog, I aim to unravel my findings and offer you a peek into the enigmatic realms of the internet, covering a wide range of topics from the basic to the advanced. The following tutorial video.

Technical skills

System Design

2024-03-14

2023-07-05 c16cc4cdcaf28551212417b27eff2c1c 99+ 3 m 0.4 k