支持基因片段与序列溢出的雪花算法
本文最后更新于 66 天前,其中的信息可能已经有所发展或是发生改变。

传统的雪花算法由41位时间戳、10位机器码以及12位序列号组成的,第一位为保留的符号位。

在分布式集群中,通常使用雪花 ID 作为主键,但是这里就会出现一些问题:

  1. 若 A表 与 B表 需要分表且 A 与 B 之间有一定的关联,例如 B表 同时储存 bid 与 aid,并且根据 bid 分片。而此时有两个需求,第一是根据 aid 获取对应的 bid 所在的分表,第二是根据 bid 获取对应的 aid 所在的分表,如果 aid 与 bid 之间没有任何联系,则两种需求都需要查询所有 A 或 B 的分表才能确定。
  2. 若在同一毫秒时间内产生的雪花 ID 超过最大限制,则需要向后借用一毫秒时间,以此类推,否则将抛出异常。

雪花算法实现

先来看看传统的雪花算法是如何实现的:

/**
 * 常规雪花算法
 * 01-42 位: 时间戳
 * 43-47 位: 数据中心 ID
 * 47-52 位: 工作 ID
 * 52-64 位: 序列 ID
 */
public synchronized long nextId() {
    long timestamp = timeGen();
    if (timestamp < lastTimestamp) {
        throw new RuntimeException(
            String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
    }
    if (lastTimestamp == timestamp) {
        sequence = (sequence + 1) & sequenceMask;
        if (sequence == 0) {
            timestamp = tilNextMillis(lastTimestamp);
        }
    }
    else {
        sequence = 0L;
    }
    lastTimestamp = timestamp;
    return ((timestamp - twepoch) << 22)
    | (datacenterId << 17)
    | (workerId << 12)
    | sequence;
}

这里的时间戳左移22位是通过 (63 – 时间戳位数) 得到的,以此类推,在此基础上减去数据中心位数得到下一个左移量。

以下均使用 Kotlin 实现

现在有下面几个常量:

private val startPoint: Long
private val timestampLength: Int
private val dataCenterIdLength: Int
private val workerIdLength: Int
private val sequenceIdLength: Int

根据上面的常量可以计算出:

val shlBitsForTimestamp = 63 - timestampLength
val timestamp = (currentTimestamp - startPoint) shl shlBitsForTimestamp

val shlBitsForDataCenter = shlBitsForTimestamp - dataCenterIdLength
val datacenterId = dataCenterId shl shlBitsForDataCenter

val shlBitsForWorker = shlBitsForDataCenter - workerIdLength
val workerId = workerId shl shlBitsForWorker

然后得到最终的雪花 ID:

timestamp or datacenterId or workerId or sequence

引入基因片段

顾名思义,基因片段可以与另一个值联系,因为这个基因片段就来自于另一个值。

例如 191 % 16 = 15,如果 aid 为 191,那么对应的表应该是 a_15。

如果要通过 aid 191 得到对应的 bid 在哪一个分片,那么就可以通过相同的基因算法,得出对应 bid 的分片编号。

现在 bid 根据 aid 生成,也就是说,只要 bid 包含 aid 的基因片段,就能通过 bid 知道对应的 aid 在哪个分片了。

说到这里,基因片段实际上是某个二进制数的最后几位,例如 191 的二进制 1011 1111,取最后四位 1111 作为基因片段,如果把这个片段拼接到 bid 的末尾,那么也同样可以通过 bid 得到这个基因片段。

如果仅仅根据 aid 取模来决定 bid 的分片编号,那么 A 与 B 两张表的数量是相同的,引入基因片段之后,则 B表的数量取决于基因片段的长度。

接下来引入基因片段:

class SnowIdGenerator(
    private val startPoint: Long,
    private val timestampLength: Int,
    private val dataCenterIdLength: Int,
    private val workerIdLength: Int,
    private val sequenceIdLength: Int,
    private val geneIdLength: Int,
    private val dataCenterId: Long,
    private val workerId: Long,
    private val actualGeneLength: Int
)

初始化

/**
 * flag                  timestamp                dataCenter worker sequence gene
 *  0   10000000000000000000000000000000000000001   10001    10001  1000001 10001
 *  1   --------------------41-------------------   --5--    --5--  ---7--- --5--
 * @author lovelycat
 * @since 2024-08-24 19:40
 * @version 1.0
 */
class SnowIdGenerator(
    private val startPoint: Long,
    private val timestampLength: Int,
    private val dataCenterIdLength: Int,
    private val workerIdLength: Int,
    private val sequenceIdLength: Int,
    private val geneIdLength: Int,
    private val dataCenterId: Long,
    private val workerId: Long,
    private val actualGeneLength: Int
) {
    private val maxSequence: Long
    private val maxDataCenters: Long
    private val maxWorkers: Long

    init {
        if (timestampLength + dataCenterIdLength + workerIdLength + sequenceIdLength + geneIdLength != 63) {
            throw IllegalArgumentException("Id length must be 64 in total.")
        }

        maxSequence = (1 shl sequenceIdLength).toLong()
        maxDataCenters = (1 shl dataCenterIdLength).toLong()
        maxWorkers = (1 shl workerIdLength).toLong()

        if (dataCenterId >= maxDataCenters) {
            throw IllegalArgumentException("Data center id out of range, max $maxDataCenters but current $dataCenterId")
        }

        if (workerId >= maxWorkers) {
            throw IllegalArgumentException("Worker id out of range, max $maxWorkers but current $workerId")
        }
    }
}

获取基因片段

/**
 * Get gene piece from original sequence,
 * 1011 1001 and 0000 1111 = 0000 1001,
 * 1111 = (1 << 4) - 1
 *
 * @param origin
 * @param geneLength
 * @return
 */
fun getGene(origin: Long, geneLength: Int = actualGeneLength): Long {
    return origin and ((1 shl geneLength) - 1).toLong()
}

雪花 ID 生成

private var lastTimestamp = 0L
private val sequenceMap = mutableMapOf<Long, Long>()
private var borrowedTimestamp = 0L

@Synchronized
fun nextId(gene: Long): Long {
    var currentTimestamp = System.currentTimeMillis()
    val shlBitsForTimestamp = 63 - timestampLength
    if (currentTimestamp <= borrowedTimestamp) {
        currentTimestamp = borrowedTimestamp
    } else if (currentTimestamp < lastTimestamp) {
        throw IllegalStateException("Clock moved backwards. Refusing to generate id for timestamp $currentTimestamp. Latest generated at $lastTimestamp")
    }
    if (currentTimestamp == lastTimestamp) {
        with(gene) {
            val original = (sequenceMap[this] ?: 0)
            sequenceMap[this] = original + 1
            if (original + 1 == maxSequence) {
                borrowedTimestamp = currentTimestamp + 1
                currentTimestamp = borrowedTimestamp
                sequenceMap.clear()
            }
        }
    } else {
        sequenceMap.clear()
    }
    lastTimestamp = currentTimestamp
    val timestamp = (currentTimestamp - startPoint) shl shlBitsForTimestamp
    val shlBitsForDataCenter = shlBitsForTimestamp - dataCenterIdLength
    val datacenterId = dataCenterId shl shlBitsForDataCenter
    val shlBitsForWorker = shlBitsForDataCenter - workerIdLength
    val workerId = workerId shl shlBitsForWorker
    val realSequence = sequenceMap[gene] ?: 0
    return if (geneIdLength == 0)
        timestamp or datacenterId or workerId or realSequence
    else {
        val shlBitsForSequence = shlBitsForWorker - sequenceIdLength
        val seq = realSequence shl shlBitsForSequence
        timestamp or datacenterId or workerId or seq or gene
    }
}

最终代码

/**
 * flag                  timestamp                dataCenter worker sequence gene
 *  0   10000000000000000000000000000000000000001   10001    10001  1000001 10001
 *  1   --------------------41-------------------   --5--    --5--  ---7--- --5--
 * @author lovelycat
 * @since 2024-08-24 19:40
 * @version 1.0
 */
class SnowIdGenerator(
    private val startPoint: Long,
    private val timestampLength: Int,
    private val dataCenterIdLength: Int,
    private val workerIdLength: Int,
    private val sequenceIdLength: Int,
    private val geneIdLength: Int,
    private val dataCenterId: Long,
    private val workerId: Long,
    private val actualGeneLength: Int
) {
    private val maxSequence: Long
    private val maxDataCenters: Long
    private val maxWorkers: Long

    init {
        if (timestampLength + dataCenterIdLength + workerIdLength + sequenceIdLength + geneIdLength != 63) {
            throw IllegalArgumentException("Id length must be 64 in total.")
        }

        maxSequence = (1 shl sequenceIdLength).toLong()
        maxDataCenters = (1 shl dataCenterIdLength).toLong()
        maxWorkers = (1 shl workerIdLength).toLong()

        if (dataCenterId >= maxDataCenters) {
            throw IllegalArgumentException("Data center id out of range, max $maxDataCenters but current $dataCenterId")
        }

        if (workerId >= maxWorkers) {
            throw IllegalArgumentException("Worker id out of range, max $maxWorkers but current $workerId")
        }
    }

    private var lastTimestamp = 0L

    private val sequenceMap = mutableMapOf<Long, Long>()

    private var borrowedTimestamp = 0L

    @Synchronized
    fun nextId(gene: Long): Long {
        var currentTimestamp = System.currentTimeMillis()

        val shlBitsForTimestamp = 63 - timestampLength

        if (currentTimestamp <= borrowedTimestamp) {
            currentTimestamp = borrowedTimestamp
        } else if (currentTimestamp < lastTimestamp) {
            throw IllegalStateException("Clock moved backwards. Refusing to generate id for timestamp $currentTimestamp. Latest generated at $lastTimestamp")
        }

        if (currentTimestamp == lastTimestamp) {
            with(gene) {
                val original = (sequenceMap[this] ?: 0)
                sequenceMap[this] = original + 1

                if (original + 1 == maxSequence) {
                    borrowedTimestamp = currentTimestamp + 1
                    currentTimestamp = borrowedTimestamp
                    sequenceMap.clear()
                }
            }
        } else {
            sequenceMap.clear()
        }

        lastTimestamp = currentTimestamp

        val timestamp = (currentTimestamp - startPoint) shl shlBitsForTimestamp

        val shlBitsForDataCenter = shlBitsForTimestamp - dataCenterIdLength
        val datacenterId = dataCenterId shl shlBitsForDataCenter

        val shlBitsForWorker = shlBitsForDataCenter - workerIdLength
        val workerId = workerId shl shlBitsForWorker

        val realSequence = sequenceMap[gene] ?: 0

        return if (geneIdLength == 0)
            timestamp or datacenterId or workerId or realSequence
        else {
            val shlBitsForSequence = shlBitsForWorker - sequenceIdLength
            val seq = realSequence shl shlBitsForSequence
            timestamp or datacenterId or workerId or seq or gene
        }
    }

    /**
     * Get gene piece from original sequence,
     * 1011 1001 and 0000 1111 = 0000 1001,
     * 1111 = (1 << 4) - 1
     *
     * @param origin
     * @param geneLength
     * @return
     */
    fun getGene(origin: Long, geneLength: Int = actualGeneLength): Long {
        return origin and ((1 shl geneLength) - 1).toLong()
    }
}

使用示例

无基因片段:

由于基因片段长度为0,则 nextId() 参数可以为任意值。

fun main() {
    val generator = SnowIdGenerator(1724505364000, 41, 5, 5, 12, 0, 9, 17, 0)
    val id = generator.nextId(0L)

    generator.debug(id)
}

包含基因片段:

fun main() {
    val generator = SnowIdGenerator(1724505364000, 41, 5, 5, 7, 5, 9, 17, 4)
    val id = generator.nextId(generator.getGene(189L))

    generator.debug(id)
}

调试:

通过此函数可以具体查看不同区段的二进制字符串。

fun debug(id: Long) {
    val binString = id.toBinaryString().run {
        "0".repeat(63 - this.length) + this
    }
    println("Binary:\n$binString")
    println("Timestamp:\n" + binString.substring(0, timestampLength))
    val a = timestampLength + dataCenterIdLength
    println("DataCenter:\n" + binString.substring(timestampLength, a))
    val b = a + workerIdLength
    println("Worker:\n" + binString.substring(a, b))
    val c = b + sequenceIdLength
    println("Sequence:\n" + binString.substring(b, c))
    val d = c + geneIdLength
    println("Gene:\n" + binString.substring(c, d))
}

fun Long.toBinaryString() = java.lang.Long.toBinaryString(this)

结语

如果不需要自定义各区段的长度,可以自行简化为常量。如果错误欢迎在下方指出。

未经允许禁止转载本站内容,经允许转载后请严格遵守CC-BY-NC-ND知识共享协议4.0,代码部分则采用GPL v3.0协议
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇