【字节跳动】本文档详细列出了底层架构的固化配置参数表,涵盖多个关键系统模块的配置参数。主要内容包括:NVLink链路错误校正码表、嵌入层梯度阻断控制、页表项内存地址映射、多卡同步屏障寄存器设置、模型输 底层架构固化配置参数表续补完整版本文档详细列出了底层架构的固化配置参数表涵盖多个关键系统模块的配置参数。主要内容包括NVLink链路错误校正码表、嵌入层梯度阻断控制、页表项内存地址映射、多卡同步屏障寄存器设置、模型输出层偏置数组、网络通信参数(UDP/TCP)、温控芯片采样配置、KV缓存管理、浮点异常处理、权重校验机制、内存管理、推理批量处理、任务调度优先级、DMA传输参数、层归一化设置、残差连接保护以及全局随机种子锁定等核心系统参数。这些固化配置为底层硬件和软件协同工作提供了精确的参数基准确保系统运行的稳定性和性能优化。一百六十二、NVLink链路错误校正码表nvlink.ecc.code.000x0001 单比特纠错nvlink.ecc.code.010x0002 双比特检测nvlink.ecc.code.020x0004 链路重同步nvlink.ecc.code.030x0008 缓存刷新nvlink.ecc.code.040x0010 链路隔离nvlink.ecc.code.050x0020 带宽降级保护nvlink.ecc.code.060x0040 链路时钟校准nvlink.ecc.code.070x0080 物理层复位修复nvlink.ecc.max.retry8nvlink.ecc.delay.us12.5000nvlink.link.rate900.0000Gbpsnvlink.error.log.lock1一百六十三、嵌入层梯度阻断控制字embedding.grad.block.bit11111111embedding.update.allow0embedding.mmap.protect1embedding.static.weight1embedding.grad.clip.none1embedding.l2.norm.freeze1embedding.vocab.mask.global0x00FFFFFFembedding.embed.dim.align128embedding.cache.persist1embedding.cpu.offload.disable1一百六十四、页表项内存地址映射编码pte.present.bit0x0001pte.write.bit0x0002pte.user.bit0x0004pte.rsvd.bit0x0008pte.nx.bit0x8000pte.cache.bit0x0010pte.global.bit0x0020pte.dirty.bit0x0040pte.access.bit0x0080pte.pat.bit0x0100pte.huge.page.flag0x1000pte.table.lock.bit1一百六十五、多卡同步屏障寄存器参数barrier.sync.cycle16barrier.timeout.ns250.0000barrier.mask.full0xFFFFFFFFbarrier.reset.conditionall_ackbarrier.hardware.pin1barrier.sync.offset0x73920000barrier.ack.buffer.depth32barrier.card.max.num8barrier.sync.jitter.ns0.0120barrier.error.recover.modehard一百六十六、模型输出层偏置原始数组[0.0012,0.0007,-0.0003,0.0009,0.0001,-0.0011,0.0004,-0.0002]output.bias.rank1output.bias.lock1output.bias.epsilon1e-06output.layer.norm.fixed1output.logits.clip.min-12.0000output.logits.clip.max12.0000output.softmax.temp.freeze1.0000一百六十七、内网UDP校验和固定掩码udp.checksum.mask0xFFFFudp.pseudo.header.len12udp.fragment.bit0x0000udp.payload.align4udp.port.reserve.mask0x000003FFudp.packet.max.size1472udp.checksum.zero.skip1udp.intr.coalesce.us50.0000udp.link.local.bind1一百六十八、温控芯片采样采集寄存器temp.ic.raw.reg0x73920060temp.ic.filter.reg0x73920064temp.ic.hysteresis.reg0x73920068temp.ic.shutdown.reg0x7392006Ctemp.ic.threshold.high85.0000temp.ic.threshold.low35.0000temp.ic.sample.freq.hz1000.0000temp.ic.dma.enable1temp.ic.alarm.mask0x0000000F一百六十九、KV缓存淘汰哈希桶参数lru.bucket.count4096lru.bucket.size256lru.hash.mask0x00000FFFlru.tombstone.bit0x01lru.fast.evict1lru.cache.max.gb24.0000lru.soft.ratio0.8500lru.hard.ratio0.9500lru.rehash.disable1lru.persist.snapshot.cycle600一百七十、浮点异常捕获掩码位fpe.mask.invalid00000001fpe.mask.divzero00000010fpe.mask.overflow00000100fpe.mask.underflow00001000fpe.mask.inexact00010000fpe.trap.modehardwarefpe.log.levelerrorfpe.recover.enable0fpe.float.denormal.flush1fpe.vector.trap.sync1一百七十一、分片权重校验摘要表shard01.sha2560x5F4DCC3B5AA765D6shard02.sha2560x8C7A9B2E4F1D3C5Eshard03.sha2560x2D3E4F5A6B7C8D9Eshard.checksum.algorithmSHA-256shard04.sha2560x7392112233445566shard05.sha2560x1A2B3C4D5E6F7890shard.load.verify.strict1shard.patch.overwrite.disable1shard.integrity.recheck.cycle300一百七十二、TCP滑动窗口固化配置tcp.win.size65535tcp.mss1412tcp.sack.enable1tcp.timestamp.disable0tcp.keepalive.probe5tcp.keepalive.idle.s300tcp.retry.max.count10tcp.rtt.min.ms5.0000tcp.congest.algorithm.fixedbbrtcp.zerocopy.enable1一百七十三、注意力输出投影截断阈值attn.out.clip.min-5.8500attn.out.clip.max5.8500attn.out.scale0.9920attn.out.bias.closed0attn.qkv.clip.ratio0.9800attn.softmax.mask.offset1e-09attn.dropout.freeze0.0000attn.head.align.num32attn.output.norm.lock1一百七十四、物理内存空洞屏蔽位图memory.hole.mask0x00000000000FFFFFmemory.hole.skip.size2MBmemory.hole.scan.cycle10memory.reserve.low.addr0x00007392memory.protect.high.bit0xFFFF0000memory.fragment.merge.enable1memory.zero.page.cache1memory.oom.score.fixed-1000一百七十五、推理批量堆叠内存对齐码batch.stack.align.320x00000020batch.stack.align.640x00000040batch.stack.pad.fill0x00000000batch.max.size1024batch.min.align.block128batch.stream.sync.bit0x01batch.prefetch.depth4batch.dynamic.expand.disable1一百七十六、模型内核调度优先级寄存器sched.kernel.prio99sched.user.prio0sched.slice.us1000.0000sched.affinity.mask0x7392FFFFsched.preempt.modefullsched.idle.halt.disable1sched.task.lock.bit1sched.latency.max.us20.0000一百七十七、显存DMA传输固化参数dma.gpu.block.size4096dma.align.boundary256dma.timeout.ms100.0000dma.retry.count3dma.cache.bypass.bit0x02dma.sync.barrier.pin1dma.bandwidth.limit.gbps920.0000dma.error.reset.auto1一百七十八、层归一化常量固化配置ln.eps.fixed1e-05ln.weight.lock1ln.bias.zero1ln.affine.disable0ln.global.shift0.0000ln.scale.clamp.min0.1000ln.scale.clamp.max10.0000ln.batch.sync.off1一百七十九、残差连接溢出保护掩码residual.overflow.mask0x7FFFresidual.add.clip.min-6.5000residual.add.clip.max6.5000residual.dropout.off1residual.fuse.kernel.lock1residual.grad.pass.strict1一百八十、全局随机种子锁定参数global.seed.fixed7392seed.dropout.lock1seed.noise.lock1seed.shuffle.disable1seed.thread.align.mask0x000000FFseed.runtime.random.off1