camel-cdr

camel-cdr.bsky.social

Followers · Following

🐘 @camelcdr@tech.lgbt

Joined November 2024

Posts Replies Media Original posts Likes

camel-cdr camel-cdr.bsky.social · Oct 23, 2025
"How NOT To Program an Out-of-order Vector Processor" slides are public. static.sched.com/hosted_files...

View on Bluesky Download image (1)Download image (2)Download image (3)Download image (4)Show all post labels
camel-cdr camel-cdr.bsky.social · Oct 23, 2025
I only slightly disagree with using segmented load/store transpose. If you need to transpose from memory fine, but if you need register to register going though memory isn't the best. I'd use vslide1up/down or in the future vpaire/vpairo: github.com/ved-rivos/ri...
riscv-isa-manual/src/zvzip.adoc at zvzip · ved-rivos/riscv-isa-manual

RISC-V Instruction Set Manual. Contribute to ved-rivos/riscv-isa-manual development by creating an account on GitHub.

github.com

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · Oct 12, 2025
Fuzzing tip: use VLA instead of fixed-size buffers or malloc 1. with fixed-size buffers asan won't catch everything. 2. VLAs are faster than malloc, in my case I get 15% faster fuzzing. If VLAs aren't portable enough, just check __STDC_NO_VLA__ and select between the other options.

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · Sep 25, 2025
Tenstorrent decided to publish the first benchmark data for Ascalon's RVV implementation using the instruction throughput benchmark of my rvv-bench benchmark suite. <3 camel-cdr.github.io/rvv-bench-re... Overall, the results look really good so far:

View on Bluesky Download image Show all post labels
camel-cdr camel-cdr.bsky.social · Sep 25, 2025
* Most instructions have an inverse throughput of 0.5/1/2/4 for LMUL=1/2/4/8, even vslide1up/down, 64-bit vmulh, viota, vpopc and integer reductions * 0.5/0.5/1/2 for vector-scalar/immediate compares and 0.5/1/2/- for narrowing instructions (see "Microarchitecture speculations" section)

View on Bluesky Show all post labels
camel-cdr camel-cdr.bsky.social · Sep 25, 2025
*correction: 0.5/0.5/2/4 for vector-scalar/immediate compares (0.5/2/4/8 for vector-vector)

View on Bluesky Show all post labels

Reposted by camel-cdr
Claire Xen 🏳️‍⚧️ 🧙🏻‍♀️ 💖💛💙 clairexen.bsky.social · Jul 25, 2025
Replying to Claire Xen 🏳️‍⚧️ 🧙🏻‍♀️ 💖💛💙
So if you are currently involved with ISA-level decisions about inclusion of any pext/pdep-like instructions: Please consider including SAG/inverse-SAG with bit-reversal of the goats. No matter which of the two implementation methods you are using: All you need to do is not mask the goat bits.

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · Jul 11, 2025
TIL about Trace Cache: www.realworldtech.com/forum/?threa... (thread on Apples Trace Cache) Ventanas Veyron V2/V3 seem to also use something like a trace cache.
RWT Forums - Real World Tech

content overridden

realworldtech.com

View on Bluesky Show all post labels
camel-cdr camel-cdr.bsky.social · Jul 11, 2025
Their V2 slides say, that they have a macro-op cache equivalent in size to a regular 32 KiB icache. It can store variable length entries of up to 48 macro ops, which can be fuses from non-sequential instruction runs by collapsing taken branches.

View on Bluesky Show all post labels
camel-cdr camel-cdr.bsky.social · Jul 11, 2025
www.youtube.com/watch?v=OPgj...
Ventana’s Second Gen RISC V Processor for Data Center and Other High Performance | Greg Favor

YouTube video by Ventana Micro

youtube.com

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · Jun 28, 2025
The sixth Championship of Branch Prediction (CBP2025) happened a week ago: ericrotenberg.wordpress.ncsu.edu/cbp2025-work...

View on Bluesky Download image Show all post labels
camel-cdr camel-cdr.bsky.social · Jun 28, 2025
Ohh, the talk recordings are on YouTube: www.youtube.com/watch?v=1lwz...
CBP2025 - Opening Remarks - Rami Sheikh

YouTube video by Rami Sheikh

youtube.com

View on Bluesky Show all post labels

Reposted by camel-cdr
Claire Xen 🏳️‍⚧️ 🧙🏻‍♀️ 💖💛💙 clairexen.bsky.social · Jun 20, 2025
Replying to camel-cdr
I wrote a reference implementation for a SAG without bit reflection: github.com/clairexen/ed..., and I wrote a parametric SAG core for any bit width: github.com/clairexen/ed...
edu-sag/param.v at main · clairexen/edu-sag

Educational 8-Bit Sheep-And-Goats (SAG) Verilog Reference IP - clairexen/edu-sag

github.com

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · Jun 6, 2025
SiFive X280 RVV benchmarks: camel-cdr.github.io/rvv-bench-re... Civil was so nice run my RVV benchmark on the SiFive X280 cores on the Tenstorrent Blackhole.
RVV benchmark SiFive X280

camel-cdr.github.io

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · Jun 6, 2025
TIL you can't do forward compatible syscalls with inline assembly because the kernel can decide to clobber architectural state that was added after you wrote the code. If you use svc with inline assembly, you have to explicitly clobber SVE registers. Good luck doing this back in 2015 when you wrote

View on Bluesky Show all post labels
camel-cdr camel-cdr.bsky.social · Jun 6, 2025
I just had this problem on RISC-V where I didn't clobber the vector registers and some autovectorized surrounding code broke on a newer kenel version.

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · Jun 3, 2025
@clairexen.bsky.social Hi Claire, we are trying to propose some of the dropped bitmanip instructions for RVV: lists.riscv.org/g/sig-vector... Since you were deeply involved in the development of the bitmanip spec, I was wondering if you could answer some questions about your bextdep implementation.
[Proposal] Bit Compress & Bit Decompress Instructions for RVV

For Spark/Flink workloads in data centers, reading large-scale Parquet files is often a performance bottleneck. Therefore, adding support for these instructions can effectively fill this gap, ensuring RISC-V's competitiveness with other ISAs.

lists.riscv.org

View on Bluesky Show all post labels
camel-cdr camel-cdr.bsky.social · Jun 3, 2025
Sidenote: My pseudocode for the LEB128 decoder using RVV pext/pdep instructions isn't completely correct. I'll revisit it properly, with spike/qemu implementation, once I finish my project.

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · May 26, 2025
oh no > When source and destination registers overlap and have different EEW, the instruction is mask- and tail-agnostic, regardless of the setting of the vta and vma bits in vtype.

View on Bluesky Download image Show all post labels
camel-cdr camel-cdr.bsky.social · May 26, 2025
looks like gcc generates wrong code, and clang is to conservative with overlaps and generates redundant moves: godbolt.org/z/1czr8oGab just created a bug report: gcc.gnu.org/bugzilla/sho... I'll have to check all RVV assembly I've written.

View on Bluesky Show all post labels
camel-cdr camel-cdr.bsky.social · May 26, 2025
Edit: I thought I found a dav1d bug (vwadd.wx v0, v0, v8), but I didn't norice the .wx, so it wasn't a bug. I'll have to check the rest of the code later.

View on Bluesky Show all post labels
camel-cdr camel-cdr.bsky.social · May 26, 2025
Ok, there don't seem to be any bugs related to this in dav1d.

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · May 13, 2025
"Efficient Implementation of RISC-V Vector Permutation Instructions" -- arxiv.org/abs/2505.07112 "Efficient Architecture for RISC-V Vector Memory Access" -- arxiv.org/abs/2504.08334 I love how these two were released so close to each other.

View on Bluesky Show all post labels

camel-cdr camel-cdr.bsky.social · May 13, 2025

View on Bluesky Download image Show all post labels
camel-cdr camel-cdr.bsky.social · May 13, 2025

View on Bluesky Download image (1)Download image (2)Download image (3)Download image (4)Show all post labels

camel-cdr

riscv-isa-manual/src/zvzip.adoc at zvzip · ved-rivos/riscv-isa-manual

RWT Forums - Real World Tech

Ventana’s Second Gen RISC V Processor for Data Center and Other High Performance | Greg Favor

CBP2025 - Opening Remarks - Rami Sheikh

edu-sag/param.v at main · clairexen/edu-sag

RVV benchmark SiFive X280

[Proposal] Bit Compress & Bit Decompress Instructions for RVV