·
AI & ML interests
datasets, research papers, experimentation, vision, classification, text encoders, tokenization, llms, diffusion, distillation, and more.
Recent Activity
posted
an
update
about 1 hour ago
pytorch-parallel-compiler v0.5.0 upgrades:
*Complex benchmarking for wide primitive objects is supported now. This includes multiple presets for quick tests on hardware.
* All supported primitive either have validity checks or will have them.
* 6 new wide layers supported directly, and will be a key part to the autotuner before v1.0
* WideTracedModel is a preliminary auto-builder so the user doesn't need to build them manually by gathering layers.
https://github.com/AbstractEyes/pytorch-parallel-compiler
New Layers for 0.5.0:
WideGRU, WideLSTM, WideGroupNorm, WideMultiheadedAttention, WideInstancenorm1/2d, WideConv3d,
Upcoming for 1.0:
* WideTracedModel fully building any supported layer patterns with multiple autotune potentials for autoselection.
* Module cherry-picking for use-case only; E.G. WideLinear replace only benefits your case 35% while Attention reduces by 10% no attn.
* All (roughly 32 more) commonly used pytorch layer systems supported in one form or another with wide-batched kernels to benefit both eager and compiled, many of which require reworks or completely remaking them.
* Autotuning wide formats based on hardware response to the kernels. Kernel chunking for big slow processes such as LSTM, kernel fusion for small process with excess overhead, expanding kernels with masking to fit specific use-case paradigms with hardwares, and a series of smaller and more important optimizations along the way.
* Full transformer and rope support with wide-batched optimizations throughout the structures to allow more robust autoregression throughput.
* Additional Conv1d, Conv2d, and Conv3d optimizations.
>version 1.0 :
* Entire diffusion structures specifically kernelized for high-efficiency utilization with eager and compilation.
* Video diffusion specific targets meant to heavily reduce computation costs on the gpu and increase computation throughput on the gpu.
View all activity
Organizations
-
-
-
-
-
-
-
-
-
-
-
upvoted
a
paper
3 months ago
view article
Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders