safetensors is a brand new, easy, quick and protected instrument file format to retailer tensioners. The design of the file format and its unique implementation is finished by Hugging Face, and it’s changing into increasingly more largely adopted of their standard ‘transformers’ framework. The Safetensor R package deal is a pure R implementation, permitting you to learn and write Safetensor information.
The preliminary model (0.1.0) of the safety tensors is now on CRAN.
Motivation
The primary motivation for safety tensors within the Python group is safety. As identified within the official documentation:
The primary purpose for this field is to get rid of the necessity to use pickle in PyTorch, which is utilized by default.
Pickle is taken into account an unsafe format, for the reason that motion of loading a Pickle file can set off the execution of arbitrary code. This has by no means been a priority for R customers, because the Pickle parser included in LibTorch solely helps a subset of the Pickle format, which doesn’t embrace code execution.
Nevertheless, the file format has extra benefits over different generally used codecs, together with:
-
Assist for lazy loading: You may select to learn a subset of the tensors saved within the file.
-
Zero copy: Studying the file requires no extra reminiscence than the file itself. (Technically, the present R implementation makes a single copy, however it may be optimized if we actually want it in some unspecified time in the future.)
-
Easy: Implementing the file format is straightforward and doesn’t require advanced dependencies. This implies it’s a good format for exchanging tensors between ML frameworks and between totally different programming languages. For instance, you may write a safety tensor file in R and cargo it in Python, and vice versa.
There are extra benefits over different frequent file codecs on this area, and you’ll see a comparability desk right here.
Format
The format of the security tensioners is described within the following determine. It is mainly a header file containing some metadata, adopted by uncooked tensor buffers.
Primary use
Security turnbuckles could be put in from CRAN utilizing:
set up.packages("safetensors")
Then we will write any named checklist of torch tensors:
library(torch)
library(safetensors)
<- checklist(
tensors x = torch_randn(10, 10),
y = torch_ones(10, 10)
)
str(tensors)
#> Listing of two
#> $ x:Float (1:10, 1:10)
#> $ y:Float (1:10, 1:10)
<- tempfile()
tmp safe_save_file(tensors, tmp)
Extra metadata could be handed to the saved file by offering a metadata
parameter containing a named checklist.
Studying safety tensor information is dealt with by safe_load_file
and returns the named checklist of tensors together with the metadata
attribute containing the header of the parsed file.
<- safe_load_file(tmp)
tensors str(tensors)
#> Listing of two
#> $ x:Float (1:10, 1:10)
#> $ y:Float (1:10, 1:10)
#> - attr(*, "metadata")=Listing of two
#> ..$ x:Listing of three
#> .. ..$ form : int (1:2) 10 10
#> .. ..$ dtype : chr "F32"
#> .. ..$ data_offsets: int (1:2) 0 400
#> ..$ y:Listing of three
#> .. ..$ form : int (1:2) 10 10
#> .. ..$ dtype : chr "F32"
#> .. ..$ data_offsets: int (1:2) 400 800
#> - attr(*, "max_offset")= int 929
At the moment, safety tensors solely help writing torch tensors, however we plan so as to add help for writing easy R arrays and circulation tensors sooner or later.
Future instructions
The following model of Torch will use safetensors
as its serialization format, which implies that when calling torch_save()
in a mannequin, checklist of tensors, or different varieties of objects supported by torch_save
you’ll get a legitimate safety tensioner file.
That is an enchancment over the earlier implementation as a result of:
-
It is a lot quicker. Greater than 10 occasions for medium-sized fashions. It could possibly be much more for big information. This additionally improves the efficiency of parallel information loaders by roughly 30%.
-
Improves compatibility between languages ​​and frameworks. You may prepare your mannequin in R and use it in Python (and vice versa), or prepare your mannequin in tensorflow and run it with torch.
If you wish to attempt it, you may set up the event model of torch with:
::install_github("mlverse/torch") remotes
Picture by Nick Fewings in unpack
Re-use
Textual content and figures are licensed underneath a Artistic Commons Attribution license. CC BY 4.0. Figures which were reused from different sources usually are not lined by this license and could be acknowledged by a observe of their caption: “Determine of…”.
Quotation
For attribution, please cite this work as
Falbel (2023, June 15). Posit AI Weblog: safetensors 0.1.0. Retrieved from https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/
BibTeX Quotation
@misc{safetensors, writer = {Falbel, Daniel}, title = {Posit AI Weblog: safetensors 0.1.0}, url = {https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/}, 12 months = {2023} }