Why does B2 require a SHA-1 hash to be provided with an upload?

When you upload a file to B2, you must provide a SHA-1 hash of the contents of the file in the HTTP request header. The SHA-1 is required to ensure that the file content uploaded from the client matches the file content persisted on the B2 cloud storage.

In addition, the SHA-1 is then saved for the future. If the file is requested for download, the original SHA-1 is matched with the file that's been reassembled from the Backblaze Vault. If they don't match, the file is recreated.

Can the SHA-1 be optional?

No. Backblaze will always require the SHA-1 to absolutely guarantee that the file you are uploading matches what gets saved in B2. In some use cases (video, audio, photographs,) if a bit is flipped, it isn't fatal. However, imagine you are streaming an encrypted file from one provider to another. If one bits flips on this file - it’s data is meaningless. 

But in practice, do bits flip random in TCP/IP transmissions? Yes. We have over 200PB of data under management at Backblaze - we’ve seen it a lot during transmission. The reasons are summarized very nicely in this stack overflow article. (TL;DR - TCP/IP packets regularly fail checksum tests and the 16-bit CRC checksum is not sufficient to find every corruption.)

Ok, but can you allow the SHA-1 to be added at the end of the request rather than the beginning?

Yes, we are considering this for a future release. We understand if you are streaming data into B2, you may not want to buffer the entire stream to compute the SHA-1, but rather compute it as it goes by, submitting it at the end.

Have more questions? Submit a request


Article is closed for comments.
Powered by Zendesk