Google Cloud Storage - Transfer file stalling - google-cloud-storage

Google Cloud Storage - Transfer (both immediate and scheduled) via url stalls with a status "calculating" on a 100mb file. The file can be manually uploaded to the google data storage bucket. The file is one of five, the other four files transfer successfully. No errors are generated, the transfer will remain in "calculating" indefinitely. Are there any suggestions on how to troubleshoot this problem?

I understand that you are using Storage Transfer Service to store file on a Google Cloud Storage Bucket. As per GCP documentation [1], it is currently under no SLA and some performance fluctuations may occur.
You may want to try to re-upload your file later or check the source from where you are transferring it. (S3, an HTTP/HTTPS location or another bucket)
Clicking on the affected transfer also displays information about the status of the operation.
As the above comment states, this is a specific issue and you may want to submit it to GCP support team.
[1] https://cloud.google.com/storage-transfer/docs/overview#service_level_agreement

Related

GCS and Blobstore virus scans and limiting file types

We currently use the blobstore to handle user uploads (and will likely shift to GCS). Our solution allows users to upload files but I've recently found that users could potentially upload a virus (knowingly or unknowingly).To mitigate this risk I'm considering limiting file types to images and/or pdfs (this would be checked server side). Would this prevent a virus from being uploaded or should I also perform a virus scan on the files once they're uploaded?
If running a virus scan, is there a simple for solution for doing this with GAE or do I need a separate cloud compute instance running it's own virus scan?
Thanks
Rob
Any time you delegate authority to upload an object to an untrusted client, there is risk that the client or malicious code posing as the client can upload malicious content. As far as I am aware, neither Google App Engine's Blobstore service nor Google Cloud Storage provide virus scanning as a service, so you'd have to bring your own. Limiting file types doesn't actually inhibit bad content being uploaded, as some browsers will ignore the stated file type after sniffing file content and render or execute the malicious object.
If you want to do this yourself for a Google Cloud Storage upload, the best practice would be to restrict the upload to have a private ACL, perform whatever sanitization you want, and when determined to be valid, change the ACL to allow broader permissions.
/via Vinny P:
There are online virus-scanning tools you can use programmatically, or you can run an anti-virus engine on Compute Engine or in an App Engine Flexible Environment. Alternatively, if these are supposed to be user-owned files under 25 MB, you could upload the files to Google Drive which will provide virus scanning, and retrieve the files via the Drive API.

Google Cloud Platform - Data Distribution

I am trying to figure out a proper solution for the following:
We have a client from which we want to receive data, for instance a binary that is 200Mbytes updated daily. We want them to deposit that data file(s) onto a local server near them (Europe).
We then want to do one of the following:
We want to retrieve the data, either from a local
server where we are (China/HK), or
We can log into their European
server where they have deposited the files and pull the files directly ourselves.
QUESTIONS:
Can Google's clould platform serve as a secure, easy way to provide a cloud drive for which to store and pull the data file?
Does Google's cloud platform distribute such that files pushed onto a server in Europe will be mirrored in a server over in East Asia? (that is, where and how would this distribution model work with regard to my example.)
For storing binary data, Google Cloud Storage is a fine solution. To answer your questions:
Secure: yes. Easy: yes, in that you don't need to write different code depending on your location, but there is a caveat on performance.
Google Cloud Storage replicates files for durability and availability, but it doesn't mirror files across all bucket locations. So for the best performance, you should store the data in a bucket located where you will access it the most frequently. For example, if you create the bucket and choose its location to be Europe, transfers to your European server will be fast but transfers to your HK server will be slow. See the Google Cloud Storage bucket locations documentation for details.
If you need frequent access from both locations, you could create one bucket in each location and keep them in sync with a tool like gsutil rsync

Google cloud storage bucket is getting unmounted

I am running some application on google compute engine where it the application reads the data from google cloud storage and writes data to persistent disk. And the bucket is mounted using gcsfuse.
But in the middle the bucket is getting unmounted and my application is going to sleep mode and getting stalled.
When I try to see the content in Mounted directory I am getting following error
cannot access /home/santhosh/MountPoint/ Transport endpoint is not connected
Is there any time limit on the bucket live? How can we see the bucket is mounted all the time?
Can someone please help me how can I resolve this? I want the program to run without any breaks in the middle.
I experience same problems, random I/O errors, unmounting. Do not use gcsfuse in production, from the doc: "Please treat gcsfuse as beta-quality software." We use it for the maintenance only.

Anyone using “Google Cloud Storage Fuse” in production?

How has the it's performance and stability been for you?
I'm looking to maybe implement this on a cluster to avoid a network or clustered file system and it should fit well with auto scaling the cluster of servers. But what are the con's involved in doing this? Such as price?
Google Cloud Storage Fuse access is ultimately Google Cloud Storage access. All data transfer and operations performed by Google Cloud Storage Fuse map to Google Cloud Storage transfers and operations, and are charged accordingly. See the pricing section for more details.
There are several caveats you should consider when using Google Cloud Storage Fuse for your application:
Individual I/O streams run approximately as fast as gsutil.
Small random reads are slow due to latency to first byte (don't run a database over Google Cloud Storage Fuse!)
Random writes are done by reading in the whole blob, editing it locally, and writing the whole modified blob back to Google Cloud Storage. Small writes to large files work as expected, but are slow and expensive.
Note: One not so obvious place to consider this is when benchmarking
Google Cloud Storage Fuse. Many benchmarking tools use a mix of random
and sequential writes as default settings. Make sure to tune any
benchmarking tools to sequential I/O when running against a bucket
mounted by Google Cloud Storage Fuse.
There is no concurrency control for multiple writers to a file. When multiple writers try to replace a file the last write wins and all previous writes are lost - there is no merging, version control, or user notification of the subsequent overwrite.
Hard links do not work.
Some semantics are not exactly what they would be in a traditional file system. The list of exceptions is here. For example, metadata like last access time are not supported, and some metadata operations like directory rename are not atomic.
Authorization for files is governed by Google Cloud Storage permissions, not Linux file permissions, which are not applicable.

Configure GCS bucket to allow public write but not overwrite

On Google Cloud Storage, I want PUBLIC (allUsers) to be able to upload new files and to download existing files, but I don't want PUBLIC to be able to overwrite an existing file.
Background: Upload and download URLs are typically determined by my own app. So under normal conditions there is no problem because the app guarantees that URLs are always unique when writing. But a malicious user could hack my app and would then potentially be able to upload files (bad) to my cloud storage and overwrite existing files (very bad).
I know I could solve this problem by proxying through App Engine or by using signed URLs, which I am trying to avoid due to timing constraints. Timely processing is essential as my app processes files (almost) in realtime and an extra delay of just 1,000 msec for processing two consecutive requests would be too long.
Would it be possible to configure cloud storage in a way that an error is returned in case an already existing file is hit during an upload, such as for example:
Bucket: PUBLIC has WRITE access
Individual file: PUBLIC has READ access
Would that work? What happens in GCS if bucket and file ACLs are contradictory? In above example the bucket would allow write access, but if the upload hits an already existing file with readonly access, would such request be honored by GCS, or would GCS consider the file as already inexistent at that point and replace it with the new content?
Any other approach that might work would be very appreciated.
You want to set these IAM roles on the bucket:
roles/storage.objectCreator
roles/storage.objectViewer
https://cloud.google.com/storage/docs/access-control/iam-roles states:
"objectCreator allows users to create objects. Does not give permission to view, delete, or overwrite objects."

Resources