Prevent "FailedMount" on pod creation
Kubernetes does not automatically retry mounts when they fail. This means that if cosmos returns an error when we try to mount a disk (for example because we try to mount too many disks simultaneously), the mount action will not be retried later.
This has been debunked... However something goes wrong because:
This is the error that occurs usually:
4m24s Normal Pulling pod/maarten-wordpress-1603377900-nqlhl Pulling image "curlimages/curl:latest"
4m22s Normal Started pod/maarten-wordpress-1603377900-nqlhl Started container wordpress-cron-caller
4m22s Normal Pulled pod/maarten-wordpress-1603377900-nqlhl Successfully pulled image "curlimages/curl:latest"
4m22s Normal Created pod/maarten-wordpress-1603377900-nqlhl Created container wordpress-cron-caller
2m10s Normal Scheduled pod/maarten-wordpress-0 Successfully assigned dev-namespace/maarten-wordpress-0 to skipr-worker-2
2m2s Warning FailedMount pod/maarten-wordpress-0 MountVolume.SetUp failed for volume "pvc-2628e1df-146c-11eb-81f2-00164e1a1700" : mount command failed, status: Failure, reason: The Cosmos2 api call returned an error: "Another process is already performing an operation on this disk or VPS, please try again.; "
2m1s Warning BackoffLimitExceeded job/maarten-wordpress-1603377900 Job has reached the specified backoff limit
82s Normal Pulled pod/maarten-wordpress-0 Successfully pulled image "open.greenhost.net:4567/openappstack/wordpress-helm/wordpress-cli-ansible:master"
82s Normal Pulling pod/maarten-wordpress-0 Pulling image "open.greenhost.net:4567/openappstack/wordpress-helm/wordpress-cli-ansible:master"
82s Normal Created pod/maarten-wordpress-0 Created container init-wordpress
81s Normal Started pod/maarten-wordpress-0 Started container init-wordpress
65s Warning FailedMount pod/maarten-wordpress-0 Unable to mount volumes for pod "maarten-wordpress-0_dev-namespace(10fa72ee-1475-11eb-81f2-00164e1a1700)": timeout expired waiting for volumes to attach or mount for pod "dev-namespace"/"maarten-wordpress-0". list of unmounted volumes=[wordpress-wp-storage wordpress-wp-content wordpress-wp-uploads ansible-secrets ssh-private-key ssh-known-hosts ansible-vars htuploads default-token-fxd5b]. list of unattached volumes=[wordpress-wp-storage wordpress-wp-content wordpress-wp-uploads ansible-secrets ssh-private-key ssh-known-hosts ansible-vars htuploads default-token-fxd5b]
I think the driver should retry mounting if Cosmos returns "Another process is already performing an operation on this disk or VPS, please try again.; "
, but not necessarily on different errors.
Edited by Maarten de Waard