Commit 606cbaf2 authored by Maarten de Waard's avatar Maarten de Waard 🤘🏻
Browse files

Merge branch '18-make-disk-mapping-store-resilient-to-reboots' into 'master'

Resolve "Make disk mapping store resilient to reboots"

Closes #18

See merge request !12
parents 9d965157 ca4fa842
......@@ -50,12 +50,37 @@ Because of this, the driver needs to keep some state so that it knows which
disks are already attached, and can connect the ID of the PV as used by
Kubernetes to the Cosmos ID of the underlying disk.
This state is kept on the node as a set of files and symlinks in
`/var/lib/ghost`:
* for every disk, a file is kept in `/var/lib/ghost/disks/$cosmosID` containing
This state is stored on the node in `/run/ghost`. That is assumed to be a tmpfs
mount, cleared on every reboot. We pass a special `detach_on_shutdown` flag to
Cosmos2 when attaching ghost disks, so Cosmos2 will detach these disks when the
node shuts down, so it starts without any ghost disks attached, matching the
empty state in `/run/ghost`. Details of the state recorded:
* for every disk, a file is kept in `/run/ghost/disks/$cosmosID` containing
a log of activity pertaining to that disk;
* for every PV that is attached, a symlink is created in
`/var/lib/ghost/pvs/$pvID`, that points to the file in `disks` that
`/run/ghost/pvs/$pvID`, that points to the file in `disks` that
corresponds to the disk backing the PV;
* for every PV that is attached, a file is kept in `/var/lib/ghost/mounts/$pvID`
* for every PV that is attached, a file is kept in `/run/ghost/mounts/$pvID`
with as contents the device path.
## Upgrading
### Upgrading from 0.4.0 to 0.6.0
To upgrade a running system with existing disks from 0.4.0 to 0.6.0, some
manual actions are required:
1. give all existing ghost disks the `detach_on_shutdown = true` field in the
database;
2. make a symlink from `/run/ghost` to `/var/lib/ghost` on every worker;
3. upgrade the ghost helm release to 0.6.0;
4. reboot all workers.
5. If all is well at this point, you can remove the `/var/lib/ghost` directory
from all workers.
Explanation: this is necessary because we need to upgrade the ghost helm
release while sites are up, and the new ghost version will look for the disk
administration files in `/run`. Also 1. is necessary because these disks were
created with the older ghost version that didn't set the flag, and we need
cosmos to detach them because after the reboot ghost has an empty disk
administration and assumes that there are no disks present from boot, so that
better be true!
......@@ -4,7 +4,7 @@ set -o errexit
# Configuration.
logFile=${logFile:-"/var/log/ghost-storage-driver.txt"}
mappingStoragePath="/var/lib/ghost"
mappingStoragePath="/run/ghost"
# Write program result to stdout.
output() {
......@@ -80,25 +80,23 @@ ghostAction() {
local apiToken=$2
# Numerical ID of disk image.
local diskImage=$3
# Numerical ID of VPS.
# Numerical ID of VPS. Only used for "attach".
local vps=$4
# URL to the Cosmos2 instance to talk to.
local server="$cosmosUrl"
local url="${server}/api/v2/disks/$diskImage/actions"
debug "curling cosmos: $url"
case $action in
attach) ;&
detach) ;;
attach)
local data="{\"type\": \"attach\", \"droplet\": $vps, \"detach_on_shutdown\": true}"
;;
detach)
local data="{\"type\": \"detach\"}"
;;
*)
output "Unsupported action"
exit 0
esac
local url="${server}/api/v2/disks/$diskImage/actions"
debug "curling cosmos: $url"
if [[ -z "$vps" ]]
then
local data="{\"type\": \"$action\"}"
else
local data="{\"type\": \"$action\", \"droplet\": $vps}"
fi
if ! response=$(curl -sS -X POST -H "Authorization: Bearer $apiToken" "$url" -d "$data")
then
exitWithFailure "$response"
......@@ -231,10 +229,10 @@ extractPV() {
# mapping to learn the cosmos ID of the disk we need to detach.
storeMapping() {
local pvID=$1
local cosmosID=$2
local cosmosDiskId=$2
local device=$3
local result
debug "storing mapping from PV ID ${pvID} to cosmos ID ${cosmosID}"
debug "storing mapping from PV ID ${pvID} to cosmos ID ${cosmosDiskId}"
mkdir -p "$mappingStoragePath"
pushd "$mappingStoragePath" > /dev/null
......@@ -244,10 +242,10 @@ storeMapping() {
mkdir -p "disks" "pvs" "mounts"
# Create an empty file at the symlink target if necessary.
touch "disks/${cosmosID}"
touch "disks/${cosmosDiskId}"
# Create the mapping symlink, pointing from the PV ID to the cosmos ID.
if ! result=$(ln -s "../disks/${cosmosID}" "pvs/${pvID}" 2>&1)
if ! result=$(ln -s "../disks/${cosmosDiskId}" "pvs/${pvID}" 2>&1)
then
exitWithFailure "failed to create mapping symlink: ${result}"
fi
......@@ -289,8 +287,8 @@ readMapping() {
errorMessage="failed to read link pvs/${pvID}"
exitWithFailure "$errorMessage"
fi
debug "found mapping with target ${cosmosID}"
cosmosID=${result#../disks/}
cosmosDiskId=${result#../disks/}
debug "found mapping with target ${cosmosDiskId}"
# Read file to determine device path.
if ! [ -e "mounts/${pvID}" ]
......@@ -422,7 +420,7 @@ unmountAction() {
# Read the api token from the kernel command line.
getKernelParams
if ! ghostAction "detach" "$apiToken" "$cosmosID"
if ! ghostAction "detach" "$apiToken" "$cosmosDiskId"
then
exitWithFailure "$errorMessage"
fi
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment