Darek's thoughts on IT solutions

How to remove backups of the same size, using Python 2.7 script

14.10.2019

In my work we have two QNAP’s machines. They are used as additional backup servers. They store database files in directories whose names are made up of date and hour/minutes of backup creation. Disk space of one of these servers was almost full. Both servers are accessible via ssh using 'admin' account. There is no Nodejs or Perl installed (by default) but Python 2.7 is. So my task was to reduce used disk space by deleting some of these directories. There are two files in each backup directory *.log and *.db. I wrote Python 2.7 script using standard modules (that come by default with Python). In short - that script (run with no command line arguments) search for case where at least two subsquent backups have the same size of .db file and allows you delete former backup directory. You can also do this for all such cases using "--all" command line argument or you can list and check if such cases exist using "--show" switch.

Working (but still not fully tested, use at your own risk!!!, be careful!!!) version was ready after 3 days but I was doing other things too so I assume I spent 1.5 day on this.

BTW - I haven’t used Python for many years (8 or so)

#!/usr/local/bin/python
# ver: 0.6

import glob
import os
import sys
# 2018-10-18_23-00
files = glob.glob("*-*-*_*-*/*")

items = {}

for file in files:
    statinfo = os.stat(file)
    #print statinfo.st_size, file
    items[file] = statinfo.st_size

#print files


def SortByNumPart(key):
    parts = key.split("/")
    return [-int(parts[0].replace("-", "").replace("_", "")), parts[1]]


# list
itemsSorted = sorted(items, key=SortByNumPart)

logItemSorted = filter(lambda item: item.split(".")[-1] == "log", itemsSorted)
dbItemSorted = filter(lambda item: item.split(".")[-1] == "db", itemsSorted)
aDirs = map(lambda item: item.split("/")[0], logItemSorted)

# for sItem in logItemSorted:
#	print sItem, "\t", items[sItem]

aOutput = []
for index in range(len(aDirs)):
    sDate = logItemSorted[index].split("/")[0]
    if (len(aDirs) > index + 1):
        sDateNext = logItemSorted[index + 1].split("/")[0]
        if (sDate == dbItemSorted[index].split("/")[0]):
            aOutput.append([
                sDateNext,
                items[logItemSorted[index + 1]],
                items[dbItemSorted[index + 1]],
                sDate,
                items[logItemSorted[index]],
                items[dbItemSorted[index]],
                items[logItemSorted[index + 1]] - items[logItemSorted[index]],
                items[dbItemSorted[index + 1]] - items[dbItemSorted[index]]
            ])


def findLastZero(mylist, pos):
    ret = -1
    for index in range(len(mylist)):
        if (mylist[index][pos] == 0):
            ret = index
    return ret


if (len(sys.argv) == 1):
    sDirItem = None

    last = findLastZero(aOutput, 7)
    last = last + 1

    for row in aOutput[0:last]:
        print '%-15s %15i %15i %-15s %15i %15i %15i %15i' % (
            row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7])
        if (row[5] - row[2] == 0):
            sDirItem = row[0]

    if (sDirItem):
        print "rm -rf", sDirItem
        anwser = raw_input('run [Y|y]es/[N|n]o ? ').lower()

        if (anwser == "y"):
            os.system("rm -rf " + sDirItem)
        else:
            print "You didn't choose 'Y|y'"

elif (sys.argv[1] == '--all'):
    aDirItems = []
    for row in aOutput:
        if (row[5] - row[2] == 0):
            print '%-15s %15i %15i %-15s %15i %15i %15i %15i' % (
                row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7])
            aDirItems.append(row[0])

    if (len(aDirItems)):
        print "rm -rf", " ".join(aDirItems)
        anwser = raw_input('run [Y|y]es/[N|n]o ? ').lower()

        if (anwser == "y"):
            os.system("rm -rf " + " ".join(aDirItems))
        else:
            print "You didn't choose 'Y|y'"

elif (sys.argv[1] == '--show'):
    # list reversed, latest first
    for row in aOutput[::-1]:
        print '%-15s %15i %15i %-15s %15i %15i %15i %15i' % (
            row[0], row[1], row[2], row[3], row[4], row[5], row[6], row[7])