Create CSV Reports from GIT Repositories containing your commits

Some months ago, I got the need to run over several GIT Repositories and collect the work I did on each day. The play was to gather all the data and collect them in different CSV files.
Since I wasn’t able to find a ready script for this task, I guess it is a good candidate for a quick blog post :-).

The first part is a file folders.txt with a list of all GIT Repositories that we want to analyse. (All folders are Subfolders of a root Directory /Users/user/GIT. This root folder can be changed later on.)

cat folders.txt

tools
utils
customer1/project2
customer2/project1
customer2/project2
customer3/project1

The script does several things:

  1. Going through every repository and collecting Project,Date,Commit,Name,Email,Comment of each commit.
  2. We also need to do some filtering first to deal with characters in the commit messages, that might break the CSV later on.
  3. The last step is to split the complete log file into the different months.

At the moment the script does only run for one specific year, but that can be changed with adding another loop to run it for a bunch of years.

The Source of the Script is:

#!/bin/bash

#YEAR=$(date +"%Y")
HEADER=Project,Date,Commit,Name,Email,Comment
YEAR=2022
ROOT=$(PWD)
GIT_ROOT=$HOME/GIT
PROJECTS=$(cat folders.txt)
TMP_DIR=/tmp/csv
CREATOR="Philipp Haussleiter"
echo "" > /tmp/csv/all.csv
mkdir -p csv/$YEAR $TMP_DIR
rm -Rf $TMP_DIR/*

for PROJECT in ${PROJECTS}; do
    echo "creating log of ${PROJECT}"
    DIR=${GIT_ROOT}/${PROJECT}
    BASENAME=$(basename $DIR)
    cd ${DIR}
    git log --pretty=format:__${BASENAME}__,__%cs__,__%h__,__%an__,__%ae__,__%s__ > /tmp/csv/${BASENAME}.a.log
    cat /tmp/csv/${BASENAME}.a.log | sed -r 's/[\"]+/\"\"/g' > /tmp/csv/${BASENAME}.b.log
    cat /tmp/csv/${BASENAME}.b.log | sed -r 's/__+/\"/g' > /tmp/csv/${BASENAME}.log
    echo "" >> /tmp/csv/${BASENAME}.log
    cat /tmp/csv/${BASENAME}.log >> /tmp/csv/all.csv
    rm /tmp/csv/${BASENAME}.a.* /tmp/csv/${BASENAME}.b.*
    cd ${ROOT}
done

for MONTH in $(seq -f "%02g" 1 12); do
    FILE=csv/$YEAR/${YEAR}-${MONTH}.csv
    FILTER=${YEAR}-${MONTH}
    echo $HEADER > $FILE
    cat /tmp/csv/all.csv |grep "$CREATOR" |grep $FILTER >> $FILE
    echo $HEADER > csv/$YEAR/all.csv
    cat /tmp/csv/all.csv |grep "$CREATOR" >> csv/all.csv
    echo $FILE
done

After running the script for the years 2021 and 2022, you get a folder structure like this:

csv
├── 2021
│   ├── 2021-01.csv
│   ├── 2021-02.csv
│   ├── 2021-03.csv
│   ├── 2021-04.csv
│   ├── 2021-05.csv
│   ├── 2021-06.csv
│   ├── 2021-07.csv
│   ├── 2021-08.csv
│   ├── 2021-09.csv
│   ├── 2021-10.csv
│   ├── 2021-11.csv
│   └── 2021-12.csv
└── 2022
    ├── 2022-01.csv
    ├── 2022-02.csv
    ├── 2022-03.csv
    ├── 2022-04.csv
    ├── 2022-05.csv
    ├── 2022-06.csv
    ├── 2022-07.csv
    ├── 2022-08.csv
    ├── 2022-09.csv
    ├── 2022-10.csv
    ├── 2022-11.csv
    └── 2022-12.csv