Generally when programming, you can ask yourself; is this the most convenient way of doing this? Sometimes you don’t know the answer and often you won’t know it until you find the more suitable coding trick or language. I will here present my insights from learning how to code in R and in which way it simplified my coding.

The purpose of the program

I will here continue on coding for nuclear cross section analysis. Generally when you work in the area of experimental and theoretical data you want to know, in quantifiable terms, what kind of errors there are between the experimental EXFOR data and your own theoretical calculations. My goal with the program in question was to quantify the correctness of my theoretical calculations.

The statistical method I chose for this purpose was Chi-squared per degrees of freedom. Why? In part due to the suitable statistics in the method and in part due to the wide application of the method in the area of nuclear physics. The experimental data is discrete, and this also applies to the data from the nuclear reaction codes EMPIRE and TALYS. The only key requirement in comparing two data sets is that the theoretically calculated cross sections should have an energy set with the relevant experimental energy as a subset.

The codes chosen

So how do you program this? There are several solutions for this problem, but the solutions I will present here is a program in C, and a script in Bash and R. The main difference in the two methods are that C lacks all inbuilt statistical functions, while R is designed for the purpose of statistical computing.

The code: C

/* Chi-squared calculation program
Implementation of the Chi-Square Distribution & Incomplete Gamma Function in C
Written and modified by Alex Björkholm, 2015*/

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>

#include <stdbool.h>
#include <unistd.h>
#include <dirent.h>
#include <sys/stat.h>

static long double log_igf (long double K, long double X);
static long double KM (long double K, long double X);
long double log_gamma (long double K);
static double chisq_prob (int DOF, double chisq);
static char *findpath(char *cwddir,char *find);
bool match(char *first, char * second);


int main()
{
printf("\n\t################### CHI-SQUARED, DATA #####################");
printf("\n\tProgram for calculation of chi-squared of theoretical data");
printf("\n\tfiles in comparison with (multiple) experimental data files\n");


/* Defining working directory */
char cwddir[1024];
if (getcwd(cwddir, sizeof(cwddir)) != NULL) {
printf("\tCurrent working dir: %s\n", cwddir);
}

/* Parameter definition */
int i,j,f;
int n,m;
char end[1024];
char prminp[20][20];
double prmbase[2][800];      //infoheader, energyrows
double prmref[20][10][800];  //file, infoheader, energyrows
double prmcomp[20][800];     //infoheader, energyrows

char find[20];
char *file_path;
char filebase_path[20][1024];
char fileref_path[20][1024];
int fileref_no;




/* Reading file names from terminal */
file_path = NULL;
find[0]='\0';
do {
printf("\tEnter the theoretical data search string with or without search keys '*'/'?': ");
scanf("%s",find);
file_path = findpath(cwddir,find);

}while ( file_path == NULL );

strcpy(filebase_path[0], file_path);
printf("\tFile path set: %s\n", filebase_path[0]);


printf("\n\tEnter the number of experimental files to reference your theoretical data to: ");
scanf("%i",&fileref_no);

i = 0;
while ( i < fileref_no ) {
file_path = NULL;
find[0]='\0';
do {
printf("\n\tEnter the experimental data search string with or without search keys '*'/'?': ");
scanf("%s",find);
file_path = findpath(cwddir,find);

}while ( file_path == NULL );

strcpy(fileref_path[i],file_path);
i++;
}


printf("\n\tFile base path:\t\t %s", filebase_path[0]);
for ( j = 0; j < fileref_no ; j++ ) {
printf("\n\tFile reference path 0%i:\t %s",j,fileref_path[j]);
}



/* Getting theoretical data from file into array */
/* Clearing arrays */
for ( i = 0; i < 20; i++) {
prminp[i][0]='\0';
}

for ( i = 0; i < 2; i++) {
for ( j = 0; j < 800; j++) {
prmbase[i][j]=0;
}
}

/* Opening filebase_name */
FILE *fpbase;
fpbase = fopen(filebase_path[0],"r"); // read mode
if( fpbase == NULL ) {
perror("\n\tError while opening the file.\n\n");
exit(1);
}

j=0;
/* Extracting the relevant data (first and second column) from filebase_name and storing it into arrays */
while( fgets(end, 1024, fpbase) != NULL ) {
fscanf( fpbase, "%s %s", prminp[0], prminp[1]);
if (( strlen(prminp[0]) != 0 ) && ( prminp[0][0] != '#' )) {
prmbase[0][j] = atof(prminp[0]);
prmbase[1][j] = atof(prminp[1]);
j++;
}
for ( i = 0; i < 20; i++) {
prminp[i][0]='\0';
}
}
fclose(fpbase);


/* Printing base array */
printf("\n\tPrinting relevant info from file: %s\n",filebase_path[0]);
for ( j = 0; j < 800; j++) {
if ( prmbase[0][j] != 0 ) {
for ( i = 0; i < 2; i++) {
printf("\t%0.6E, %i %i\t", prmbase[i][j], i ,j);
}
printf("\n");
}
}



/* Getting experimental data from files into array */
/* Clearing arrays */
for ( f = 0; f < 20; f++) {
for ( i = 0; i < 10; i++) {
for ( j = 0; j < 800; j++) {
prmref[f][i][j]='\0';
}
}
}

/* Opening fileref_name */
for ( n = 0; n < fileref_no; n++) {

FILE *fpref;
fpref = fopen(fileref_path[n],"r"); // read mode
if( fpref == NULL ) {
perror("\tError while opening the file.\n\n");
exit(1);
}

/* Extracting the relevant data (first and second column) from fileref_name and storing it into arrays */
for ( i = 0; i < 20; i++) {
prminp[i][0]='\0';
}

j = 0;
while( fgets(end, 1024, fpref) != NULL ) {
fscanf( fpref, "%s %s %s %s", prminp[0], prminp[1], prminp[2], prminp[3]);
if (( strlen(prminp[0]) != 0 ) && ( prminp[0][0] != '#' )) {
prmref[n][0][j] = atof(prminp[0]); // Energy data
prmref[n][1][j] = atof(prminp[1]); // Cross section data
j += 1;
}
for ( i = 0; i < 20; i++) {
prminp[i][0]='\0';
}

}

fclose(fpref);
}


/* Printing reference array */
for ( f = 0; f < fileref_no; f++) {
printf("\n\tPrinting relevant info from file: %s\n",fileref_path[f]);
for ( j = 0; j < 80; j++) {
if ( prmref[f][0][j] != 0 ) {
for ( i = 0; i < 2; i++) {
printf("\t%0.6E\t", prmref[f][i][j]);
}
printf("\n");
}
}
}



/* Setting data from parameter arrays with corresponding energy into same array */
for ( i = 0; i < 20; i++) {
for ( j = 0; j < 800; j++) {
prmcomp[i][j] = '\0';
}
}

n = 0;
for ( m = 0 ; m < 800; m++) {
if (( prmbase[0][m] != 0 ) && ( prmbase[0][m] != '\0')) { // ignoring null energies in theoretical file
for ( f = 0 ; f < fileref_no; f++) {
for ( j = 0 ; j < 80; j++) {
if ( prmbase[0][m] == prmref[f][0][j] ) {
prmcomp[0][n] = prmbase[0][m];
prmcomp[2][n] = prmbase[1][m];
prmcomp[f+4][n] = prmref[f][1][j];
}
}
}
}
n++;
}



/* Calculating expected value of all experimental values for each element */
for ( j = 0; j < 800; j++) {
double N = 0;
for ( i = 4; i < 20; i++) {
if (( prmcomp[i][j] != '\0' ) && ( prmcomp[i][j] != 0 )){
prmcomp[3][j] = prmcomp[3][j] + prmcomp[i][j];
N = N + 1;
}
}
prmcomp[3][j] = prmcomp[3][j]/N;
}



/* Calculating chi-squared for each element */
double N = 0;
for ( i = 0; i < 800; i++) {
if (( prmcomp[0][i] != '\0' ) && ( prmcomp[3][i] != 0 )) {
prmcomp[1][i] = pow((prmcomp[2][i] - prmcomp[3][i]),2) / prmcomp[3][i];
N++;
}
}

/* Calculating sum of chi-squared per DOF for elements */
double chisq = 0;
for ( i = 0; i < 800; i++) {
chisq = chisq + prmcomp[1][i];
}

double DOF = N - 1.0;
double chisqdof = chisq / DOF;

printf("\n\tThe correlating, relevant, parameters:");
printf("\n\tchisq %0.4E\t chisqdof %0.4E\t DOF %0.2f\n\n", chisq, chisqdof, DOF);



/* Opening filecomp_name */
FILE *fpcomp;
fpcomp = fopen("chisq-calculate-output.tot","wb+"); // write mode, creating new file if needed
if( fpcomp == NULL ) {
perror("\tError while opening the file.\n\n");
exit(1);
}

/* Printing data from comparison parameter array */
fprintf(fpcomp, "# Theoretical input file:\n# %s\n#\n", filebase_path[0]);
fprintf(fpcomp, "# Experimental input file(s):\n");
printf("\tTheoretical input file:\n\t%s\n\n", filebase_path[0]);
printf("\tExperimental input file(s):\n");
for ( i = 0 ; i < fileref_no ; i++) {
fprintf(fpcomp, "# %s\n", fileref_path[i]);
printf("\t%s\n", fileref_path[i]);
}

fprintf(fpcomp, "#\n# Energy [MeV]\tchi-squared\txs(theor.) [mb]\txs(exp. aver.) [mb]\txs(exp. individual) [mb]\n");
printf("\n\tEnergy [MeV]\t\tchi-squared\t\txs(theor.) [mb]\t\txs(exp. aver.) [mb]\txs(exp. ind.) [mb]\n");
for ( j = 0 ; j < 800 ; j++) {
if (( prmcomp[0][j] != '\0' ) && ( prmcomp[0][j] != 0 )){
for ( i = 0 ; i < fileref_no + 4 ; i++) {
printf("\t%0.6E\t", prmcomp[i][j]);
fprintf(fpcomp, "%0.6E\t", prmcomp[i][j]);
}

printf("\n");
fprintf(fpcomp,"\n");
}
j++;
}


printf("\n\tDegrees of freedome:  %0.2f\n", DOF);
printf("\tChi-squared value:  %0.2f\n", chisq);
printf("\tProbability value:  %0.15f\n\n", chisq_prob(DOF, chisq));
fprintf(fpcomp,"# Degrees of freedome:  %0.2f\n", DOF);
fprintf(fpcomp,"# Chi-squared value:  %0.2f\n", chisq);
fprintf(fpcomp,"# Probability value:  %0.15f\n\n", chisq_prob(DOF, chisq));

fclose(fpcomp);

return 0;

}


static char *findpath(char *cwddir,char *find)
{
char *pathout;
char *finddir;

char temp[50];
char temp01[50];
char temp02[50];

char path[1024];
char wdir[1024];

strcpy(temp, find);
strcpy(wdir, cwddir);
char target = '/';
int n = 0;


DIR *dp;
struct dirent *entry;
struct stat statbuf;

if( (dp = opendir(".")) == NULL ) {
fprintf(stderr,"\tCannot open directory: %s\n", wdir);
return 0;
}

/* checking if 'find' contains '/' */
while ( strchr(find, target) != NULL ) {

/* Splitting 'find' in two parts dependent on '/'.
Second string modified by removing first char with +1. */
finddir = strtok(find, "/");
find = (strrchr(temp, '/') + 1);

while( (entry = readdir(dp)) != NULL ) {
lstat(entry->d_name,&statbuf);

if (( S_ISDIR(statbuf.st_mode) ) && ( match(finddir, entry->d_name) == true )) {
strcpy(temp01, entry->d_name);
n++;
}
}
closedir(dp);

if ( n == 1 ) {
sprintf(wdir,"%s%s%s",wdir,"/",temp01);
chdir(wdir);
n = 0;
}
else {
printf("\n\tERROR: %i directories found with the string '%s'\n\n",n,finddir);
}

}


if( (dp = opendir(".")) == NULL ) {
fprintf(stderr,"\tCannot open directory: %s\n", wdir);
return 0;
}

while( (entry = readdir(dp)) != NULL ) {
lstat(entry->d_name,&statbuf);
sprintf(path,"%s%s%s",wdir,"/",entry->d_name);

/* If regular file and contains "find" string */
if(( S_ISREG(statbuf.st_mode) ) && ( match(find, entry->d_name) == true )) {
printf("\tAvailable file: %s\n",path);
strcpy(temp02, entry->d_name);
n++;
}
}
closedir(dp);
chdir(cwddir);

if ( n == 1 ) {
sprintf(path,"%s%s%s",wdir,"/",temp02);
pathout = path;
return pathout;
}
else {
printf("\n\tERROR: %i files found with the string '%s'\n\n",n,find);
return NULL;
}

}


/* The main function that checks if two given strings match.
The first string may contain wildcard characters */
bool match(char *first, char *second)
{
// If we reach at the end of both strings, we are done
if (*first == '\0' && *second == '\0')
return true;

// Make sure that the characters after '*' are present in second string.
// This function assumes that the first string will not contain two
// consecutive '*'
if (*first == '*' && *(first+1) != '\0' && *second == '\0')
return false;

// If the first string contains '?', or current characters of both
// strings match
if (*first == '?' || *first == *second)
return match(first+1, second+1);

// If there is *, then there are two possibilities
// a) We consider current character of second string
// b) We ignore current character of second string.
if (*first == '*')
return match(first+1, second) || match(first, second+1);
return false;
}



/* Wikipedia - Chi-squared Distribution -> Cumulative distribution function
http://en.wikipedia.org/wiki/Chi-squared_distribution#Cumulative_distribution_function */
static double chisq_prob(int DOF, double chisq)
{
if (chisq < 0 || DOF < 1)
{
return 0.0;
}

double K = ((double)DOF) * 0.5;
double X = chisq * 0.5;
if(DOF == 2) // special case for Probability function
{
return exp(-1.0 * X);
}

long double PValue;
long double ln_CDF;
ln_CDF = log_igf(K, X)- log_gamma(K);
PValue = 1.0 - expl(ln_CDF);

return (double)PValue;

}


/* Returns the Natural Logarithm of the Incomplete Gamma Function
The logaritmic expression allows more accurate calculations, but the number of iterations are larger.
Wikipedia - Incomplete Gamma Function -> Evaluation formulae -> Connection with Kummer's confluent hypergeometric function
https://en.wikipedia.org/wiki/Incomplete_gamma_function#Connection_with_Kummer.27s_confluent_hypergeometric_function
*/
static long double log_igf(long double K, long double X)
{
if(X < 0.0)
{
return 0.0;
}

long double Sc;
long double C;
Sc = (K * logl(X)) - X - logl(K);
C = KM(K, X);

return logl(C) + Sc;
}


/* Kummer's confluent hypergeometric function for Incomplete Gamma function
Wikipedia - Incomplete Gamma Function -> Evaluation formulae -> Connection with Kummer's confluent hypergeometric function
https://en.wikipedia.org/wiki/Incomplete_gamma_function#Connection_with_Kummer.27s_confluent_hypergeometric_function
*/
static long double KM(long double K, long double X)
{
long double Sum = 1.0;
long double Nom = 1.0;
long double Denom = 1.0;
int i = 0;

for(i = 0; i < 1000; i++) // Loops for 1000 iterations
{
Nom *= X;
K++;
Denom *= K;
Sum += (Nom / Denom);
}

return Sum;
}


/*
Returns the Natural Logarithm of the Gamma Function, based on Spouge's approximation (modification of Stirling).
The logaritmic expression allows more accurate calculations, but the number of iterations are larger.
Wikipedia - Spouge's approximation: https://en.wikipedia.org/wiki/Spouge's_approximation
*/
long double log_gamma(long double K)
{
/* A is the level of accuracy of the calculation. Spouge's approximation: the desired level of precision can only be reached with
an extra precision available so that the desired level can be built. */
double A = 15;
int k = 1;
const long double SQRT2PI = sqrtl(atanl(1.0) * 8.0);

long double Sc;

Sc = ((K + 0.5)*logl(K + A)) - (K + A) - logl(K);

long double Sum = SQRT2PI;
long double Ck = 0.0;
long double F = 1.0;

for(k = 1; k < A; k++)
{
Ck = powl(-1,k-1) * powl(A - k, k - 0.5) * expl(A - k);
Sum += (Ck / (F * (K + k)));
F *= k;
}

return logl(Sum) + Sc;
}





As a purely C code this program somewhat lacks the dynamic nature that I find convenient in Bash. It still functions and does the job, but the programming isn’t really versatile in its nature. You couldn’t really use this code for any other purpose that similar to the current application. Anything else would have to be coded into place.

The code: Bash & R

#!/bin/bash
#
# This script calculates the chi-squared value of a theoretical file compared to experimental data
# by using R statistical software
# Author: Alex Björkholm 2016


# Manual of input output
if [ "$1" == "man" ]; then
echo -e "This script calculates the chi-squared value of a theoretical file compared to experimental data by using R statistical software.
Input options:
1: input file search string, theoretical data (single file)
2: input file search string, experimental data (single/multiple files)
3: (no)/energy
experimental energy sort (optional) active if input 'energy'\n"

else

input1_theoretical=$1
input1_experimental=$2
energy=$3

declare -a files_theoretical
declare -a files_experimental
declare -a input_theoretical
declare -a input_experimental

IFS=$'\n'

#Finding input file, theoretical
files_theoretical+=($(find . -maxdepth 1 -type f -iwholename \*$input1_theoretical\* | sort -u))

#Finding input file, experimental
files_experimental+=($(find . -maxdepth 1 -type f -iwholename \*$input1_experimental\* | sort -u))


#Declare files
echo -e "\nInput files, theoretical"
for each in ${files_theoretical[@]}; do echo -e "\t$each"; done

echo -e "\nInput files, experimental"
for each in ${files_experimental[@]}; do echo -e "\t$each"; done

if [ ! ${#files_theoretical[@]} == 1 ]; then
echo -e "\nNumber of theoretical files is not singular (no. of files = ${#files_theoretical[@]})\n"
exit 2
fi



for each in ${files_theoretical[@]}; do
# Remove irrelevant headers, leaving title
sed -e '1{/\#.*/d}' -e '2{/\#.*/d}' -e 's/\#//g' -e 's/type//g' $each > temp.tot
mv temp.tot $each
done

echo -e "\nNumber of experimental files: ${#files_experimental[@]}"
for each in ${files_experimental[@]}; do
# Check files for theoretical data and problems
if [ ! $(grep -i "empire\|talys\|tendl" $each | wc -l) == 0 ]; then
echo -e "\nWarning: file $each contains string empire\|talys\|tendl"
# Exit possibility for script
echo -n "Exit script? Yes/No? >"
read input
if [[ "${input,,}" = y* ]]; then
echo "\nExiting script\n\n"; exit 2
fi
fi

# Remove irrelevant headers, leaving title
sed -e '1{/\#.*/d}' -e '2{/\#.*/d}' -e 's/\#//g' -e 's/type//g' $each > temp.tot
mv temp.tot $each
done



# Setting input files for R
for each in ${files_theoretical[@]}; do
each=$(basename $each)
input_theoretical+=($(echo "\"$each\"",))
done
theoretical=$(echo ${input_theoretical[@]} | sed s'/.$//')

# Setting input files for R
for each in ${files_experimental[@]}; do
each=$(basename $each)
input_experimental+=($(echo "\"$each\"",))
done
experimental=$(echo ${input_experimental[@]} | sed s'/.$//')



cat <<EOF > R-script.R
writeLines("\n\nExecuting R statistical software code")


experimental <- c(${experimental[@]})
noexperimental <- ${#files_experimental[@]}
theoretical <- c(${theoretical[@]})
notheoretical <- ${#files_theoretical[@]}


# Importing theoretical data from file
Rtheoretical <- read.table(theoretical, header=TRUE)
# Setting data labels for theoretical data
colnames(Rtheoretical)[1]<-paste("Energy")
colnames(Rtheoretical)[2]<-paste("xstheor")


Rexperimental<-0
# Importing experimental data from file
for (val in experimental) {
temp <- read.table(val, header=TRUE)
temp <- temp[,c("x", "y")]
# Setting data labels for experimental data
colnames(temp)[1]<-paste("Energy")
colnames(temp)[2]<-paste("xsexper")
if( ! is.data.frame(Rexperimental) ){
#Storing first parameters in Rexperimental when no data present
Rexperimental<-temp }
else {
# Merging the two data frames of experimental data
# Note: duplicate values will dissapear. This need to be resolved
# if one wants a weighted data analysis
Rexperimental<-merge(Rexperimental, temp, all = T)}
}

# Producing data table with single energy input
# Taking means of cross section for identical energy values
Rexperimental<-aggregate( xsexper ~ Energy, Rexperimental, mean )


# Merging the matching values from Rtheoretical and Rexperimental
# Getting matching values from Rtheoretical
t1 <- Rtheoretical[ Rtheoretical\$Energy %in% Rexperimental\$Energy,]
# Getting matching values from Rexperimental
t2 <- Rexperimental[ Rexperimental\$Energy %in% Rtheoretical\$Energy,]


# Merging energies and cross sections into single data table
xs.data.table <- cbind(t1, t2)
# Writing new row numbering for data set
rownames(xs.data.table)<-NULL


# Merging cross sections into single data table for chisq.test
chisq.data.table <- cbind(t1\$xstheor, t2\$xsexper)
# Setting header names for chisq.data.table
colnames(chisq.data.table)<-c("xstheor", "xsexper")
# Writing new row numbering for data set
rownames(chisq.data.table)<-c(1:nrow(chisq.data.table))


# Printing resulting table
writeLines("\nCross section data")
print(xs.data.table)
writeLines("\nChi-squared input data")
print(chisq.data.table)

# Running chi squared test
writeLines("\n\nChi-squared data")
chisq.test(chisq.data.table)

q()
EOF

# Executing R-script
Rscript R-script.R

unset IFS
fi

echo -e "\nScript finished\n\n"

Note that the integration of Bash in the script isn’t essential and the reasoning behind this solution was mainly to enable the strong file search capabilities of find in Bash. The script could have been written entirely in R, but I chose not to at the moment of coding. The script therefore also displays the capability to execute scripting via Bash. The execution is however somewhat clumsy when it involves the manufacturing of a R script followed by the script’s execution.

The difference?

As seen above both programs do more or less the same thing. The main difference is that R comes with already inbuilt functions for handling the statistics, while the C program has the statistics coded into existence.

One can of course ask which program is better. I prefer the Bash & R solution, mainly due to the versatility of the code. Adding new features isn’t difficult, if I would need any, and there is inbuilt support in R that makes the coding simpler.

How does it work?

Yeah, so how does the code work? Mainly you need the experimental data sets as well as the theoretical data set. This can be acquired via the data plot script I have previously written about, with the data output set to the table option.  We need to remember that our focus are finding how good the theoretical calculations are compared to the experimental values. So before the program is run a theoretical calculation should be preformed, whereby we need to get theoretical values (here nuclear reaction cross sections) for all the existing experimental reference points (in this case energies). After that the scripts/programs would need to be executed in the directory of the experimental and theoretical files.

The C program, when executed from the terminal command line, will ask for the theoretical file name (actually it searches for the name based on a search string, since I’m to lazy to always copy paste the full name) and then ask for the number of experimental files to compare to, where after one needs to to provide the file search strings for those files. After that the script will make the necessary checks when it executes itself where after the result is printed out in the terminal.

The Bash and R program works in a similar way, but here we can enter all the necessary data straight into the terminal when running the program. So everything is a lot smoother in execution.

Does the code work?

Well, off course. I hope that I have commented my code to the degree that it is comprehensible for anyone, but that is usually ofter quite hard to acheive. For my needs this works perfectly. It might need some readthrough before implementation so that you understand its working principles in order to get the correct results. But it should be very straight forward! Anyway, I hope this inspires.

A final note

Usually your files aren’t that clean form extra formatting. Especially if they have been produced via the table option in gnuplot. My easy fix for this is a simple bash script – no surprise there, due to my love for Bash – which cleans the table file from all the unnecessary formatting and formats it in the correct way for the Chi squared scripts (C or Bash). This code can be found here below.

#!/bin/bash
#
# This script splits and refines output table from gnuplot for further analysis
# Author: Alex Björkholm 2016


# Manual of input output
if [ "$1" == "man" ]; then
echo -e "This script splits and refines output table from gnuplot for further analysis
Input options:
1: input file
2: output name string
3: (no)/energy
experimental energy sort (optional) active if input 'energy'\n"

else

inputfile=$1
outputname=$2
energy=$3

declare -a files
IFS=$'\n'

# Using sed to remove every 'i' or 'o' at lines without '#'
# Using sed to remove lines with 'NaN'
# Using sed to remove the first line if it is empty
sed -e '/^\#/!s/i//' -e '/^\#/!s/o//' -e '/NaN.*/d' -e '1{/^$/d}' $inputfile | cat -s > temp.out

# Counting number of data sets based on the numer of titles
#N=$(($(grep -o '\#' temp.out | wc -l)/3 - 2))
N=$(($(grep -o 'title' temp.out | wc -l) - 2))

# Splitting file at every empty row
csplit -sf $outputname -b '%d.tot' temp.out '/^$/' {$N}
rm temp.out


# Removing first empty row in every file
files+=($(find . -maxdepth 1 -type f -iwholename \*$outputname\*.tot | sort -u))
for each in ${files[@]}; do
# Removing first blank row from file
sed '1{/^$/d}' $each > temp.out
mv temp.out $each
# Removing first blank space at line
sed -i 's/ //' $each
done


if [[ ! -z ${energy// } ]]; then
if [ $energy == "energy" ]; then
for each in ${files[@]}; do
# Getting all energy data points not matching empire, talys or tendl
if [ $(grep -i "empire\|talys\|tendl" $each | wc -l) == 0 ]; then
# Extracting the first number/word
cat $each | cut -d " " -f1 >> temp.out
fi
done

# Removing comment lines and empty lines
sed -e '/\#.*/d' -e '/^$/d' temp.out > energy.tot
# Sorting data file
sort -n -u energy.tot -o energy.tot
rm temp.out
fi
fi

unset IFS
fi

Advertisements