First (serious) commit

This commit is contained in:
luiandresgonzalez 2022-09-28 16:20:24 -04:00
parent 00c7d4c819
commit 7a2360f680
13 changed files with 3349 additions and 0 deletions

4
.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata

BIN
data/farmB.RData Normal file

Binary file not shown.

BIN
data/farm_A_demo.xlsx Normal file

Binary file not shown.

239
demo-notebook.Rmd Normal file
View File

@ -0,0 +1,239 @@
---
title: "Demo for the presentation 'Getting insights from automatic feeder data'"
author:
- Luis Gonzalez-Gracia (lagog6@ulaval.ca)
- Twitter - @gonzaluisandres
date: "`r format(Sys.time(), '%d %B %Y')`"
tags: [social network analysis, animal science]
abstract:
This short R Notebook will show the workflow as presented in my presentation today.
We will grab to different sources of data, and do some wrangling, visualization,
and analysis.
The script can also be accessed in the `demo.R ` file, also in this repository.
output:
pdf_document: default
html_notebook: default
editor_options:
chunk_output_type: inline
---
# How to get and run this files on your own computer:
![The files are IN the computer](img/zoolanderfilesincomputer.jpg)
You can run and tweak this code by cloning my repository in GitTea.org.
https://git.disroot.org/luangonz/feeder-analytics-demo.git
# Setup
Since the idea of this project is to be as modular and automated as possible,
I created a set of custom functions that do the heavy lifting in the background,
without cluttering too much the script file. This has some advantages and disadvantages,
since although it is easier to read and the workflow is easy to follow along, if
issues appear, then debugging and following the function that originated the issue
is more time-consuming.
The folder is organized in the main project files and two folders:
- The `data` folder holds the data files we will be loading into the environment.
- The `setup` folder (used in this section) holds two key files:
- `functions.R` that holds all the custom functions
- `loadlibraries.R` that has all the libraries needed for any script file that
uses the same set of libraries (so you do not need to copy and paste it in every
file of the project)
When we run the `source()` function at this step, we are loading both the libraries
and the functions that we are going to use throughout the demo.
```{r setup, echo=TRUE, message=FALSE, warning=FALSE}
source("setup/functions.R")
source("setup/loadlibraries.R")
```
# Importing data
First step in the process is to load the excel file in the environment.
There are packages that can natively import excel files that are very straightforward,
but some of them do not handle the Date information properly. For this I have a custom function that takes into account some common issues and with the `method =` parameter, I
can fine-tune the importing method according to where the data comes from.
```{r}
farmA <- xlsx_to_dataframe(filename = "data/farm_A_demo.xlsx", # selects the file
method = "farm_A_11rows") # selects the farm A method
```
Lucky for us, farm B has its data directly formatted in the RData format, which
helps a lot in the importing process. A simple `load()` function and the data is there.
```{r}
load("data/farmB.RData") # load the file into the environment
farmB <- data_alim # renaming to make it a better understandable filename
rm(data_alim) # removing the original imported dataframe
```
# Checking structure of data
We will look at the raw imported data as it comes from the import procedure.
```{r}
str(farmA)
```
We see some issues that are of concern, for example time of start of visit is not
in a proper date/time type, but it is only a character. Lets check Farm B:
```{r}
str(farmB)
```
Similar issues with the time and also the titles of the variables between these two
are different, making it hard to work with them with just a piece of code. So the
strategy is to take this (or any) kind of dataframe that we work with, and standardize
it to a format that any of the next functions can work with. Next step then, is
standardization.
# Standardization
The `harmonize_feeder_data()` function is a custom function that allows us to
funnel any kind of source file into a single, homogenous data structure so it
can be fed into the following functions in the workflow. It has two parameters:
- `groupstations`: If TRUE, the station number becomes a group in the dataframe
(useful for summarizations).
- `method`: a selector for the method that it will use, according to which source
the data frame comes from
- `remove_filling`: if TRUE, it will remove the FILLING events of the feeder (when
the feeder is filled up).
- `remove_na`: if TRUE, it will remove unavailable data that might interfere in
some of the calculations.
```{r}
farmB_standard <- harmonize_feeder_data(farmB,
groupstations = TRUE,
method = "deschambault")
farmA_standard <- harmonize_feeder_data(farmA,
groupstations = TRUE,
method = "farm_A_raw",
remove_filling = TRUE,
remove_na = TRUE)
```
We will check the structure again to see if everything is in order:
```{r echo=TRUE}
str(farmA_standard)
```
```{r echo=TRUE}
str(farmB_standard)
```
With this function we`ve managed to homogenize the data structure so we can move
on now to our next step.
# Inspecting data integrity
Well be running some more custom functions to plot valuable data.
## Population plot
Farm A has 22 different pens. It would be valuable to see if there are any issues
regarding the population of these pens, for example a quick reduction or increase
in size or a quick drop due to data loss form the hardware
### Population plot of farm A
```{r message=FALSE, warning=FALSE}
populationPlot(farmA_standard)
```
### Population plot of farm B
```{r}
populationPlot(farmB_standard)
```
We can evidence with these plots that there are some pen size fluctuations and
some data loss in some of the periods. These losses will need to be taken into
account during the analyses.
## Visualizing visits to the feeder
We can visualize a timeline of visits to the feeder for any station or day
with this custom function, `visitPlotsDay()`:
```{r}
visitPlotsDay(farmA_standard,
thedate = "2021-06-03",
thestation = 11,
singlestrip = FALSE)
```
With the last plot, we have one line per pig, but sometimes seeing all the visit
in a single line is useful. This is what the `singlestrip` parameter is useful for.
```{r}
visitPlotsDay(farmA_standard,
thedate = "2021-06-03",
thestation = 11,
singlestrip = TRUE)
```
## A birdseye view of all the data for a station
The `inspectDay` function can show the visits to a feeder for the whole period,
in a single plot. It can also show a population plot similar to the previous section.
```{r}
inspectDay(farmA_standard, thestation = 11)
```
# Building network visualizations and analyisis
## Building the igraph objects and plotting
The following steps succesively converts the data into the network objects of the
`igraph` package.
```{r}
farmA_list <- makeAllStationsPerdate(farmA_standard, domerge = "F")
farmA_pairs <- makePairsPerStation(farmA_list, mythreshold = 5) # TAKES A LONG TIME
farmA_network <- makeIGraphObjects(farmA_pairs, directed = T)
plot(farmA_network[["12"]][["2021-05-20"]])
```
## Making summarizations based on the network data
The following steps will analyze how a whole-network parameter, the [Network Density](https://methods.sagepub.com/reference/the-sage-encyclopedia-of-educational-research-measurement-and-evaluation/i14550.xml) progresses through time. It looks that there is a downward
trend in the group we are studying.
```{r}
getmetheplot_pliz(site = "Farm A",
df = farmA_standard,
thestation = 12)
```
The reason why this trend occurs is not clear, but it could be
that these animals are learning to avoid each other. Another possible explanation
is that the animals are going less to the feeder as they grow, and thus there
is less of a chance that the animals can interact with each other. The
`getmetheplot_pliz_but_corrected_this_time()` function corrects the network
density by the times the animals visit the feeder.
```{r}
getmetheplot_pliz_but_corrected_this_time(site = "Farm A",
df = farmA_standard,
thestation = 5)
```
Even with this correction, we still see a downward trend.
Thanks for reading all of this, you can reach me at Microsoft Teams or e-mail at
*lagog6@ulaval.ca*.

47
demo-notebook.aux Normal file
View File

@ -0,0 +1,47 @@
\relax
\providecommand\hyper@newdestlabel[2]{}
\providecommand\HyperFirstAtBeginDocument{\AtBeginDocument}
\HyperFirstAtBeginDocument{\ifx\hyper@anchor\@undefined
\global\let\oldcontentsline\contentsline
\gdef\contentsline#1#2#3#4{\oldcontentsline{#1}{#2}{#3}}
\global\let\oldnewlabel\newlabel
\gdef\newlabel#1#2{\newlabelxx{#1}#2}
\gdef\newlabelxx#1#2#3#4#5#6{\oldnewlabel{#1}{{#2}{#3}}}
\AtEndDocument{\ifx\hyper@anchor\@undefined
\let\contentsline\oldcontentsline
\let\newlabel\oldnewlabel
\fi}
\fi}
\global\let\hyper@last\relax
\gdef\HyperFirstAtBeginDocument#1{#1}
\providecommand\HyField@AuxAddToFields[1]{}
\providecommand\HyField@AuxAddToCoFields[2]{}
\@writefile{toc}{\contentsline {section}{How to get and run this files on your own computer:}{1}{section*.1}\protected@file@percent }
\newlabel{how-to-get-and-run-this-files-on-your-own-computer}{{}{1}{How to get and run this files on your own computer:}{section*.1}{}}
\@writefile{toc}{\contentsline {section}{Setup}{1}{section*.2}\protected@file@percent }
\newlabel{setup}{{}{1}{Setup}{section*.2}{}}
\@writefile{toc}{\contentsline {section}{Importing data}{1}{section*.3}\protected@file@percent }
\newlabel{importing-data}{{}{1}{Importing data}{section*.3}{}}
\@writefile{toc}{\contentsline {section}{Checking structure of data}{2}{section*.4}\protected@file@percent }
\newlabel{checking-structure-of-data}{{}{2}{Checking structure of data}{section*.4}{}}
\@writefile{toc}{\contentsline {section}{Standardization}{3}{section*.5}\protected@file@percent }
\newlabel{standardization}{{}{3}{Standardization}{section*.5}{}}
\@writefile{toc}{\contentsline {section}{Inspecting data integrity}{4}{section*.6}\protected@file@percent }
\newlabel{inspecting-data-integrity}{{}{4}{Inspecting data integrity}{section*.6}{}}
\@writefile{toc}{\contentsline {subsection}{Population plot}{4}{section*.7}\protected@file@percent }
\newlabel{population-plot}{{}{4}{Population plot}{section*.7}{}}
\@writefile{toc}{\contentsline {subsubsection}{Population plot of farm A}{4}{section*.8}\protected@file@percent }
\newlabel{population-plot-of-farm-a}{{}{4}{Population plot of farm A}{section*.8}{}}
\@writefile{toc}{\contentsline {subsubsection}{Population plot of farm B}{4}{section*.9}\protected@file@percent }
\newlabel{population-plot-of-farm-b}{{}{4}{Population plot of farm B}{section*.9}{}}
\@writefile{toc}{\contentsline {subsection}{Visualizing visits to the feeder}{5}{section*.10}\protected@file@percent }
\newlabel{visualizing-visits-to-the-feeder}{{}{5}{Visualizing visits to the feeder}{section*.10}{}}
\@writefile{toc}{\contentsline {subsection}{A birdseye view of all the data for a station}{7}{section*.11}\protected@file@percent }
\newlabel{a-birdseye-view-of-all-the-data-for-a-station}{{}{7}{A birdseye view of all the data for a station}{section*.11}{}}
\@writefile{toc}{\contentsline {section}{Building network visualizations and analyisis}{8}{section*.12}\protected@file@percent }
\newlabel{building-network-visualizations-and-analyisis}{{}{8}{Building network visualizations and analyisis}{section*.12}{}}
\@writefile{toc}{\contentsline {subsection}{Building the igraph objects and plotting}{8}{section*.13}\protected@file@percent }
\newlabel{building-the-igraph-objects-and-plotting}{{}{8}{Building the igraph objects and plotting}{section*.13}{}}
\@writefile{toc}{\contentsline {subsection}{Making summarizations based on the network data}{9}{section*.14}\protected@file@percent }
\newlabel{making-summarizations-based-on-the-network-data}{{}{9}{Making summarizations based on the network data}{section*.14}{}}
\gdef \@abspage@last{11}

2188
demo-notebook.nb.html Normal file

File diff suppressed because one or more lines are too long

14
demo-notebook.out Normal file
View File

@ -0,0 +1,14 @@
\BOOKMARK [1][-]{section*.1}{\376\377\000H\000o\000w\000\040\000t\000o\000\040\000g\000e\000t\000\040\000a\000n\000d\000\040\000r\000u\000n\000\040\000t\000h\000i\000s\000\040\000f\000i\000l\000e\000s\000\040\000o\000n\000\040\000y\000o\000u\000r\000\040\000o\000w\000n\000\040\000c\000o\000m\000p\000u\000t\000e\000r\000:}{}% 1
\BOOKMARK [1][-]{section*.2}{\376\377\000S\000e\000t\000u\000p}{}% 2
\BOOKMARK [1][-]{section*.3}{\376\377\000I\000m\000p\000o\000r\000t\000i\000n\000g\000\040\000d\000a\000t\000a}{}% 3
\BOOKMARK [1][-]{section*.4}{\376\377\000C\000h\000e\000c\000k\000i\000n\000g\000\040\000s\000t\000r\000u\000c\000t\000u\000r\000e\000\040\000o\000f\000\040\000d\000a\000t\000a}{}% 4
\BOOKMARK [1][-]{section*.5}{\376\377\000S\000t\000a\000n\000d\000a\000r\000d\000i\000z\000a\000t\000i\000o\000n}{}% 5
\BOOKMARK [1][-]{section*.6}{\376\377\000I\000n\000s\000p\000e\000c\000t\000i\000n\000g\000\040\000d\000a\000t\000a\000\040\000i\000n\000t\000e\000g\000r\000i\000t\000y}{}% 6
\BOOKMARK [2][-]{section*.7}{\376\377\000P\000o\000p\000u\000l\000a\000t\000i\000o\000n\000\040\000p\000l\000o\000t}{section*.6}% 7
\BOOKMARK [3][-]{section*.8}{\376\377\000P\000o\000p\000u\000l\000a\000t\000i\000o\000n\000\040\000p\000l\000o\000t\000\040\000o\000f\000\040\000f\000a\000r\000m\000\040\000A}{section*.7}% 8
\BOOKMARK [3][-]{section*.9}{\376\377\000P\000o\000p\000u\000l\000a\000t\000i\000o\000n\000\040\000p\000l\000o\000t\000\040\000o\000f\000\040\000f\000a\000r\000m\000\040\000B}{section*.7}% 9
\BOOKMARK [2][-]{section*.10}{\376\377\000V\000i\000s\000u\000a\000l\000i\000z\000i\000n\000g\000\040\000v\000i\000s\000i\000t\000s\000\040\000t\000o\000\040\000t\000h\000e\000\040\000f\000e\000e\000d\000e\000r}{section*.6}% 10
\BOOKMARK [2][-]{section*.11}{\376\377\000A\000\040\000b\000i\000r\000d\000s\000e\000y\000e\000\040\000v\000i\000e\000w\000\040\000o\000f\000\040\000a\000l\000l\000\040\000t\000h\000e\000\040\000d\000a\000t\000a\000\040\000f\000o\000r\000\040\000a\000\040\000s\000t\000a\000t\000i\000o\000n}{section*.6}% 11
\BOOKMARK [1][-]{section*.12}{\376\377\000B\000u\000i\000l\000d\000i\000n\000g\000\040\000n\000e\000t\000w\000o\000r\000k\000\040\000v\000i\000s\000u\000a\000l\000i\000z\000a\000t\000i\000o\000n\000s\000\040\000a\000n\000d\000\040\000a\000n\000a\000l\000y\000i\000s\000i\000s}{}% 12
\BOOKMARK [2][-]{section*.13}{\376\377\000B\000u\000i\000l\000d\000i\000n\000g\000\040\000t\000h\000e\000\040\000i\000g\000r\000a\000p\000h\000\040\000o\000b\000j\000e\000c\000t\000s\000\040\000a\000n\000d\000\040\000p\000l\000o\000t\000t\000i\000n\000g}{section*.12}% 13
\BOOKMARK [2][-]{section*.14}{\376\377\000M\000a\000k\000i\000n\000g\000\040\000s\000u\000m\000m\000a\000r\000i\000z\000a\000t\000i\000o\000n\000s\000\040\000b\000a\000s\000e\000d\000\040\000o\000n\000\040\000t\000h\000e\000\040\000n\000e\000t\000w\000o\000r\000k\000\040\000d\000a\000t\000a}{section*.12}% 14

BIN
demo-notebook.pdf Normal file

Binary file not shown.

74
demo.R Normal file
View File

@ -0,0 +1,74 @@
# Setup
source("setup/functions.R")
source("setup/loadlibraries.R")
# Import data ----
# A farm that has one type of feeder ----
farmA <- xlsx_to_dataframe(filename = "data/farm_A_demo.xlsx",
method = "farm_A_11rows")
str(farmA)
# From CDPQ group ----
load("data/farmB.RData")
farmB <- data_alim # renaming
rm(data_alim)
str(farmB)
# Standardizing ----
farmB_standard <- harmonize_feeder_data(farmB,
groupstations = TRUE,
method = "deschambault")
farmA_standard <- harmonize_feeder_data(farmA,
groupstations = TRUE,
method = "farm_A_raw",
remove_filling = TRUE,
remove_na = TRUE)
str(farmA_standard)
str(farmB_standard)
# Inspecting data integrity
populationPlot(farmA_standard)
populationPlot(farmB_standard)
visitPlotsDay(farmA_standard,
thedate = "2021-06-03",
thestation = 11,
singlestrip = FALSE)
inspectDay(farmA_standard, 11)
# Make network plots ----
farmA_list <- makeAllStationsPerdate(farmA_standard, domerge = "F")
farmA_pairs <- makePairsPerStation(farmA_list, mythreshold = 5) # TAKES A LONG TIME
farmA_network <- makeIGraphObjects(farmA_pairs, directed = T)
plot(farmA_network[["12"]][["2021-05-20"]])
# Edge density plot ----
getmetheplot_pliz(site = "Farm A",
df = farmA_standard,
thestation = 12)
getmetheplot_pliz_but_corrected_this_time(site = "Farm A",
df = farmA_standard,
thestation = 5)

View File

@ -0,0 +1,13 @@
Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

756
setup/functions.R Normal file
View File

@ -0,0 +1,756 @@
# A. For application in unlisted dataframe (or each element of a list of df)----
calcGapDuration <- function(hourIn, HourOutOfPrevious, units = 'secs'){
difftime(hourIn, dplyr::lag(HourOutOfPrevious), units=units)
}
get_pairs <- function(df,threshold){
output <- df %>%
mutate(displacer = id,
displaced = lag(id)
) %>%
filter(gap_duration < threshold)
return(output)
}
graph_by_list_element <- function(df, isdirected){
thegraph <- df %>% select(displacer, displaced) %>%
graph_from_data_frame(., directed = isdirected)
return(thegraph)
}
gapdurationlist <- function(thelist) {
mylist <- thelist
for (item in 1:length(mylist)) {
unlisted <- mylist[[item]]
unlisted$gap_duration <- calcGapDuration(unlisted$hour.in, unlisted$hour.out)
mylist[[item]] <- unlisted
}
return(mylist)
}
createConsData <- function(df, Mode) {
consumption.ind.daily <- df %>%
ungroup() %>%
filter(id != "FILLING") %>%
group_by(Date, id) %>%
summarise(
daily.cons.ind = sum(cons),
daily.bouts = n(),
station = unique(station)
) %>%
mutate(avg.intake.per.bout = daily.cons.ind / daily.bouts) %>%
ungroup() %>% group_by(station)
# per station
consumption.station.daily <- consumption.ind.daily %>%
ungroup() %>%
group_by(station, Date) %>%
summarise(daily.cons.group = sum(daily.cons.ind), n = length(unique(id))) %>%
mutate(mean.daily.cons = daily.cons.group/ n)
if (Mode == "perstation") {
return(consumption.station.daily)
}
if (Mode == "all") {
# Adding the group consumption to the individual
consumption.all <-
left_join(consumption.ind.daily,
consumption.station.daily,
by = c('station', 'Date')) %>%
mutate(proportion = daily.cons.ind / daily.cons.group) %>%
arrange(station)
return(consumption.all)
}
}
# B. For application in lists of dataframes ----
get_pairs_list <- function(thelist, mythreshold) {
for (item in 1:length(thelist)) {
temp <- thelist[[item]]
temp <- get_pairs(temp,threshold = mythreshold)
thelist[[item]] <- temp
}
return(thelist)
}
createdegreeslist <- function(thelist) {
mylist <- thelist
for (item in 1:length(mylist)) {
unlisted <- createdegrees(mylist[[item]])
mylist[[item]] <- unlisted
}
return(mylist)
}
savetheplot_pliz <- function(thepath, plot, station){
ggsave(file = paste0(thepath, "Edge density - Station ",
unique(dataset_raw$station)[station],
".png"),
plot = plot,
device = "png")
}
makeIGraphObjects <- function(pairDF, directed) {
# outer loop: stations
for (station in 1:length(pairDF)) {
# inner loop: dates
#message("This is station", station)
# the -1 in the length is because there is a weird last element for each station
for (date in 1:(length(pairDF[[station]])-1)) {
#message("This is date", date)
temp <- pairDF[[station]][[date]]
temp <- graph_by_list_element(temp, isdirected = directed)
pairDF[[station]][[date]] <- temp
}
}
return(pairDF)
}
makePairsPerStation <- function(df, mythreshold) {
# outer loop: stations
for (station in 1:length(df)) {
# inner loop: dates
#message("This is station", station)
# the -1 in the length is because there is a weird last element for each station
for (date in 1:(length(df[[station]])-1)) {
#message("This is date", date)
temp <- df[[station]][[date]]
temp <- mergevisits(temp)
temp <- get_pairs(temp, threshold = mythreshold)
df[[station]][[date]] <- temp
}
}
return(df)
}
makeIGraphObjects <- function(pairDF, directed) {
# outer loop: stations
for (station in 1:length(pairDF)) {
# inner loop: dates
#message("This is station", station)
# the -1 in the length is because there is a weird last element for each station
for (date in 1:(length(pairDF[[station]])-1)) {
#message("This is date", date)
temp <- pairDF[[station]][[date]]
temp <- graph_by_list_element(temp, isdirected = directed)
pairDF[[station]][[date]] <- temp
}
}
return(pairDF)
}
# TODO: solve artifact when pigs visit feeder across midnight
visitPlotsDay <- function(df, thestation, thedate, singlestrip) {
thedata <- df %>%
filter(Date == ymd(thedate)) %>%
filter(station == thestation)
if(singlestrip == TRUE){
thedata <- mutate(thedata, visit = 0)
ggplot(transform(thedata, y=order(id, hour.in)), aes(
x = hour.in,
xend = hour.out,
y = visit,
yend = visit,
col = id
)) + geom_segment(size=20)+
scale_y_discrete(breaks=NULL)
} else {
ggplot(transform(thedata, y=order(id, hour.in)), aes(
x = hour.in,
xend = hour.out,
y = id,
yend = id,
col = id
)) + geom_segment(size=5)+
scale_y_discrete(breaks=NULL)
}
}
inspectDay <- function(df, thestation){
s_x <- df[df$station == thestation,] # filtrando station x
s_x$s_time <- hour(s_x$hour.in) + minute(s_x$hour.in)/60
p1 <- ggplot(s_x, aes(x = Date, y = s_time))+
geom_point(alpha = 0.5) +geom_jitter()
p2 <- populationPlot(s_x)
grid.arrange(p1, p2, nrow = 2)
}
# C. For application in a graph object ----
createdegrees <- function(thegraph) {
vector_length <- length(V(thegraph))
if(vector_length > 0) {
V(thegraph)$label <- V(thegraph)$name
V(thegraph)$degree <- degree(thegraph)}
return(thegraph)
}
makeEdgeDensities <- function(df_net) {
# outer loop: stations
for (station in 1:length(df_net)) {
# inner loop: dates
#message("This is station", station)
# the -1 in the length is because there is a weird last element for each station
for (date in 1:(length(df_net[[station]])-1)) {
#message("This is date", date)
temp <- df_net[[station]][[date]]
temp <- edge_density(temp)
df_net[[station]][[date]] <- temp
}
}
return(unlist2d(df_net))
}
#used in Redemption site
savetheplot_pliz.2 <- function(thepath, myplot, station){
ggsave(file = paste0(thepath, "Edge density - Station ",
station,
".png"),
plot = myplot,
device = "png")
}
# D. Super functions to rule them all ----
### Getmetheplot, a quick method to output degree density
getmetheplot_pliz <- function(site, df, thestation) {
df_s_n <- df[df$station == thestation,] %>%
arrange(Date, hour.in)
dataset_perday <- df_s_n %>%
ungroup() %>% # remove station grouping
group_by(Date) %>%
mutate(date_ymd = Date) %>%
group_map(~ mutate(.x,
gap_duration = calcGapDuration(hour.in, hour.out))) %>%
setNames(unique(sort(df_s_n$Date))) # this line preserves station number!
alldates <- names(dataset_perday)
dataset_perday.merged <- lapply(dataset_perday, function(i) mergevisits(i))
dataset_perday.merged.gd <- gapdurationlist(dataset_perday.merged)
pairvisits_list <- get_pairs_list(dataset_perday.merged.gd, mythreshold = 20)
graph_list <- lapply(pairvisits_list, function(i) graph_by_list_element(i,T))
ggg_simple <- lapply(graph_list, function(i) simplify(i))
ggg_simple <- createdegreeslist(ggg_simple)
ed_bydate <- sapply(ggg_simple, function(i) edge_density(i)) %>%
rbind(names(dataset_perday)) %>%
as.data.frame() %>%
transpose() %>%
rename(date = V2,
edgeDensity = V1)
plot <- ggplot(ed_bydate, aes(x = as.Date(date), y = as.numeric(edgeDensity))) + geom_point() +
scale_x_date(date_breaks = "2 weeks") +
ylab("Edge density") +
xlab("Date") +
ylim(0,1)+
ggtitle(label = "Edge density across time",
subtitle = paste(site,", Station", thestation))
return(plot)
}
# F.Importing from excel file ----
xlsx_to_dataframe <- function(filename, method = "farm_A_15rows"){
if(method == "farm_A_11rows"){
df <- read_excel(filename,
col_types = c("date", "text", "skip",
"numeric", "numeric", "numeric",
"numeric", "numeric", "text", "text",
"text"))
}
if(method == "farm_A_15rows"){
df <- read_excel(filename,
col_types = c("date", "text", "numeric",
"numeric", "numeric", "numeric",
"numeric", "date", "date", "date",
"numeric", "numeric", "text", "text",
"date"))
}
return(df)
}
# New version of harmonize 2022-07-12
# handles hours in and out and strings first, then converts to posix
# catches station or Station
harmonize_feeder_data <- function(df,
groupstations = T,
method = "farm_A_clean",
remove_filling = T,
remove_na = T) {
if(method == "farm_A_clean"){
if(names(df)[3] == "Station"){
names(df)[3] = "station"
}
dataset_raw <- df %>%
mutate(
id = as.factor(Tatouage),
station = as.factor(station), # assumes station is column 4
tempdate = as.character(as.Date(Date, origin = "1899-12-30")),
hour.in = paste(tempdate,as.character(hms::as_hms(hredeb))) %>%
as.POSIXct(),
hour.out = paste(tempdate,as.character(hms::as_hms(hrefin)))%>%
as.POSIXct(),
visit_dur_secs = duree1 %>%
substr(12, 19) %>%
as.difftime()) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(method == "farm_A_raw"){
# catches station in uppercase
if(names(df)[3] == "Station"){
names(df)[3] = "station"
}
dataset_raw <- df %>%
mutate(
id = as.factor(Tatouage),
station = as.factor(station),
tempdate = as.character(as.Date(Date, origin = "1899-12-30")),
hour.in = hredeb %>%
paste(tempdate,.) %>%
as.POSIXct(tz = ""),
hour.out = hrefin %>%
paste(tempdate,.) %>%
as.POSIXct(tz = "")
)
dataset_raw <- dataset_raw %>%
mutate(visit_dur_secs = as.difftime(hour.out - hour.in)) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(method == "deschambault"){
dataset_raw <- df %>%
mutate(
Date = Date_fin,
id = as.factor(Animal),
station = as.factor(Parc),
pdsdeb = Poids_aliment_debut,
pdsfin = Poids_aliment_fin,
cons = Qte_aliment,
tempdate = as.character(as.Date(Date_fin)),
hour.in = as.character(chron::times(Tdebut)),
hour.in = as.POSIXct(paste(tempdate,hour.in), tz=""),
hour.out = as.character(chron::times(Tfin)),
hour.out = as.POSIXct(paste(tempdate,hour.out), tz=""),
visit_dur_secs = as.difftime(Duree_s, units = "secs")
) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(remove_filling == T){
dataset_raw <- filter(dataset_raw, id != "FILLING")
}
if(remove_na == T){
dataset_raw <- drop_na(dataset_raw)
}
if(groupstations == T){
dataset_raw %>% # dropping NA values
group_by(station) # test 2022-02-26
}
return(dataset_raw)
} # end harmonize()
# old version of harmonize. does not work well with dates
harmonize_feeder_data_ <- function(df,
groupstations = T,
method = "farm_A_clean",
remove_filling = T,
remove_na = T) {
if(method == "farm_A_clean"){
dataset_raw <- df %>%
mutate(
id = as.factor(Tatouage),
station = as.factor(station),
tempdate = as.character(as.Date(Date, origin = "1899-12-30")),
hour.in = as.character(chron::times(hredeb)),
hour.in = as.POSIXct(paste(tempdate,hour.in), tz=""),
hour.out = as.character(chron::times(hrefin)),
hour.out = as.POSIXct(paste(tempdate,hour.out), tz=""),
visit_dur_secs = as.difftime(duree1)) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(method == "farm_A_raw"){
dataset_raw <- df %>%
mutate(
id = as.factor(Tatouage),
station = as.factor(station),
tempdate = as.character(as.Date(Date)),
hour.in = as.character(chron::times(hredeb)),
hour.in = as.POSIXct(paste(tempdate,hour.in), tz=""),
hour.out = as.character(chron::times(hrefin)),
hour.out = as.POSIXct(paste(tempdate,hour.out), tz=""))
dataset_raw <- dataset_raw %>%
mutate(visit_dur_secs = as.difftime(hour.out - hour.in)) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(method == "deschambault"){
dataset_raw <- df %>%
mutate(
Date = Date_fin,
id = as.factor(Animal),
station = as.factor(Parc),
pdsdeb = Poids_aliment_debut,
pdsfin = Poids_aliment_fin,
cons = Qte_aliment,
tempdate = as.character(as.Date(Date_fin)),
hour.in = as.character(chron::times(Tdebut)),
hour.in = as.POSIXct(paste(tempdate,hour.in), tz=""),
hour.out = as.character(chron::times(Tfin)),
hour.out = as.POSIXct(paste(tempdate,hour.out), tz=""),
visit_dur_secs = as.difftime(Duree_s, units = "secs")
) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(remove_filling == T){
dataset_raw <- filter(dataset_raw, id != "FILLING")
}
if(remove_na == T){
dataset_raw <- drop_na(dataset_raw)
}
if(groupstations == T){
dataset_raw %>% # dropping NA values
group_by(station) # test 2022-02-26
}
return(dataset_raw)
} # end harmonize()
# deprecated
standardize <- function(df,
groupstations = T,
method = "farm_A_clean",
remove_filling = T,
remove_na = T) {
if(method == "farm_A_clean"){
dataset_raw <- df %>%
mutate(
id = as.factor(Tatouage),
station = as.factor(Station),
tempdate = as.character(as.Date(Date)),
hour.in = as.character(chron::times(hredeb)),
hour.in = as.POSIXct(paste(tempdate,hour.in), tz=""),
hour.out = as.character(chron::times(hrefin)),
hour.out = as.POSIXct(paste(tempdate,hour.out), tz=""),
visit_dur_secs = as.difftime(duree1)) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(method == "farm_A_raw"){
dataset_raw <- df %>%
mutate(
id = as.factor(Tatouage),
station = as.factor(station),
tempdate = as.character(as.Date(Date)),
hour.in = as.character(chron::times(hredeb)),
hour.in = as.POSIXct(paste(tempdate,hour.in), tz=""),
hour.out = as.character(chron::times(hrefin)),
hour.out = as.POSIXct(paste(tempdate,hour.out), tz=""))
dataset_raw <- dataset_raw %>%
mutate(visit_dur_secs = as.difftime(hour.out - hour.in)) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(method == "deschambault"){
dataset_raw <- df %>%
mutate(
Date = Date_fin,
id = as.factor(Animal),
station = as.factor(Parc),
pdsdeb = Poids_aliment_debut,
pdsfin = Poids_aliment_fin,
cons = Qte_aliment,
tempdate = as.character(as.Date(Date_fin)),
hour.in = as.character(chron::times(Tdebut)),
hour.in = as.POSIXct(paste(tempdate,hour.in), tz=""),
hour.out = as.character(chron::times(Tfin)),
hour.out = as.POSIXct(paste(tempdate,hour.out), tz=""),
visit_dur_secs = as.difftime(Duree_s, units = "secs")
) %>%
select(Date, id, station, pdsdeb, pdsfin, cons, hour.in, hour.out, visit_dur_secs)
}
if(remove_filling == T){
dataset_raw <- filter(dataset_raw, id != "FILLING")
}
if(remove_na == T){
dataset_raw <- drop_na(dataset_raw)
}
if(groupstations == T){
dataset_raw %>% # dropping NA values
group_by(station) # test 2022-02-26
}
return(dataset_raw)
}
# Same as previous but gives a nice dataframe ----
makeListPerStation <- function(df, thestation, doMerge = T) {
df_s_n <- df %>%
filter(station == thestation) %>%
arrange(Date, hour.in)
dataset_perday <- df_s_n %>%
ungroup() %>% # remove station grouping
group_by(Date) %>%
mutate(date_ymd = Date) %>%
group_map(~ mutate(.x,
gap_duration = calcGapDuration(hour.in, hour.out))) %>%
setNames(unique(sort(df_s_n$Date)))# this line preserves station number!
# if doMerge is true, then successive visits for the same pig are merged.
if(doMerge == "T"){
dataset_perday.merged <- lapply(dataset_perday, function(i) mergevisits(i))
dataset_perday.merged.gd <- gapdurationlist(dataset_perday.merged)
return(dataset_perday.merged.gd)
} else {
dataset_perday$id <- as.character(dataset_perday$id)
return(dataset_perday)
}
}
makeAllStationsPerdate <- function(df, domerge){
output <- list()
stations <- as.numeric(as.character(unique(df$station)))
for (i in seq_along(stations)) {
#message(c("Creating item ", i))
output[[i]] <- makeListPerStation(df, stations[i], doMerge = domerge)
}
names(output) <- stations
return(output)
}
mergevisits <- function(df) {
output <- df %>%
filter(!is.na(gap_duration)) %>%
arrange(hour.in) %>%
mutate(id2 = rleid(id)) %>%
group_by(id2) %>%
summarise(id = first(id),
hour.in = first(hour.in),
hour.out = last(hour.out),
myDate = min(date_ymd)
)%>%
ungroup() %>%
select(id, myDate, hour.in, hour.out)
output$gap_duration <- calcGapDuration(output$hour.in, output$hour.out)
return(output)
}
# Variant that corrects per feeding bout ----
getmetheplot_pliz_but_corrected_this_time <- function(site, df, thestation) {
df_s_n <- df[df$station == thestation,] %>%
arrange(Date, hour.in)
dataset_perday <- df_s_n %>%
ungroup() %>% # remove station grouping
group_by(Date) %>%
mutate(date_ymd = Date) %>%
group_map(~ mutate(.x,
gap_duration = calcGapDuration(hour.in, hour.out))) %>%
setNames(unique(sort(df_s_n$Date))) # this line preserves station number!
alldates <- names(dataset_perday)
# Getting bouts per day to correct it later
boutsperday <- sapply(dataset_perday, function(i) nrow(i))
dataset_perday.merged <- lapply(dataset_perday, function(i) mergevisits(i))
dataset_perday.merged.gd <- gapdurationlist(dataset_perday.merged)
pairvisits_list <- get_pairs_list(dataset_perday.merged.gd, mythreshold = 20)
graph_list <- lapply(pairvisits_list, function(i) graph_by_list_element(i,T))
ggg_simple <- lapply(graph_list, function(i) simplify(i))
ggg_simple <- createdegreeslist(ggg_simple)
ed_bydate <- sapply(ggg_simple, function(i) edge_density(i)) %>%
rbind(names(dataset_perday)) %>%
as.data.frame() %>%
transpose() %>%
rename(date = V2,
edgeDensity = V1) %>%
mutate(corrected_ed = as.numeric(edgeDensity) / boutsperday)
plot <- ggplot(ed_bydate, aes(x = as.Date(date), y = as.numeric(corrected_ed))) + geom_point() +
scale_x_date(date_breaks = "2 weeks") +
ylab("Edge density/bouts per day") +
xlab("Date") +
#ylim(0,1)+
ggtitle(label = "Edge density across time",
subtitle = paste("Corrected daily amount of feeding bouts.",site,", Station", thestation))
return(plot)
}
# used in progressreportmarch ----
# DEPRECATED: it did not handle very well the lists, so I remade it in the function
# below
# populationPlot <- function(df_list){
# k <- data.frame()
#
# for (station_n in seq_along(df_list)) {
# f <- sapply(df_list[[station_n]], function(i) summarize(i, n = length(unique(id)))) %>%
# collapse::unlist2d() %>%
# reshape2::melt()
#
# l <- as.data.frame(cbind(thedate = names(df_list[[station_n]]), n = f$value))
# l$station_n <- names(df_list)[station_n]
# k <- rbind(k,l)
# }
#
#
# k$thedate <- as.Date(k$thedate)
# k$n <- as.numeric(k$n)
# k$station_n <- as.factor(k$station_n)
#
#
# ggplot(k, aes(x = thedate, y = n)) + geom_point(stat = "identity") +
# scale_x_date(date_breaks = "1 month") + facet_wrap(~station_n)
#
# }
populationPlot <- function(df_raw) {
populationdaily <- df_raw %>%
ungroup() %>%
filter(id != "FILLING") %>%
group_by(station, Date) %>%
summarise(n = length(unique(id)))
populationdaily$Date <- as.Date(populationdaily$Date)
# fills missing dates by 0
populationdaily <- populationdaily %>% pad %>% fill_by_value(value = 0)
ggplot(populationdaily, aes(x = Date, y = n)) + geom_line(stat = "identity", color ="red") +
scale_x_date(date_breaks = "1 month") + facet_wrap(~station) +
theme(axis.text.x = element_text(angle = 90,
vjust = 0.5,
hjust = 1)) +
ylim(0, max(populationdaily$n))
}
plotConsData <- function(df, method ,thefarm) {
if(method == "group"){
theplot <- ggplot(data = SCH_cons_ps, aes(x = as.Date(Date))) +
geom_line(colour = "red", aes(y = daily.cons.group)) +
scale_x_date(date_breaks = "1 month") +
facet_wrap(~station) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
ylab("total daily consumption (kg)") +
ggtitle(paste0("Whole group daily total consumption, ", thefarm," farm"))
return(theplot)
}
if(method == "mean"){
theplot <- ggplot(data = SCH_cons_ps, aes(x = as.Date(Date))) +
geom_line(colour = "red", aes(y = mean.daily.cons)) +
scale_x_date(date_breaks = "1 month") +
facet_wrap(~station) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
ylab("mean daily consumption/pig (kg)") +
ggtitle(paste0("Mean group consumption, ", thefarm," farm"))
return(theplot)
}
}

14
setup/loadlibraries.R Normal file
View File

@ -0,0 +1,14 @@
library(openxlsx)
library(tidyverse)
library(lubridate)
library(data.table)
library(igraph)
library(ggplot2)
library(stringr)
library(collapse)
library(readxl)
library(collapse)
library(ggthemr)
library(padr)
library(egg)
library(chron)