252 lines
8.8 KiB
Plaintext
252 lines
8.8 KiB
Plaintext
---
|
|
title: "Demo for the presentation 'Getting insights from automatic feeder data'"
|
|
author:
|
|
- Luis Gonzalez-Gracia (lagog6@ulaval.ca)
|
|
- Twitter - @gonzaluisandres
|
|
date: "`r format(Sys.time(), '%d %B %Y')`"
|
|
tags: [social network analysis, animal science]
|
|
abstract:
|
|
This short R Notebook will show the workflow as presented in my presentation today.
|
|
|
|
We will grab to different sources of data, and do some wrangling, visualization,
|
|
and analysis.
|
|
|
|
The script can also be accessed in the `demo.R ` file, also in this repository.
|
|
output:
|
|
pdf_document: default
|
|
html_notebook: default
|
|
editor_options:
|
|
chunk_output_type: inline
|
|
---
|
|
|
|
|
|
|
|
# How to get and run this files on your own computer:
|
|
![Some of you might be too young to get the reference](img/zoolanderfilesincomputer.jpg)
|
|
|
|
You can run and tweak this code by cloning my repository. If you do not know what
|
|
a repository is, I recommend you to begin reading about it if you want to work
|
|
collaboratively with other people.
|
|
|
|
1. Install git https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
|
|
2. Go to the folder you want the folder with the files to be located.
|
|
3. Open that folder in the terminal and type
|
|
|
|
```{bash}
|
|
git clone https://git.disroot.org/luangonz/feeder-analytics-demo.git
|
|
```
|
|
|
|
Git will automatically create a folder and download all the files required to run
|
|
this code.
|
|
|
|
# Setup
|
|
|
|
Since the idea of this project is to be as modular and automated as possible,
|
|
I created a set of custom functions that do the heavy lifting in the background,
|
|
without cluttering too much the script file. This has some advantages and disadvantages,
|
|
since although it is easier to read and the workflow is easy to follow along, if
|
|
issues appear, then debugging and following the function that originated the issue
|
|
is more time-consuming.
|
|
|
|
The folder is organized in the main project files and two folders:
|
|
|
|
- The `data` folder holds the data files we will be loading into the environment.
|
|
- The `setup` folder (used in this section) holds two key files:
|
|
- `functions.R` that holds all the custom functions
|
|
- `loadlibraries.R` that has all the libraries needed for any script file that
|
|
uses the same set of libraries (so you do not need to copy and paste it in every
|
|
file of the project)
|
|
|
|
When we run the `source()` function at this step, we are loading both the libraries
|
|
and the functions that we are going to use throughout the demo.
|
|
|
|
```{r setup, echo=TRUE, message=FALSE, warning=FALSE}
|
|
source("setup/functions.R")
|
|
source("setup/loadlibraries.R")
|
|
```
|
|
|
|
|
|
# Importing data
|
|
|
|
First step in the process is to load the excel file in the environment.
|
|
There are packages that can natively import excel files that are very straightforward,
|
|
but some of them do not handle the Date information properly. For this I have a custom function that takes into account some common issues and with the `method =` parameter, I
|
|
can fine-tune the importing method according to where the data comes from.
|
|
```{r}
|
|
farmA <- xlsx_to_dataframe(filename = "data/farm_A_demo.xlsx", # selects the file
|
|
method = "farm_A_11rows") # selects the farm A method
|
|
```
|
|
|
|
Lucky for us, farm B has its data directly formatted in the RData format, which
|
|
helps a lot in the importing process. A simple `load()` function and the data is there.
|
|
|
|
|
|
|
|
```{r}
|
|
load("data/farmB.RData") # load the file into the environment
|
|
|
|
farmB <- data_alim # renaming to make it a better understandable filename
|
|
rm(data_alim) # removing the original imported dataframe
|
|
```
|
|
|
|
# Checking structure of data
|
|
|
|
We will look at the raw imported data as it comes from the import procedure.
|
|
|
|
```{r}
|
|
str(farmA)
|
|
```
|
|
|
|
We see some issues that are of concern, for example time of start of visit is not
|
|
in a proper date/time type, but it is only a character. Lets check Farm B:
|
|
```{r}
|
|
str(farmB)
|
|
```
|
|
|
|
Similar issues with the time and also the titles of the variables between these two
|
|
are different, making it hard to work with them with just a piece of code. So the
|
|
strategy is to take this (or any) kind of dataframe that we work with, and standardize
|
|
it to a format that any of the next functions can work with. Next step then, is
|
|
standardization.
|
|
|
|
# Standardization
|
|
|
|
The `harmonize_feeder_data()` function is a custom function that allows us to
|
|
funnel any kind of source file into a single, homogenous data structure so it
|
|
can be fed into the following functions in the workflow. It has two parameters:
|
|
|
|
- `groupstations`: If TRUE, the station number becomes a group in the dataframe
|
|
(useful for summarizations).
|
|
- `method`: a selector for the method that it will use, according to which source
|
|
the data frame comes from
|
|
- `remove_filling`: if TRUE, it will remove the FILLING events of the feeder (when
|
|
the feeder is filled up).
|
|
- `remove_na`: if TRUE, it will remove unavailable data that might interfere in
|
|
some of the calculations.
|
|
|
|
```{r}
|
|
farmB_standard <- harmonize_feeder_data(farmB,
|
|
groupstations = TRUE,
|
|
method = "deschambault")
|
|
|
|
farmA_standard <- harmonize_feeder_data(farmA,
|
|
groupstations = TRUE,
|
|
method = "farm_A_raw",
|
|
remove_filling = TRUE,
|
|
remove_na = TRUE)
|
|
```
|
|
|
|
We will check the structure again to see if everything is in order:
|
|
|
|
```{r echo=TRUE}
|
|
str(farmA_standard)
|
|
```
|
|
|
|
```{r echo=TRUE}
|
|
str(farmB_standard)
|
|
```
|
|
|
|
With this function we`ve managed to homogenize the data structure so we can move
|
|
on now to our next step.
|
|
|
|
# Inspecting data integrity
|
|
|
|
Well be running some more custom functions to plot valuable data.
|
|
|
|
## Population plot
|
|
|
|
Farm A has 22 different pens. It would be valuable to see if there are any issues
|
|
regarding the population of these pens, for example a quick reduction or increase
|
|
in size or a quick drop due to data loss form the hardware
|
|
|
|
### Population plot of farm A
|
|
```{r message=FALSE, warning=FALSE}
|
|
populationPlot(farmA_standard)
|
|
|
|
```
|
|
|
|
### Population plot of farm B
|
|
```{r}
|
|
populationPlot(farmB_standard)
|
|
```
|
|
|
|
We can evidence with these plots that there are some pen size fluctuations and
|
|
some data loss in some of the periods. These losses will need to be taken into
|
|
account during the analyses.
|
|
|
|
## Visualizing visits to the feeder
|
|
|
|
We can visualize a timeline of visits to the feeder for any station or day
|
|
with this custom function, `visitPlotsDay()`:
|
|
```{r}
|
|
visitPlotsDay(farmA_standard,
|
|
thedate = "2021-06-03",
|
|
thestation = 11,
|
|
singlestrip = FALSE)
|
|
```
|
|
|
|
With the last plot, we have one line per pig, but sometimes seeing all the visit
|
|
in a single line is useful. This is what the `singlestrip` parameter is useful for.
|
|
|
|
```{r}
|
|
visitPlotsDay(farmA_standard,
|
|
thedate = "2021-06-03",
|
|
thestation = 11,
|
|
singlestrip = TRUE)
|
|
```
|
|
|
|
## A birdseye view of all the data for a station
|
|
|
|
The `inspectDay` function can show the visits to a feeder for the whole period,
|
|
in a single plot. It can also show a population plot similar to the previous section.
|
|
|
|
```{r}
|
|
inspectDay(farmA_standard, thestation = 11)
|
|
```
|
|
|
|
|
|
# Building network visualizations and analyisis
|
|
|
|
## Building the igraph objects and plotting
|
|
|
|
The following steps succesively converts the data into the network objects of the
|
|
`igraph` package.
|
|
|
|
```{r}
|
|
farmA_list <- makeAllStationsPerdate(farmA_standard, domerge = "F")
|
|
|
|
farmA_pairs <- makePairsPerStation(farmA_list, mythreshold = 5) # TAKES A LONG TIME
|
|
|
|
farmA_network <- makeIGraphObjects(farmA_pairs, directed = T)
|
|
|
|
plot(farmA_network[["12"]][["2021-05-20"]])
|
|
```
|
|
|
|
## Making summarizations based on the network data
|
|
|
|
The following steps will analyze how a whole-network parameter, the [Network Density](https://methods.sagepub.com/reference/the-sage-encyclopedia-of-educational-research-measurement-and-evaluation/i14550.xml) progresses through time. It looks that there is a downward
|
|
trend in the group we are studying.
|
|
|
|
```{r}
|
|
getmetheplot_pliz(site = "Farm A",
|
|
df = farmA_standard,
|
|
thestation = 12)
|
|
```
|
|
|
|
The reason why this trend occurs is not clear, but it could be
|
|
that these animals are learning to avoid each other. Another possible explanation
|
|
is that the animals are going less to the feeder as they grow, and thus there
|
|
is less of a chance that the animals can interact with each other. The
|
|
`getmetheplot_pliz_but_corrected_this_time()` function corrects the network
|
|
density by the times the animals visit the feeder.
|
|
|
|
```{r}
|
|
getmetheplot_pliz_but_corrected_this_time(site = "Farm A",
|
|
df = farmA_standard,
|
|
thestation = 5)
|
|
```
|
|
Even with this correction, we still see a downward trend.
|
|
|
|
Thanks for reading all of this, you can reach me at Microsoft Teams or e-mail at
|
|
*lagog6@ulaval.ca*.
|