Final edit, fixed pagebreaks.

This commit is contained in:
shelldweller 2021-11-13 21:38:06 -07:00
parent f6c46cd79d
commit 5f6f857031
2 changed files with 2 additions and 9 deletions

View File

@ -188,8 +188,6 @@ The original research team downloaded and parsed the entire Bitcoin transaction
5. Visualize clustering to analyze results further.
6. Generate confusion matrix to quantify results.
---
## Data Analysis
### Hardware Specification
@ -621,8 +619,6 @@ It appears that, although the `r selected_features[1]` distribution for ransomw
After visually and numerically exploring the data, it becomes clear what the challenge is. Ransomware-related addresses are very sparse, comprising `r ransomprop*100`% of all addresses. This small percentage is also further classified into 28 groups. Perhaps the original paper was a overly ambitious in trying to categorize all the addresses into 29 categories, including the vastly prevalent *white* addresses. To simplify our approach, we will categorize the addresses in a binary way: as either *white* or *black*, where *black* signifies an association with ransomware transactions. Asking this as a "ransomware or not-ransomware" question allows for application of methods that have been shown to be impractical otherwise.
---
## Modeling approach
Akcora, et al. applied a Random Forest approach to the data; however "Despite improving data scarcity, [...] tree based methods (i.e., Random Forest and XGBoost) fail to predict any ransomware family".$^{[3]}$ Considering all ransomware addresses as belonging to a single group may help to improve the predictive power of such methods, making Random Forest worth another try.
@ -1133,8 +1129,6 @@ add.cluster.boundaries(som_model2, som.cluster$cluster)
```
---
## Results & Performance
### Results
@ -1163,9 +1157,8 @@ add.cluster.boundaries(som_model2, som.cluster$cluster)
- RAM: DDR4 8080MB (8 GB)
This is a single board computer / development board, which runs the same software as the others (ported to `aarch64`), except for Rstudio. It is of personal interest to benchmark a modern 64-bit ARM processor in addition to the two Intel CPUs. The script runs in about 860 seconds on this platform, nearly half of that for the Atom processor above. Still not fast enough to analyze each block in real time, but a significant improvement given the low power usage of such processors.
---
## Summary
### Comparison to results from original paper

Binary file not shown.