diff --git a/Detecting_Bitcoin_Ransomware.Rmd b/Detecting_Bitcoin_Ransomware.Rmd index 60091ab..3f67712 100644 --- a/Detecting_Bitcoin_Ransomware.Rmd +++ b/Detecting_Bitcoin_Ransomware.Rmd @@ -188,8 +188,6 @@ The original research team downloaded and parsed the entire Bitcoin transaction 5. Visualize clustering to analyze results further. 6. Generate confusion matrix to quantify results. ---- - ## Data Analysis ### Hardware Specification @@ -621,8 +619,6 @@ It appears that, although the `r selected_features[1]` distribution for ransomw After visually and numerically exploring the data, it becomes clear what the challenge is. Ransomware-related addresses are very sparse, comprising `r ransomprop*100`% of all addresses. This small percentage is also further classified into 28 groups. Perhaps the original paper was a overly ambitious in trying to categorize all the addresses into 29 categories, including the vastly prevalent *white* addresses. To simplify our approach, we will categorize the addresses in a binary way: as either *white* or *black*, where *black* signifies an association with ransomware transactions. Asking this as a "ransomware or not-ransomware" question allows for application of methods that have been shown to be impractical otherwise. ---- - ## Modeling approach Akcora, et al. applied a Random Forest approach to the data; however "Despite improving data scarcity, [...] tree based methods (i.e., Random Forest and XGBoost) fail to predict any ransomware family".$^{[3]}$ Considering all ransomware addresses as belonging to a single group may help to improve the predictive power of such methods, making Random Forest worth another try. @@ -1133,8 +1129,6 @@ add.cluster.boundaries(som_model2, som.cluster$cluster) ``` ---- - ## Results & Performance ### Results @@ -1163,9 +1157,8 @@ add.cluster.boundaries(som_model2, som.cluster$cluster) - RAM: DDR4 8080MB (8 GB) This is a single board computer / development board, which runs the same software as the others (ported to `aarch64`), except for Rstudio. It is of personal interest to benchmark a modern 64-bit ARM processor in addition to the two Intel CPUs. The script runs in about 860 seconds on this platform, nearly half of that for the Atom processor above. Still not fast enough to analyze each block in real time, but a significant improvement given the low power usage of such processors. - ---- - + + ## Summary ### Comparison to results from original paper diff --git a/Detecting_Bitcoin_Ransomware.pdf b/Detecting_Bitcoin_Ransomware.pdf index f7c713c..8a7af4f 100644 Binary files a/Detecting_Bitcoin_Ransomware.pdf and b/Detecting_Bitcoin_Ransomware.pdf differ