craplog-CLI/README.md

703 lines
23 KiB
Markdown
Raw Normal View History

2022-06-11 23:32:09 +02:00
# Craplog CLI
Parse Apache2 logs to create statistics
2022-06-11 23:24:20 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2023-02-12 00:25:03 +01:00
![logo](https://git.disroot.org/elB4RTO/screenshots/raw/branch/main/Craplog/craplogo.png)
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
## Table of contents
- [Overview](#overview)
- [Installation and execution](#installation-and-execution)
- [Dependencies](#dependencies)
- [Run without installation](#run-without-installation)
- [Run with installation](#run-with-installation)
- [Usage](#usage)
- [Arguments](#arguments)
- [Examples](#examples)
2022-06-18 02:42:37 +02:00
- [Tools examples](#tools-examples)
2022-06-11 23:32:09 +02:00
- [Output control](#output-control)
2022-06-14 03:26:40 +02:00
- [How to configure](#how-to-configure)
2022-06-15 02:12:34 +02:00
- [Crapset](#crapset)
2022-06-14 03:26:40 +02:00
- [Configuration files](#configuration-files)
- [Hardcoded values](#hardcoded-values)
- [How to update](#how-to-update)
2022-06-15 01:58:37 +02:00
- [Crapup](#crapup)
- [Self-service](#self-service)
2022-06-11 23:32:09 +02:00
- [Logs](#logs)
- [Usage control](#usage-control)
- [Log files](#log-files)
- [Logs path](#logs-path)
- [Logs structure](#logs-structure)
- [Statistics](#statistics)
- [Storage](#storage)
- [Examined fields](#examined-fields)
- [Sessions statistics](#sessions-statistics)
- [Global statistics](#global-statistics)
- [Whitelist](#whitelist)
2022-06-14 03:26:40 +02:00
- [View statistics](#view-statistics)
2022-06-15 02:12:34 +02:00
- [Crapview](#crapview)
2022-06-11 23:32:09 +02:00
- [Extra features](#extra-features)
2022-06-14 03:26:40 +02:00
- [Crapstats converter](#crapstats-converter)
2022-06-11 23:32:09 +02:00
- [Final considerations](#final-considerations)
- [Estimated working speed](#estimated-working-speed)
- [Backups](#backups)
- [Contributions](#contributions)
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-11 23:32:09 +02:00
## Overview
Craplog is a tool that takes Apache2 logs in their default form, parses them and creates simple statistics.
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
Welcome to the **command line** version
2023-02-12 00:25:03 +01:00
![screenshot](https://git.disroot.org/elB4RTO/screenshots/raw/branch/main/Craplog/CLI/craplog.png)
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
2022-06-18 02:42:37 +02:00
Searching for something different? Try the [other versions of CRAPLOG](https://git.disroot.org/elB4RTO/CRAPLOG#official-versions).
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-11 23:32:09 +02:00
## Installation and execution
### Dependencies
*None*
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Run without installation
2022-06-15 22:32:46 +02:00
- Download and un-archive this repo
2022-06-25 14:00:12 +02:00
<br/>*or*<br/>
2022-06-18 02:42:37 +02:00
```
git clone https://git.disroot.org/elB4RTO/craplog-CLI.git
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-20 20:40:04 +02:00
- Open a terminal inside "*craplog-CLI-main/*"
2022-06-25 14:00:12 +02:00
<br/>*or*<br/>
2022-06-18 02:42:37 +02:00
```
2022-06-20 20:40:04 +02:00
cd craplog-CLI/
2022-06-18 02:42:37 +02:00
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- Run craplog using python's environment:
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
```
2022-06-20 20:40:04 +02:00
python3 craplog/craplog.py --help
2022-06-18 02:42:37 +02:00
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Run with installation
2022-06-15 22:32:46 +02:00
- Download and un-archive this repo
2022-06-25 14:00:12 +02:00
<br/>*or*<br/>
2022-06-18 02:42:37 +02:00
```
git clone https://git.disroot.org/elB4RTO/craplog-CLI
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
- Open a terminal inside "*craplog-CLI-main*"
2022-06-25 14:00:12 +02:00
<br/>*or*<br/>
2022-06-18 02:42:37 +02:00
```
cd craplog-CLI/
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- Run the installation script:
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
```
chmod +x ./install.sh && ./install.sh
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- You can now run craplog from terminal, as any other application (you don't need to be in craplog's folder):
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
```
craplog --help
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-11 23:32:09 +02:00
## Usage
2022-06-18 02:42:37 +02:00
### Syntax
`craplog [TOOL] {[OPTION] [ARGUMENT]}`
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
### Tools
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
| Tool | Description |
| -----: | :---------- |
2022-06-25 14:00:12 +02:00
| *log* | Craplog: make statistics from the logs<br/>*Implicit, can be omitted* |
2022-06-18 02:42:37 +02:00
| view | Crapview: view your statistics |
| setup | Crapset: configure these tools |
| update | Crapup: check for updates |
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-18 02:42:37 +02:00
2022-06-11 23:32:09 +02:00
### Arguments
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
2022-06-11 23:32:09 +02:00
| Abbr. | Option | Additional | Description |
| :-: | -----------------: | :------------- | :-- |
| -h | --help | | prints an help screen |
| | --examples | | prints usage examples |
| -l | --less | | less output on screen |
| -m | --more | | more output on screen |
| -p | --performance | | prints performances data |
| | --no-colors | | prints text without using colors |
2022-06-15 22:31:15 +02:00
| | --auto-delete | | auto-chooses to delete files/folders |
| | --auto-merge | | auto-chooses to merges sessions having the same date |
2022-06-11 23:32:09 +02:00
| -e | --errors | | make sstatistics of error logs too |
| -eO | --only-errors | | only uses error logs (doesn't parse access logs) |
| -gO | --only-globals | | only updates globals (doesn't store sessions) |
| -gA | --avoid-globals | | does not update global statistics |
| -b | --backup | | stores a backup copy of the original logs file |
| -bT | --backup-tar | | stores the backup as a compressed tar.gz archive |
| -bZ | --backup-zip | | stores the backup as a compressed zip archive |
| -dO | --delete-originals | | deletes the original log files when done |
2022-06-25 14:00:12 +02:00
| | --trash | *&lt;path&gt;* | moves files to Trash instead of remove<br/>*&lt;path&gt;* is optional: if omitted, default will be used |
2022-06-11 23:32:09 +02:00
| | --shred | | uses shred on files instead of remove |
| -P | --logs-path | *&lt;path&gt;* | path of the directory where the logs are located |
2022-06-25 14:00:12 +02:00
| -F | --log-files | *&lt;list&gt;* | list of log files to use (names, NOT paths)<br/>*&lt;list&gt;*: whitespace-separated file names |
| -A | --access-fields | *&lt;list&gt;* | list of fields to use while parsing access logs<br/>*&lt;list&gt;*: whitespace-separated fields |
| -W | --ip-whitelist | *&lt;list&gt;* | doesn't parse log lines from these IPs<br/>*&lt;list&gt;*: whitespace-separated IPs |
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-11 23:32:09 +02:00
2022-06-18 02:42:37 +02:00
### Examples
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- Uses default log files as input, including errors (access logs are used by default). Stores a backup copy the original files as a *tar.gz* compressed archive, without deleting them. Moves files to trash if needed (instead of complete deletion). Global statistics will be updated by default.
2022-06-18 02:42:37 +02:00
```
craplog -e -bT --trash
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- As the above one, but only parses errors (not access logs). Stores a backup copy the original files as a *zip* compressed archive, without deleting them. Shreds files if needed (instead of normal deletion). Global statistics will be updated by default.
2022-06-18 02:42:37 +02:00
```
craplog -eO -bZ --shred
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- Uses user-defined access and/or error logs files from an alternative logs path. Automatically merges sessions having the same date if needed.
2022-06-18 02:42:37 +02:00
```
craplog -e -P /your/logs/path -F file.log.2 file.log.3.gz --auto-merge
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- Uses default log files for both access and error logs. Uses a whitelist for IPs and a selection of which access fields to parse.
2022-06-18 02:42:37 +02:00
```
craplog -e -W ::1 192.168. -A REQ RES
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
- Print more informations on screen, including performance details. Use the default access logs file but only update globals, not sessions.
2022-06-18 02:42:37 +02:00
```
craplog -m -p -gO
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
- Print less informations on screen, including performance details but without using colors. Use the default access and error logs files, but do not update globals. Make a backup copy of the original files used and delete them when done.
```
craplog -l -p --no-colors -e -gA -b -dO
```
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
2022-06-18 02:42:37 +02:00
### Tools examples
2022-06-11 23:32:09 +02:00
2022-06-18 02:42:37 +02:00
***Warning***: *the following syntax is only suited for the usage **with installation**. If you're using Craplog without installing it, you'll have to run the tools individually. Further informations can be found in the relative sections.*
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
2022-06-18 02:42:37 +02:00
- Make a *version check* query.
```
craplog update
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
- View your statistics.
```
craplog view
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
- Set-up Craplog's tools.
```
craplog setup
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-18 02:42:37 +02:00
2022-06-11 23:32:09 +02:00
### Output control
2022-06-25 14:00:12 +02:00
You can control the output on screen, like: quantity of informations printed, performance details and the use of colors.<br/><br/>
2022-06-11 23:32:09 +02:00
2023-02-12 00:25:03 +01:00
![output diffs](https://git.disroot.org/elB4RTO/screenshots/raw/branch/main/Craplog/CLI/output_diff.png)
2022-06-11 23:32:09 +02:00
*Normal output vs Less output*
2022-06-29 23:03:51 +02:00
<br/><br/><br/>
2022-06-11 23:32:09 +02:00
2022-06-14 03:26:40 +02:00
## How to configure
2022-06-18 02:42:37 +02:00
Sometimes is annoying to keep remembering and passing arguments, I know. This is why Craplog gives you the possibility to customize the way it gets ready to do its job.
There's actually more then one way you can customize Craplog's settings: using the [configuration tool](#crapset), editing the [configuration files](#configuration-files) or the [hardcoded values](#hardcoded-values).
2022-06-14 03:26:40 +02:00
2022-06-25 14:00:12 +02:00
<br/>The configuration file will override the hardcoded values and will be overridden by the command-line arguments, so that the configurations **hierarchy** results as follows:
2022-06-14 03:26:40 +02:00
- Hardcoded values
2022-06-15 01:58:37 +02:00
- Configuration files
2022-06-14 03:26:40 +02:00
- Command-line arguments
2022-06-25 14:00:12 +02:00
<br/>You can also **lock** a configuration method to avoid it, like discarding any command-line argument or not reading the configuration files. Further informations can be found while following one of the procedures listed above.
2022-06-14 03:26:40 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-15 01:58:37 +02:00
2022-06-15 02:12:34 +02:00
### Crapset
2022-06-15 01:58:37 +02:00
2022-06-25 14:00:12 +02:00
**Crapset** is an utility to easily and safely customize Craplog.<br/><br/>
2022-06-18 02:42:37 +02:00
With Craplog installed:
```
craplog setup
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-15 01:58:37 +02:00
2022-06-18 02:42:37 +02:00
Without Craplog installed *(from the main folder)*:
```
python3 crapset/crapset.py
```
2022-06-15 01:58:37 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-15 01:58:37 +02:00
### Configuration files
2022-06-25 14:00:12 +02:00
Manually editing the [configuration files](https://git.disroot.org/elB4RTO/craplog-CLI/src/branch/main/crapconf) you need.<br/>
2022-06-15 01:58:37 +02:00
2022-06-18 02:42:37 +02:00
Files can be found inside **craplog-cli/crapconfs/**
2022-06-15 01:58:37 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-15 01:58:37 +02:00
### Hardcoded values
Directly modifying the script's hardcoded variables, to set pre-defined initialization values:
2022-06-20 00:43:56 +02:00
- **Craplog** -> line [**117**](https://git.disroot.org/elB4RTO/craplog-CLI/src/branch/main/craplog/craplog.py#L117) **@** *craplog-CLI/craplog/craplog.py*
- **Crapview** -> line [**21**](https://git.disroot.org/elB4RTO/craplog-CLI/src/branch/main/crapview/crapview.py#L21) **@** *craplog-CLI/cragview/crapview.py*
2022-06-20 20:44:55 +02:00
- **Crapup** -> line [**66**](https://git.disroot.org/elB4RTO/craplog-CLI/src/branch/main/crapup/crapup.py#L66) **@** *craplog-CLI/cragup/crapup.py*
2022-06-20 00:43:56 +02:00
- **Crapset** -> line [**32**](https://git.disroot.org/elB4RTO/craplog-CLI/src/branch/main/crapset/crapset.py#L32) **@** *craplog-CLI/cragset/crapset.py*
2022-06-15 01:58:37 +02:00
2022-06-29 23:03:51 +02:00
<br/><br/><br/>
2022-06-15 01:58:37 +02:00
## How to update
2022-06-18 02:42:37 +02:00
Updates can be checked with the [updater tool](#crapup) or, in alternative, you can always do a [manual update](#self-service).
2022-06-15 01:58:37 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-15 01:58:37 +02:00
### Crapup
2022-06-25 14:00:12 +02:00
**Crapup** allows you to query the updates in two different ways: a simple [version check](#version-check) query, or an effective update through [git pull](#git-pull).<br/><br/>
2022-06-18 02:42:37 +02:00
With Craplog installed:
```
craplog update
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
Without Craplog installed *(from the main folder)*:
```
`python3 crapup/crapup.py
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-15 01:58:37 +02:00
#### Version check
This is the **default** method.
2022-06-18 02:42:37 +02:00
**Crapup** will check for a version-update through a simple GET request to the [version check file](https://git.disroot.org/elB4RTO/craplog-CLI/blob/main/version_check) in this page.
2022-06-15 01:58:37 +02:00
2022-06-25 14:00:12 +02:00
Nothing will be downloaded or updated, it will only queries this repo's version and gives back a response. You'll have to [manually download and apply the changes](#self-service).<br/><br/>
2022-06-15 01:58:37 +02:00
#### Git pull
This is the **suggested** method.
2022-06-18 02:42:37 +02:00
**Crapup** will update your local version by directly fetching this repo. This is the suggested method since it's fast, reliable and easy.
2022-06-15 01:58:37 +02:00
You can also perform this procedure manually if you want, by following the [update with git](#git-update) guide.
2022-06-29 23:03:51 +02:00
<br/><br/>
2022-06-15 01:58:37 +02:00
### Self-service
2022-06-25 14:00:12 +02:00
A self-served update of Craplog can be done in the well-known two ways.<br/><br/>
2022-06-15 01:58:37 +02:00
#### Manual update
2022-06-18 02:42:37 +02:00
To manually update Craplog, please download the new version of this repo and run the [update script](https://git.disroot.org/elB4RTO/craplog-CLI/blob/main/update.sh).
2022-06-15 02:18:07 +02:00
2022-06-25 14:00:12 +02:00
Or alternatively manually *copy-paste* this list of files/folders in your Craplog installation directory:<br/>
`craplog/`, `crapset/`, `crapup/`, `crapview/`, `README.md`, `LICENSE`.<br/>
If you opted for the manual *copy-paste*, please make sure the operation fully replace the old content, meaning that you have to check that no old entry (maybe with a different, old name) is left there.<br/><br/>
2022-06-15 01:58:37 +02:00
#### Git update
2022-06-25 14:00:12 +02:00
To update Craplog with **git** you'll need to have a local clone of this repo.<br/>
2022-06-15 01:58:37 +02:00
If you downloaded Craplog using the `git clone` method, you should be ready to go.
Follow these steps:
2022-06-25 14:00:12 +02:00
- Make sure you're in Craplog's main folder with your terminal<br/>
*You should see "craplog" as output*<br/>
2022-06-18 02:42:37 +02:00
```
ls | grep craplog
```
2022-06-25 14:00:12 +02:00
<br/>
- Make sure you have *git* installed in your system<br/>
*This should output the path of your git executable*<br/>
2022-06-18 02:42:37 +02:00
```
which git
```
2022-06-25 14:00:12 +02:00
<br/>
- Test if a git repository is already initialized in the current directory<br/>
*No error message should be shown*<br/>
2022-06-18 02:42:37 +02:00
```
git status
```
2022-06-25 14:00:12 +02:00
If you get an error message, follow these steps to initialize a git:<br/>
2022-06-18 02:42:37 +02:00
2022-06-25 14:00:12 +02:00
- Initialize the git repo, using `main` as local branch name<br/>
2022-06-18 02:42:37 +02:00
```
git init -b main
```
2022-06-25 14:00:12 +02:00
- Configure it<br/>
2022-06-18 02:42:37 +02:00
2022-06-15 01:58:37 +02:00
```
git config core.filemode false
git config remote.origin.url https://git.disroot.org/elB4RTO/craplog-CLI.git
git config remote.origin.fetch +refs/heads/*:refs/remotes/origin/*
git config remote.origin.prune true
git config branch.main.remote origin
git config branch.main.merge refs/heads/main
2022-06-18 02:42:37 +02:00
git config pull.rebase false
2022-06-15 02:12:34 +02:00
```
2022-06-18 02:42:37 +02:00
2022-06-25 14:00:12 +02:00
- Add Craplog's files to the git index<br/>
2022-06-18 02:42:37 +02:00
```
git add craplib/ craplog/ crapview/ crapup/ crapset/ README.md LICENSE
```
2022-06-25 14:00:12 +02:00
- Make a `.gitignore` file to ignore the local *configurations* and *statistics*<br/>
2022-06-18 02:42:37 +02:00
```
echo "/crapconfs/" >> .gitignore
echo "*.crapconf" >> .gitignore
echo "/crapstats/" >> .gitignore
echo "*.crapstat" >> .gitignore
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
2022-06-25 14:00:12 +02:00
- Your local repo is ready to pull the updates from the remote:<br/>
2022-06-18 02:42:37 +02:00
2022-06-25 14:00:12 +02:00
- You can directly download and apply any modification with just one command:<br/>
2022-06-18 02:42:37 +02:00
```
git pull origin main
```
2022-06-25 14:00:12 +02:00
<br/>
- Or you can split the process in steps:<br/>
2022-06-18 02:42:37 +02:00
2022-06-25 14:00:12 +02:00
- Download the informations about the new version's changes<br/>
2022-06-18 02:42:37 +02:00
```
git fetch origin
```
2022-06-25 14:00:12 +02:00
- Inspect any modification<br/>
2022-06-18 02:42:37 +02:00
```
git diff origin/main
```
2022-06-25 14:00:12 +02:00
- Finally apply the changes (if you want so)<br/>
2022-06-18 02:42:37 +02:00
```
git merge origin/main
```
2022-06-29 23:03:51 +02:00
<br/>
- If you're having troubles updating for refs/code mismatches, follow the following:<br/>
- Make a backup copy of the `crapstats` and `crapconfs` folders (and whatever else you care about).<br/>
*Nothing should happen to the non-indexed files/folders, but who knows, right?*
2022-06-29 23:15:23 +02:00
<br/>
2022-06-29 23:03:51 +02:00
- Reset your local git, removing the indexed content
```
git reset --hard
```
- Pull a fresh copy of this repository
```
git pull origin main
```
- Restore your backups if required<br/>
*Hopefully you shouldn't need to*
2022-06-15 01:58:37 +02:00
2022-06-29 23:03:51 +02:00
<br/><br/><br/><br/>
2022-06-14 03:26:40 +02:00
2022-06-11 23:32:09 +02:00
## Logs
At the moment, it still only supports **Apache2** log files in their **default** form.
Archived (**gzipped**) log files can be used as well as normal files.
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Usage control
This version of Craplog keeps track of the log files which have been used.
2022-06-25 14:00:12 +02:00
When a file is parsed succesfully, its **sha256** checksum is stored.<br/>
2022-06-11 23:32:09 +02:00
The stored checksums will be checked every time a file is given as input, to help preventing parsing the same files twice.
Hasheswill be stored in **craplog/crapstats/.hashes**
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Log files
If not specified, the files to be used will be **access.log.1** *and/or* **error.log.1**
Different file/s can be used by passing their names with `-F <names>` / `--log-files <names>`
Please notice that only **file names** have to be specified, NOT full paths.
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Logs path
If not specified, the default path will be **/var/log/apache2/**
A different path can be used by passing it with `-P <path>` / `--logs-path <path>`
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
2022-06-18 02:42:37 +02:00
### Logs structure
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
At the moment of writing, this is the only supported logs structure.<br/><br/>
2022-06-11 23:32:09 +02:00
#### access.log.*
IP - - [DATE:TIME] "REQUEST URI" RESPONSE "FROM URI" "USER AGENT"
*123.123.123.123 - - [01/01/2000:00:10:20 +0000] "GET /style.css HTTP/1.1" 200 321 "/index.php" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Firefox/86.0"*
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
#### error.log.*
[DATE TIME] [LOG LEVEL] [PID] ERROR REPORT
*[Mon Jan 01 10:20:30.456789 2000] [headers:trace2] [pid 12345] [client 123.123.123.123:45678] AH00128: File does not exist: /var/www/html/domain/readme.txt*
2022-06-29 23:03:51 +02:00
<br/><br/><br/>
2022-06-11 23:32:09 +02:00
## Statistics
### Storage
2022-06-25 14:00:12 +02:00
Statistics will be stored in Craplog's main folder: **craplog-CLI/crapstats/**<br/>
2022-06-11 23:32:09 +02:00
Please refer to the [statistics viewer tool](#statistics-viewer) to view your crapstats.
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Examined fields
#### Access logs
Four fields can be examined while parsing **access** logs:
- IP address of the client
- User-agent of the client
- Requested page/URL
- Response code from the server
2022-06-25 14:00:12 +02:00
<br/>You can select which fields to use by passing them with `-A <fields>` / `--access-fields <fields>`<br/>
Available fields choices are: **IP**, **UA**, **REQ**, **RES**<br/>
2022-06-11 23:32:09 +02:00
You can avoid parsing access logs by passing `-eO` / `--only-errors`
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
#### Error logs
While parsing **error** logs, only two fields will be used:
- Log level
- Error report
2022-06-25 14:00:12 +02:00
<br/>By default error logs won't be used, but you can parse them by passing `-e` / `--error-logs`
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Sessions statistics
**Sessions** are made by grouping statistics depending on the **date** of the single lines and will be stored consequently: new content will be made if that date is not present in the *crapstats* yet, or it will be merged if the date already exists.
2022-06-25 14:00:12 +02:00
Olny '**\*.log.\***' files will be considered valid as input. This is because these files (usually) contain the full logs stack of an entire (*past*) day.<br/>
2022-06-11 23:32:09 +02:00
Running it against a *today*'s file (which is not complete yet) may lead to re-running it in the future on the same file, parsing the same lines twice.
You can avoid storing sessions by passing `-gO` / `--only-globals`
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Global statistics
2022-06-25 14:00:12 +02:00
Additionally, **global statistics** will be created and/or updated *consequently*.<br/>
2022-06-11 23:32:09 +02:00
These statistics are identical to the session ones, in fact they're just merged sessions, for a larger view.
You can avoid updating globals by passing `-gA` / `--avoid-globals`
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Whitelist
You can add IP addresses to this list (may them be full *IPs*, only the *net-ID* part or just a portion of your choice), in order to skip the relative lines by whitelisting (or blacklisting..?) them, in both **access** and **error** logs.
Please notice that the given sequence must be the **starting part**: it's not possible (at the moment, and more likely also in future versions) to skip IPs ending or just containing that sequence.
2022-06-25 14:00:12 +02:00
As an example, if you insert "123", then only IP addresses starting with that sequence will be skipped.<br/>
If you insert ".1", then nothing will be skipped, since no IP will ever start with a dot.<br/>
2022-06-11 23:32:09 +02:00
But the shortcut "::1" is used by Apache2 for internal connections and will therefore be valid to skip those lines.
The **default** is to only skip logs from **::1**, but different sequences can be passed with `-W <IPs>` / `--ip-whitelist <IPs>`
Please notice that using a custom list will overwrite the default one, not appending to it. When passing a custom list as argument, you should include the default *::1* in order to keep whitelisting the relative lines.
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-11 23:32:09 +02:00
2022-06-15 01:58:37 +02:00
## View statistics
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
Craplog saves statistics as plain-text files, so you can directly view them, but you will agree that this ain't the best way to do that.<br/><br/>
2022-06-15 02:12:34 +02:00
### Crapview
2022-06-25 14:00:12 +02:00
**Crapview** is a cursed application that lets you easily view your crapstats.<br/><br/>
2022-06-11 23:32:09 +02:00
2023-02-12 00:25:03 +01:00
![performance diffs](https://git.disroot.org/elB4RTO/screenshots/raw/branch/main/Craplog/CLI/crapview.png)<br/>
2022-06-25 14:00:12 +02:00
*Viewing statistics*<br/><br/>
2022-06-11 23:32:09 +02:00
2022-06-18 02:42:37 +02:00
#### Run crapview
With Craplog installed:
```
craplog view
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
Without Craplog installed *(from the main folder)*:
```
2022-07-10 22:10:22 +02:00
python3 crapview/crapview.py
2022-06-18 02:42:37 +02:00
```
2022-06-25 14:00:12 +02:00
<br/>
2022-06-18 02:42:37 +02:00
#### Use crapview
2022-06-25 14:00:12 +02:00
It is pretty straight forward: use `TAB` to switch between windows, `ENTER` to interact, the arrow keys `←` `↑` `→` `↓` to move arownd and the letters to write in the **cli** or jump in the index of the **tree**.<br/><br/>
2022-06-18 02:42:37 +02:00
On the left side you can see the **tree** of your *statistics*, as it is in your system.
On the right side you can **view** the selected *statistics file*. You can see the elementa and their counts. The bars will show the percentage compared to the other elements in the same file.
2022-06-25 14:00:12 +02:00
Last but not least, at the bottom you can find the **cli**, which is not a real cli, but more like a search box.<br/>
2022-06-18 02:42:37 +02:00
Available **keywords** ar the following:
- `quit` : quits the program
2022-06-25 14:00:12 +02:00
- `clear <element>` : clears an element of the window. If no element is supplied, it will take effect on each one.<br/>
2022-06-18 02:42:37 +02:00
Available elements are: `cli`, `tree`, `view`
- `<element>`: Directly jump to the relative element
2022-06-25 14:00:12 +02:00
- `<tree path>` : Directly jump to the relative position in the tree.<br/>
Path must be composed by whitespace-separated words, as they are in the tree.<br/>
2022-06-18 02:42:37 +02:00
*Example: see the Requests statistics of a particulare day:* `sessions access <year> <month> <day> REQ`
2022-06-14 03:26:40 +02:00
2022-06-29 23:03:51 +02:00
<br/><br/><br/>
2022-06-14 03:26:40 +02:00
2022-06-15 01:58:37 +02:00
## Extra features
### Crapstats converter
2022-06-11 23:32:09 +02:00
2022-06-15 01:58:37 +02:00
***COMING SOON***
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-11 23:32:09 +02:00
## Final considerations
### Estimated working speed
1~10 MB/s
2022-06-25 14:00:12 +02:00
May be higher or lower depending on the complexity of the logs, the complexity of the stored statistics (in case of merge), your hardware and the workload of your system during the execution.<br/>
Usually, if it takes more than 10 seconds to parse 10 MB of data, it means you've probably been tested in some way (better to check).<br/><br/>
2022-06-11 23:32:09 +02:00
2023-02-12 00:25:03 +01:00
![performance diffs](https://git.disroot.org/elB4RTO/screenshots/raw/branch/main/Craplog/CLI/perf_diff.png)<br/>
2022-06-25 14:00:12 +02:00
*Normal vs Scanned*<br/><br/>
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
The above image shows the difference in performances between two different sessions, having the same number of lines and very similar data sizes.<br/>
On the left side, the parsed logs resulted from a webserver with normal activity.<br/>
2022-06-11 23:32:09 +02:00
On the right side, the parsed logs resulted from a webserver which have been scanned with tools like **sqlmap** and **nikto** (not nmap)
2022-06-25 14:00:12 +02:00
<br/>
2022-06-11 23:32:09 +02:00
### Backups
2022-06-25 14:00:12 +02:00
Craplog will automatically make backups of **global statistics** files (in case of fire).<br/>
2022-06-11 23:32:09 +02:00
If something goes wrong and you lose your actual globals, you can recover them (at least the last backup taken).
2022-06-25 14:00:12 +02:00
Move inside the folder you choose to store statistics in, open the "**globals**" folder, show hidden files and open the folder named "**.backups**'.<br/>
The complete path should look like `/<your_path>/craplog/crapstats/globals/.backups/`<br/>
2022-06-11 23:32:09 +02:00
Here you will find the last 3 backups taken. Folder named '3' is always the oldest and '1' the newest.
A new backup is made every time you run Craplog *successfully* over globals.
2022-06-18 02:42:37 +02:00
Please notice that *session* statistics will **not** be backed-up
2022-06-11 23:32:09 +02:00
2022-06-25 14:00:12 +02:00
<br/><br/>
2022-06-11 23:32:09 +02:00
## Contributions
2022-06-18 02:42:37 +02:00
Craplog is under development
2022-06-11 23:32:09 +02:00
2022-06-12 00:04:19 +02:00
If you have suggestions about how to improve it please open an ![issue](https://git.disroot.org/elB4RTO/craplog-CLI/issues) or make a ![pull request](https://git.disroot.org/elB4RTO/craplog-CLI/pulls)
2022-06-11 23:32:09 +02:00
If you're not running Apache, but you like this tool: same as the above (bring a sample of a log file)