mkv-this/README.md

96 lines
4.8 KiB
Markdown
Raw Normal View History

2020-04-19 19:57:13 +02:00
2020-04-21 02:42:00 +02:00
## disclaimer
2020-04-20 18:02:13 +02:00
2020-04-20 18:36:27 +02:00
i wrote this cli rapper for the `markovify` python module because i wanted its features to be available as a cli tool.
2020-04-20 18:02:13 +02:00
2020-04-24 18:37:16 +02:00
i only published it to share with friends.
2020-04-20 18:02:13 +02:00
2020-04-24 18:37:16 +02:00
& in case a programmer felt like picking it up and improving on it. so if you are interested in fixing amateur code, then by all means!
2020-04-20 18:36:27 +02:00
maybe this functionality already exists somewhere, but i couldn't find it. if it does, pls let me know!
2020-04-20 18:02:13 +02:00
2020-04-19 19:57:13 +02:00
## mkv-this
2020-04-20 18:24:36 +02:00
`mkv-this` is a little script that outputs a bunch of bot-like sentences based on a bank of text that you feed it. the results are saved to a text file. if you run it again with the same output file, the new results are appended after the old ones.
2020-04-19 19:57:13 +02:00
2020-04-20 18:24:36 +02:00
a second command, `mkv-this-dir` (see below) allows you to input a directory and it will read all text files within it as the input.
2020-04-19 19:57:13 +02:00
`mkv-this` simply makes some of the features of the excellent `markovify` module available as a command line tool. it was written by a total novice, so you probably shouldnt download it. i only learned about `argparser` yesterday, and pypi.org today, no matter what day it is. tomorrow i might learn about `os` and `sys`. and then maybe even `cookiecutter`!
2020-04-19 19:57:13 +02:00
2020-04-21 02:42:00 +02:00
### installing
2020-04-19 19:57:13 +02:00
install it with `pip`, the python package manager:
`python3 -m pip install mkv-this`
or
`pip install mkv-this`
2020-04-24 18:37:16 +02:00
to do this you need `python3` and `pip`. install them through your system's package manager. on debian (+ derivatives), for example, you'd run:
2020-04-19 19:57:13 +02:00
`sudo apt install python3 python3-pip`
2020-04-24 18:37:16 +02:00
`markovify` is a dependency, but should install along with `mkv-this`.
2020-04-19 19:57:13 +02:00
2020-04-24 18:37:16 +02:00
if you get sth like `ModuleNotFound error: No module named '$modulename'`, just run `pip install $modulename` to get the missing module.
2020-04-19 19:57:13 +02:00
2020-04-21 02:42:00 +02:00
### options
2020-04-19 19:57:13 +02:00
2020-04-24 18:37:16 +02:00
the script implements a number of the basic `markovify` options, so you can specify:
2020-04-19 19:57:13 +02:00
2020-04-24 18:37:16 +02:00
* how many sentences to output (default = 5).
* the state size, i.e. the number of preceding words to be used in calculating the choice of the next word (default = 2).
* a maximum sentence length, in characters.
2020-04-24 18:37:16 +02:00
* the amount of (verbatim) overlap allowed between input and output.
* if your text's sentences end with newlines rather than full-stops.
* an additional file to use for text input. you can add only one. if you want to feed a stack of files into your bank, use `mkv-this-dir`.
* the relative weight to give to the second file if it is used.
as of 0.1.29 you can also specify:
2020-04-24 18:37:16 +02:00
* a URL to a text file online. (you can input something that isn't a text file but the results will be mush or the programme will crash.)
* an additional URL to use as text input.
2020-04-19 19:57:13 +02:00
run `mkv-this -h` to see how to use these options.
### mkv-this-dir: markovify a directory of text files
2020-04-19 19:57:13 +02:00
2020-04-24 18:37:16 +02:00
`mkv-this` can only take two files as input. if you want to input a stack of files, use `mkv-this-dir`. specify a directory and all text files in it will be used as input.
2020-04-19 19:57:13 +02:00
2020-04-24 18:37:16 +02:00
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate files yourself from the command line, then process them:
2020-04-19 19:57:13 +02:00
* copy all your text files into a directory
* cd into the directory
* run `cat * > outputfile.txt`
* run mkv-this on your newly created file: `mkv-this outputfile.txt`
2020-04-24 18:37:16 +02:00
* if `mkv-this-dir` returns lots of chars that don't display because they it can't read the encoding, try this out instead.
2020-04-19 19:57:13 +02:00
2020-04-21 02:42:00 +02:00
### file types
2020-04-19 19:57:13 +02:00
you need to input plain text files. currently accepted file extensions are `.txt`, `.org` and `.md`. it is trivial to add others, so if you want one included just ask.
2020-04-21 02:42:00 +02:00
### for best results
2020-04-19 19:57:13 +02:00
2020-04-24 18:37:16 +02:00
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, HTML tags, metadata, stars, bullets, lines, etc.), unless you want mashed versions of those things in your output.
2020-04-19 19:57:13 +02:00
2020-04-24 18:37:16 +02:00
youll probably want to edit or select things from the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself.
2020-04-19 19:57:13 +02:00
for a few further tips, see https://github.com/jsvine/markovify#basic-usage.
happy zaning.
2020-04-24 18:37:16 +02:00
### macos
it seems to run on macos too.
you may already have python installed. if not, you first need to install [homebrew](https://brew.sh/#install), edit your PATH so that it works, then install `python3` with `brew install python3`. if you are already running an old version of `homebrew` you might need to run `brew install python3 && brew postinstall python3` to get `python3` and `pip` running right.
i know nothing about macs so if you ask me for help i'll just send you random copypasta from the interwebs.
### todo
* option to also append input model to a saved JSON file. (i.e. `text_model.to_json()`, `markovify.Text.from_json()`)
* maybe some copy in some basic webscraping boilerplate code.
* learn how to programme.