fns for -dir too + readme

This commit is contained in:
mousebot 2020-04-24 13:37:16 -03:00
parent 9f397a6f9b
commit d5ebb2906a
4 changed files with 123 additions and 102 deletions

View File

@ -3,9 +3,9 @@
i wrote this cli rapper for the `markovify` python module because i wanted its features to be available as a cli tool.
i only published it in case someone who actually knows what they're doing felt like picking it up and improving on it. (and to share it with friends.)
i only published it to share with friends.
and if you are interested in fixing my amateur code, then by all means!
& in case a programmer felt like picking it up and improving on it. so if you are interested in fixing amateur code, then by all means!
maybe this functionality already exists somewhere, but i couldn't find it. if it does, pls let me know!
@ -27,54 +27,44 @@ or
`pip install mkv-this`
to do this you need `python3` and `pip`. if you don't have them, install them through your system's package manager. on debian (+ derivatives), for example, you'd run:
to do this you need `python3` and `pip`. install them through your system's package manager. on debian (+ derivatives), for example, you'd run:
`sudo apt install python3 python3-pip`
`markovify` is also a dependency, but it should install along with `mkv-this`.
`markovify` is a dependency, but should install along with `mkv-this`.
if you get sth like `ModuleNotFound error: No module named 'modulename'`, just run `pip install modulename` to get the missing module.
### macos
it seems to run on macos too.
you may already have python installed. if not, you first need to install [homebrew](https://brew.sh/#install), edit your PATH so that it works, then install `python3` with `brew install python3`. if you are already running an old version of `homebrew` you might need to run `brew install python3 && brew postinstall python3` to get `python3` and `pip` running right.
you can check if `pip` is installed with `pip --version`, or `pip3 --version`.
i know nothing about macs so if you ask me for help i'll just send you random copypasta from the interwebs.
if you get sth like `ModuleNotFound error: No module named '$modulename'`, just run `pip install $modulename` to get the missing module.
### options
the script implements a few of the basic `markovify` options, so you can specify:
the script implements a number of the basic `markovify` options, so you can specify:
* how many sentences to output (default = 5)
* the state size, i.e. the number of preceeding words to be used in calculating the probability of the next word (default = 2).
* how many sentences to output (default = 5).
* the state size, i.e. the number of preceding words to be used in calculating the choice of the next word (default = 2).
* a maximum sentence length, in characters.
* the amount of (verbatim) overlap allowed between your input text and your output text.
* that your text's sentences end with newlines rather than full-stops.
* the amount of (verbatim) overlap allowed between input and output.
* if your text's sentences end with newlines rather than full-stops.
* an additional file to use for text input. you can add only one. if you want to feed a stack of files into your bank, use `mkv-this-dir`.
* the relative weight to give to the second file if it is used.
as of 0.1.29 you can also specify:
* a URL to a text file online. (you can input something that isn't a text file but the results will be mush.)
* a URL to a text file online. (you can input something that isn't a text file but the results will be mush or the programme will crash.)
* an additional URL to use as text input.
run `mkv-this -h` to see how to use these options.
### mkv-this-dir: markovify a directory of text files
`mkv-this` can only take two files as input material each time. if you want to input a stack of files, use `mkv-this-dir`. it allows you to specify a directory and all text files in it will be used as input material.
`mkv-this` can only take two files as input. if you want to input a stack of files, use `mkv-this-dir`. specify a directory and all text files in it will be used as input.
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate some files yourself from the command line, then process them:
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate files yourself from the command line, then process them:
* copy all your text files into a directory
* cd into the directory
* run `cat * > outputfile.txt`
* run mkv-this on your newly created file: `mkv-this outputfile.txt`
* this approach has the benefit of creating a file with encoding that mkv-this can certainly handle.
* if `mkv-this-dir` returns lots of chars that don't display because they it can't read the encoding, try this out instead.
### file types
@ -82,10 +72,24 @@ you need to input plain text files. currently accepted file extensions are `.txt
### for best results
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, tags, metadata, stars, bullets, etc.), unless you want mashed versions of those things in your output.
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, HTML tags, metadata, stars, bullets, lines, etc.), unless you want mashed versions of those things in your output.
youll probably want to edit the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself.
youll probably want to edit or select things from the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself.
for a few further tips, see https://github.com/jsvine/markovify#basic-usage.
happy zaning.
### macos
it seems to run on macos too.
you may already have python installed. if not, you first need to install [homebrew](https://brew.sh/#install), edit your PATH so that it works, then install `python3` with `brew install python3`. if you are already running an old version of `homebrew` you might need to run `brew install python3 && brew postinstall python3` to get `python3` and `pip` running right.
i know nothing about macs so if you ask me for help i'll just send you random copypasta from the interwebs.
### todo
* option to also append input model to a saved JSON file. (i.e. `text_model.to_json()`, `markovify.Text.from_json()`)
* maybe some copy in some basic webscraping boilerplate code.
* learn how to programme.

View File

@ -29,7 +29,6 @@ import markovify
import sys
import argparse
# argparse
def parse_the_args():
parser = argparse.ArgumentParser(prog="mkv-this", description="markovify one or two local or remote text files and output the results to a local text file.",
@ -66,8 +65,10 @@ def parse_the_args():
# store_true = default to False, become True if flagged.
return parser.parse_args()
# read/build/write fns:
def URL(insert):
try:
req = requests.get(insert)
@ -79,6 +80,7 @@ def URL(insert):
print('text fetched from URL.')
return req.text
def read(infile):
try:
with open(infile, encoding="utf-8") as f:
@ -89,31 +91,35 @@ def read(infile):
except FileNotFoundError:
print(fnf)
sys.exit()
def mkbtext(texttype):
return markovify.Text(texttype, state_size=args.state_size,
well_formed=args.no_well_formed)
def mkbnewline(texttype):
return markovify.NewlineText(texttype, state_size=args.state_size,
well_formed=args.no_well_formed)
def writesentence(tmodel):
for i in range(args.sentences):
output = open(args.outfile, 'a') # append
# short:
if args.length:
if args.length:
output.write(str(tmodel.make_short_sentence(
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n')
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n')
# normal:
else:
output.write(str(tmodel.make_sentence(
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n')
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n')
output.write(str('*\n\n'))
output.close()
# make args + fnf avail to all:
args = parse_the_args()
fnf = 'error: file not found. please provide a path to a really-existing \
@ -121,27 +127,27 @@ fnf = 'error: file not found. please provide a path to a really-existing \
def main():
# if a -c/-C, combine it w infile/URL:
# if a -c/-C, combine it w infile/URL:
if args.combine or args.combine_URL:
if args.combine:
# get raw text as a string for both:
# try:
# infile is URL:
if args.URL:
text = URL(args.infile)
# or normal:
else:
text = read(args.infile)
# read -c file:
ctext = read(args.combine)
# try:
# infile is URL:
if args.URL:
text = URL(args.infile)
# or normal:
else:
text = read(args.infile)
# read -c file:
ctext = read(args.combine)
# except FileNotFoundError:
# print(fnf)
# sys.exit()
# if -C, combine it w infile/URL:
elif args.combine_URL:
# try:
# infile is URL:
# try:
# infile is URL:
if args.URL:
text = URL(args.infile)
# or normal:
@ -167,7 +173,7 @@ def main():
[text_model, ctext_model], [1, args.weight])
writesentence(combo_model)
# if no -c/-C, do normal:
else:
# Get raw text as string.
@ -176,7 +182,7 @@ def main():
text = URL(args.infile)
# or local:
else:
# try:
# try:
text = read(args.infile)
# except FileNotFoundError:
# print(fnf)
@ -188,8 +194,8 @@ def main():
text_model = mkbnewline(text)
# no --newline:
else:
text_model = mkbtext(text)
text_model = mkbtext(text)
writesentence(text_model)
print('\n: The options you used are as follows:\n')

View File

@ -1,10 +1,4 @@
#! /usr/bin/env python3
import os
import markovify
import sys
import argparse
"""
mkv-this-dir: input a directory, output markovified text based on all its text files.
Copyright (C) 2020 mousebot@riseup.net.
@ -27,11 +21,16 @@ import argparse
a (very basic) script to collect all text files in a directory, markovify them and output a user-specified number of sentences to a text file.
"""
import os
import markovify
import sys
import argparse
def main():
# argparse for cmd line args
parser = argparse.ArgumentParser()
# argparse
def parse_the_args():
parser = argparse.ArgumentParser(prog="mkv-this-dir", description="markovify all text files in a director and output the results to a text file.",
epilog="may you find many prophetic énoncés in your virtual bird guts! Here, this is not at all the becomings that are connected... so if you want to edit it like a bot yourself, it is trivial.")
# positional args:
parser.add_argument('indir', help="the directory to extract the text of all text files from, with path.")
@ -51,32 +50,49 @@ def main():
parser.add_argument('--newline', help="sentences in input file end with newlines rather than with full stops.", action='store_true')
# store_true = default to False, become True if flagged.
args = parser.parse_args()
return parser.parse_args()
# read, build, write fns:
def read(infile):
# read, build, write fns:
def read(infile):
try:
with open(infile, encoding="utf-8") as f:
return f.read()
except UnicodeDecodeError:
with open(infile, encoding="latin-1") as f:
return f.read()
def mkbtext(texttype):
return markovify.Text(texttype, state_size=args.state_size,
well_formed=args.no_well_formed)
def mkbnewline(texttype):
return markovify.NewlineText(texttype, state_size=args.state_size,
well_formed=args.no_well_formed)
def writesent(tmodel):
return output.write(str(tmodel.make_sentence(
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n')
def writeshortsent(tmodel):
return output.write(str(tmodel.make_short_sentence(
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n')
except FileNotFoundError:
print(fnf)
sys.exit()
def mkbtext(texttype):
return markovify.Text(texttype, state_size=args.state_size,
well_formed=args.no_well_formed)
def mkbnewline(texttype):
return markovify.NewlineText(texttype, state_size=args.state_size,
well_formed=args.no_well_formed)
def writesentence(tmodel):
for i in range(args.sentences):
output = open(args.outfile, 'a') # append
# short:
if args.length:
output.write(str(tmodel.make_short_sentence(
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n')
# normal:
else:
output.write(str(tmodel.make_sentence(
tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n\n')
output.write(str('*\n\n'))
output.close()
# make args avail to all:
args = parse_the_args()
def main():
#create a list of files to concatenate:
matches = []
if os.path.isdir(args.indir) is True:
@ -90,17 +106,21 @@ def main():
# place batchfile.txt in user-given directory:
batchfile = os.path.dirname(args.indir) + os.path.sep + 'batchfile.txt'
# concatenate the files into batchfile.txt:
# concatenate files into batchfile.txt:
with open(batchfile, 'w') as outfile:
for fname in matches:
with open(fname, encoding="latin-1") as infile:
outfile.write(infile.read())
try:
with open(fname, encoding="utf-8") as infile:
outfile.write(infile.read())
except UnicodeDecodeError:
with open(fname, encoding="latin-1") as infile:
outfile.write(infile.read())
outfile.close()
# Get raw text from batchfile as string.
text = read(batchfile)
# Build the model:
# Build model:
# if --newline:
if args.newline:
text_model = mkbnewline(text)
@ -108,20 +128,9 @@ def main():
else:
text_model = mkbtext(text)
# Print -n number of randomly-generated sentences
for i in range(args.sentences):
output = open(args.outfile, 'a') # append
# short sentence:
if args.length:
writeshortsent(text_model)
# normal sentence:
else:
writesent(text_model)
output.write(str('*\n\n'))
# add a star between each appended set.
output.close()
writesentence(text_model)
os.unlink(batchfile)
print('\n: The options you used are as follows:\n')
for key, value in vars(args).items():
print(': ' + key.ljust(15, ' ') + ': ' + str(value).ljust(10))
@ -132,5 +141,7 @@ def main():
print('mkv-this ran but did NOT create an output file as requested. this is a very regrettable and dangerous situation. contact the package maintainer asap. soz!')
sys.exit()
# enable for testing:
# main()
# for testing:
if __name__ == '__main__':
main()

View File

@ -7,7 +7,7 @@ with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
setup(name='mkv-this',
version='0.1.30',
version='0.1.31',
description='cli wrapper for markovify: take a text file, markovify, output the results to a text file.',
long_description=long_description,
long_description_content_type='text/markdown',