fns for -dir too + readme

This commit is contained in:
mousebot 2020-04-24 13:37:16 -03:00
parent 9f397a6f9b
commit d5ebb2906a
4 changed files with 123 additions and 102 deletions

View File

@ -3,9 +3,9 @@
i wrote this cli rapper for the `markovify` python module because i wanted its features to be available as a cli tool. i wrote this cli rapper for the `markovify` python module because i wanted its features to be available as a cli tool.
i only published it in case someone who actually knows what they're doing felt like picking it up and improving on it. (and to share it with friends.) i only published it to share with friends.
and if you are interested in fixing my amateur code, then by all means! & in case a programmer felt like picking it up and improving on it. so if you are interested in fixing amateur code, then by all means!
maybe this functionality already exists somewhere, but i couldn't find it. if it does, pls let me know! maybe this functionality already exists somewhere, but i couldn't find it. if it does, pls let me know!
@ -27,54 +27,44 @@ or
`pip install mkv-this` `pip install mkv-this`
to do this you need `python3` and `pip`. if you don't have them, install them through your system's package manager. on debian (+ derivatives), for example, you'd run: to do this you need `python3` and `pip`. install them through your system's package manager. on debian (+ derivatives), for example, you'd run:
`sudo apt install python3 python3-pip` `sudo apt install python3 python3-pip`
`markovify` is also a dependency, but it should install along with `mkv-this`. `markovify` is a dependency, but should install along with `mkv-this`.
if you get sth like `ModuleNotFound error: No module named 'modulename'`, just run `pip install modulename` to get the missing module. if you get sth like `ModuleNotFound error: No module named '$modulename'`, just run `pip install $modulename` to get the missing module.
### macos
it seems to run on macos too.
you may already have python installed. if not, you first need to install [homebrew](https://brew.sh/#install), edit your PATH so that it works, then install `python3` with `brew install python3`. if you are already running an old version of `homebrew` you might need to run `brew install python3 && brew postinstall python3` to get `python3` and `pip` running right.
you can check if `pip` is installed with `pip --version`, or `pip3 --version`.
i know nothing about macs so if you ask me for help i'll just send you random copypasta from the interwebs.
### options ### options
the script implements a few of the basic `markovify` options, so you can specify: the script implements a number of the basic `markovify` options, so you can specify:
* how many sentences to output (default = 5) * how many sentences to output (default = 5).
* the state size, i.e. the number of preceeding words to be used in calculating the probability of the next word (default = 2). * the state size, i.e. the number of preceding words to be used in calculating the choice of the next word (default = 2).
* a maximum sentence length, in characters. * a maximum sentence length, in characters.
* the amount of (verbatim) overlap allowed between your input text and your output text. * the amount of (verbatim) overlap allowed between input and output.
* that your text's sentences end with newlines rather than full-stops. * if your text's sentences end with newlines rather than full-stops.
* an additional file to use for text input. you can add only one. if you want to feed a stack of files into your bank, use `mkv-this-dir`. * an additional file to use for text input. you can add only one. if you want to feed a stack of files into your bank, use `mkv-this-dir`.
* the relative weight to give to the second file if it is used. * the relative weight to give to the second file if it is used.
as of 0.1.29 you can also specify: as of 0.1.29 you can also specify:
* a URL to a text file online. (you can input something that isn't a text file but the results will be mush.) * a URL to a text file online. (you can input something that isn't a text file but the results will be mush or the programme will crash.)
* an additional URL to use as text input. * an additional URL to use as text input.
run `mkv-this -h` to see how to use these options. run `mkv-this -h` to see how to use these options.
### mkv-this-dir: markovify a directory of text files ### mkv-this-dir: markovify a directory of text files
`mkv-this` can only take two files as input material each time. if you want to input a stack of files, use `mkv-this-dir`. it allows you to specify a directory and all text files in it will be used as input material. `mkv-this` can only take two files as input. if you want to input a stack of files, use `mkv-this-dir`. specify a directory and all text files in it will be used as input.
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate some files yourself from the command line, then process them: if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate files yourself from the command line, then process them:
* copy all your text files into a directory * copy all your text files into a directory
* cd into the directory * cd into the directory
* run `cat * > outputfile.txt` * run `cat * > outputfile.txt`
* run mkv-this on your newly created file: `mkv-this outputfile.txt` * run mkv-this on your newly created file: `mkv-this outputfile.txt`
* this approach has the benefit of creating a file with encoding that mkv-this can certainly handle. * if `mkv-this-dir` returns lots of chars that don't display because they it can't read the encoding, try this out instead.
### file types ### file types
@ -82,10 +72,24 @@ you need to input plain text files. currently accepted file extensions are `.txt
### for best results ### for best results
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, tags, metadata, stars, bullets, etc.), unless you want mashed versions of those things in your output. feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, HTML tags, metadata, stars, bullets, lines, etc.), unless you want mashed versions of those things in your output.
youll probably want to edit the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself. youll probably want to edit or select things from the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself.
for a few further tips, see https://github.com/jsvine/markovify#basic-usage. for a few further tips, see https://github.com/jsvine/markovify#basic-usage.
happy zaning. happy zaning.
### macos
it seems to run on macos too.
you may already have python installed. if not, you first need to install [homebrew](https://brew.sh/#install), edit your PATH so that it works, then install `python3` with `brew install python3`. if you are already running an old version of `homebrew` you might need to run `brew install python3 && brew postinstall python3` to get `python3` and `pip` running right.
i know nothing about macs so if you ask me for help i'll just send you random copypasta from the interwebs.
### todo
* option to also append input model to a saved JSON file. (i.e. `text_model.to_json()`, `markovify.Text.from_json()`)
* maybe some copy in some basic webscraping boilerplate code.
* learn how to programme.

View File

@ -29,7 +29,6 @@ import markovify
import sys import sys
import argparse import argparse
# argparse # argparse
def parse_the_args(): def parse_the_args():
parser = argparse.ArgumentParser(prog="mkv-this", description="markovify one or two local or remote text files and output the results to a local text file.", parser = argparse.ArgumentParser(prog="mkv-this", description="markovify one or two local or remote text files and output the results to a local text file.",
@ -68,6 +67,8 @@ def parse_the_args():
return parser.parse_args() return parser.parse_args()
# read/build/write fns: # read/build/write fns:
def URL(insert): def URL(insert):
try: try:
req = requests.get(insert) req = requests.get(insert)
@ -79,6 +80,7 @@ def URL(insert):
print('text fetched from URL.') print('text fetched from URL.')
return req.text return req.text
def read(infile): def read(infile):
try: try:
with open(infile, encoding="utf-8") as f: with open(infile, encoding="utf-8") as f:
@ -90,14 +92,17 @@ def read(infile):
print(fnf) print(fnf)
sys.exit() sys.exit()
def mkbtext(texttype): def mkbtext(texttype):
return markovify.Text(texttype, state_size=args.state_size, return markovify.Text(texttype, state_size=args.state_size,
well_formed=args.no_well_formed) well_formed=args.no_well_formed)
def mkbnewline(texttype): def mkbnewline(texttype):
return markovify.NewlineText(texttype, state_size=args.state_size, return markovify.NewlineText(texttype, state_size=args.state_size,
well_formed=args.no_well_formed) well_formed=args.no_well_formed)
def writesentence(tmodel): def writesentence(tmodel):
for i in range(args.sentences): for i in range(args.sentences):
output = open(args.outfile, 'a') # append output = open(args.outfile, 'a') # append
@ -114,6 +119,7 @@ def writesentence(tmodel):
output.write(str('*\n\n')) output.write(str('*\n\n'))
output.close() output.close()
# make args + fnf avail to all: # make args + fnf avail to all:
args = parse_the_args() args = parse_the_args()
fnf = 'error: file not found. please provide a path to a really-existing \ fnf = 'error: file not found. please provide a path to a really-existing \
@ -121,11 +127,11 @@ fnf = 'error: file not found. please provide a path to a really-existing \
def main(): def main():
# if a -c/-C, combine it w infile/URL: # if a -c/-C, combine it w infile/URL:
if args.combine or args.combine_URL: if args.combine or args.combine_URL:
if args.combine: if args.combine:
# get raw text as a string for both: # get raw text as a string for both:
# try: # try:
# infile is URL: # infile is URL:
if args.URL: if args.URL:
text = URL(args.infile) text = URL(args.infile)
@ -140,7 +146,7 @@ def main():
# if -C, combine it w infile/URL: # if -C, combine it w infile/URL:
elif args.combine_URL: elif args.combine_URL:
# try: # try:
# infile is URL: # infile is URL:
if args.URL: if args.URL:
text = URL(args.infile) text = URL(args.infile)
@ -176,7 +182,7 @@ def main():
text = URL(args.infile) text = URL(args.infile)
# or local: # or local:
else: else:
# try: # try:
text = read(args.infile) text = read(args.infile)
# except FileNotFoundError: # except FileNotFoundError:
# print(fnf) # print(fnf)

View File

@ -1,10 +1,4 @@
#! /usr/bin/env python3 #! /usr/bin/env python3
import os
import markovify
import sys
import argparse
""" """
mkv-this-dir: input a directory, output markovified text based on all its text files. mkv-this-dir: input a directory, output markovified text based on all its text files.
Copyright (C) 2020 mousebot@riseup.net. Copyright (C) 2020 mousebot@riseup.net.
@ -27,11 +21,16 @@ import argparse
a (very basic) script to collect all text files in a directory, markovify them and output a user-specified number of sentences to a text file. a (very basic) script to collect all text files in a directory, markovify them and output a user-specified number of sentences to a text file.
""" """
import os
import markovify
import sys
import argparse
def main():
# argparse for cmd line args # argparse
parser = argparse.ArgumentParser() def parse_the_args():
parser = argparse.ArgumentParser(prog="mkv-this-dir", description="markovify all text files in a director and output the results to a text file.",
epilog="may you find many prophetic énoncés in your virtual bird guts! Here, this is not at all the becomings that are connected... so if you want to edit it like a bot yourself, it is trivial.")
# positional args: # positional args:
parser.add_argument('indir', help="the directory to extract the text of all text files from, with path.") parser.add_argument('indir', help="the directory to extract the text of all text files from, with path.")
@ -51,32 +50,49 @@ def main():
parser.add_argument('--newline', help="sentences in input file end with newlines rather than with full stops.", action='store_true') parser.add_argument('--newline', help="sentences in input file end with newlines rather than with full stops.", action='store_true')
# store_true = default to False, become True if flagged. # store_true = default to False, become True if flagged.
args = parser.parse_args() return parser.parse_args()
# read, build, write fns: # read, build, write fns:
def read(infile): def read(infile):
try:
with open(infile, encoding="utf-8") as f:
return f.read()
except UnicodeDecodeError:
with open(infile, encoding="latin-1") as f: with open(infile, encoding="latin-1") as f:
return f.read() return f.read()
except FileNotFoundError:
print(fnf)
sys.exit()
def mkbtext(texttype): def mkbtext(texttype):
return markovify.Text(texttype, state_size=args.state_size, return markovify.Text(texttype, state_size=args.state_size,
well_formed=args.no_well_formed) well_formed=args.no_well_formed)
def mkbnewline(texttype): def mkbnewline(texttype):
return markovify.NewlineText(texttype, state_size=args.state_size, return markovify.NewlineText(texttype, state_size=args.state_size,
well_formed=args.no_well_formed) well_formed=args.no_well_formed)
def writesent(tmodel): def writesentence(tmodel):
return output.write(str(tmodel.make_sentence( for i in range(args.sentences):
output = open(args.outfile, 'a') # append
# short:
if args.length:
output.write(str(tmodel.make_short_sentence(
tries=2000, max_overlap_ratio=args.overlap, tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n') max_chars=args.length)) + '\n\n')
# normal:
def writeshortsent(tmodel): else:
return output.write(str(tmodel.make_short_sentence( output.write(str(tmodel.make_sentence(
tries=2000, max_overlap_ratio=args.overlap, tries=2000, max_overlap_ratio=args.overlap,
max_chars=args.length)) + '\n\n') max_chars=args.length)) + '\n\n\n')
sys.exit() output.write(str('*\n\n'))
output.close()
# make args avail to all:
args = parse_the_args()
def main():
#create a list of files to concatenate: #create a list of files to concatenate:
matches = [] matches = []
if os.path.isdir(args.indir) is True: if os.path.isdir(args.indir) is True:
@ -90,9 +106,13 @@ def main():
# place batchfile.txt in user-given directory: # place batchfile.txt in user-given directory:
batchfile = os.path.dirname(args.indir) + os.path.sep + 'batchfile.txt' batchfile = os.path.dirname(args.indir) + os.path.sep + 'batchfile.txt'
# concatenate the files into batchfile.txt: # concatenate files into batchfile.txt:
with open(batchfile, 'w') as outfile: with open(batchfile, 'w') as outfile:
for fname in matches: for fname in matches:
try:
with open(fname, encoding="utf-8") as infile:
outfile.write(infile.read())
except UnicodeDecodeError:
with open(fname, encoding="latin-1") as infile: with open(fname, encoding="latin-1") as infile:
outfile.write(infile.read()) outfile.write(infile.read())
outfile.close() outfile.close()
@ -100,7 +120,7 @@ def main():
# Get raw text from batchfile as string. # Get raw text from batchfile as string.
text = read(batchfile) text = read(batchfile)
# Build the model: # Build model:
# if --newline: # if --newline:
if args.newline: if args.newline:
text_model = mkbnewline(text) text_model = mkbnewline(text)
@ -108,18 +128,7 @@ def main():
else: else:
text_model = mkbtext(text) text_model = mkbtext(text)
# Print -n number of randomly-generated sentences writesentence(text_model)
for i in range(args.sentences):
output = open(args.outfile, 'a') # append
# short sentence:
if args.length:
writeshortsent(text_model)
# normal sentence:
else:
writesent(text_model)
output.write(str('*\n\n'))
# add a star between each appended set.
output.close()
os.unlink(batchfile) os.unlink(batchfile)
print('\n: The options you used are as follows:\n') print('\n: The options you used are as follows:\n')
@ -132,5 +141,7 @@ def main():
print('mkv-this ran but did NOT create an output file as requested. this is a very regrettable and dangerous situation. contact the package maintainer asap. soz!') print('mkv-this ran but did NOT create an output file as requested. this is a very regrettable and dangerous situation. contact the package maintainer asap. soz!')
sys.exit() sys.exit()
# enable for testing:
# main() # for testing:
if __name__ == '__main__':
main()

View File

@ -7,7 +7,7 @@ with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
long_description = f.read() long_description = f.read()
setup(name='mkv-this', setup(name='mkv-this',
version='0.1.30', version='0.1.31',
description='cli wrapper for markovify: take a text file, markovify, output the results to a text file.', description='cli wrapper for markovify: take a text file, markovify, output the results to a text file.',
long_description=long_description, long_description=long_description,
long_description_content_type='text/markdown', long_description_content_type='text/markdown',