infile can be URL, added combine_URL, URL as fn

This commit is contained in:
mousebot 2020-04-22 21:05:16 -03:00
parent 2152bdb8b0
commit 003365c3ec
3 changed files with 91 additions and 67 deletions

View File

@ -5,9 +5,9 @@ i wrote this cli rapper for the `markovify` python module because i wanted its f
i only published it in case someone who actually knows what they're doing felt like picking it up and improving on it. (and to share it with friends.)
maybe this functionality already exists somewhere, but i couldn't find it.
and if you are interested in fixing my amateur code, then by all means!
so if unlike me you are actually a programmer and are interested in expanding/correcting/fixing my non-programmer's code, then by all means!
maybe this functionality already exists somewhere, but i couldn't find it. if it does, pls let me know!
## mkv-this
@ -15,7 +15,7 @@ so if unlike me you are actually a programmer and are interested in expanding/co
a second command, `mkv-this-dir` (see below) allows you to input a directory and it will read all text files within it as the input.
`mkv-this` simply makes some of the features of the excellent `markovify` python module available as a command line tool. it was written by a total novice, so you probably shouldnt download it. i only learned about `argparser` yesterday, and pypi.org today, no matter what day it is. tomorrow i might learn about `os` and `sys`. and then maybe even `cookiecutter`!
`mkv-this` simply makes some of the features of the excellent `markovify` module available as a command line tool. it was written by a total novice, so you probably shouldnt download it. i only learned about `argparser` yesterday, and pypi.org today, no matter what day it is. tomorrow i might learn about `os` and `sys`. and then maybe even `cookiecutter`!
### installing
@ -33,12 +33,12 @@ to do this you need `python3` and `pip`. if you don't have them, install them th
`markovify` is also a dependency, but it should install along with `mkv-this`.
if you get sth like `ModuleNotFound error: No module named 'modulename'`, just run `pip install modulename` to get it.
if you get sth like `ModuleNotFound error: No module named 'modulename'`, just run `pip install modulename` to get the missing module.
### repository
if you are reading this on pypi.org, the repo is here:
https://git.disroot.org/mousebot/mkv-this
https://git.disroot.org/mousebot/mkv-this.
### macos
@ -52,16 +52,20 @@ i know nothing about macs so if you ask me for help i'll just send you random co
### options
the script implements a few of the basic `markovify` options, so you can:
the script implements a few of the basic `markovify` options, so you can specify:
* specify output file (default = "./mkv-output.txt")
* specify a maximum sentence length, in characters.
* specify how many sentences to output (default = 5)
* specify state size, i.e. the number of preceeding words to be used in calculating the probability of the next word (default = 2).
* specify the amount of (verbatim) overlap allowed between your input text and your output text.
* specify that your text's sentences end with newlines rather than full-stops.
* specify an additional file to use for text input. you can add only one. if you want to feed a stack of files into your bank, use `mkv-this-dir`.
* if a second file is added, you can also specify the relative weight to give to the two files.
* how many sentences to output (default = 5)
* the state size, i.e. the number of preceeding words to be used in calculating the probability of the next word (default = 2).
* a maximum sentence length, in characters.
* the amount of (verbatim) overlap allowed between your input text and your output text.
* that your text's sentences end with newlines rather than full-stops.
* an additional file to use for text input. you can add only one. if you want to feed a stack of files into your bank, use `mkv-this-dir`.
* the relative weight to give to the second file if it is used.
as of 0.1.29 you can also specify:
* a URL to a text file online. (you can input something that isn't a text file but the results will be mush.)
* an additional URL to use as text input.
run `mkv-this -h` to see how to use these options.
@ -69,7 +73,7 @@ run `mkv-this -h` to see how to use these options.
`mkv-this` can only take two files as input material each time. if you want to input a stack of files, use `mkv-this-dir`. it allows you to specify a directory and all text files in it will be used as input material.
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate some files yourself in bash, then process them:
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate some files yourself from the command line, then process them:
* copy all your text files into a directory
* cd into the directory
@ -83,9 +87,7 @@ you need to input plain text files. currently accepted file extensions are `.txt
### for best results
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, metadata, stars, bullets, etc.), unless you want mashed versions of those things in your output.
if your input text doesnt use full-stops to mark the ends of sentences, try putting each 'sentence' on a newline, and then write to the maintainer of this package to complain about how that option isn't yet implemented. then the parser wont read your entire file as one big sentence and output nothing.
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, tags, metadata, stars, bullets, etc.), unless you want mashed versions of those things in your output.
youll probably want to edit the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself.

View File

@ -27,11 +27,21 @@ import requests
import markovify
import sys
import argparse
import fns
def URL(insert):
try:
req = requests.get(insert)
req.raise_for_status()
except Exception as exc:
print(f'There was a problem: {exc}')
sys.exit()
else:
print('text fetched from URL.')
return req.text
def main():
# argparse for cmd line args
parser = argparse.ArgumentParser(prog="mkv-this", description="markovify a local or remote text file and output the results to local text file.", epilog="may you have many delightful éoncés!")
parser = argparse.ArgumentParser(prog="mkv-this", description="markovify a local or remote text file and output the results to local text file.", epilog="may you find many prophetic énoncés in your virtual bird guts! Here, this is not at all the becomings that are connected... so if you want to edit it like a bot yourself, it is trivial.")
# positional args:
parser.add_argument('infile', help="the text file to process, with path. NB: file cannot be empty.")
@ -39,61 +49,67 @@ def main():
# optional args:
parser.add_argument('-s', '--state-size', help="the number of preceeding words used to calculate the probability of the next word. defaults to 2, 1 makes it more random, 3 less so. must be an integer. anything more than 4 will likely have little effect.", type=int, default=2)
# if i use --state-size (w a dash), type=int doesn't work.
parser.add_argument('-u', '--URL', help="infile is a URL. NB: for this to work best it should be the location of a text file.", action='store_true')
parser.add_argument('-n', '--sentences', help="the number of 'sentences' to output. defaults to 5. must be an integer.", type=int, default=5)
parser.add_argument('-l', '--length', help="set maximum number of characters per sentence. must be an integer.", type=int)
parser.add_argument('-o', '--overlap', help="the amount of overlap allowed between original text and the output, expressed as a ratio between 0 and 1. defaults to 0.5", type=float, default=0.5)
parser.add_argument('-o', '--overlap', help="the amount of overlap allowed between original text and the output, expressed as a ratio between 0 and 1. defaults to 0.5", type=float, default=0.5)
parser.add_argument('-c', '--combine', help="provide an another input text file with path to be combined with the first item.")
parser.add_argument('-C', '--combine-URL', help="provide an additional URL to be combined with the first item")
parser.add_argument('-w', '--weight', help="specify the weight to be given to the second text provided with --combine. defaults to 1, and the weight of the initial text is also 1. setting this to 1.5 will place 50 percent more weight on the second text, while setting it to 0.5 will place less.", type=float, default=1)
#switches
parser.add_argument('-f', '--no-well-formed', help="don't enforce 'well_formed', ie allow the inclusion of sentences with []{}()""'' in them in the markov model. this might filth up your text, especially if it contains 'smart' quotes.", action='store_false') # store_false = default to True.
parser.add_argument('--newline', help="sentences in input file end with newlines rather than with full stops.", action='store_true')
# store_true = default to False, become True if flagged.
parser.add_argument('-c', '--combine', help="provide an another input text file with path to be combined with the first.")
parser.add_argument('-C', '--combine-URL', help="provide an additional URL to be combined with the first")
parser.add_argument('-w', '--weight', help="specify the weight to be given to the second text provided with --combine. defaults to 1, and the weight of the initial text is also 1. setting this to 1.5 will place 50 percent more weight on the second text, while setting it to 0.5 will place less.", type=float, default=1)
args = parser.parse_args()
fnf = 'error: file not found. please provide a path to a really-existing file!'
# if a combine file is provided, combine it w infile/URL:
if args.combine :
# if a combine file is provided, we will combine it w infile/URL:
if args.combine or args.combine_URL:
if args.combine:
# get raw text as a string for both files:
try:
# infile can be a URL:
if args.URL:
text = fns.URL(args.infile)
# or normal file:
else:
with open(args.infile, encoding="latin-1") as f:
text = f.read()
except FileNotFoundError:
print(fnf)
sys.exit()
with open(args.combine, encoding="latin-1") as cf:
ctext = cf.read()
# if combine URL is provided, combine it w infile/URL:
elif args.combine_URL :
try:
# infile can still be a URL:
if args.URL:
text = fns.URL(args.infile)
# or normal file:
else:
with open(args.infile, encoding="latin-1") as f:
text = f.read()
except FileNotFoundError:
print(fnf)
sys.exit()
# now combine as URL:
ctext = fns.URL(args.combine_URL)
try:
# infile can be a URL:
if args.URL:
text = URL(args.infile)
# or normal file:
else:
with open(args.infile, encoding="latin-1") as f:
text = f.read()
# read combine file:
with open(args.combine, encoding="latin-1") as cf:
ctext = cf.read()
except FileNotFoundError:
print(fnf)
sys.exit()
# if combine_URL is provided, we will combine it w infile/URL:
elif args.combine_URL:
try:
# infile can still be a URL:
if args.URL:
text = URL(args.infile)
# or normal file:
else:
with open(args.infile, encoding="latin-1") as f:
text = f.read()
except FileNotFoundError:
print(fnf)
sys.exit()
# now combine_URL:
ctext = URL(args.combine_URL)
# build the models and build a combined model:
# NB: attempting to implement Newline option here (and below):
# with newline flagged:
if args.newline :
text_model = markovify.NewlineText(
text, state_size=args.state_size, well_formed=args.no_well_formed)
ctext_model = markovify.NewlineText(
ctext, state_size=args.state_size, well_formed=args.no_well_formed)
# no newline flag:
else:
text_model = markovify.Text(text,
state_size=args.state_size, well_formed=args.no_well_formed)
@ -105,9 +121,11 @@ def main():
# Print -n number of randomly-generated sentences
for i in range(args.sentences):
output = open(args.outfile, 'a') # appending
# short sentence:
if args.length :
output.write(str(combo_model.make_short_sentence(
args.length, tries=2000, max_overlap_ratio=args.overlap)) + '\n \n')
# normal sentence:
else:
output.write(str(combo_model.make_sentence(
tries=2000, max_overlap_ratio=args.overlap)) + '\n \n')
@ -116,14 +134,12 @@ def main():
# add a star between each appended set.
output.close()
# if no combo file, just do normal:
# if no combo file, just do normal:
else:
# Get raw text as string.
# either from a URL:
if args.URL:
text = fns.URL(args.infile)
# text is what fn returns, ie = req.text.
text = URL(args.infile)
# or a normal local file:
else:
try:
@ -136,10 +152,11 @@ def main():
# Build the model:
# NB: this errors if infile is EMPTY:
## newline option:
## newline flagged:
if args.newline :
text_model = markovify.NewlineText(text,
state_size=args.state_size, well_formed=args.no_well_formed)
# no newline flag:
else:
text_model = markovify.Text(text,
state_size=args.state_size, well_formed=args.no_well_formed)
@ -147,9 +164,11 @@ def main():
# Print -n number of randomly-generated sentences
for i in range(args.sentences):
output = open(args.outfile, 'a') # append to file
# short sentence:
if args.length :
output.write(str(text_model.make_short_sentence(
args.length, tries=2000, max_overlap_ratio=args.overlap)) + '\n \n')
# normal sentence:
else:
output.write(str(text_model.make_sentence(
tries=2000, max_overlap_ratio=args.overlap)) + '\n \n')
@ -158,9 +177,12 @@ def main():
# add a star between each appended set.
output.close()
print('\n The options you used were as follows:\n')
print(vars(args))
print('\n: literary genius has been written to the file ' + args.outfile + '. thanks for playing!')
print('\n: The options you used are as follows:\n')
for key, value in vars(args).items():
print(key, ': ', value)
print('\n: literary genius has been written to the file ' + args.outfile + '. thanks for playing! \n\n: Here, this is not at all the becomings that are connected... so if you want to edit it like a bot yourself, it is trivial. Yes, although your very smile suggests that this Armenian enclave is not at all the becomings that are connected...')
# sys.exit()
main()
sys.exit()
# enable this for testing the file:
# main()

View File

@ -7,7 +7,7 @@ with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
setup(name='mkv-this',
version='0.1.28',
version='0.1.29',
description='cli wrapper for markovify: take a text file, markovify, output the results to a text file.',
long_description=long_description,
long_description_content_type='text/markdown',