fns for -dir too + readme
This commit is contained in:
parent
9f397a6f9b
commit
d5ebb2906a
56
README.md
56
README.md
|
@ -3,9 +3,9 @@
|
|||
|
||||
i wrote this cli rapper for the `markovify` python module because i wanted its features to be available as a cli tool.
|
||||
|
||||
i only published it in case someone who actually knows what they're doing felt like picking it up and improving on it. (and to share it with friends.)
|
||||
i only published it to share with friends.
|
||||
|
||||
and if you are interested in fixing my amateur code, then by all means!
|
||||
& in case a programmer felt like picking it up and improving on it. so if you are interested in fixing amateur code, then by all means!
|
||||
|
||||
maybe this functionality already exists somewhere, but i couldn't find it. if it does, pls let me know!
|
||||
|
||||
|
@ -27,54 +27,44 @@ or
|
|||
|
||||
`pip install mkv-this`
|
||||
|
||||
to do this you need `python3` and `pip`. if you don't have them, install them through your system's package manager. on debian (+ derivatives), for example, you'd run:
|
||||
to do this you need `python3` and `pip`. install them through your system's package manager. on debian (+ derivatives), for example, you'd run:
|
||||
|
||||
`sudo apt install python3 python3-pip`
|
||||
|
||||
`markovify` is also a dependency, but it should install along with `mkv-this`.
|
||||
`markovify` is a dependency, but should install along with `mkv-this`.
|
||||
|
||||
if you get sth like `ModuleNotFound error: No module named 'modulename'`, just run `pip install modulename` to get the missing module.
|
||||
|
||||
### macos
|
||||
|
||||
it seems to run on macos too.
|
||||
|
||||
you may already have python installed. if not, you first need to install [homebrew](https://brew.sh/#install), edit your PATH so that it works, then install `python3` with `brew install python3`. if you are already running an old version of `homebrew` you might need to run `brew install python3 && brew postinstall python3` to get `python3` and `pip` running right.
|
||||
|
||||
you can check if `pip` is installed with `pip --version`, or `pip3 --version`.
|
||||
|
||||
i know nothing about macs so if you ask me for help i'll just send you random copypasta from the interwebs.
|
||||
if you get sth like `ModuleNotFound error: No module named '$modulename'`, just run `pip install $modulename` to get the missing module.
|
||||
|
||||
### options
|
||||
|
||||
the script implements a few of the basic `markovify` options, so you can specify:
|
||||
the script implements a number of the basic `markovify` options, so you can specify:
|
||||
|
||||
* how many sentences to output (default = 5)
|
||||
* the state size, i.e. the number of preceeding words to be used in calculating the probability of the next word (default = 2).
|
||||
* how many sentences to output (default = 5).
|
||||
* the state size, i.e. the number of preceding words to be used in calculating the choice of the next word (default = 2).
|
||||
* a maximum sentence length, in characters.
|
||||
* the amount of (verbatim) overlap allowed between your input text and your output text.
|
||||
* that your text's sentences end with newlines rather than full-stops.
|
||||
* the amount of (verbatim) overlap allowed between input and output.
|
||||
* if your text's sentences end with newlines rather than full-stops.
|
||||
* an additional file to use for text input. you can add only one. if you want to feed a stack of files into your bank, use `mkv-this-dir`.
|
||||
* the relative weight to give to the second file if it is used.
|
||||
|
||||
as of 0.1.29 you can also specify:
|
||||
|
||||
* a URL to a text file online. (you can input something that isn't a text file but the results will be mush.)
|
||||
* a URL to a text file online. (you can input something that isn't a text file but the results will be mush or the programme will crash.)
|
||||
* an additional URL to use as text input.
|
||||
|
||||
run `mkv-this -h` to see how to use these options.
|
||||
|
||||
### mkv-this-dir: markovify a directory of text files
|
||||
|
||||
`mkv-this` can only take two files as input material each time. if you want to input a stack of files, use `mkv-this-dir`. it allows you to specify a directory and all text files in it will be used as input material.
|
||||
`mkv-this` can only take two files as input. if you want to input a stack of files, use `mkv-this-dir`. specify a directory and all text files in it will be used as input.
|
||||
|
||||
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate some files yourself from the command line, then process them:
|
||||
if for some reason you want to get a similar funtionality with `mkv-this`, you can easily concatenate files yourself from the command line, then process them:
|
||||
|
||||
* copy all your text files into a directory
|
||||
* cd into the directory
|
||||
* run `cat * > outputfile.txt`
|
||||
* run mkv-this on your newly created file: `mkv-this outputfile.txt`
|
||||
* this approach has the benefit of creating a file with encoding that mkv-this can certainly handle.
|
||||
* if `mkv-this-dir` returns lots of chars that don't display because they it can't read the encoding, try this out instead.
|
||||
|
||||
### file types
|
||||
|
||||
|
@ -82,10 +72,24 @@ you need to input plain text files. currently accepted file extensions are `.txt
|
|||
|
||||
### for best results
|
||||
|
||||
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, tags, metadata, stars, bullets, etc.), unless you want mashed versions of those things in your output.
|
||||
feed `mkv-this` large-ish amounts of well punctuated text. it works best if you bulk replace/remove as much mess as possible (URLs, code, HTML tags, metadata, stars, bullets, lines, etc.), unless you want mashed versions of those things in your output.
|
||||
|
||||
you’ll probably want to edit the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself.
|
||||
you’ll probably want to edit or select things from the output. it is very much supposed to be a kind of raw material rather than print-ready boilerplate bosh, although many bots are happily publishing such output directly. you might find that it prompts you to edit it like a bot yourself.
|
||||
|
||||
for a few further tips, see https://github.com/jsvine/markovify#basic-usage.
|
||||
|
||||
happy zaning.
|
||||
|
||||
### macos
|
||||
|
||||
it seems to run on macos too.
|
||||
|
||||
you may already have python installed. if not, you first need to install [homebrew](https://brew.sh/#install), edit your PATH so that it works, then install `python3` with `brew install python3`. if you are already running an old version of `homebrew` you might need to run `brew install python3 && brew postinstall python3` to get `python3` and `pip` running right.
|
||||
|
||||
i know nothing about macs so if you ask me for help i'll just send you random copypasta from the interwebs.
|
||||
|
||||
### todo
|
||||
|
||||
* option to also append input model to a saved JSON file. (i.e. `text_model.to_json()`, `markovify.Text.from_json()`)
|
||||
* maybe some copy in some basic webscraping boilerplate code.
|
||||
* learn how to programme.
|
||||
|
|
|
@ -29,7 +29,6 @@ import markovify
|
|||
import sys
|
||||
import argparse
|
||||
|
||||
|
||||
# argparse
|
||||
def parse_the_args():
|
||||
parser = argparse.ArgumentParser(prog="mkv-this", description="markovify one or two local or remote text files and output the results to a local text file.",
|
||||
|
@ -66,8 +65,10 @@ def parse_the_args():
|
|||
# store_true = default to False, become True if flagged.
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
# read/build/write fns:
|
||||
|
||||
|
||||
def URL(insert):
|
||||
try:
|
||||
req = requests.get(insert)
|
||||
|
@ -79,6 +80,7 @@ def URL(insert):
|
|||
print('text fetched from URL.')
|
||||
return req.text
|
||||
|
||||
|
||||
def read(infile):
|
||||
try:
|
||||
with open(infile, encoding="utf-8") as f:
|
||||
|
@ -89,31 +91,35 @@ def read(infile):
|
|||
except FileNotFoundError:
|
||||
print(fnf)
|
||||
sys.exit()
|
||||
|
||||
|
||||
|
||||
def mkbtext(texttype):
|
||||
return markovify.Text(texttype, state_size=args.state_size,
|
||||
well_formed=args.no_well_formed)
|
||||
|
||||
|
||||
def mkbnewline(texttype):
|
||||
return markovify.NewlineText(texttype, state_size=args.state_size,
|
||||
well_formed=args.no_well_formed)
|
||||
|
||||
|
||||
|
||||
def writesentence(tmodel):
|
||||
for i in range(args.sentences):
|
||||
output = open(args.outfile, 'a') # append
|
||||
# short:
|
||||
if args.length:
|
||||
if args.length:
|
||||
output.write(str(tmodel.make_short_sentence(
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n')
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n')
|
||||
# normal:
|
||||
else:
|
||||
output.write(str(tmodel.make_sentence(
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n')
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n')
|
||||
output.write(str('*\n\n'))
|
||||
output.close()
|
||||
|
||||
|
||||
# make args + fnf avail to all:
|
||||
args = parse_the_args()
|
||||
fnf = 'error: file not found. please provide a path to a really-existing \
|
||||
|
@ -121,27 +127,27 @@ fnf = 'error: file not found. please provide a path to a really-existing \
|
|||
|
||||
|
||||
def main():
|
||||
# if a -c/-C, combine it w infile/URL:
|
||||
# if a -c/-C, combine it w infile/URL:
|
||||
if args.combine or args.combine_URL:
|
||||
if args.combine:
|
||||
# get raw text as a string for both:
|
||||
# try:
|
||||
# infile is URL:
|
||||
if args.URL:
|
||||
text = URL(args.infile)
|
||||
# or normal:
|
||||
else:
|
||||
text = read(args.infile)
|
||||
# read -c file:
|
||||
ctext = read(args.combine)
|
||||
# try:
|
||||
# infile is URL:
|
||||
if args.URL:
|
||||
text = URL(args.infile)
|
||||
# or normal:
|
||||
else:
|
||||
text = read(args.infile)
|
||||
# read -c file:
|
||||
ctext = read(args.combine)
|
||||
# except FileNotFoundError:
|
||||
# print(fnf)
|
||||
# sys.exit()
|
||||
|
||||
# if -C, combine it w infile/URL:
|
||||
elif args.combine_URL:
|
||||
# try:
|
||||
# infile is URL:
|
||||
# try:
|
||||
# infile is URL:
|
||||
if args.URL:
|
||||
text = URL(args.infile)
|
||||
# or normal:
|
||||
|
@ -167,7 +173,7 @@ def main():
|
|||
[text_model, ctext_model], [1, args.weight])
|
||||
|
||||
writesentence(combo_model)
|
||||
|
||||
|
||||
# if no -c/-C, do normal:
|
||||
else:
|
||||
# Get raw text as string.
|
||||
|
@ -176,7 +182,7 @@ def main():
|
|||
text = URL(args.infile)
|
||||
# or local:
|
||||
else:
|
||||
# try:
|
||||
# try:
|
||||
text = read(args.infile)
|
||||
# except FileNotFoundError:
|
||||
# print(fnf)
|
||||
|
@ -188,8 +194,8 @@ def main():
|
|||
text_model = mkbnewline(text)
|
||||
# no --newline:
|
||||
else:
|
||||
text_model = mkbtext(text)
|
||||
|
||||
text_model = mkbtext(text)
|
||||
|
||||
writesentence(text_model)
|
||||
|
||||
print('\n: The options you used are as follows:\n')
|
||||
|
|
|
@ -1,10 +1,4 @@
|
|||
#! /usr/bin/env python3
|
||||
|
||||
import os
|
||||
import markovify
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
"""
|
||||
mkv-this-dir: input a directory, output markovified text based on all its text files.
|
||||
Copyright (C) 2020 mousebot@riseup.net.
|
||||
|
@ -27,11 +21,16 @@ import argparse
|
|||
a (very basic) script to collect all text files in a directory, markovify them and output a user-specified number of sentences to a text file.
|
||||
"""
|
||||
|
||||
import os
|
||||
import markovify
|
||||
import sys
|
||||
import argparse
|
||||
|
||||
def main():
|
||||
|
||||
# argparse for cmd line args
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
# argparse
|
||||
def parse_the_args():
|
||||
parser = argparse.ArgumentParser(prog="mkv-this-dir", description="markovify all text files in a director and output the results to a text file.",
|
||||
epilog="may you find many prophetic énoncés in your virtual bird guts! Here, this is not at all the becomings that are connected... so if you want to edit it like a bot yourself, it is trivial.")
|
||||
|
||||
# positional args:
|
||||
parser.add_argument('indir', help="the directory to extract the text of all text files from, with path.")
|
||||
|
@ -51,32 +50,49 @@ def main():
|
|||
parser.add_argument('--newline', help="sentences in input file end with newlines rather than with full stops.", action='store_true')
|
||||
# store_true = default to False, become True if flagged.
|
||||
|
||||
args = parser.parse_args()
|
||||
return parser.parse_args()
|
||||
|
||||
# read, build, write fns:
|
||||
def read(infile):
|
||||
# read, build, write fns:
|
||||
def read(infile):
|
||||
try:
|
||||
with open(infile, encoding="utf-8") as f:
|
||||
return f.read()
|
||||
except UnicodeDecodeError:
|
||||
with open(infile, encoding="latin-1") as f:
|
||||
return f.read()
|
||||
|
||||
def mkbtext(texttype):
|
||||
return markovify.Text(texttype, state_size=args.state_size,
|
||||
well_formed=args.no_well_formed)
|
||||
|
||||
def mkbnewline(texttype):
|
||||
return markovify.NewlineText(texttype, state_size=args.state_size,
|
||||
well_formed=args.no_well_formed)
|
||||
|
||||
def writesent(tmodel):
|
||||
return output.write(str(tmodel.make_sentence(
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n')
|
||||
|
||||
def writeshortsent(tmodel):
|
||||
return output.write(str(tmodel.make_short_sentence(
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n')
|
||||
except FileNotFoundError:
|
||||
print(fnf)
|
||||
sys.exit()
|
||||
|
||||
def mkbtext(texttype):
|
||||
return markovify.Text(texttype, state_size=args.state_size,
|
||||
well_formed=args.no_well_formed)
|
||||
|
||||
def mkbnewline(texttype):
|
||||
return markovify.NewlineText(texttype, state_size=args.state_size,
|
||||
well_formed=args.no_well_formed)
|
||||
|
||||
def writesentence(tmodel):
|
||||
for i in range(args.sentences):
|
||||
output = open(args.outfile, 'a') # append
|
||||
# short:
|
||||
if args.length:
|
||||
output.write(str(tmodel.make_short_sentence(
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n')
|
||||
# normal:
|
||||
else:
|
||||
output.write(str(tmodel.make_sentence(
|
||||
tries=2000, max_overlap_ratio=args.overlap,
|
||||
max_chars=args.length)) + '\n\n\n')
|
||||
output.write(str('*\n\n'))
|
||||
output.close()
|
||||
|
||||
|
||||
# make args avail to all:
|
||||
args = parse_the_args()
|
||||
|
||||
def main():
|
||||
#create a list of files to concatenate:
|
||||
matches = []
|
||||
if os.path.isdir(args.indir) is True:
|
||||
|
@ -90,17 +106,21 @@ def main():
|
|||
# place batchfile.txt in user-given directory:
|
||||
batchfile = os.path.dirname(args.indir) + os.path.sep + 'batchfile.txt'
|
||||
|
||||
# concatenate the files into batchfile.txt:
|
||||
# concatenate files into batchfile.txt:
|
||||
with open(batchfile, 'w') as outfile:
|
||||
for fname in matches:
|
||||
with open(fname, encoding="latin-1") as infile:
|
||||
outfile.write(infile.read())
|
||||
try:
|
||||
with open(fname, encoding="utf-8") as infile:
|
||||
outfile.write(infile.read())
|
||||
except UnicodeDecodeError:
|
||||
with open(fname, encoding="latin-1") as infile:
|
||||
outfile.write(infile.read())
|
||||
outfile.close()
|
||||
|
||||
# Get raw text from batchfile as string.
|
||||
text = read(batchfile)
|
||||
|
||||
# Build the model:
|
||||
# Build model:
|
||||
# if --newline:
|
||||
if args.newline:
|
||||
text_model = mkbnewline(text)
|
||||
|
@ -108,20 +128,9 @@ def main():
|
|||
else:
|
||||
text_model = mkbtext(text)
|
||||
|
||||
# Print -n number of randomly-generated sentences
|
||||
for i in range(args.sentences):
|
||||
output = open(args.outfile, 'a') # append
|
||||
# short sentence:
|
||||
if args.length:
|
||||
writeshortsent(text_model)
|
||||
# normal sentence:
|
||||
else:
|
||||
writesent(text_model)
|
||||
output.write(str('*\n\n'))
|
||||
# add a star between each appended set.
|
||||
output.close()
|
||||
writesentence(text_model)
|
||||
os.unlink(batchfile)
|
||||
|
||||
|
||||
print('\n: The options you used are as follows:\n')
|
||||
for key, value in vars(args).items():
|
||||
print(': ' + key.ljust(15, ' ') + ': ' + str(value).ljust(10))
|
||||
|
@ -132,5 +141,7 @@ def main():
|
|||
print('mkv-this ran but did NOT create an output file as requested. this is a very regrettable and dangerous situation. contact the package maintainer asap. soz!')
|
||||
|
||||
sys.exit()
|
||||
# enable for testing:
|
||||
# main()
|
||||
|
||||
# for testing:
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
|
|
2
setup.py
2
setup.py
|
@ -7,7 +7,7 @@ with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
|
|||
long_description = f.read()
|
||||
|
||||
setup(name='mkv-this',
|
||||
version='0.1.30',
|
||||
version='0.1.31',
|
||||
description='cli wrapper for markovify: take a text file, markovify, output the results to a text file.',
|
||||
long_description=long_description,
|
||||
long_description_content_type='text/markdown',
|
||||
|
|
Loading…
Reference in New Issue