An interactive CLI tool for choosing CSS selectors. Designed for use with BeautifulSoup and Scrapy. This project uses the BeautifulSoup and Textualize.rich libraries to create an interactive element selecting experience. It can be run as program or used as a library.
Go to file
Christopher Lane 319f6d3abb Add gif to readme 2023-12-09 20:18:00 -05:00
.gitignore Initial commit 2023-12-09 14:59:27 -05:00
LICENSE Initial commit 2023-12-09 14:59:27 -05:00
README.md Add gif to readme 2023-12-09 20:18:00 -05:00
__init__.py Initial commit 2023-12-09 20:15:22 -05:00
csschooser.py Initial commit 2023-12-09 20:15:22 -05:00
csschooser_demo_2023-12-09 20-12.gif Initial commit 2023-12-09 20:15:22 -05:00
example.py Initial commit 2023-12-09 20:15:22 -05:00
requirements.txt Initial commit 2023-12-09 20:15:22 -05:00
test.html Initial commit 2023-12-09 20:15:22 -05:00
test_csschooser.py Initial commit 2023-12-09 20:15:22 -05:00

README.md

csschooser

Demo

demo

Description:

An interactive CLI tool for choosing CSS selectors for a web page. Designed for use as a library with BeautifulSoup and Scrapy.

This project uses the BeautifulSoup and rich libraries to create an interactive element-selecting experience. It can be run as program or used as a library.

Created as a final project for the CS50P course.

Prerequisites

This project was made using Python 3.10.12 and pip 22.0.2. See requirements.txt for module information.

Installation

Using Git:

git clone https://github.com/Makaze/csschooser.git
cd csschooser
pip install -r requirements.txt

Usage

On the Command Line:

$ python3 csschooser.py

As A Library:

Example using the BeautifulSoup library to print the text from all matching elements:

import csschooser

soup = csschooser.get_soup("http://github.com/Makaze/csschooser") # Example URLexit

selector = csschooser.interactive_select(soup)

for tag in soup.select(selector):
    print(tag.get_text().strip())

API / Documentation

get_soup(name):

Takes in a string name and returns a BeautifulSoup instance based on the contents of the file or URL named name. Raises a FileNotFoundError if name is neither a valid URL nor a valid file name.

get_regex(s):

Takes in a string s and returns a Regular Expression pattern as a string for matching the outermost element in s. Returns s unchanged if it contains no elements.

interactive_select(soup):

Takes in soup as a BeautifulSoup instance and prompts the user to enter a CSS selector. Matching elements are highlighted in an auto-scrolling output window. Clears the terminal screen and returns the last chosen selector when the user follows the prompt to exit.

clear(lines):

Takes in an int lines. If lines is >= 1, moves the cursor up and to the end of the line lines times and returns the resulting backtrack sequence as a string. Otherwise calls the system's clear terminal command, clearing the terminal screen, then returns False.

paginate(console, pretty):

Takes in console as a rich.Console instance and pretty as a string, then passes pretty to the console and sends the rich string to the system's pager utility (less for Linux systems).