Merge pull request #1 from guiraldelli/appscipt

From Python to Google Apps Script
This commit is contained in:
R. H. Gracini Guiraldelli 2015-08-30 13:45:56 +02:00
commit e38c924f74
11 changed files with 1435 additions and 138 deletions

20
LICENSE.txt Normal file
View file

@ -0,0 +1,20 @@
The MIT License (MIT)
Copyright (c) 2015 Ricardo H. Gracini Guiraldelli
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

5
README
View file

@ -1,5 +0,0 @@
cfp-bot is a app that process "Call For Papers" e-mails and automatically add the submition deadline dates to Google Calendar.
This app uses some third-party libraries, as:
- parsedatetime [ http://code.google.com/p/parsedatetime/ ]
- Google Data APIs [ http://code.google.com/p/gdata-python-client/downloads/list ]

52
README.md Normal file
View file

@ -0,0 +1,52 @@
# "Call For Papers" Bot
This *bot* has been developed as an automatic processor of *"call for papers"*
(CFP) emails received in the academic circles.
It was developed with the necessities of the
[Adaptive Technologies Lab](http://lta.poli.usp.br/) in mind by one of its
(former) students (me), not as a final product for widespread use. However,
anyone is free to try it at his/hers own risk and, even better, collaborate for
improvements on the existing code.
## License
So far, it is still licensed under
[MIT License](https://www.tldrlegal.com/l/mit). Maybe, someday, it will be
even more open.
## Acknowledgments
We make use of [Datejs](http://www.datejs.com/) library for parsing of the
several human ways of writing dates. The library is licensed under
the [MIT License](https://www.tldrlegal.com/l/mit).
## The Old Version
This repository is originally "the house" of the old version written in Python.
If you are used with GitHub and git, feel free to navigate and learn from it.
Nonetheless...
## The New Version
The current (new) version of this bot is a port of the old code from Python to
Javascript—or, more correctly,
[Google Apps Script](https://developers.google.com/apps-script/).
Why? Very simple: because the bot is totally based in Google products and so we
could use the Google Apps Script to trigger our bot automatically, moving to
Google the responsibility of maintaining our `cron` jobs. Simple like this.
### It is **not** perfect
I know it is not a perfect bot, but it does an amazing job in a very simple way.
Could I have used machine learning? Yes.
Could I have used neural network? Yes.
Could I have used Bayesian networks? Yes.
Could I have used [mechanical turks](https://en.wikipedia.org/wiki/The_Turk)?
Yes, I could.
But, believe me, it was a night (or two) project to solve a problem we had and
was making us crazy: full inbox of CFP emails.
I am very interested in using more intelligent approaches to solve this problem,
but the probability it will happen tends to **zero**.
Anyhow, your collaboration, fork or whatever is welcome! :smile:

5
calendar_processor.js Normal file
View file

@ -0,0 +1,5 @@
// connects to Google Calendar and creates an event in the default calendar
// of the account
function create_event(title, date){
return CalendarApp.getDefaultCalendar().createAllDayEvent(title, date);
}

View file

@ -1,133 +0,0 @@
#!/usr/bin/env python
# Copyright (c) 2011, R. H. Gracini Guiraldelli <rguira@acm.org>
# All rights reserved.
# Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
# Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
# Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
# Neither the name of the R. H. Gracini Guiraldelli nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import sys
import re
import parsedatetime.parsedatetime as pdt
import parsedatetime.parsedatetime_consts as pdc
import gdata.calendar
import gdata.calendar.service
import gdata.service
import atom
import time
import imaplib
import array
def load_file(filename):
fp = None
try:
fp = open(filename, "r")
except:
sys.stderr.write("File not found!\n")
return fp
def find_key_line(line):
pattern = r"((call\s+for\s+(paper|papers))|submission|deadline)"
regex = re.compile(pattern, re.IGNORECASE | re.UNICODE)
found = regex.search(line)
return found
def find_date(line):
# FIXME: Let global these regex components
# FIXME: define a config file where these regex could be
pattern = r"((\d{1,2}/\d{1,2}/(\d{4}|\d{2}))|(\d{4}-\d{2}-\d{2})|(\d{1,2}(st|nd|rd|th)*\s*(Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)(,)?\s*\d{1,2}(st|nd|rd|th)*\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*\d{1,2}(st|nd|rd|th)*\s*(,)?\s*(\d{4}|\d{2})))"
regex = re.compile(pattern, re.IGNORECASE | re.UNICODE)
found = regex.search(line)
return found
def connect_google_calendar(username, password):
calendar_service = gdata.calendar.service.CalendarService()
calendar_service.email = username
calendar_service.password = password
calendar_service.source = "Call For Papers App - Beta Version"
calendar_service.ProgrammaticLogin()
return calendar_service
def create_calendar_event(calendar_service, title, date):
event = gdata.calendar.CalendarEventEntry()
event.title = atom.Title(text=title)
content = "Deadline for submission @ " + title
event.content = atom.Content(text=content)
start_time = time.strftime('%Y-%m-%d', date)
event.when.append(gdata.calendar.When(start_time = start_time, end_time = start_time))
new_event = calendar_service.InsertEvent(event, '/calendar/feeds/default/private/full')
return new_event
def connect_imap_server(username, password):
imap_connection = imaplib.IMAP4_SSL('imap.gmail.com', 993)
imap_connection.login(username,password)
return imap_connection
def disconnect_imap_server(imap_connection):
imap_connection.close()
imap_connection.logout()
def processing_emails(imap_connection):
print ">> Processing e-mails..."
status, count = imap_connection.select('[Gmail]/All Mail')
print "\tYou have %s in the '[Gmail]/All Mail' folder." % (count)
status, found = imap_connection.search(None, '(UNSEEN)')
print "\tAnd you have %d unseen e-mail in that folder." % (len(found[0].strip()))
regex = re.compile('(?<=(Subject:\s))(.*)')
i = 0
mails = []
for mail_number in found[0].strip():
try:
status, data = imap_connection.fetch(mail_number, '(BODY[HEADER])')
mail_header = regex.search(data[0][1])
single_mail = []
single_mail.append(mail_header.group(0).strip())
status, data = imap_connection.fetch(mail_number, '(BODY[TEXT])')
single_mail.append(data[0][1])
if ( (single_mail[0] == '') or (single_mail[0] == None) or (single_mail[1] == '') or (single_mail[1] == None) ):
print ">> WARNING: Message could not be processed. Flaggind it! <<"
imap_connection.store(mail_number, '+FLAGS', '\\Flagged')
mails.append(single_mail)
except:
print ">> WARNING: Could not fetch message of number %s <<" % (mail_number)
return mails
def process_dates(mails):
# TODO: I must improve it: what if it does not parse the date? The array index will go wrong!
dates = []
c = pdc.Constants()
date_parser = pdt.Calendar(c)
for single_mail in mails:
parsed_date = None
matched_line = find_key_line(single_mail[1])
matched_date = find_date(single_mail[1])
if ( (matched_line != None) and (matched_date != None) ):
parsed_date = date_parser.parseDateText(matched_date.group(0))
else:
pass
dates.append(parsed_date)
return dates
def main():
username = raw_input("E-mail address: ")
password = raw_input("Password: ")
imap_connection = connect_imap_server(username, password)
mails = processing_emails(imap_connection)
disconnect_imap_server(imap_connection)
dates = process_dates(mails)
calendar_service = connect_google_calendar(username, password)
for i in range(0,len(dates)):
single_mail = mails[i]
date = dates[i]
print "\t Subject: '%s' and Date: '%s'" % (single_mail[0], date)
try:
new_event = create_calendar_event(calendar_service, single_mail[0], date)
except:
print ">> ERROR: Event could not be added to Google Calendar! <<"
if __name__ == "__main__":
main()

33
date_processor.js Normal file
View file

@ -0,0 +1,33 @@
// regex pattern for finding the dates
var REGEX_DATE = /((\d{1,2}\/\d{1,2}\/(\d{4}|\d{2}))|(\d{4}-\d{2}-\d{2})|(\d{1,2}(st|nd|rd|th)*\s*(Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)(,)?\s*\d{1,2}(st|nd|rd|th)*\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*\d{1,2}(st|nd|rd|th)*\s*(,)?\s*(\d{4}|\d{2})))/i;
// ISO format for dates
var DATE_ISO_FORMAT = "yyyy-MM-dd";
// given a date represented as string, returns the date as a Date Javascript
// object using the Datejs library
function get_date(string_date){
return Date.parse(string_date);
}
// gets a mathced date from regex and converts to a string in the ISO format
// using the Datejs library
function get_iso_date(matched_date){
return Date.parse(matched_date).toString(DATE_ISO_FORMAT);
}
// returns the matched date found in the line
function get_literal_date(line){
var match = line.match(REGEX_DATE);
if (match.length > 0){
return match[0];
}
else{
return null;
}
}
// verifies if a line contains a date
// returning a true value in positive case
function has_date(line){
return REGEX_DATE.test(line);
}

1230
datejs.js Normal file

File diff suppressed because it is too large Load diff

39
email_connector.js Normal file
View file

@ -0,0 +1,39 @@
var UNREAD_THREADS_QUERY = "is:unread";
// get an array unread threads from GMail
function get_unread_threads(){
return GmailApp.search(UNREAD_THREADS_QUERY);
}
// given an array of unread GMail threads, composes an array of unread messages
function get_unread_messages(unread_threads){
return unread_threads.reduce(reduce_unread_messages, []);
}
// reduce function in which all unread messages are merged in a single array
function reduce_unread_messages(previousValue, currentValue){
// var messages = currentValue.getMessages();
// var unread = messages.filter(is_message_unread);
// return previousValue.concat(unread);
return previousValue.concat(currentValue.getMessages().filter(is_message_unread));
}
// filter function which says if a GMail message is unread or not
function is_message_unread(gmail_message){
return gmail_message.isUnread();
}
// returns the plain body text of a GMail message
function get_message_text(gmail_message){
return gmail_message.getPlainBody();
}
// returns the subject text of a GMail message
function get_subject_text(gmail_message){
return gmail_message.getSubject();
}
// main function, which returns the plain text body of all unread GMail messages
function get_body_all_unread_messages(){
return get_unread_messages(get_unread_threads()).map(get_message_text);
}

42
email_processor.js Normal file
View file

@ -0,0 +1,42 @@
// regex patter for finding the "key line" that classifies the email as "call for papers"
var REGEX_KEY_LINE = /((call\s+for\s+(paper|papers))|submission|deadline)/i; //ignore case
// regex for the line which contains paper submission deadline information
var REGEX_PAPER_DEADLINE = /(paper(s)?)*(submission|deadline)(paper(s)?)*/i;
// regex pattern of forward text in subject of emails
var REGEX_FORWARD = /^\s*(fw(d)?|en(c)?):\s*/i;
// verifies if a line contains the information of a call for paper email,
// returning a true value in positive case
function has_key_line(line){
return REGEX_KEY_LINE.test(line);
}
// verifies if the line contain the keywords for paper submission deadline date,
// returning a true value in positive case
function is_paper_deadline(line){
return REGEX_PAPER_DEADLINE.test(line);
}
// removes forward text in subject
function remove_forward(subject_text){
return subject_text.replace(REGEX_FORWARD, EMPTY_STRING);
}
// takes a GmailMessage object and process it, extracting
function process_email(gmail_message){
var subject = remove_forward(get_subject_text(gmail_message));
var lines_of_interest = break_lines(get_message_text(gmail_message)).filter(has_date).filter(is_paper_deadline);
// process only one entry of lines of interest
if (lines_of_interest.length > 0){
var date = get_date(get_literal_date(lines_of_interest[0]));
var calendar_event = create_event(subject, date);
if (calendar_event == null){
Logger.log("It was not possible to create an event with the following details:\n\tSubject: %s\n\tDate: %s", subject, date);
gmail_message.star();
}
else{
gmail_message.markRead();
}
}
}

6
main.js Normal file
View file

@ -0,0 +1,6 @@
// main function, which gets all unread GMail messages and process them all
function main(){
Logger.log("Initiating the process of the call for papers emails...");
get_unread_messages(get_unread_threads()).map(process_email);
Logger.log("Execution is over.");
}

8
utils.js Normal file
View file

@ -0,0 +1,8 @@
// empty string
var EMPTY_STRING = "";
// modified from http://beckism.com/2010/09/splitting-lines-javascript/
var LINE_BREAKS = /^.*((\r\n|\n|\r)|$)/gm;
function break_lines(string){
return string.match(LINE_BREAKS);
}