initial commit of Apps Script version

Deletion of former, Python code, and addition of the initial files of
the Google Apps Script version.
All the retrieval of unread emails is complete, but processing of emails
is just starting and no calendar handle is done.
This commit is contained in:
Ricardo Henrique Gracini Guiraldelli 2015-08-27 00:52:40 +03:00
parent b59e205320
commit 8c7d23ef7b
5 changed files with 102 additions and 138 deletions

5
README
View file

@ -1,5 +0,0 @@
cfp-bot is a app that process "Call For Papers" e-mails and automatically add the submition deadline dates to Google Calendar.
This app uses some third-party libraries, as:
- parsedatetime [ http://code.google.com/p/parsedatetime/ ]
- Google Data APIs [ http://code.google.com/p/gdata-python-client/downloads/list ]

47
README.md Normal file
View file

@ -0,0 +1,47 @@
# "Call For Papers" Bot
This *bot* has been developed as an automatic processor of *"call for papers"*
(CFP) emails received in the academic circles.
It was developed with the necessities of the
[Adaptive Technologies Lab](http://lta.poli.usp.br/) in mind by one of its
(former) students (me), not as a final product for widespread use. However,
anyone is free to try it at his/hers own risk and, even better, collaborate for
improvements on the existing code.
## License
So far, it is still licensed under
[BSD 3-Clause](https://www.tldrlegal.com/l/bsd3). Maybe, someday, it will be
even more open.
## The Old Version
This repository is originally "the house" of the old version written in Python.
If you are used with GitHub and git, feel free to navigate and learn from it.
Nonetheless...
## The New Version
The current (new) version of this bot is a port of the old code from Python to
Javascript—or, more correctly,
[Google Apps Script](https://developers.google.com/apps-script/).
Why? Very simple: because the bot is totally based in Google products and so we
could use the Google Apps Script to trigger our bot automatically, moving to
Google the responsibility of maintaining our `cron` jobs. Simple like this.
### It is **not** perfect
I know it is not a perfect bot, but it does an amazing job in a very simple way.
Could I have used machine learning? Yes.
Could I have used neural network? Yes.
Could I have used Bayesian networks? Yes.
Could I have used [mechanical turks](https://en.wikipedia.org/wiki/The_Turk)?
Yes, I could.
But, believe me, it was a night (or two) project to solve a problem we had and
was making us crazy: full inbox of CFP emails.
I am very interested in using more intelligent approaches to solve this problem,
but the probability it will happen tends to **zero**.
Anyhow, your collaboration, fork or whatever is welcome! :smile:

View file

@ -1,133 +0,0 @@
#!/usr/bin/env python
# Copyright (c) 2011, R. H. Gracini Guiraldelli <rguira@acm.org>
# All rights reserved.
# Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
# Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
# Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
# Neither the name of the R. H. Gracini Guiraldelli nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import sys
import re
import parsedatetime.parsedatetime as pdt
import parsedatetime.parsedatetime_consts as pdc
import gdata.calendar
import gdata.calendar.service
import gdata.service
import atom
import time
import imaplib
import array
def load_file(filename):
fp = None
try:
fp = open(filename, "r")
except:
sys.stderr.write("File not found!\n")
return fp
def find_key_line(line):
pattern = r"((call\s+for\s+(paper|papers))|submission|deadline)"
regex = re.compile(pattern, re.IGNORECASE | re.UNICODE)
found = regex.search(line)
return found
def find_date(line):
# FIXME: Let global these regex components
# FIXME: define a config file where these regex could be
pattern = r"((\d{1,2}/\d{1,2}/(\d{4}|\d{2}))|(\d{4}-\d{2}-\d{2})|(\d{1,2}(st|nd|rd|th)*\s*(Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)(,)?\s*\d{1,2}(st|nd|rd|th)*\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*\d{1,2}(st|nd|rd|th)*\s*(,)?\s*(\d{4}|\d{2})))"
regex = re.compile(pattern, re.IGNORECASE | re.UNICODE)
found = regex.search(line)
return found
def connect_google_calendar(username, password):
calendar_service = gdata.calendar.service.CalendarService()
calendar_service.email = username
calendar_service.password = password
calendar_service.source = "Call For Papers App - Beta Version"
calendar_service.ProgrammaticLogin()
return calendar_service
def create_calendar_event(calendar_service, title, date):
event = gdata.calendar.CalendarEventEntry()
event.title = atom.Title(text=title)
content = "Deadline for submission @ " + title
event.content = atom.Content(text=content)
start_time = time.strftime('%Y-%m-%d', date)
event.when.append(gdata.calendar.When(start_time = start_time, end_time = start_time))
new_event = calendar_service.InsertEvent(event, '/calendar/feeds/default/private/full')
return new_event
def connect_imap_server(username, password):
imap_connection = imaplib.IMAP4_SSL('imap.gmail.com', 993)
imap_connection.login(username,password)
return imap_connection
def disconnect_imap_server(imap_connection):
imap_connection.close()
imap_connection.logout()
def processing_emails(imap_connection):
print ">> Processing e-mails..."
status, count = imap_connection.select('[Gmail]/All Mail')
print "\tYou have %s in the '[Gmail]/All Mail' folder." % (count)
status, found = imap_connection.search(None, '(UNSEEN)')
print "\tAnd you have %d unseen e-mail in that folder." % (len(found[0].strip()))
regex = re.compile('(?<=(Subject:\s))(.*)')
i = 0
mails = []
for mail_number in found[0].strip():
try:
status, data = imap_connection.fetch(mail_number, '(BODY[HEADER])')
mail_header = regex.search(data[0][1])
single_mail = []
single_mail.append(mail_header.group(0).strip())
status, data = imap_connection.fetch(mail_number, '(BODY[TEXT])')
single_mail.append(data[0][1])
if ( (single_mail[0] == '') or (single_mail[0] == None) or (single_mail[1] == '') or (single_mail[1] == None) ):
print ">> WARNING: Message could not be processed. Flaggind it! <<"
imap_connection.store(mail_number, '+FLAGS', '\\Flagged')
mails.append(single_mail)
except:
print ">> WARNING: Could not fetch message of number %s <<" % (mail_number)
return mails
def process_dates(mails):
# TODO: I must improve it: what if it does not parse the date? The array index will go wrong!
dates = []
c = pdc.Constants()
date_parser = pdt.Calendar(c)
for single_mail in mails:
parsed_date = None
matched_line = find_key_line(single_mail[1])
matched_date = find_date(single_mail[1])
if ( (matched_line != None) and (matched_date != None) ):
parsed_date = date_parser.parseDateText(matched_date.group(0))
else:
pass
dates.append(parsed_date)
return dates
def main():
username = raw_input("E-mail address: ")
password = raw_input("Password: ")
imap_connection = connect_imap_server(username, password)
mails = processing_emails(imap_connection)
disconnect_imap_server(imap_connection)
dates = process_dates(mails)
calendar_service = connect_google_calendar(username, password)
for i in range(0,len(dates)):
single_mail = mails[i]
date = dates[i]
print "\t Subject: '%s' and Date: '%s'" % (single_mail[0], date)
try:
new_event = create_calendar_event(calendar_service, single_mail[0], date)
except:
print ">> ERROR: Event could not be added to Google Calendar! <<"
if __name__ == "__main__":
main()

39
email_connector.js Normal file
View file

@ -0,0 +1,39 @@
var UNREAD_THREADS_QUERY = "is:unread";
// get an array unread threads from GMail
function get_unread_threads(){
return GmailApp.search(UNREAD_THREADS_QUERY);
}
// given an array of unread GMail threads, composes an array of unread messages
function get_unread_messages(unread_threads){
return unread_threads.reduce(reduce_unread_messages, []);
}
// reduce function in which all unread messages are merged in a single array
function reduce_unread_messages(previousValue, currentValue){
// var messages = currentValue.getMessages();
// var unread = messages.filter(is_message_unread);
// return previousValue.concat(unread);
return previousValue.concat(currentValue.getMessages().filter(is_message_unread));
}
// filter function which says if a GMail message is unread or not
function is_message_unread(gmail_message){
return gmail_message.isUnread();
}
// returns the plain body text of a GMail message
function get_message_text(gmail_message){
return gmail_message.getPlainBody();
}
// main function, which returns the plain text body of all unread GMail messages
function get_body_all_unread_messages(){
return get_unread_messages(get_unread_threads()).map(get_message_text);
}
// debug function
function debug(){
Logger.log("Unread messages:\n%s", get_body_all_unread_messages());
}

16
email_processor.js Normal file
View file

@ -0,0 +1,16 @@
// regex patter for finding the "key line" that classifies the email as "call for papers"
var regex_key_line = /((call\s+for\s+(paper|papers))|submission|deadline)/i; //ignore case
// regex pattern for finding the dates
// var regex_date = /((\d{1,2}/\d{1,2}/(\d{4}|\d{2}))|(\d{4}-\d{2}-\d{2})|(\d{1,2}(st|nd|rd|th)*\s*(Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)(,)?\s*\d{1,2}(st|nd|rd|th)*\s*(\d{4}|\d{2}))|((Jan|January|Feb|February|Mar|March|Apr|April|May|Jun|June|Jul|July|Aug|August|Sep|September|Oct|October|Nov|November|Dec|December)\s*\d{1,2}(st|nd|rd|th)*\s*(,)?\s*(\d{4}|\d{2})))/i;
function find_key_line(line){
return regex_key_line.test(line);
}
// returns -1 if no match is found
// otherwise, returns the index of the match
// see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/search
// or https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions?redirectlocale=en-US&redirectslug=JavaScript%2FGuide%2FRegular_Expressions#Working_with_regular_expressions
function find_date(line){
return line.search(regex_date);
}