From: Rick van Rein Date: Thu, 13 Feb 2014 14:37:34 +0000 (+0000) Subject: First working version of midget/midput; for now, configure IMAP server manually X-Git-Url: http://git.arpa2.org/?p=midget;a=commitdiff_plain First working version of midget/midput; for now, configure IMAP server manually --- f06f59fa85d3b7551e9bda08c78a59037ac38bcc diff --git a/README.md b/README.md new file mode 100644 index 0000000..8351ce3 --- /dev/null +++ b/README.md @@ -0,0 +1,149 @@ +# Midget: Commandline interface to IMAP documents + +> *The midget and midput commandline utilities let you download and +> upload email attachments from/to your IMAP mailbox, almost as if you +> are using SCP or a similar cross-platform copying command.* + +You know about the shepherd boy who slayed a giant with his sling shot? +Keep reading... + +## What is this? + +Midget solves a problem that is so common that you might not have +noticed it. But it can really aid the automation of our daily tasks. + +You probably recognise this workflow: + +1. You receive a document or package over email + +2. You save the attachment + +3. You move it to another system + +4. On that system, you process it further (unpackaging, saving, + printing, …) + +This flow can be greatly simplified. On the shell of the target machine, +you can simply run: + + midget mid:opq cid:stu mid:uvw/xyz... + +This will retrieve documents with `mid:` or `cid:` URIs from your IMAP +account. These URIs are used to mark documents with a MIME-Type, and are +defined as pragmatically unique identities generated on the sending +site. They may occur in multiple places, but since these ought to be the +same only one will be retrieved by midget. MIME markings such as +filename proposals are taken into account; transport encoding is undone +but content encoding is not. + +There is a counterpart to this command: + + midput file1 file2 file3 + +This will construct a new draft email in your IMAP account. The new +draft will have the given files as attachments. You can access the draft +in your graphical browser to enter what gave you the idea to send these +files. + +## What is required to make this work? + +These commands rely on Kerberos Single SignOn. Meaning, you are in a +shell account that shows a non-expired principal ticket when you run: + + klist + +If not, logon to your REALM first, using: + + kinit + +Your principal name is of the form `user@REALM` and midget and midput +interpret the REALM part as a DNS domain name. Under this domain name, +it will look for a `_kerberos` TXT record to confirm the case-sensitive +REALM value. Note that letters in this record are translated to +uppercase, unless they are escaped with a single '=' character prefix. + +There is some upheaval about the reliability of DNS for this sort of +lookup; midget and midput will mostly be used on locally hosted domains, +and if not then it is nowadays a good practice to use DNSSEC to overcome +such problems. + +Given DNS confirmation, the derived domain name is queried for an IMAP +server as declared in SRV records. This server is then approached, +Kerberos-based authentication is performed. During authentication, a +login name is sent to the IMAP server. For `user/detail@REALM` names, +this will be detail; for `user@REALM` names, this will be user. For any +other forms (if they exist at all), this will be the local account name +under which you are logged in. + +**TODO:** For now, you need to manually configure the IMAP server name +in the midget and midput scripts. They are set in the variable +remote\_hostname. + +## Where do the filenames come from? + +Each of the retrieved files is stored in the filesystem. The tool will +create a file named after the `Message-` or `Content-ID`; if it already +exists it will fail and complain; you probably downloaded the same +content twice, and if not then this may be a sign that the sender is +trying to overwrite local files on your system, which is a warning sign. +If a filename is found in the descriptive information sent along the +attachmant, then midget will try to create a link with that filename as +well; this will fail with a non-fatal warning if the filename exists. + +Note that the unique value of the `Message-` or `Content-ID` is the +reason to not be cautious when writing a file by that name. It would be +incredibly unlikely that you had a file on your system with the same +name. And yes, some thought is necessary to avoid local files +overwritten by submitted devious content. So perhaps it is better to +raise a fatal error if the name already exists? + +## Desktop tool support + +Mailers may hide the `mid:` and `cid:` URIs from you, the silly mouse +operator. It would be advised to use these URIs as copy/paste format, as +well as what gets pasted into a shell when an attechment is dragged into +it. + +Maybe this will be a battle, maybe not. As so often, the commandline is +leading the way forward, and welcoming the GUI to follow suit ;-) + +It is certainly useful to be able to use `mid:` and `cid:` references in +chat sessions, to refer back to a previously exchanged email attachment. + +## Generating these URIs + +If you are building a tool that handles email messages, you should +consider offering a clickable `mid:` URI to access the message body and +any attachments. + +Operating systems and browsers generally support new URI schemes with a +registry, and applications can be setup to register against those. It is +easily imaginable that desktop versions of midget are created to handle +such downloads. + +Another advantage of visible `mid:` URIs is that they can be copy/pasted +or dragged into shells, chat sessions, and so on. This makes them usable +to crossover to remote locations, where the files can then be used. You +may not want to offer this is the only option, but it certainly is a +useful option to offer to end users. + +You should not generate `cid:` URIs if you can avoid it. The `mid:` +format includes the `Message-ID`, and that is a bit more work but it +saves a lot of search time when downloading it. This is due to the IMAP +protocol, which has special constructs for matches with header names +such as the `Message-ID`, but not for MIME-headers such as `Content-ID`. +The latter must therefore be resolved with full-text search, and that is +not open to optimisations like the `Message-ID` is. + +Read [RFC 2392][] for details; but in short, you will be removing +angular brackets and applying percent-escaping to the remainder. Be sure +to also escape any slash that occurs in the string. If you are setting +up a `mid:` for a message body, then this is all; for an attachment, the +same procedure should be applied to the `Content-ID` header and it +should be attached with a separating slash. + +Finally, application environments (so, operating systems and browsers) +usually are capable of doing something useful with a MIME-type. This is +indeed taken into account. + + [RFC 2392]: https://tools.ietf.org/html/rfc2392 diff --git a/TODO.md b/TODO.md new file mode 100644 index 0000000..83cdf05 --- /dev/null +++ b/TODO.md @@ -0,0 +1,14 @@ +TODO for Midget +=============== + +This was a useful idea for a utility that came up when I was getting to +learn to use Python's imaplab. The learning path is somewhat visible in +the current code, which could therefore use a bit of a cleanup. + +I suppose I will leave it like this, until I feel confident with imaplib +and don't mind erasing the history to... well, the historic traces in git. + +What should still be done, is locating the IMAP server by way of SRV +records, based on Kerberos-to-DNS changes. For now, set the remote_hostname +in the script to your local IMAP server address. + diff --git a/midget.py b/midget.py new file mode 100755 index 0000000..21b63ca --- /dev/null +++ b/midget.py @@ -0,0 +1,279 @@ +#!/usr/bin/env python +# +# Kerberos login to IMAP server and extraction of provided cid: and mid: URIs. +# +# From: Rick van Rein + + +import os +import sys +from base64 import b64encode, b64decode +import imaplib +import urllib + +import kerberos + + +class SASLTongue: + + def __init__ (self): + self.ctx = None + self.complete = False + + def wrap (self, plaintext): + """Once a GSSAPI Context is complete, it can wrap plaintext + into ciphertext. This function operates on binary strings. + """ + kerberos.authGSSClientWrap (self.ctx, b64encode (plaintext)) + cipherdata = kerberos.authGSSClientResponse (self.ctx) + return (b64decode (cipherdata) if cipherdata else "") + + def unwrap (self, ciphertext): + """Once a GSSAPI Context is complete, it can unwrap ciphertext + into plaintext. This function operates on binary strings. + """ + kerberos.authGSSClientUnwrap (self.ctx, b64encode (ciphertext)) + return b64decode (kerberos.authGSSClientResponse (self.ctx)) + + def processor (self, hostname): + # Currying function (needed to bind 'self') + def step (rcv): + #DEBUG# print 'New Call with Complete:', self.complete + #DEBUG# print 'Received:', '"' + b64encode (rcv) + '"' + if not self.complete: + # Initiate the GSSAPI Client + #ALT# rc, self.ctx = kerberos.authGSSClientInit ('imap@' + hostname, gssflags=kerberos.GSS_C_SEQUENCE_FLAG) + #STD# rc, self.ctx = kerberos.authGSSClientInit ('imap@' + hostname) + if not self.ctx: + rc, self.ctx = kerberos.authGSSClientInit ('imap@' + hostname) + rc = kerberos.authGSSClientStep (self.ctx, b64encode (rcv)) + #DEBUG# print 'ClientStep Result Code:', ['CONTINUE', 'COMPLETE'] [rc] + if rc == kerberos.AUTH_GSS_COMPLETE: + self.complete = True + # if rc != 0: + # print 'Error making a step' + # return None + snd = kerberos.authGSSClientResponse (self.ctx) + return (b64decode (snd) if snd else "") + else: + # Unwrap and interpret the information token + rc = kerberos.authGSSClientUnwrap (self.ctx, b64encode (rcv)) + # if rc != 0: + # print 'Error unwrapping' + # return None + token = b64decode (kerberos.authGSSClientResponse (self.ctx)) + if len (token) != 4: + #DEBUG# print 'Error unwrapping token after GSSAPI handshake' + return None + flags = ord (token [0]) + #DEBUG# print 'Flags:', '0x%02x' % flags + if flags & kerberos.GSS_C_INTEG_FLAG: + pass #DEBUG# print 'Integrity Supported' + if flags & kerberos.GSS_C_CONF_FLAG: + pass #DEBUG# print 'Confidentialtiy Supported' + maxlen = (ord (token [1]) << 16) | (ord (token [2]) << 8) | (ord (token [3])) + #DEBUG# print 'Maxlen:', maxlen + rettok = (chr (0) * 4) + 'ofo' + return self.wrap (rettok) + # kerberos.authGSSClientWrap (self.ctx, b64encode (rettok)) + # snd = kerberos.authGSSClientResponse (self.ctx) + # return (b64decode (snd) if snd else "") + + # The Currying surroundings return the internal function + # This is a strange necessity due to the IMAP assumption + # that it can call a closure, or a stateless function. + # What a lot of work to evade global variables... and it's + # all due to an ill-designed API, I think. + return step + + def clientname (self): + return kerberos.authGSSClientUserName (self.ctx) + + + +# +# Check the commandline +# +if len (sys.argv) < 2: + sys.stderr.write ('Usage: ' + sys.argv [0] + ' mid:... cid:...\n\tTo retrieve the mid: and/or cid: URIs from your IMAP mailbox\nAuthentication and mailbox identities use your current Kerberos ticket\n') + sys.exit (1) + +# +# Turn the commandline into (messageid,contentid) pairs +# +todo = [ ] +def alsodo (todo, mid=None, cid=None): + if mid: + mid = '<' + urllib.unquote (mid) + '>' + if cid: + cid = '<' + urllib.unquote (cid) + '>' + todo.append ( (mid,cid) ) + +for arg in sys.argv [1:]: + if arg [:4].lower () == 'mid:': + slashpos = arg.find ('/') + if slashpos > 0: + alsodo (todo, mid=arg [4:slashpos], cid=arg [slashpos+1:]) + else: + alsodo (todo, mid=arg [4:]) + elif arg [:4].lower () == 'cid:': + alsodo (todo, cid=arg [4:]) + else: + sys.stderr.write ('You should only use mid:... and cid:... arguments, see RFC 2392\n') + sys.exit (1) + #DEBUG# print 'Searching for', todo [-1] + +remote_hostname = 'popmini.opera' + +im = imaplib.IMAP4 (remote_hostname, 143) +authctx = SASLTongue () +authcpu = authctx.processor (remote_hostname) +#DEBUG# print 'AuthCPU:', authcpu, '::', type (authcpu) +im.authenticate ('GSSAPI', authcpu) + +print 'Accessing IMAP as', authctx.clientname () + +ok,msgs = im.select () +if ok != 'OK': + sys.stderr.write ('Failed to select INBOX\n') + sys.exit (1) + +for (mid,cid) in todo: + #DEBUG# print 'Retrieving', (mid,cid) + if mid: + # This is relatively quick, Content-ID is much slower, even + # as an _added_ conition (huh... Dovecot?!?) + cc = '(HEADER Message-ID "' + mid + '")' + else: + # Strange... no MIME-header search facilities in IMAP4rev1?!? + cc = '(TEXT "' + cid + '")' + #DEBUG# print 'Search criteria:', cc + ok,findings = im.uid ('search', None, cc) + if ok != 'OK': + sys.stderr.write ('Failed to search\n') + sys.exit (1) + #DEBUG# print 'Found the following:', findings + for uid in findings: + #DEBUG# print 'Looking up UID', uid + ok,data = im.uid ('fetch', uid, 'BODYSTRUCTURE') + if ok != 'OK': + sys.stderr.wrote ('Error fetching body structure') + sys.exit (1) + #DEBUG# print 'Found', data + stack = [ ] + parsed = [ ] + if not data [0]: + sys.stderr.write ('Failed to locate content\n') + sys.exit (1) + unquoted = data [0].split ('"') + for i in range (len (unquoted)): + if i & 0x0001 == 0: + # Even entries are unquoted + w = unquoted [i] + modulus = len (w) + 3 + while w != '': + brapos = min (w.find ('(') % modulus, w.find (')') % modulus, w.find (' ') % modulus) + if brapos > 0: + if w [:brapos] == 'NIL': + parsed.append (None) + else: + parsed.append (w [:brapos]) + if w [brapos] == '(': + # Push on stack + stack.append (parsed) + parsed = [ ] + if w [brapos] == ')': + # Pop from stack + tail = parsed + parsed = stack.pop () + parsed.append (tail) + w = w [brapos+1:] + else: + # Quoted word -- pass literally + parsed.append (unquoted [i]) + # print 'Parsed it into', parsed + bodystructure = parsed [1] [3] + #DEBUG# print 'Body structure:', bodystructure + def printbody (bs, indents=0): + subs = True + for i in range (len (bs)): + if type (bs [i]) == type ([]): + if subs: + printbody (bs [i], indents=indents+1) + else: + print ' ' * indents + '{%02d}' % i + else: + # subs = False + print ' ' * indents + '[%02d]' % i, bs [i] + #DEBUG# printbody (bodystructure) + + def matchcid (bs, cid, accupar, path=[]): + subs = True + for i in range (len (bs)): + if type (bs [i]) == type ([]): + if subs: + matchcid (bs [i], cid, accupar, path=path+[i]) + else: + if i == 3: + pass #DEBUG# print 'Comparing', cid, 'with', bs [i] + if i == 3 and bs [i] == cid: + #DEBUG# print 'CID found on:', path + accupar.append (path) + subs = False + if cid: + accu = [] + matchcid (bodystructure, cid, accu, path=[1,3]) + #DEBUG# print 'Result is:', accu + absname = cid [1:-2] + else: + accu = [[1,3,1]] + absname = mid [1:-2] + for result in accu: + here = parsed + for i in result: + here = here [i] + print 'MIME-Type =', here [0] + '/' + here [1] + print '[attr,value,...] =', here [2] + name = None + for i in range (0, len (here [2]), 2): + print 'Looking for name in', here [2][i] + if here [2][i].lower () == 'name': + name = here [2][i+1] + print 'Filename:', name + print 'Content-ID =', here [3] if len (here) > 3 else '' + print 'Description =', here [4] if len (here) > 4 else '' + print 'Transfer-Encoding =', here [5] if len (here) > 5 else '' + encoding = here [5] if len (here) > 5 else '' + print 'Size =', here [6] if len (here) > 6 else '?' + bodyspec = 'BODY' + dot = '[' + for r in result [2:]: + bodyspec = bodyspec + dot + str (r+1) + dot = '.' + bodyspec = bodyspec + ']' + if bodyspec == 'BODY]': + bodyspec = 'BODY[1]' + print 'Fetchable bodyspec', bodyspec, 'for UID', uid + ok,data = im.uid ('fetch', uid+':'+uid, '('+bodyspec+')') + if ok != 'OK': + sys.stderr.write ('Error fetching content') + sys.exit (1) + #TODO# Be more subtle about encoding lists + if os.path.exists (absname): + sys.stderr.write ('Fatal: file ' + absname + ' already exists\nYou probably ran the command twice; or else the sender may attempt overwriting\n') + sys.exit (1) + fh = open (absname, 'wb') + if encoding == 'base64': + fh.write (b64decode (data [0][1])) + else: + fh.write (data [0][1]) + fh.close () + print 'Written to:', absname + if name: + if not os.path.exists (name): + os.link (absname, name) + print 'Created a link from:', name + else: + sys.stderr.write ('Warning: file ' + name + ' already exists, not linking\n') + + diff --git a/midput.py b/midput.py new file mode 100755 index 0000000..a940698 --- /dev/null +++ b/midput.py @@ -0,0 +1,165 @@ +#!/usr/bin/env python +# +# Kerberos login to IMAP server and upload of provided files as a draft email. +# +# From: Rick van Rein + + +import os +import sys +from base64 import b64encode, b64decode +import imaplib +import urllib + +import email.mime.base as mime +import email.mime.text as text +import email.mime.multipart as multipart + +import kerberos + + +class SASLTongue: + + def __init__ (self): + self.ctx = None + self.complete = False + + def wrap (self, plaintext): + """Once a GSSAPI Context is complete, it can wrap plaintext + into ciphertext. This function operates on binary strings. + """ + kerberos.authGSSClientWrap (self.ctx, b64encode (plaintext)) + cipherdata = kerberos.authGSSClientResponse (self.ctx) + return (b64decode (cipherdata) if cipherdata else "") + + def unwrap (self, ciphertext): + """Once a GSSAPI Context is complete, it can unwrap ciphertext + into plaintext. This function operates on binary strings. + """ + kerberos.authGSSClientUnwrap (self.ctx, b64encode (ciphertext)) + return b64decode (kerberos.authGSSClientResponse (self.ctx)) + + def processor (self, hostname): + # Currying function (needed to bind 'self') + def step (rcv): + #DEBUG# print 'New Call with Complete:', self.complete + #DEBUG# print 'Received:', '"' + b64encode (rcv) + '"' + if not self.complete: + # Initiate the GSSAPI Client + #ALT# rc, self.ctx = kerberos.authGSSClientInit ('imap@' + hostname, gssflags=kerberos.GSS_C_SEQUENCE_FLAG) + #STD# rc, self.ctx = kerberos.authGSSClientInit ('imap@' + hostname) + if not self.ctx: + rc, self.ctx = kerberos.authGSSClientInit ('imap@' + hostname) + rc = kerberos.authGSSClientStep (self.ctx, b64encode (rcv)) + #DEBUG# print 'ClientStep Result Code:', ['CONTINUE', 'COMPLETE'] [rc] + if rc == kerberos.AUTH_GSS_COMPLETE: + self.complete = True + # if rc != 0: + # print 'Error making a step' + # return None + snd = kerberos.authGSSClientResponse (self.ctx) + return (b64decode (snd) if snd else "") + else: + # Unwrap and interpret the information token + rc = kerberos.authGSSClientUnwrap (self.ctx, b64encode (rcv)) + # if rc != 0: + # print 'Error unwrapping' + # return None + token = b64decode (kerberos.authGSSClientResponse (self.ctx)) + if len (token) != 4: + #DEBUG# print 'Error unwrapping token after GSSAPI handshake' + return None + flags = ord (token [0]) + #DEBUG# print 'Flags:', '0x%02x' % flags + if flags & kerberos.GSS_C_INTEG_FLAG: + pass #DEBUG# print 'Integrity Supported' + if flags & kerberos.GSS_C_CONF_FLAG: + pass #DEBUG# print 'Confidentialtiy Supported' + maxlen = (ord (token [1]) << 16) | (ord (token [2]) << 8) | (ord (token [3])) + #DEBUG# print 'Maxlen:', maxlen + rettok = (chr (0) * 4) + 'ofo' + return self.wrap (rettok) + # kerberos.authGSSClientWrap (self.ctx, b64encode (rettok)) + # snd = kerberos.authGSSClientResponse (self.ctx) + # return (b64decode (snd) if snd else "") + + # The Currying surroundings return the internal function + # This is a strange necessity due to the IMAP assumption + # that it can call a closure, or a stateless function. + # What a lot of work to evade global variables... and it's + # all due to an ill-designed API, I think. + return step + + +# +# Process the commandline +# +if len (sys.argv) < 2: + sys.stderr.write ('Usage: ' + sys.argv [0] + ' attachment...\n\tThis command will create a draft email with the given files attached.\n') + + +attachments = [ ] +for arg in sys.argv [1:]: + ana = os.popen ('file --mime-type "' + arg + '"').read () + (filenm,mimetp) = ana.split (': ', 1) + (major,minor) = mimetp.strip ().split ('/', 1) + filenm = arg.split (os.sep) [-1] + content = mime.MIMEBase (major, minor) + content.set_param ('name', filenm) + content.add_header ('Content-disposition', 'attachment', filename=filenm) + attachments.append (content) + + +remote_hostname = 'popmini.opera' + +# +# Login to IMAP +# +im = imaplib.IMAP4 (remote_hostname, 143) +authctx = SASLTongue () +authcpu = authctx.processor (remote_hostname) +#DEBUG# print 'AuthCPU:', authcpu, '::', type (authcpu) +im.authenticate ('GSSAPI', authcpu) + +# +# Select a mailbox for uploading to +# +draftbox = 'Drafts' +ok,msgs = im.select (draftbox) +if ok != 'OK': + ok,msgs = im.select () + if ok == 'OK': + sys.stderr.write ('Warning: No ' + draftbox + ' folder found, posting to INBOX\n') + draftbox = 'INBOX' + else: + sys.stderr.write ('Failed to select both Drafts folder or even INBOX\n') + sys.exit (1) + + +# +# Insert the content into the attachments +# +for (av,at) in zip (sys.argv [1:], attachments): + if major == 'text': + at.set_payload ( open (av, 'r').read () ) + else: + at.set_payload (b64encode (open (av, 'r').read ())) + +# +# Construct the email message to upload +# +introtxt = """Hello, + +Attached, you will find +""" +intro = text.MIMEText (introtxt) +attachments.insert (0, intro) + +msg = multipart.MIMEMultipart () +for at in attachments: + msg.attach (at) + +ok,data = im.append (draftbox, '(\\Flagged \\Draft)', None, msg.as_string ()) +if ok != 'OK': + sys.stderr.write ('Problem appending the file') + sys.exit (1)