Rebol Talk Forum  |  Getting Started  |  Ask the Guru! (Moderator: Carl)  |  Topic: Parsing a Thunderbird Mailbox
Pages: [1] Print
Author Topic: Parsing a Thunderbird Mailbox  (Read 2098 times)
bmoore
Newbie
*
Offline Offline

Posts: 9


View Profile
Parsing a Thunderbird Mailbox
« on: August 08, 2006, 12:33:12 AM »

I am trying to extract the messages from a Thunderbird Mailbox, which is more or less a standard UNIX format mailbox with all the messages strung together.

messagetag: {From - *
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000}

find/any mailboxdata messagetag

finds the first tag, but I don't know how to make find, find multiple instances.  I need to be reasonably efficient since the mailbox file is over 3MB.  So far I have wasted a day trying to search for ways to make find continue, but given that "find" is such a common word, it is not easy to do an effective search.  Can anyone offer some suggestions, or point me to some relevent documentation.

Out of despiration I tried peeling messages off the bottom of the file, and the following seemed to work:

mailfile: %/path-to-mailbox/mailboxfile
maildata: read mailfile
messagetag: {From - *
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000}
count: 0
messages: copy []
r: maildata
while [length? r] [
   r: find/last/any r messagetag
   insert messages copy r
   remove/part r length? r
   r: head r
   count: count + 1
   print count
]

but it only pulled out the last 27 messages in the mailbox before causing REBOL to puke up the following error:

** Script Error: copy expected value argument of type: series port bitset
** Where: halt-view
** Near: insert messages copy r

Likely trying to shuffle around that huge string (just over 2518810 bytes at the time of failure) was too much and caused something to get overwritten. 

print r returned none

Any help would be much appreciated.
Logged
Graham
Full Member
***
Offline Offline

Posts: 113


View Profile
Re: Parsing a Thunderbird Mailbox
« Reply #1 on: August 08, 2006, 01:41:21 AM »

I think Brett Handley has done some work on scripts that parse a unix mailbox.

The delimiter that separates a mail from another is two newlines I think.

So,

mailbox: []

parse unix-file [ some [ copy mail thru {^/^/} ( append mailbox mail ) ]]

might work for you .. untested.
Logged

bmoore
Newbie
*
Offline Offline

Posts: 9


View Profile
Re: Parsing a Thunderbird Mailbox
« Reply #2 on: August 10, 2006, 09:36:09 AM »

Thanks for the reply.  Unfortunately I said "more or less" a standard UNIX mailbox.  The delimiter is the complex pattern mentioned above - i.e.:

From - Date of Storage (Don't care... prefer to match as a wildcard)
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000}

I've continued to slug through the various artilces on Parse, and notice that parse can execute code after a match-but the examples shown are naturally very simplistic.

I'm wondering if it would it be possible to use a single parse that could build a series of all the starting points of these delimiters?
Logged
Gabriele
Full Member
***
Offline Offline

Posts: 182


View Profile WWW
Re: Parsing a Thunderbird Mailbox
« Reply #3 on: August 11, 2006, 04:05:27 AM »

Something like:

Code:
parse/all mailbox-string [
    some [
        mail: "From " thru newline ; just ignore
        "X-Mozilla-Status: 0001" newline
        "X-Mozilla-Status2: 00000000" newline
        (print index? mail)
        ; skip to next mail for next iteration
        to "^/From " skip
        |
        ; didn't match, some junk?
        to "^/From " skip
    ]
]

(not tested)
Logged
Graham
Full Member
***
Offline Offline

Posts: 113


View Profile
Re: Parsing a Thunderbird Mailbox
« Reply #4 on: August 11, 2006, 05:06:33 AM »

There's lots of documentation on parse, but it's all over the place.

Here's something we started in the wikibook but never finished http://en.wikibooks.org/wiki/REBOL_Programming/Language_Features/Parse
Logged

Sunanda
Full Member
***
Offline Offline

Posts: 110


View Profile
Re: Parsing a Thunderbird Mailbox
« Reply #5 on: August 11, 2006, 09:13:44 AM »

And one of the places it is all over is the mailing list.

This may help in finding relevant mailing list postings:

http://www.rebol.org/cgi-bin/cgiwrap/rebol/ml-topic-index.r?i=parse
Logged
Pages: [1] Print 
Rebol Talk Forum  |  Getting Started  |  Ask the Guru! (Moderator: Carl)  |  Topic: Parsing a Thunderbird Mailbox
Jump to:  

  
Quick Search...

Advanced search
  
Welcome, Guest. Please login or register.
Did you miss your activation email?
November 21, 2008, 04:20:24 PM
Username: Password: Session Length:
  

News: 01-09-08

Alpha version of REBOL 3 has been released!


  
2295 Posts in 593 Topics by 3754 Members
Latest Member: MPShetfoept

  Rebol Talk Forum | Powered by SMF 1.0.9.
© 2001-2005, Lewis Media. All Rights Reserved.

RT design by Defiant Pc