|
Pages: [1]
|
 |
|
Author
|
Topic: Parsing a Thunderbird Mailbox (Read 2098 times)
|
|
bmoore
|
I am trying to extract the messages from a Thunderbird Mailbox, which is more or less a standard UNIX format mailbox with all the messages strung together.
messagetag: {From - * X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000}
find/any mailboxdata messagetag
finds the first tag, but I don't know how to make find, find multiple instances. I need to be reasonably efficient since the mailbox file is over 3MB. So far I have wasted a day trying to search for ways to make find continue, but given that "find" is such a common word, it is not easy to do an effective search. Can anyone offer some suggestions, or point me to some relevent documentation.
Out of despiration I tried peeling messages off the bottom of the file, and the following seemed to work:
mailfile: %/path-to-mailbox/mailboxfile maildata: read mailfile messagetag: {From - * X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000} count: 0 messages: copy [] r: maildata while [length? r] [ r: find/last/any r messagetag insert messages copy r remove/part r length? r r: head r count: count + 1 print count ]
but it only pulled out the last 27 messages in the mailbox before causing REBOL to puke up the following error:
** Script Error: copy expected value argument of type: series port bitset ** Where: halt-view ** Near: insert messages copy r
Likely trying to shuffle around that huge string (just over 2518810 bytes at the time of failure) was too much and caused something to get overwritten.
print r returned none
Any help would be much appreciated.
|
|
|
|
|
Logged
|
|
|
|
|
Graham
|
I think Brett Handley has done some work on scripts that parse a unix mailbox.
The delimiter that separates a mail from another is two newlines I think.
So,
mailbox: []
parse unix-file [ some [ copy mail thru {^/^/} ( append mailbox mail ) ]]
might work for you .. untested.
|
|
|
|
|
Logged
|
|
|
|
|
bmoore
|
Thanks for the reply. Unfortunately I said "more or less" a standard UNIX mailbox. The delimiter is the complex pattern mentioned above - i.e.:
From - Date of Storage (Don't care... prefer to match as a wildcard) X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000}
I've continued to slug through the various artilces on Parse, and notice that parse can execute code after a match-but the examples shown are naturally very simplistic.
I'm wondering if it would it be possible to use a single parse that could build a series of all the starting points of these delimiters?
|
|
|
|
|
Logged
|
|
|
|
|
Gabriele
|
Something like: parse/all mailbox-string [ some [ mail: "From " thru newline ; just ignore "X-Mozilla-Status: 0001" newline "X-Mozilla-Status2: 00000000" newline (print index? mail) ; skip to next mail for next iteration to "^/From " skip | ; didn't match, some junk? to "^/From " skip ] ]
(not tested)
|
|
|
|
|
Logged
|
|
|
|
|
|
|
|
|
|
Pages: [1]
|
|
|
 |
News: 01-09-08 Alpha version of REBOL 3 has been released!
2295 Posts in 593 Topics by 3754 Members
Latest Member: MPShetfoept
|