Last May, at the Mark Logic User Conference, Jason Hunter showed off MarkMail. It's finally available to the public at http://markmail.org/
Various links about it:
One part that wowed me is the ability to view attachments within the interface. Maybe I'm easily impressed. For an example, go here for email with a PDF attached. Click on the link for the attachment. Search for other attachments with a search string like 'extension:ppt' or 'extension:doc'
Showing posts with label xquery. Show all posts
Showing posts with label xquery. Show all posts
Thursday, November 08, 2007
Thursday, May 17, 2007
san francisco -- day #4 -- "You like extreme?"
Final conference sessions today. OK, but nothing fantastic.
Great lunch at House of Nanking. The waiter arrived, said "Your first time here?" When we said yes, he took our menus. No idea what the dishes were, but they were all super-excellent.
Then got a taxi to the airport. After a few minutes of silence, the driver asks, "You like extreme?"
I'm not sure I hear him right. He asks again, "You like extreme?"
Yep. I had heard him right.
Pause....... I'm not sure if I'm going to get invited to a cock fight, or what. I guess that maybe he's asking if it's OK to play his favorite thrash metal CD. "Uhhh.... extreme music?" I ask.
He explains, "No! Extreme! Like X! .... Games! Extreme. Sports."
I respond, "umm. A little?"
The rest of the ride was enclosed in blissful silence.
Great lunch at House of Nanking. The waiter arrived, said "Your first time here?" When we said yes, he took our menus. No idea what the dishes were, but they were all super-excellent.
Then got a taxi to the airport. After a few minutes of silence, the driver asks, "You like extreme?"
I'm not sure I hear him right. He asks again, "You like extreme?"
Yep. I had heard him right.
Pause....... I'm not sure if I'm going to get invited to a cock fight, or what. I guess that maybe he's asking if it's OK to play his favorite thrash metal CD. "Uhhh.... extreme music?" I ask.
He explains, "No! Extreme! Like X! .... Games! Extreme. Sports."
I respond, "umm. A little?"
The rest of the ride was enclosed in blissful silence.
Wednesday, May 16, 2007
san francisco -- day #3
Noisey street. Some sleepy sessions. A couple awesome ones -- displaying debugging and profiling support for XQuery.
Then the final session of the day. At first, I'm thinking... blah, a mailing list archive viewer? Oh, how wrong I was. Sweet zombie Jesus, it was cool. Or maybe Jason Hunter is an great speaker. Or maybe both?
It was also very applicable to the work we're doing -- transforming and enhancing XML and non-XML assets into an normalized XML form and storing the results in MarkLogic.
Then the final session of the day. At first, I'm thinking... blah, a mailing list archive viewer? Oh, how wrong I was. Sweet zombie Jesus, it was cool. Or maybe Jason Hunter is an great speaker. Or maybe both?
It was also very applicable to the work we're doing -- transforming and enhancing XML and non-XML assets into an normalized XML form and storing the results in MarkLogic.
Tuesday, May 15, 2007
san francisco -- day #2
Noisey street + hotel room close to street = no sleep.
As the opening session begins, I realize I can't focus. Then I notice the strobing migraine precursor.... Today is going to be awesome!
Interesting sessions. One of the sessions on the technical track of the conference seemed like a waste of time, but most of the others were interesting. And an added plus, presentations by the vendor and one of our competitors seem to validate our current architectural approach.
As the opening session begins, I realize I can't focus. Then I notice the strobing migraine precursor.... Today is going to be awesome!
Interesting sessions. One of the sessions on the technical track of the conference seemed like a waste of time, but most of the others were interesting. And an added plus, presentations by the vendor and one of our competitors seem to validate our current architectural approach.
Monday, May 14, 2007
san francisco -- day #1
Flew to San Francisco for a technical conference. Flight uneventful, and in a teeny CRJ200. Only 12 and a half rows of seats.
Googled up restaurants near the hotel. I've been craving Vietnamese all the time lately, so I decided on a hole-in-the-wall place with good reviews -- Golden Flower Vietnamese. Just a couple minutes walk from the hotel.
Pretty good, but not the best I've had. The imperial rolls were greasy, and more mystery-meaty than I'd have preferred. The bun with grilled pork and shrimp was tasty.
In order of worst to first:
Pho Cali -> Golden Flower Vietnamese -> Pho Hoa -> Shanghai Cafe -> The deeee-licious versions Dien made for us over at Mark's house. Mmmmm.
Also, it's less fun to travel without coworkers. Wandering aimlessly is less fun to do on your own... it feels more like getting lost than having an adventure.
Googled up restaurants near the hotel. I've been craving Vietnamese all the time lately, so I decided on a hole-in-the-wall place with good reviews -- Golden Flower Vietnamese. Just a couple minutes walk from the hotel.
Pretty good, but not the best I've had. The imperial rolls were greasy, and more mystery-meaty than I'd have preferred. The bun with grilled pork and shrimp was tasty.
In order of worst to first:
Pho Cali -> Golden Flower Vietnamese -> Pho Hoa -> Shanghai Cafe -> The deeee-licious versions Dien made for us over at Mark's house. Mmmmm.
Also, it's less fun to travel without coworkers. Wandering aimlessly is less fun to do on your own... it feels more like getting lost than having an adventure.
Wednesday, March 21, 2007
a programming post, odd.
At work, I've been working on a data processing pipeline. The meta-data for the pipeline (e.g. status of pipeline phases, data locations, etc) at runtime is held in a LRU cache and flushed to persistent storage. Prior to a couple weeks ago, the storage was just a bunch of files on the filesystem. It worked, but it was pretty slow to collect aggregate information out of it -- e.g. if you wanted a global view of all status within the pipeline, you had to traverse the meta-data graph loading each individual file as the traversal progressed.
I'm allergic to relational databases, and we're already using Marklogic/XQuery to store and transform our data, so we decided to try a new implemenation of the LRU cache with the storage in Marklogic. Performance wasn't impacted much (maybe an extra couple minutes for a 1500+ file pipeline that takes 10+ hours), and now status reports take a couple seconds rather than a few minutes. Hooray!
Since that was less painful, I decided I'd also try getting some performance statistics out of the meta-data in the database. There are a few helper functions to do sum/count/average, but I wanted to avoid iterating over the list multiple times. I found some psuedocode here, and implemented an XQuery version:
I'm allergic to relational databases, and we're already using Marklogic/XQuery to store and transform our data, so we decided to try a new implemenation of the LRU cache with the storage in Marklogic. Performance wasn't impacted much (maybe an extra couple minutes for a 1500+ file pipeline that takes 10+ hours), and now status reports take a couple seconds rather than a few minutes. Hooray!
Since that was less painful, I decided I'd also try getting some performance statistics out of the meta-data in the database. There are a few helper functions to do sum/count/average, but I wanted to avoid iterating over the list multiple times. I found some psuedocode here, and implemented an XQuery version:
(:
This is an XQuery implementation of the variance algorithm on this page:
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
credited to Donald Knuth, The Art of Computer Programming, vol 2: Seminumerical Algorithms, 3rd edn., p. 232 Boston: Addison-Wesley.
It relies on Marklogic's extensions to XQuery, so it probably won't
work on other XQuery implementations.
:)
define function putil:getSumMeanVariance($srcvals as xs:double*) as node()* {
let $n := 0
let $mean := 0
let $S := 0
let $sum := 0
let $throwAway :=
for $x in $srcvals
let $delta := $x - $mean
return (
xdmp:set($n,$n + 1),
xdmp:set($mean,$mean + ( $delta div $n ) ),
xdmp:set($S, $S + $delta * ( $x - $mean )),
xdmp:set($sum, $sum + $x )
)
let $variance :=
if ($n gt 1) then
$S div ($n - 1)
else 0
return (
<count>{$n}</count>,
<sum>{$sum}</sum>,
<mean>{$mean}</mean>,
<variance>{$variance} </variance>,
<stdev>{math:sqrt($variance)}</stdev>
)
}
let $elapsedTimes :=
for $pnode in collection($collName)[ .... uninteresting XPath predicate stuff here .... ]
return $pnode/stats/time/end - $pnode/stats/time/start
return <stats>{getSumMeanVariance($elapsedTimes)}</stats>
Wednesday, July 19, 2006
more IronPython + MarkLogic experimenting
I've been playing more with IronPython. Specifically, experimenting with using it along with the MarkLogic's XCC API.
Pretty ugly, but working re-implementation of the sample code to run queries.
Figuring out more ways I might infiltrate my workplace with Python makes me happy happy. Beer also makes me happy. Mmm... Murphy's Stout.
Pretty ugly, but working re-implementation of the sample code to run queries.
# Re-implementation of SimpleQueryRunner.cs XCC example
# using IronPython
import clr
clr.AddReferenceToFile("MarklogicXcc.dll")
import Marklogic.Xcc
import System
import os
import sys
def execute(session,query):
"""Generator that yields result strings from
execution of the query"""
req = session.NewAdhocQuery(query)
for res in session.SubmitRequest(req).AsStrings():
yield res
return
def main():
if len(sys.argv) != 3:
print usage()
sys.exit(2)
query = "'Hello world'"
try:
f = open(sys.argv[2])
query = f.read()
f.close()
cs = Marklogic.Xcc.ContentSourceFactory.NewContentSource(
System.Uri(sys.argv[1]))
session = cs.NewSession()
for result in execute(session,query):
print result
except EnvironmentError,e:
sys.stderr.write("*** Error: %s\n"%e)
sys.exit(1)
except Exception,e:
#ugh, looks like exceptions from XCC are Exception
sys.stderr.write("*** Error: %s\n"%e)
sys.exit(1)
def usage():
return "usage: \n SimpleQueryRunner.py xcc://USER:PASSWORD@HOST:PORT/MLDATABASE <queryfile>"
if __name__ == "__main__":
main()
Figuring out more ways I might infiltrate my workplace with Python makes me happy happy. Beer also makes me happy. Mmm... Murphy's Stout.
Monday, July 17, 2006
MarkLogic XCC + IronPython = Sweet!
At work we're using MarkLogic to store and transform our XML content. You use XQuery to access the data, which has been pretty fun to learn. While looking at the release notes for MarkLogic 3.1-2, I saw that they've released the .NET version of their new XCC API for connecting with the server.
I've meant to start playing with IronPython for a while. And I've also meant to start learning C#/.NET stuff. So, I thought I'd try running the MarkLogic XCC.Net examples via IronPthon.
And, it works! Woohoo! Not that it does much yet...
Our collection of XSLT/XQuery/Java applications is a pain to deploy and do quick interactive testing against. Hopefully being able to script it with IronPython -- and maybe provide a nice interface with Windows Forms -- will allow quicker turnaround times.
I've meant to start playing with IronPython for a while. And I've also meant to start learning C#/.NET stuff. So, I thought I'd try running the MarkLogic XCC.Net examples via IronPthon.
And, it works! Woohoo! Not that it does much yet...
import clr
clr.AddReferenceToFile("MarklogicXcc.dll")
import Marklogic.Xcc
import System
import os
doc = 'hamlet.xml'
#replace connection info and marklogic db name
contentSource = 'xcc://USER:PASSWORD@HOST:PORT/MLDBNAME'
print "Loading document ..."
Marklogic.Xcc.Examples.ContentLoader.Main(
System.Array[System.String](
[contentSource ,doc]
)
)
print "Fetching document ..."
Marklogic.Xcc.Examples.ContentFetcher.Main(
System.Array[System.String](
[contentSource,os.path.abspath(doc).replace('\\','/'),'-o','hamlet_fetched.xml']
)
)
print "Done."
Our collection of XSLT/XQuery/Java applications is a pain to deploy and do quick interactive testing against. Hopefully being able to script it with IronPython -- and maybe provide a nice interface with Windows Forms -- will allow quicker turnaround times.
Subscribe to:
Posts (Atom)