At work, we have our telephony applications. We send SIP-calls here and there, bouncing between different application. Sometimes when I get logs of a callscenario someone had problem with it can be a bit hard to understand exactly how the three calls has interacted with each other.

Then I remembered that Wireshark has the ability to show SIP traffic as a flow-diagram. Quite nifty really.

I had an idea that I should take our own logs, extract the SIP messages and generate a pcap-file from that. Then using Wireshark to produce a graph like the one shown above.
Perhaps a bit round-about way to get a call-graph. But still, sounds simple, right?

As a starter I found the format specification for the pcap file. It looked simple enough, just a couple of C-structs written straight to disk.
http://wiki.wireshark.org/Development/LibpcapFileFormat#File_Format

So in an hour I wrote a program that parsed our logs and filled struct matching the C-structs. One unexpected hardship I encountered was converting structs to binary format. I could not find a "C-like" serializer! But writing byte-by-byte manually still worked.
However the pcap-files created this was didn't work. Write shark complained that it was malformed IP-packets, segmented IP-packets with invalid checksum and so on.
To summarize, it didn't work!

Sometimes when my I have run my efforts straight into the wall I think "This shouldn't be hard. Someone has to have done this before." But I didn't find anything. (However I never found managed to phrase a good google-question. All I found was people trying to get tcpdump to log and similar.)

Ok, Internet couldn't help me. So I tried the next thing, Stack Overflow. "Are there a .net-library available to write arbitrary data to a pcap-file?"

And yes there where.
SharkPcap And the sibling Packet.Net.
To quote the project: SharkPcap is a libpcap/winpcap wrapper. Packet.Net is is a networking packet parser.

After some more poking around I found how to use the libraries. Open a CaptureFileWriterDevice, and indicate that the file-format should be raw. (Rather then Ethernet, Tokenring or something similar.)

Then create a UdpPacket, wrap that in a IpPacket. Get the bytes from the IpPacket and write that to the capture device.
Done.

public void WritePCap(
      string filename, DateTime dt, IPAddress srcIp, IPAddress dstIp,
    UInt16 srcPort, UInt16 dstPort, byte[] data )
{
    IpPacket ip = new IPv4Packet(srcIp, dstIp);
    ip.TimeToLive = 70;

    UdpPacket payload = new UdpPacket(srcPort, dstPort) {
                               SourcePort = srcPort,
                               DestinationPort = dstPort,
                               PayloadData = data,
                               ParentPacket = ip
                           };

    ip.PayloadPacket = payload;

    payload.UpdateCalculatedValues();
    ip.UpdateCalculatedValues();

    byte[] ipData = ip.Bytes;

    CaptureFileWriterDevice storeDevice = new CaptureFileWriterDevice(
         PacketDotNet.LinkLayers.Raw, null, filename, FileMode.OpenOrCreate);
    PcapHeader hdr = new PcapHeader(
           ToUnixTime(dt), (uint)(dt.Millisecond * 1000),
           (uint)ipData.Length, (uint)ipData.Length);
    storeDevice.Write(ipData, hdr);
}

I recently got an idea, make a program that calculates checksums of files and stores those in a database. Then a customer can generate a foot-print, a set of checksums, of what is actually installed.

One important thing needed to complete this is knowing what we actually are installed. What file a product version contains. Many of our installers are MSI:s. How to you know which files an arbitrary MSI contains?

Well, an MSI can contain basically anything, anyhow. Including custom actions that makes arbitrary changes, adding or removing files. So I'll start with reducing the issue to assume that it is a "regular" MSI-file. That the files contained in the MSI-file will be the actual files the MSI will install.

Also, one big problem that I will ignore is that many installers ain't MSI-files. It is a an EXE-file which contains a MSI-file, hidden inside it's gut. Hopefully I will get to that issue later, in another post.

So I did some digging around, how do you actually open and look at a MSI file. I have previously frequently used Orca to look at the contents of MSI and MSM (merge modules). That is a start.

File table

I think a sample is needed. TortoiseSVN is delivered as an MSI. Version 1.7.2
If we open the MSI in Orca and look at the File table we can se that some files seem to be there. (One idea is also to sneek-peak at the MSDN specification for the table.) The File column seems to contain some sort of ID, Column probably also. Filename sure sound like ball on the money.

However, scrollong down some of the filenames are gibberish. Being human, looking at ccp7flfk.exe|TortoiseMerge.exe I would guess that the second part after the pipe-character is the proper filename. But how would a program do?

Further investigation shows that Filename in MSI is a type of it's own. MSDN says: To include a long file name with the short file name, separate it from the short file name with a vertical bar (|). For my purposes that either means "use data efter pipe-charater", or perhaps "use both". But since my target platform isn't MSDOS I can rely on "longnames" working.

Directory table

So the File table says what files really should be named. But not where they should be. To solve this we have to look at the Directory and Component tables in combination. The File table contained an external reference to the Component table.

A quick peek in the Component table tells us that there is a reference to the Directory table. If we know the component of a file (which the File table tells us) we know the directory.
(Also, since each component only has one directory referenced, this limits a component to one folder on disk.)

Continuing, if we look at the Directory table we can seen that the first directory is INSTALLDIR.

It should probably be mentioned that INSTALLDIR is an arbitrary name. The only thing telling me that it is the main-folder of SVN is that the parent is D__ProgramFilesFolder, which parent in turn is TARGETDIR.

So now we know where a file should go, if we only have the file. Let's start looking for where the actual bytes are hidden.

Media table

After some looking around I found that I should look in the Media table. There lies all attached media files.

Well that is a bit of a disappointment, the files are not here either! Just some cab-file. Also cab-files are not the most easy thing in the world to extract. If we can believe Wikipedia and MSDN the contents might be compressed by any of four algorithms. File too large for a single CAB-file might continue to the next file. (But no more then 15 files might overlap to the next cab-file at a time. Thanks.)

Well, extracting the CAB-file is not too hard. Fetch the record. Open a stream from column four. it seems like. Then we only have to worry about reading a CAB-file...

The CAB file

Ok. We have now a CAB file on disk. But how do we examine it? MSDN might be a starter. Also there is a project on CodeProject that claims to be a way to success.

Ohh. This seems hard.

We have to create a Cabinet-instance. For it assign call-back functions to handle all possible events that might happen. This is possible to do, even when in a .NET environment as I am envisioning. However it sure seem like a hassle.

Expand.exe

Microsoft has a stardard utility for handling cabinet files, expand.exe.

Microsoft (R) File Expansion Utility Version 6.1.7600.16385

Expands one or more compressed files.

EXPAND [-R] Source Destination

EXPAND -R Source [Destination]

EXPAND -I Source [Destination]

EXPAND -D Source.cab [-F:Files]

EXPAND Source.cab -F:Files Destination

-R Rename expanded files.

-I Rename expanded files but ignore directory structure.

-D Display list of files in source.

Source Source file specification. Wildcards may be used.

-F:Files Name of files to expand from a .CAB.

Destination Destination file | path specification.

Destination may be a directory.

If Source is multiple files and -r is not specified,

Destination must be a directory.

So -D, to view the contents eh?
   C:\temp>expand /d #TSVN.cab
   Microsoft (R) File Expansion Utility Version 6.1.7600.16385
   Copyright (c) Microsoft Corporation. All rights reserved.

   #tsvn.cab: F__aprlicense
   #tsvn.cab: F__BlipAddedIcon
   #tsvn.cab: F__BlipCleanAddedIcon
   #tsvn.cab: F__BlipCleanConflictIcon
   #tsvn.cab: F__BlipCleanDeletedIcon
   ...

Oh, bugger. We don't have the original filename, only the ID from the File-table. Well it should be possible to make the lookup, even though it's a hassel.

Well, at least expand -F:F__FileID #TSVN.cab C:\Temp\TSV_cab_extracted will extract a single file for me. Not ideal since there are a lot of temp-files to juggle, but it should work unless I find something better.

Cabinet SDK?

Come on, there has to be some sort of SDK to manipulate Cabinet files. I cannot be the first one on eath to attempt this!

The WiX-toolkit contains some DLLs to manipulate MSI and related tools. Including the DLL Microsoft.Deployment.Compression.dll. I found a website which attempt to introduce these.

Using the WiX-toolkit I can get a System.IO.Stream to the CAB-file included inside the MSI-file. If I can find some way to operate on CAB-files using a stream-interface (or byte-arrays) I should be home safe.

Maybe I'll revisit this method later, the expand.exe-solution will have to do for now.

Loose files, then?

It is not mandatory to have your files in a cab-file, loose files are also allowed. If I configure my installer to place loose files this is the files in by installer build folder:

P:\Output>dir /b /S

P:\Output\UncompressedProject.msi

P:\Output\UncompressedProject_TargetDir\file1.exe

P:\Output\UncompressedProject_TargetDir\subfolder\file2.exe

So I have my MSI-file as usual. Also I have my two files, in some subfolder thereunder. But how do I know that the folder is named UncompressedProject_TargetDir as it is in my case? Well since I looked at my MSI I feel I might have an idea. The long-name of the directory INSTALLLOCATION matches the folder on disk. Let's hope it's not a coincidence. Gets get that record from Directory-table and use everything after the | (pipe) character.

In my test-application I simply used the folder <location of MSI>\<INSTALLLOCATION> and searched for *.* in that folder. That gave me all loose files.

So there you have it. The File table contains lookups of what files actually should be called. Using Component and Directory in combination we know where the files should be. The Media table usually contains a CAB-file which in turn contains the files. (Named as their IDs.)

Learning in bits and pieces

Jun 19, 2012

Writing your own pcap-files.

Mar 11, 2012

MSI databases, extracting the files