cabextract

cabextract is Free Software for extracting Microsoft cabinet files, also called .CAB files, on UNIX or UNIX-like systems. cabextract is distributed under the GPL license. It is based on the portable LGPL libmspack library. cabextract supports all special features and all compression formats of Microsoft cabinet files.

Quick links

Microsoft cabinets vs InstallShield cabinets

Bundled extras
Using cabextract
Changes since 0.6
FAQs
CAB History
Credits

Microsoft cabinet files are used by Microsoft and others to distribute all kinds of data and software: core Web fonts, Longhorn videos, operating system updates and video codecs, to give some examples. Microsoft cabinets are also used as the installation format for Windows CE software. Some people would pay $10.95 to extract this format, you can have it for free.

Downloading and installing `cabextract`

The latest version of cabextract is version 1.1, released 18 October 2004. It is a minor release which fixes a security flaw. See the changes section for more information.

In most cases, cabextract has already been packaged for your operating system. You should consult your OS documentation on how to obtain and install packages. However, you may have to download the source code and compile it yourself to get the latest version.

Platform	Version	Download
All: `cabextract` source code	1.1	cabextract-1.1.tar.gz
RPM-based Linux distributions (Red Hat, SuSE, etc.)	1.1	cabextract-1.1-1.i386.rpm cabextract-1.1-1.src.rpm
Debian GNU/Linux	1.0	`cabextract` package
Gentoo Linux	1.0	`cabextract` package
Slackware Linux	1.0	`cabextract` package
NetBSD	1.0	`cabextract` package
FreeBSD	1.0	`cabextract` package
OpenBSD	0.6	`cabextract` package
Amiga (GeekGadgets)	0.6	`cabextract` [readme] package
BeOS	0.6	M$CAB Extract package
Mac OS X	0.6	`cabextract` package for Fink
Microsoft Windows™	0.6	`cabextract` package for Cygwin
HP/UX, Solaris, IRIX, ...	0.6	binary tarballs from wwexptools

Please let me know of any additions or changes to the list above. Old distributions of cabextract are still available: 1.0 [RPMS: i386, src], 0.6, 0.5, 0.4, 0.3, 0.2, 0.1.

To install an existing cabextract package, consult your operating system's documentation. To install the RPM, use the command rpm -i cabextract-1.1-1.i386.rpm. To install from the source code tarball:

$ gzip -cd < cabextract-1.1.tar.gz | tar xf -
$ cd cabextract-1.1
$ ./configure
$ make
# make install

More detailed instructions are included in the INSTALL file found in the cabextract-1.1 directory.

Bundled extras

The cabextract source tarball contains extra documentation and software that is not installed by default. If you use a distribution-specific package, these extras may or may not be installed.

doc/ja/cabextract.1: the Japanese manual page for cabextract.
doc/magic: Some magic entries for the file command, describing Microsoft cabinet files, InstallShield cabinet files and Windows CE install cabinet header files.
doc/wince_cab_format.html: a technical specification of the Windows CE install cabinet file format. This is also available online.
src/cabinfo: A program for dumping the raw data fields of Microsoft cabinet files.
src/wince_info: A perl script for dumping the raw data fields of a Windows CE install cabinet header file.
src/wince_rename: A perl script for renaming the files of an extracted Windows CE install cabinet to their true installed names, and extracting the registry entries made by the cabinet.

Using `cabextract`

Enter man cabextract to read the cabextract manual page. Also, running the cabextract command with the --help option gives a brief summary of usage.

In regular usage, just enter cabextract and the name of the cabinet or executable file you want to extract. cabextract will extract all files in all cabinets to the current directory, preserving any internal directory structure, file permissions and file dates. To list files rather than extract them, use the --list option.

cabextract automatically searches files for embedded cabinets, and extracts all of them. If any multi-part cabinets are present, cabextract automatically searches for those parts and links them in. To suppress this behaviour, use the --single option.

cabextract can repair some kinds of corrupt cabinet files. Perhaps a better word for this is "salvage", as the corrupted data is lost forever. Using the --fix option, lost data will be replaced with zeroes, and cabextract will attempt to continue to later data blocks, which are hopefully not corrupt.

You can make cabextract extract files into a specific directory with the --directory option, and you can force extracted filenames to lowercase with the --lowercase opetion. You can control which files are extracted using the --filter option. For example, cabextract --filter '*.wav' music.cab will extract only '.wav' files from music.cab.

Changes since `cabextract` 1.0

A security vulnerability has been fixed. If the files within a cabinet file include "../" in their filenames, this will be changed to "xx/", so cabinets cannot access the parent directory of where you want to extract them.
cabextract should now compile cleanly on AIX and Cygwin

Changes since `cabextract` 0.6

The cabextract source has been refactored and rewritten into a portable, extensible, robust library called libmspack. Now cabextract is just a UNIX-specialised command line application using the OS-agnostic libmspack CAB decompressor. Any developers who were considering using cabextract in their own software should now look at libmspack first.
Many bugs in the decompressers were squashed after refactoring. More than three gigabytes of real CAB files from the wild were used in testing. Corrupt cabinets that crashed cabextract 0.6 are now correctly reported as corrupt, without crashing.
cabextract now alerts you if you try and use it to unpack InstallShield cabinet files. This is the number one FAQ saver.
cabextract no longer gets "/" and "\" mixed up.
cabextract now ignores cabinet files listed on the command line that have already been used as part of a multi-part set. You can now type cabextract *.cab on a big multi-part set, cabextract will not extract all the files several times over.
cabextract now lists files with the same filenames it would use if it were extracting them. This includes always showing UNIX directory separators in the listing.
cabextract now correctly lowercases cabinet files with Unicode filenames.
cabextract has the new --filter, --single and --pipe options.
The cabextract package now includes experimental wince_info and wince_rename scripts.
The definition of Microsoft cabinet files in doc/magic has been fixed, and a definition of Windows CE install cabinet header files has also been added.

Frequently Asked Questions

Q: I can't extract this DATA1.CAB file...
A: There are two different "cabinet" file formats in popular use. Some are Microsoft cabinets, which can be unpacked with cabextract. Others are InstallShield cabinets, which can be unpacked with unshield. You can distinguish the two files like so:

InstallShield cabinets are normally called data1.cab and have a matching data1.hdr file.
InstallShield cabinets begin with the magic ID "ISc(". Microsoft cabinets begin with the magic ID "MSCF".
Unpacking an InstallShield cabinet with cabextract 0.6 gives the error message "not a Microsoft cabinet file". cabextract 1.0 gives the warning "WARNING; found InstallShield header. This is probably an InstallShield file. Use UNSHIELD (http://synce.sf.net) to unpack it." and the error message "no valid cabinets found".

Q: Can I license cabextract for use in my non-GPL software?
A: Yes, you can. Contact me for further details. However, you may prefer to use libmspack, as it has been explicitly designed for reuse.

Q: Where can I get software to create Microsoft cabinet files?
A: There are several options:

You can use Microsoft's own CABARC.EXE. If this is not already provided as part of your Microsoft Windows™ operating system, you can get it by downloading the Microsoft Cabinet SDK.
You can use Rien Croonenborghs' LCAB.
Future releases of libmspack will include a cabinet file creator. It is currently being designed.

Q: Is cabextract a circumvention device, as defined in the DMCA?
A: Perhaps it is, according to Microsoft. The linked article shows Microsoft citing WinRAR and WinZip as circumvention devices, as they allow people to extract this executable cabinet file, which contains a document describing how Microsoft have embraced and extended the Kerberos protocol to prevent interoperability with Unix-based Kerberos servers. The executable gives you a click license to agree to, which includes a Non Disclosure Agreement. Obviously, if you don't run the executable, you will never see the NDA, and will not be bound by it. The irony is that WinRAR and WinZip rely on Microsoft's own CABINET.DLL to do the extraction, so really it is Microsoft's own software acting against them!

Q: Do you hate Microsoft?
A: No. There is nothing wrong with being a big software corporation. What I dislike is Microsoft's illegal abuse of its monopoly to harrass its competitors. I would like to see Microsoft pressured into continuous technical innovation in a competitive marketplace, rather than engage in product dumping to financially cripple its competitors, and locking in its users with incompletely documented and constantly changing file formats, then letting software stagnate once it achieves dominance.

Q: Is reverse engineering illegal?
A: Reverse engineering for interoperability is protected under international copyright law. You do not have the explicit right to copy software, which is why you need to agree to a license that gives you those rights, but you do have the explicit right to reverse engineer software for interoperability purposes. Your right to reverse engineer could only be stopped if you signed a contract agreeing not to reverse engineer the software. In the UK, this must be a fair contract, which is a contract that both parties have the opportunity to amend and agree to. A shrinkwrap license or EULA is not a fair contract.

CAB History

In 1977, Abraham Lempel and Jacob Ziv devised and published a paper on their new compression method, LZ77. In 1982, James Storer and Thomas Szymarski released their LZSS variant. In the early 1980s, Microsoft required some form of data compression for their installation media to cut down on the number of disks needed to install MS-DOS and Microsoft Windows, so they took Haruhiko Okumura's implementation of LZSS. Their compressed files had a SZDD signature.

In 1989, Phil Katz put the deflate method in the public domain. Microsoft started using the algorithm to compress their installation media. The signature changed to KWAJ.

In the early 1990s, various people invented new forms of disk formatting for the IBM PC, increasing the amount of space on a disk despite the PC's inflexible floppy disk controller. Once again, Microsoft products were getting bigger and bigger, so Microsoft took one of these disk formats and called it DMF, or Windows formatted disks.

For most of the early 1990s, Jonathan Forbes had been writing fast versions of LZH archivers on the Amiga. In 1995, he and Tomi Poutanen devised an LZH adaption known as LZX. Its main benefits beyond deflate were a compact way of encoding large match offsets, and ramping up the size of the LZ sliding window. Furthermore, their Amiga implementation included file merging (known as solid archiving in RAR), where file data was grouped into large blocks, instead of files being individually compressed. This file merging technique also appeared in other new archivers around that time. By coincidence, Microsoft devised a new installation media which used file merging! This time, they were cabinet files or CABs. They included two compression methods - MSZIP (aka deflate) and Quantum, a large-window LZ compressor using arithmetic coding, licensed from its author David Stafford.

In 1997, Jonathan Forbes went to work for Microsoft. Soon enough, cabinet files started supporting a modified form of LZX. But finally, Microsoft published an official specification for cabinet files, MSZIP and LZX. They did not detail Quantum, and their LZX specification contained errors to such extent that it was not possible to create a working compressor or decompressor from the specification.

In 2000, Stuart Caie embarked on writing a CAB unpacker for Dirk Stöcker's XAD system. He discovers all of the above, including the LZX specification errors, but eventually comes up with a working LZX extractor. Being a generous devil, and wanting help with the remaining Quantum extractor, he converts his XAD client into a command-line CAB decompressor. In 2002, Matthew Russotto kindly researches and writes the Quantum extractor.

In 2003, Stuart Caie launches a new library designed to support all major Microsoft compression formats, called libmspack.

Credits

cabextract is written primarily by Stuart Caie. The Quantum decompressor was researched and implemented by Matthew Russotto. The original adaption of InfoZip's inflate to MS-ZIP was done by Dirk Stoecker, who also provided lots of support, testing, and cabinet files. The fast Huffman table generator is taken from unlzx by Dave Tritscher.

Thanks to Eric Sharkey for Debian packaging and the original manual page. Thanks to the Ben Collver for NetBSD packaging, and some useful patches. Thanks to Maxim Sovolev for the FreeBSD packaging. Thanks to Siarzhuk Zharski for BeOS packaging. Thanks to Pawel Chwalowski for the Amiga packaging. Thanks to Stefan Dirsch for using cabextract in SuSE. Thanks to Katsumi Saito for the Japanese manual page. Thanks to Soos Peter for the RPM spec file. Thanks to Jae Jung for LZX decompressor fixes. Thanks to Markus Nullmeier for native IRIX compiler support. Thanks to Jonathan Forbes for creating LZX and other Amiga compression tools. Finally, thanks to the many other people who have sent in email, suggestions and code.

カイザー