“If E-mail is your life and you use Entourage then the Database file is the single most important file on your computer.”
One month from this week Microsoft MVPs will be attending the MVP Summit held at Microsoft every 12-18 months. One month from today Mac MVPs will get their opportunity to give feedback directly to the folks who make Office for Mac. This is face-to-face feedback and discussion covering everything from bugs to feature requests.
I’ll be presenting topics for Entourage, including topics related to Exchange. But one of my personal non-Exchange topics will be to ditch the Database. The Database is unwieldy to back up, it’s no longer needed for fast searches and it prevents recovery of any data outside of using the proprietary Microsoft Database Utility.
If you care about protecting your Entourage data then you should care about the Database.
What is the Database?
That’s Database with a capital D.
The Database is the single file that Entourage uses to store your mail, attachments, calendar, contacts, tasks, notes, projects, categories, links and other data that you see in Entourage. If E-mail is your life and you use Entourage then the Database file is the single most important file on your computer.
You can find the Database in its default location ~/Documents/Microsoft User Data/Office 200x Identities/Main Identity/ where “ ~ ” is your home folder. If you don’t see the Main Identity folder then you’ve probably renamed it to something more familiar to you. Inside the Main Identity folder is a the Database file.
This file can be huge for some folks. I’ve seen reports of up to 25GB in size but often it is at least 500MB or more for the typical Entourage user. Select your Database file and choose the Get Info command from the File menu to see the size of your database.
Why is my E-mail and everything else stored in one file?
The simple answer is that it’s a database (lowercase d). Databases are optimized for quick access and fast searching.
Think of a database like a book. A book can have a table of contents so that you can quickly skip to the part of the book you want to read. A book can also have an index so that you can look up specific words that my not have a topic in the table of contents.
Organizing data on your computer is done the same way. The Finder provides you a table of contents in the form of folders such as the Applications folder and the Users folder as well as files such as the Read Me files that come with most applications. It also offers an index in the form of Spotlight so that you can find specific data anywhere on your computer by searching for a key word.
Microsoft created the Database back when hard drives were increasing in size, the amount of data that people were accumulating was growing exponentially and Spotlight didn’t yet exist. It was a solution to a problem.
So what’s wrong with still using the Database?
It doesn’t scale well.
Huh?
That means it doesn’t necessarily work well at all sizes. This is one of my favorite examples:
Consider a the sizes of a flea, an elephant and a planet.A flea can jump 600 times its own height. But if you enlarge it to the size of an elephant then it will crush under its own weight.
An elephant is much sturdier and can carry its weight on four massive legs. But if you enlarge it to the size of a planet then it will collapse into a sphere.
A planet can orbit a star but when reduced to the size of a flea it can create a black hole and absorb the star.
Whereas the Entourage Database was once like a flea, it’s now growing to elephantine sizes. It’s not scaling well and this is a risk to your Entourage data.
What can go wrong?
While the Entourage Database is a robust database, it can get corrupt. Microsoft includes the Microsoft Database Utility to verify and repair problems but it’s not perfect and never will be. It can not fix problems caused by the Mac OS, hardware or other applications.
Even if the Microsoft Database Utility could fix all the problems of the Database, the time to fix it is directly proportional to the size of the Database. That means the larger your database, the longer to fix the problems. And for many folks this could mean hours. And if you’re running low on hard disk space then you may not even be able to repair the Database at all.
The only way to ensure your data is safe is to maintain backups—copies of your data before it was corrupted. Backups are an afterthought to many computer users because they are themselves time-consuming, require additional software, additional hardware and require management. The advantage of backups is never realized until they are actually needed, which is why many folks live on the edge and never back up regularly.
Time Machine is a huge boon to Mac OS X users because it makes backups simple but it can not backup the Entourage Database properly. Time Machine and other backup applications must copy each file to a different location. To do this, the file must be closed. However, the Entourage Database is always open because of the Microsoft Database Daemon.
A daemon is a process on your computer that runs in the background. The Microsoft Database Daemon is constantly using the Entourage Database so that it can remind you of events, create Spotlight caches for searching and monitor the health of the Database itself. Even if Entourage is not running, the daemon is still using it.
If Time Machine or any other backup utility were to copy the Database file while it’s in use then the backup would be corrupted (although your original is still just fine).
What else is possible?
This topic was discussed in the TidBITS Talk mailing list in early January and a lot of pros and cons were raised about whether or not to use a database.
One list member noted that Entourage 2008 contains an important new AppleScript addition to export archives from Entourage. Exporting to a smaller file and then deleting items from the database would be more manageable. However, the Entourage dictionary does not include a Compact command to reduce the size of the Database file. It also requires some knowledge of how to either use AppleScript or implement someone else’s script.
Another idea discussed was to completely eliminate the Database and store all messages as single files like Apple’s Mail application. Mail stores its messages in .mbox files in ~/Library/Application Support/Mail/account name/.
These .mbox files are really folders that can be opened by simply deleting the file extension and double-clicking the folder. Individual messages are stored in a Messages sub-folder. These can be backed up or retrieved one message at a time instead of having to deal with an entire database. While this is easy to implement and improves the speed and reliability of backups, searching can get slower when iterating through more individual files. Additionally, the meta data about your messages—the categories and links—are not preserved unless you store that information in a different file.
Maybe Entourage could store its data in multiple databases, each with its own index. This would reduce the risk of losing all data if the Database gets corrupt, decrease backup/restore times and decrease the amount of time to repair mail stores. While this looks promising, it still means a proprietary set of database files that only Entourage could use.
What’s the best solution?
Ditch the Database!
Again, that’s Database with a capital D.
I’m not a software engineer but the most logical solution I’ve seen so far is to ditch the Database and instead create a hybrid database/file solution. The data (the actual mail messages, contacts, calendar events, etc.) can be stored as individual items with the meta data (categories, links, etc.) stored in a smaller database file. The database can keep track of the location of the data without actually storing the data within itself.
Files can be maintained individually.
- One bad sector on a disk would not be allowed to compromise more than one file (one mail message).
- A full 2GB database file would not need a backup just to preserve a new 4k mail message or the deletion of a mail message or any item.
- The time to incrementally backup and restore would be proportional to the amount of data that has actually changed.
Meta data can be preserved in an smaller, external database.
- Categories, tags, links and other data that I myself associate with my data can be preserved.
- If any meta data is corrupted then my data (mail messages) are not adversely affected.
Security of my data must be top priority.
- The less that changes then the less opportunity for corruption.
- Less data that has to be copied for backups means faster backups and that means my data is secured in minutes and not potentially hours.
Hard drives are getting bigger and people are storing more data. Simple backups are still the most economical means of protecting that data for most users. Microsoft must change its method of storing your Entourage data to better scale with the increasing amount of data you need to maintain. If they don’t change how they store your data then you run the risk of losing all of it because backups are either too unwieldy or you can not back up your data in a timely manner.
Do you find backups for your Entourage Database complicated or do you avoid them? Let Microsoft know your concerns about the security of your data by sending them feedback through their website http://www.microsoft.com/mac/suggestions.mspx?product=entourage.
For additional information about Time Machine and using it to back up your Entourage Database read Diane’s post about Entourage and Time Machine as well as her post for an Alternative method to use Entourage and Time Machine.

Comments (20)
Posted on March 16, 2008 19:17
blm:Hear hear! I've always disliked the monolithic database, for all the reasons listed. And in my experience, the database utility is basically worthless. The only thing it's ever recovered for me is the account settings. I only use Entourage with Exchange, so I could resync all the data, but that takes hours, and none of my other settings, like the columns visible or the sort order, were preserved. I hope someone at Microsoft reads this and takes it to heart.
Posted by blm | March 16, 2008 7:17 PM
Posted on March 17, 2008 09:22
Jochem:About the .mbox format: if jou look at the package contents of The Database (the .rge); you'll find that email is already stored in a .mbox file per folder.
But I agree to every point you made.
Posted by Jochem | March 17, 2008 9:22 AM
Posted on March 17, 2008 11:35
William SmithHi Jochem!
You raise a good point about .rge files being .mbox files. This is a great way to store mail content because it's an open format.
But the Database itself is not a .rge file. Converting it to a format that uses .mbox would be an acceptable solution.
Posted by William Smith
|
March 17, 2008 11:35 AM
Posted on March 17, 2008 11:37
T-Enterprise:Agreed - excellent article. Saved and forwarded.
Posted by T-Enterprise | March 17, 2008 11:37 AM
Posted on March 18, 2008 00:33
Alberto:Easy solution: ditch Entourage altogheter and use Mail & iCal & Address Book ;-)
Posted by Alberto | March 18, 2008 12:33 AM
Posted on March 26, 2008 19:33
Colette:Hi there,
By the looks of things, I could really use your help. Last Thursday, only days after you posted this blog ironically, my Entourage program just... disappeared. I booted up to find everything; my received, sent, "folders on my computer" - which is where I stored EVERYTHING - gone. Opening the program all I got was a prompt asking if I'd like to make Entourage my default email program.
Admittedly, I'm technically illiterate - I've been searching desperately for a way to recover my files. Is there any way you might help steer me in the right direction? You can reach me at cgonzalez@ellex.com.
Please help.
Many thanks,
C
[This is usually error. See Lost My Identity ] Diane Ross
Posted by Colette | March 26, 2008 7:33 PM
Posted on April 11, 2008 03:34
Aaron Marks:I partly agree with the commenter that said to ditch Entourage all together and use Mail, iCal, and Address Book.
I think though that Entourage should truly only be used on a Mac with an Exchange account. I think that the only reason that Microsoft even includes POP/IMAP support in Entourage is just so that people feel like they are getting one more bonus out of their $400 investment into Office 2008.
The whole database backup thing is a moot point if you have an Exchange server, and other the familiarity of Outlook, their should be no reason why someone should need to use Entourage over Apple's counter-parts.
And btw, I'm aware that most of the people that use Entourage are in fact home users who use POP3. I just simply think that those people have made uninformed selections in their PIM software.
Posted by Aaron Marks | April 11, 2008 3:34 AM
Posted on April 11, 2008 05:43
William SmithHi Aaron!
Actually, Exchange connectivity was added last in the second version of Entourage, version X, with increased support starting with Entourage 2004.
Have a look at this:
http://blog.entourage.mvps.org/2007/05/why_did_microsoft_replace_outl.html
Posted by William Smith
|
April 11, 2008 5:43 AM
Posted on April 11, 2008 19:48
TechnoLawyer:If you use Entourage, you should use IMAP email (or Exchange) to minimize the risk of a corrupt database. Also, syncing with Address Book and iCal can also provide some redundancy.
Posted by TechnoLawyer | April 11, 2008 7:48 PM
Posted on April 30, 2008 20:28
typhoid:Here's a thought. If you want to improve the reliability and stability of Entourage databases when connecting to Exchange, how about NOT CACHING EVERYTHING INTO A HUGE DATABASE IN THE FIRST PLACE? And when an Entourage user access someone else's shared mailbox, why does Entourage feel the need to copy the other person's mailbox onto the user's local computer?
Outlook for Windows has the ability to work in online mode, directly off the server. Why can't Entourage work the same way? That would save space on the user's hard drive, eliminate long sync times, and ensure that the user's Exchange client always has the most up to date information.
Entourage needs the ability to work in online-only mode when connecting to Exchange.
Posted by typhoid | April 30, 2008 8:28 PM
Posted on May 19, 2008 13:28
Nik Nesbitt:I can't start Entourage and i think my Database must be buggered somehow. Whenever i try to open it i get the message, "Entourage cannot access your data. To attempt to fix the problem, rebuild your database." When i try to rebuild the database, it takes hours to run through the routine and then it crashes 75% of the way through Step 4 (f 5) saying it "found a error." Then i go back to Step 1 and can't open it. My database is 7.8GB. I am 10,000 miles from home in Kenya and am stuck... Help someone..??
Posted by Nik Nesbitt | May 19, 2008 1:28 PM
Posted on May 20, 2008 15:43
Diane RossThis sounds like you recently upgraded to Entourage 2008 and imported your database. It's possible if you had more space on your drive the Identity would rebuild, but it could just be problems with the imported data. Unless you can get Entourage to open there is no way to salvage your data. Do you have a backup or do you have your Entourage 2004 Identity still available?
If you can get an Identity that will open you can manually move over your data. See this page for how to manually move over your data.
Manually move your data when Import Identity or Rebuild Fails
If you continue to have problems post on the Entourage newsgroup for the quickest help.
How to subscribe to the MSFT newsgroups
Posted by Diane Ross
|
May 20, 2008 3:43 PM
Posted on May 22, 2008 07:51
Jack Kasarjian:I just discovered this the hard way. When trying to fit my home folder onto my new Mac book Air, I moved the database out of the home folder. When I put it back it would not open (13GB) but told me to rebuild it. After 6 hours of rebuilding, I found no data after April 1- about 6 weeks worth. I had Time Machine backups and though they were all different sizes from different dates they all were corrupted and needed to be rebuilt. After spending 3 days rebuilding different ones, I found they all took me back to the same date. Does anyone know any way to recover this data? I am desperate. This is my entire business.
Posted by Jack Kasarjian | May 22, 2008 7:51 AM
Posted on May 22, 2008 11:17
William M. SmithThis is the main reason why the Database idea in Entourage needs revisiting. You can't just simply rely on your Time Machine backups because the Database must be closed (Entourage must be quit) and the Database Daemon needs to be stopped before you can get a reliable backup.
If your backups are corrupt and the Database Daemon can not fix them then you might be able to take the sledgehammer approach and open the files with Microsoft Word. It won't be pretty and you'll have to search for what you're needing but you can at least get at some irreplaceable data. Not sure how well this will work with a 13GB database.
Be sure to read Diane's blog post about Entourage and Time Machine so that you have an idea for how to do this safely. I have a different method where I used SuperDuper! from ShirtPocket Software to back up my Database on a daily basis to a local disk image file. That disk image file then gets copied to my Time Machine backups.
Good luck!
Posted by William M. Smith
|
May 22, 2008 11:17 AM
Posted on May 22, 2008 11:41
TechnoLawyer:I'm sorry to hear about Jack's predicament. Once again, now that Google Apps and probably others offer free IMAP accounts at your own domain name I encourage all small businesses to use IMAP email.
Posted by TechnoLawyer | May 22, 2008 11:41 AM
Posted on May 22, 2008 20:49
Michael Fox:Stupid question!
What exactly is IMAP email?
Can somebody explain in simple terms.
I am a home user who runs his business from Entourage and have to send numerous (pre-formed under signature) estimates to clients.
At this moment in time I am unable to use Office 2008 following the 12.0.1 update and anything that would prevent this in the future would be most welcome.
Thanks
Posted by Michael Fox | May 22, 2008 8:49 PM
Posted on May 23, 2008 05:54
William M. SmithEntourage supports three E-mail protocols: POP, IMAP and Exchange.
POP, by default, will download messages into Entourage (or any other mail program) and then delete them from your mail server. This is done when you have limited server space. It prevents mail from being returned to senders if your account mailbox is full.
IMAP is a synchronization protocol. Rather than delete messages from the server it will leave them there until you delete them from Entourage. Furthermore, if you create mail folders in Entourage then they will appear on the server. If you move mail into those folders then they will be moved on the server. The beauty of IMAP is that you can go to another computer and access the same account and you will see exactly what you have in your IMAP account on your first computer. And as TechnoLawyer pointed out already it's also a backup of your mail that's stored on the server outside of your home or office.
Exchange is similar to IMAP when dealing with mail—it synchronizes mail. Exchange also supports synchronizing your calendar and Contacts as well. This is typically found in corporate or educational environments, however, anyone can have an Exchange account through one of hundreds of hosted Exchange services on the Internet. These are not free but if you need to, for example, use a shared computer in a coffee shop to check E-mail and your schedule then this is possible.
Posted by William M. Smith
|
May 23, 2008 5:54 AM
Posted on May 23, 2008 13:47
Diane RossSimply put: IMAP works by keeping mail on the server. POP works by downloading your mail to your computer. See this page for a comparison of the two protocols. POP/IMAP Accounts
In the future if you are unsure about a term, see the Glossary Page for terms used with Entourage and the Mac OS.
To troubleshoot your problem, it's best to ask on the Entourage newsgroup where it's easier to discuss your problem and more people can contribute to a solution. You have not given enough info here to know what to suggest.
How to subscribe to the MSFT newsgroups
Be sure to give details. How to Report a Problem
See you on the newsgroup!
Posted by Diane Ross
|
May 23, 2008 1:47 PM
Posted on May 24, 2008 05:09
Jack Kasarjian:Thanks, Diane. My problem is not being able to access Entourage so I'm using Mac Mail. I can't use the newsgroup yet because I cant get Entourage to open. Each time I rebuilt the database in the past I only got my e-mails from before March 31. I trashed those databases because they were not complete, now I rebuild and when I re-open, Entourage asks me to rebuild again so I can't even use the incomplete Entourage. I have the updated version of Mac OS 10.5.2 and Office 2008 with the latest update installed. I think from reading the posts that Time Machine did not backup the database properly so my database may not be recoverable at all. I'm still holding out hope.
Posted by Jack Kasarjian | May 24, 2008 5:09 AM
Posted on May 24, 2008 06:54
William M. SmithHi Jack!
You don't need Entourage to view the newsgroups for help. You can use Google Groups or Microsoft's Office for Mac interface. These and other websites all tie into the same newsgroups so you'll find the same messages and answers.
As much as possible, we'd prefer not to turn the comments section of the blog into a forum for asking and answering help questions. Thanks for your understanding. :-)
Posted by William M. Smith
|
May 24, 2008 6:54 AM