Open Source Product Management: How do features get into Zimbra?

Posted in Community, Open Source by Ross Dargahi on the February 28th, 2007

One of the great advantages of being an Open Source company is having a community and customer base that can help guide and shape the evolution of the product. Our community and customers use Zimbra in a wide variety of deployment scenarios from the single individual, to small and medium business, to hosted services and large Internet service providers. Such diversity naturally breeds some excellent ideas around product features and improvements. In many ways we think of our community and customers as product managers whose ideas and experiences can really help improve the Zimbra Collaboration Suite (ZCS). In order to fully leverage such a wonderful resource, it is important to provide the right set of tools and process to easily enable communicating ideas and suggestions to us. It is equally important that folks can track the progress of their suggestions.

We have a product management process and website that we have been using internally for quite some time; today, we are making the process and a version of the website public. Let’s first understand how enhancements make it into the ZCS.

As far as new enhancement requests (RFEs) are concerned, the center of our world is our issue tracking system (currently a modified version of Bugzilla). This system is open to the world and folk are free to add, edit, and browse issues at will. In fact, all RFEs are filed in it, and just about all of them (with the exception of those containing customer specific or sensitive information) are visible to the community. Typically, an RFE consists of a description, input from the community customers and Zimbrians, votes, and code “putback” comments from the engineers working on the feature. In fact, we have an internal website, built around the data in the issue tracking system, that we use to make product decisions - much more on that a little later.

How we decide what ends up in a release:

There are several criteria that we look at when deciding on which enhancements to include in a release. These criteria include:

  • The number of votes an enhancement has received from the community;
  • The number of support cases (i.e. requests from paying customers);
  • The number of potential sales blocked by the RFE;
  • The innovation/strategic impact the RFE may have.

As part of release planning, we gather a list of requirements that have the largest intersection of the above requirements and then make a quick level of effort estimate (a.k.a guess) on how long it would take to get the feature implemented. Based on this we adjust the feature set for the release. At the end of the exercise, we end up with a set of features falling into three buckets:

  • P1 - RFEs that are hard requirements, i.e. those which we are willing to slip the release date in order to get completed.
  • P2 - RFEs are lower priority enhancements that we will attempt to get completed;
  • P3 - RFEs which are the lowest priority.

Generally P2 and P3 enhancements are either low hanging fruit, or features which may get completed for the release, but may slip out of it. I think an important point to keep in mind is that our planning process is both iterative and dynamic. That is, we don’t pretend that we can lock the feature set for a release, we realize that there are many factors that can change - sometimes frequently - causing us to have to regularly re-evaluate what is in the release. Such factors include technical discovery that impacts the time cost of a feature and changing market criteria (i.e. community, existing, and new customers input). Bottom line is that we try our best to hold firm to what we consider the “defining” features for a release, while at the same time realizing that as a company we have to be nimble and flexible.

One of the tools that is essential to our product planning and management is an internal website that we call PMWeb. PMWeb provides key information about past, current, and future releases, as well as important views into the issue tracking system. Almost of all of PMWeb is driven by the data in the issue tracking system. The home page of the site consists of a dashboard that shows:

  • The list of top 20 most voted upon RFEs
  • The list of top 20 RFEs with the most support cases
  • The list of RFEs that are blocking sales
  • The list of top 20 most voted on bugs
  • The list of top 20 bugs with the most support cases
  • The list of bugs that are blocking sales
  • A list of bugs with critical and above severity
  • A summary of past, current and future releases
  • A summary of the number of bugs across priorities and severities
  • Key statistical graphs such as find-to-fix ratio and find-to-fix rates

Every week we review the above data and make sure that key RFEs and bugs are assigned appropriately; we also review new inbound RFEs and execute an initial triage by slotting them in the current or one of the future releases. In addition, PMWeb provides detailed information on each product release and patches to released products such as:

  • The RFEs slated for the release and their status (i.e. whether the RFE is uncommitted, targeted, completed, at risk, or at high risk)
  • The bugs slated for the release
  • The bugs fixed in the release
  • The dates for the release
  • Links to patches and future patches slated for the release (once the release has been… released

To help them in their role of product manager, we thought it would be useful for our community and customers to have access to this information and so we have made a public version of PMWeb available here.

There is obviously a lot more that goes into building a product release (and even patches to existing releases) that could fill many written pages; however this hopefully provides you with an insight into how we do things at Zimbra.




4.5, admins, and backup/restore

Posted in PowerTips - Admins, Zimbra Web Client by Kevin Kluge on the February 21st, 2007

ZCS 4.5 helps make admins’ jobs easier — a lot easier in some cases. This post discusses advanced search in the admin console, backup and restore in the admin console, backup performance improvements, and good policies for creating a recoverable system.

Advanced Search for Users, Domains, Servers
We’ve added an advanced search capability to the admin console. It includes a search builder similar to what has been available in the end user client. We had heard from many customers that they wanted to create complex searches, such as “show me all domains with xyz in them” or “show me all users on server 3″ or “show me all users with last name xyz on server 5″. You can construct all these searches and quite a few more now, and since the search uses LDAP indexes it’s fast. This feature is AJAX helping admins.

Backup and Restore in the Admin Console
For the ZCS 4.5 Network Edition release we’ve also done a lot to help your success with backup and restore. First, we’ve extended the admin console’s ability to manage the backup and restore process. From the admin console you can now review all the backups, both fulls and incrementals, that exist on your system, and whether they ran successfully or not. You can also initiate an immediate backup. As in previous versions you can restore a single account, either to itself (the same account name) or to a new account. You can also choose whether you want to restore to a particular full or to “now”, applying all the data that is available for that account.

For the next major release we’ll add point in time recovery in the admin console. This will enable you to restore any mailbox (or set of mailboxes) to any point in time for which you have backups. For example, you could restore Sally’s mailbox to Sally_restored using data from 3PM last Tuesday, when she knows she had the key message she needs. Note that point in time recovery is already available with the CLI (zmrestore).

Backup Performance Improvements
We’ve also made some big improvements to backup performance on the server in 4.5. The main source of the improvements is an increased ability to make multiple concurrent i/o requests of the system when copying data. After some experimentation we decided to copy with a pool of worker threads, each of which takes responsibility for the serial copy of a particular file. There are, of course, enough files that need to be touched that this provides for as much parallelism as the disk system can handle. If you have i/o bandwidth available with 4.0’s backups you will see a decrease in backup time. Our tests showed solid improvements. We’ve seen initial full backups on systems with 1000 users (3 U320 10K RPM disks) take 40% of the time of 4.0. Subsequent full backups are even faster, taking only 18% of the time required by 4.0 code. Your mileage may vary of course, but we think you’ll like it.

Backup Management and Risks
Speaking of backups, you are running them, aren’t you? I’ve been surprised at the support cases we see where customers have either never set up backups or are doing backups on to the same disk (even the same partition) as their data. Remember that there are bugs in every piece of software — filesystem, drivers, firmware, even ZCS — and you need to protect against them as well as disk failure. Keeping your backup on the same partition as your data leaves you vulnerable, even if you have a RAID backing store.

To check what backup schedule you have, run the following (Note: all commands that follow should be run as the zimbra user):

zmschedulebackup -q

You’ll (hopefully) see something like:

Current Schedule:
f 0 1 * * 0
i 0 1 * * 1-6
d 1m 0 0 * * *

The results show when fulls (f), incrementals (i), and deletions (d) will run using standard crontab syntax. (In fact, this information is pulled from crontab for the zimbra user; cron invokes zmbackup for all these operations.) If you don’t get any output back, you don’t have backups running! The schedule above shows that fulls are run on Sundays at 1 AM and incrementals every other day at 1 AM. It also says that every day at midnight any backups older than 1 month will be purged.

If you don’t have backups configured, or don’t understand what you do have, there is a simple fix:

zmschedulebackup -D

This one command will set the backup schedule and deletion schedule to the default, which is what is shown above. That is all you have to do to make sure you have a reasonable backup schedule! The default schedule should be fine for smaller sites. It will put backups into /opt/zimbra/backup. It’s your job to make sure that is on a different disk and partition than your data.

Custom Schedules
You can use zmschedulebackup if you do need to set up a different schedule. Putting 3 cron lines into a single command line can be a little messy, so you may want to dump the schedule to a file, edit the file, and then copy-paste the desired schedule into the command.

zmschedulebackup -s > /tmp/sched.txt
vi /tmp/sched.txt

sched.txt will have something like

f "0 1 * * 6" i "0 1 * * 0-5" d 1m "0 0 * * *"

The cron timing follows the f/i/d letter. Deletion (d) is a little different — it has the age of backups to preserve between the d and the cron-style time to run. Once in the editor make your desired change. For example, to keep only backups younger than 8 days, change the “1m” to “8d”. Then, copy-paste your file’s contents into a zmschedulebackup -R (R for replace existing schedule):

zmschedulebackup -R  f "0 1 * * 6" i "0 1 * * 0-5" d 8d "0 0 * * *"

You can use the -A (append) option to add more timings to the backup schedule, creating a ruleset that is complicated if needed. There’s a wiki page describing zmschedulebackup if you’d like to learn more.

Disk Layout and Filesystems
As I mentioned earlier we sometimes see systems with all the data, redologs, and backups on one disk or one partition. A full discussion on disk layout would take more than a blog, but in the context of reliability here are a few quick tips:

  • Put your redologs (/opt/zimbra/redolog) on a different disk and partition than your live data (mail store, indexes, and MySQL data). If you don’t do this and you were to lose both the live data and the redolog, the latest time you could restore to is the time of your last backup (full or incremental) that wasn’t also lost. That means data loss. Consider the cases if redologs and the live data are separate. If you lost the live data, then by using backups and redologs you will be able to restore to the point in time of the crash. If you lost just the redologs the server would halt immediately. In that case you will probably need to call support when the server comes back up to check for any MySQL/filesystem inconsistencies, but you should not lose any data.

  • You can put your redologs and backups on the same disk and partition assuming you have some other place you have moved the full backups to. This is not the best for performance, but it’s OK for reliability.

  • We run ext3 in data=ordered mode for the vast majority of our testing. There’s a good article (based on the 2.4 kernel, and courtesy of IBM) on ext3 here. Going forward I would like to see us do more testing in this area, but for now this a safe path with reasonable performance.

We’d like to make backup and restore as easy as possible for admins. If you have an idea for an improvement please either drop us a note or file an enhancement request.




Declared Javascript Functions - Odd Parsing Behavior in IE

Posted in Open Source, Zimbra Web Client by Conrad Daemon on the February 2nd, 2007

A function declared in a conditional block that evaluates to false gets defined in IE, but not in Firefox. That is a problem if you are trying to use an ‘ifdef’ or ‘require_once’ mechanism for defining functions. However, if the function is defined via assignment, the reference is not created, which is what you’d expect.

(more…)




Subscribe


Subscribe by Email



Categories


Archives

  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008