Enterprise data integration needs are growing exponentially over
time, as is the interest in open source technologies and the adoption
of open source solutions.
With this in mind Talend conducted a survey to define the usage
landscape of open source data integration and to profile users of
this technology. The data used in this analysis was collected from
1013 survey participants. Responses came primarily from the U.S.
(56.5%), followed by Europe (35.2%), with the rest of the responses
(8.3%) originating in the rest of the World.



03 octobre 2011
English


Table of Contents Background Diverse Data Integration Projects Data Integration Needs and Tools Open Source Data Integration vs. Proprietary Solutions Enterprise Requirements Community Support Community Involvement Conclusion  
As companies merge, acquire new applications, and build their ITplatforms by incorporating disparate applications with legacysystems, information systems are becoming more and moreheterogeneous. As a result, tda aintegration tools are now indispensable if enterprise IT departments are to properly managethe flows of data across the information system. Page 3 of 13
In addition, alternative models of software deployment—such asSoftware as a Service (SaaS)—and the need for interoperability withpartners, customers, providers, etc., all have an important impacton data integration requirements. The global economy is imposing cost controls on IT Managers, both inData IntegrationThe process of combining data residingt erms of staff and software, at a time when data integrationat different sources and providing theuser with a unified view of these data.represents an increasingly larger percentage of the enterprise ITbudget. Asked to do more with less, IT personnel would be betteroff spending cycles on tasks orththean the time consuming manualscripting needed to meet custom requirements. In fact, softwareresources with lower acquisition and operation costs would allow ITManagers to more easily deploy enterprise-grade solutions. In this context, open source lustoions offer a very compellingargument. Open source tools can automate and maintain tasksformerly requiring manual scriptasn, d the existing skills of the IT implementation team easily transfer to an open source offering. Inaddition, IT departments don’t have to justify significant up-frontfees.
Diverse Data Integration ProjectsData integration is the collectiver tme for technologies that include ETL (Extract-Transform-Load) for business intelligence and datawarehousing, and operation data integration—the flows of dataacross operational applications and systems. These needs can rangefrom high throughput batch transfers of data to near-real-time,trickle-feed data flows. Project TypeConsistent with the global data integration market distribution—whether open source or proprietarmyo—st of the survey participants (61.5%) use open source solutions for their ETL projects, in Page 4 of 13
particular for BI, Data warehousing and analytics. This can beattributed to the fact that ETL tishe most mature segment of theentire data integration market. 
ETLData LoadingOperational Data Integration: BatchMigrationOperational Data Integration: Real TimeDatabase Synchronization
0% 10% 20% 30% 40% 50% 60% 70%  Types of projects for which open sourcedata integration is used  Data LoadingData loading (41.9%) and data migration (26.5%) are the second andTahpep licpartioocne sso r ofd altoaabdaisneg— fodra tae xainm palen  fourth most popular type of project. Both of these are goodiopr r to its deployment.candidates for open source solutsi, oans they are typically one-offs, TDhaet a Mpriogcreastsi onof transferring datawith no ongoing purpose that would justify a long-term investmentobtehtewr eesny stdeamtsa,b awsietsh,  tahpe plipcuartipoonsse  oorf  in an expensive proprietary tool.replacing a system with another. Data synchronization (19.1%) is also a popular type of projectData SynchronizationTchoen sistperonccey ss oonf  ersetamboltise hings oudrcaetas  conducted by open sourcdae ta integration users.  continually harmonizing the data over time.Batch vs. Real-TimeOperational data integration—whether batch or real-time—is also agood fit for open source solutions. As business tempos speed up,real-time and nearly real-time operational data integration projectswill prevail over bulk transfer projects. As of the date of the survey,40% of participants used open source tools to manage their batchoperational data integration tasks, compared to only 22.9% for real-time projects—but the latter is a much faster growing segment. Page 5 of 13
 ETL vs. Operational Data IntegrationTaken together, batch and real-time operational data integrationprojects (62.9%) are slightly better represented than ETL usageshare (61.5%), even though the former market segment is lessmature. And, if we also add in data synchronization, the operationalproject share reaches 82%. The reason for this over-representation issimply that open source tools are particularly appropriate foroperational projects because they meet a number of dataintegration requirements, whereas—traditionally—proprietary toolsfocus on ETL. In addition, enterprises that want to diversify theirdata integration tools are often doisucraged by the licensing costs of proprietary applications. Open usrcoe solutions offer a greaterbreadth of connectivity and more flexibility in terms of adoption,deployment, and maintenance.
Data Integration Needs and ToolsAlthough software companies e artrying to provide unifiedintegration solution packages, tdhaeta integration needs for mostenterprises are so complex thateythoften need to multiply thenumber and nature of the integration software products they use. 
Manual scripting
Database utilities
Commercial software
0% 10% 20% 30% 40% 50% 60%  Data integration technologies usine dc onjunction with open source  
Survey participants proved to use a combination of commercialapplications, open sourcseolutions, and database utilities to meettheir data integration needs. The statistics show that using open source and commercial solutionsin combination is very common (31.2%), and that the two can, anddo, coexist on the same platform. In fact, open source solutions areoften complementary to an existipnrgoprietary solution that—forwhatever reason—cannot address a specific need. In some cases itmay be that it’s not worth the expense of investing in a proprietarysolution extension. The high incidence of database utilities shown in the survey results(53.9%) is as expected—these utilities are a no-cost solution and areusually included with the databases. Their usefulness, however, islimited to dedicated database usage. Applications are often stacked as needs arise—increasingconnectivity issues—whether enterprises want their CRM system tocommunicate with their ERP module, or to have their disparatedatabases exchanging information with their home-grown platform. Faced with multiple connectivity issues, enterprises often have nooption other than manual scripting to keep data flowing across theirheterogeneous enterprise systems. This is why the survey resultsrank manual scripting as one of the technologies most frequentlyinvoked (54.7%) by enterprises mtoeet their integration needs.Although this is much higher than commercial (31.2%) packagedtechnologies, it is not surprising that manual scripting is the solutionof choice as it carries the lowest initial cost. Although manual scripting is oftiennt ended to be a short-term fix for interchange issues, once in production it often becomes apermanent solution. And, in the end, this simple stop-gap can Page 7 of 13
become an entire home-grown platform. The drawback of handcoding or home-grown platforms surfaces over time in the inevitablemaintenance problems that increase the TCO. The advantage,however, is that it fits a particular need that none of the availablecommercial or open source solutions can meet.
Open Source Data Integration vs. Proprietary SolutionsIn an ongoing effort to lower trh ediata integration software TCO, many enterprises are now considering open source solutions, notjust for one-time projects, but a lfsor their ongoing mission-criticalprocesses, to replace or complement their expensive CPU-dependent solutions. 
Ease of usePerformanceAvoid lock-inNo licensing costsSource code access
0% 20% 40% 60% 80% 100%Very important Important Neutral Not important  Decision criteria Open source solutions are a real alternative to the proprietaryworld. Key players have made major strides toward improving theusability and friendliness of open source technologies, traditionally aweak spot for these applications. In just a few short years, open source has evolved from something“geeky” into an enterprise-readsyolution. Today, open sourcesolutions are sufficiently feateu-rich to meet complex userrequirements. The survey results reflect these expectations. Page 8 of 13
Respondents felt most strongly about ease-of-use (59%) andperformance (53.9%) as the most important aspects of an opensource data integration solution. Surprisingly, licensing cost is ntohte  gating criterion for enterprises turning to open source solutio nIts .actually comes fourth after performance, ease of use, and no lock-in (42.5%), with only 42.1% ofrespondents considering it very important. Access to the source code comes last on most priority lists whenenterprises are choosing open source tools. It is a common misconception thcaotntrol of the source code isimportant for users of open source software. Most users todayunderstand that open source solutions are as mature as theirproprietary counterparts and, thefeorre, dont feel the need to enhance the code themselves. Today, open source solutions are advantageously replacing thesource code escrow of proprietary software. However, fewenterprises want to allocate in-house resources (or even have theexpertise) to edit, enhance, anmdaintain their data integrationapplications code.
Enterprise RequirementsAn analysis of the survey data indicates that users expect the sameperformance and enterprise-scale features from open sourcesolutions that they previously fod uonnly in proprieatry products. In order of importance tehse features include:  centralized scheduling and execution dashboard shared repository administration tools
70%60%50%40%30%20% Scheduling tool10% DashboardShared repository0% Administration tool  Enterprise open source data integration requirements First, 60.5% of respondents want a scheduling tool that lets themconsolidate and centralize theirc thenical processes. Second, 57.8% users need a dashboard to centrally monitor processes as theyexecute. Because enterprise users often work in teams and need toshare data on large-scale projects, 54.9% consider a sharedrepository essential. Finally, 38.4% of enterprise users want anadministration tool to centrally manage users and projects. However, not all companies have enterprise-scale requirements.Single users and SMBs might not need that sort of enterprise-gradefeature. What emerges is that o pseonurce solutions address diverse needs for a variety of user profiles, whether large or small.
Community SupportAs shown, enterprises want the same support with open sourcesolutions that commercial aipcpaltions provide. The major difference lies in the fact that a significant number of open sourceusers (84.9%) would rather call on the community for helpaddressing issues than get support from a dedicated service. Thislets them reduce the cost of support and decrease their dataintegration budget; the return they get from the community is Page 10 of 13
comparable in quality to traditional support from a proprietaryvendor. 
Community support (forums, etc.)Email-based or Web-based supportGuaranteed response timesPhone support
0% 20% 40% 60% 80% 100%  Community vs. commercial support expectations Open source users value the forum and the other community tools attheir disposal, as well as the ease-of-mind that comes from knowingthat there is no pressure to upgrade or to buy new tools. Thecommunity also tends to be morse proensive than traditional support services and community tools are no-cost to the enterprise. However, enterprise users working on mission critical projects, doneed (and demand) vendor-provided, enterprise-grade technicalsupport. This still represents a mritnyo of the total number of users of open source data integration (20., 9b%u)t is a fast growing proportion.
Community InvolvementTwo-thirds of the respondents sayatththey are willing to activelyparticipate in the community, and nearly half are ready to helpbeta-test open source productOs.p en source communities have a real, live QA lab of thousands at their disposal. Open source usersappreciate getting support from the community and feel at ease insharing their experiences and helping other users solve problems.Getting involved in the communeitnysures the sustainability of the
