TRANSWARP DATA HUB (TDH) is the most deployed one stop Hadoop distribution in China and offers the leading performance to big data analytics applications. It delivers 10X to 100X faster performance than open source Apache Hadoop 2. TDH is applicable for enterprises with different scale of data, say 10GB to 100PB. The incredible running speed can be achieved via 100% in-memory distributed computing, highly efficient indexing, execution optimization and highly fault-tolerant technology. Enterprise customers will no longer worry about integrating different products. TDH is able to accompany with our enterprise customers while their data is scaling up. Customers can easily have their dynamic data expansion without shutting down their system, and avoid common problems caused by migration of MPP or other mixed architecture.

Transwarp Data Hub Contains Four Main Products:

Transwarp Hadoop, Enterprise Edition
Transwarp Inceptor, Distributed In-memory Analysis Engine
Transwarp Hyperbase, Distributed Real-time Processing Engine
Transwarp Stream, Streaming Processing Engine

Core Competencies

1. Transwarp Inceptor

Transwarp Inceptor in-memory analysis engine provides high-speed interactive SQL statistics and R data mining for big data.

Better Performance: 10 to 100 times faster than Hadoop and 2 to 10 times faster than MPP
Greater Support for SQL: Compatible with Oracle PL/SQL and HiveQL syntax
Stronger Analysis Ability: Supports R language,provides more parallel algorithms
BI and Reporting Tools:Supports Tableau,SAP BO, Oracle OBIEE
Highly Expandable:Liner expansion,supports fast expansion from GB to PB
Outstanding Stability:Tested stable version,can work 7×24

2. Transwarp Hyperbase

Transwarp Hyperbase real-time online data processing engine based on Apache HBase,is one of the best choices for enterprises to build their highly concurrent businesses.

Support Different Data Structures: Supports structured, semi-structured and unstructured data
High Speed Processing Ability: Delay only between milliseconds and hundreds milliseconds, extremely high concurrency level.
OLAP and Batch Statistics: Supports high speed OLAP statistics and offline SQL batch processing
Efficient Graph Computation: Provides graphical API and unique efficient graph algorithms

3. Transwarp Stream

Transwarp Stream real-time computing engine based on Spark Streaming that provides strong real-time computational ability

Greater Expressive Ability:Supports DAG computational model
Abundant Outputs:HBase,warning page,real-time display
Various Applications:Sensor network processing,service monitoring,anti-cheating

4. Transwarp Hadoop

Transwarp Hadoop Enterprise Edition has five layers. Different applications are supported by different combinations of components and efficient collaborations within those combinations.

Data Storage Layer: Based on HDFS2.2, supports Erasure Code
Resource Management Layer: Based on YARN, support multiple computing frameworks running concurrently
Computing Layer: Adopts Map/Reduce2 to deal with offline computational tasks
Analysis and Mining Layer: Supports batch SQL statistics, R and Mahout
Data Integrated Layer: Uses Sqoop and Flume to support data migration and collection


1  Interactive Data Analysis & Mining

Data integration from multiple sources becomes more and more popular. When volume of data with different type and different sources is increasing, companies need notlny collecting more data, but also integrate data and use data processing and analysis as a part of decision making. To some companies, the interval between data integration and decision making needs to be very short. Development of memory based data analysis make it possible to process high volume data with high speed. For this kind of application, most of the data is structured data from GB to TB level such as government data, stock transaction, Bank&Insurance, retail trading data

Transwarp Inceptor memory based analysis engine is good for OLAP with high speed such as data statistics, aggregation, prediction based on historical data

2  Real-time NewSQL Database

The driving force of real-time data processing is to make better decisions and take meaningful actions at the right time. “It’s about detecting fraud while someone is swiping a credit card, or triggering an offer while a shopper is standing on a checkout line, or placing an ad on a website while someone is reading a specific article. It’s about combining and analyzing data so you can take the right action, at the right time, and at the right place”. There’re are many data sources for real-time data processing, including streaming data from the web, social media, sensors and operational system. Data arrives fast and high concurrently. This brings challenges to store data online and process data as it arrives, rather than storing the data and retrieving it at some point in the future. The data processing needs to be “pretty fast” and decision to be made “very fast”, within seconds or even milliseconds. This is beyond the capability of batch processing.

Powered by the 100% in-memory distributed computing engine and the high-dimensional indexing technology, TRANSWARP DATA HUB handles massive amounts of data at incredible speeds and totally meets the needs of real-time analytics.

3  Real time stream processing

Streaming computing system is used for data which is generated continuously from front end and other data sources. It’s called real time computing system because it computes the data without data accumulation latency to guarantee the whole process is done in extremely short interval

Transwarp Stream real time streaming engine is based on Spark Streaming which provide powerful streaming expression capability supporting DAG model. It’s better than Hadoop based batch processing system which connects multiple processing stage to make the whole system complicated and inefficient. Transwarp Stream supports Kafka, Flume and is compatible with existing Hadoop ecosystem

4  Offline Data Analysis & Mining

Offline batch data processing techniques are driven by the needs to analyze big data sets at terabyte or even petabyte scale. Typical data sources are semi-structured logging data from web access, CDN, application system and device usage. TRANSWARP DATA HUB helps you analyze access logs to discover new trends or patterns to make business decisions or create new businesses opportunities with data mining technologies. For example, proper web advertising to target audience that ideally matches the ad audience; combining analysis results of access logs with purchase history information to expand the business.

TRANSWARP DATA HUB integrates a wide variety of statistical techniques (classification, clustering, association and probability analysis …) into platform; Together with the lightning fast execution engine, the interactive and iterative query and analysis becomes possible and easy.