Relational DB In ONE slide! * ¾More than 30 Y (1970) : maturity! ¾Theoretical & Practical aspects (DBMS) st ¾Domains, R⊆D1 x D2 x .... Dn, Algebra,Predicate Logic1 Order ¾Languages : SQL (wins), QUEL, QBE (see MS Access)R x S R∪S ¾DBMS Prototypes (1975), Products (1980) R S ¾A MAJOR improvement in DB : provide data independence R[α] & a simple, tabular VIEW of data. R :ϕ ¾Relations are implemented as Tables, Views & Snapshots (NO “relation” in SQL) R * S
Integration Data Documents Services
Active Spatial Temporal Multimedia Object & OR Deductive
¾Mid80s : a new approach ? ¾Basic ideas : PL + DB, ObjectOriented Languages, Type extensibility, dynamic ¾Two “clans” : relational DB vs. OO DB ¾Now : integration and ObjectRelational DB ¾The best of the two worlds ;)
add_town
¾Complex Objects 9“relational was too flat” ¾Operations : ADT (UDT, UDF) 9“relational was too static” ¾SemiStructured Data Models 9Conciliate Documents and Data 9Deal with the Web & the XML Galaxy 9Schema less data
2000
WEB –based DBMS XML
2007
2005
* Not possible for any other proposed data model ! M.Adiba, DBTA, Berne, 03/03
Relational (T.Codd)
OLTP OLCP
Files
3
Complex Objects, Object Identity, Encapsulation, Types or Classes, Inheritance, Late Binding, Extensibility
Persistence, Storage managt, Security, Transactions, Liability, Concurrency Ad hocQuery.
MOV
T
Y > 1990
5! Possible Joins Access Paths Intra vs. Inter // ================= MOV & MD H.partitioned / 3 servers MOV=M1∪M2∪M3 M3 replicated M31, M32
¾What vs. How : Data Independence ¾Non procedural, declarative languagesThe Web ¾SQL, QUEL, QBE SEQUEL (76), ANSI/ISOSQL (86) , SQL (89), SQL2 (92), SQL3 (99, 03),…? ¾SQL is not only a query language Structured ¾SQL is not a programming language Databases ¾Object Query Language OQL & OODB Pull vs. Push ? ¾XQueryThe WEB & Continuous Query http://www.w3.org/XML/Query Location dep. Query
Distributed Query Optimization Towns where movies made after 1990 staring S.Stone ? MOV(T,C,Y,D) MD(T,AC) PR(T,TN,RN,W,NB) MT(TN,TOWN,PN) RMT(TN,RN,NB)
Active DBMS
ACID Transactions Transaction execution ¾Atomicity : all or nothing Case 2 ¾Consistency Case 1 ¾Isolation EifailureEf ¾Durability OLTP & OLCP, TP Monitors Benchmarks :www.tpc.org &Standards
S2
User view: writing transactions is a software engineering problem System view: deal with Integrity, Concurrency & Recovery
Systems & Architectures partition and/or replication
S1
time
Generated vs. Handcrafted Triggers (New programming paradigm for application development or system tool)
¾Very large spectrum of DBMS from smartcard to (parallell) DB Machine ¾HW Influences: machine architecture (P, M, D), networks, data storage units, shared disk, shared nothing ¾SW Influences: O.S. & Middleware
¾DB : locally stored data about remote physical objects ¾Dataspace Data is inherently dispersed & connected Data lives on the physical object Data is stored with the physical object Data is another characteristic of the object (like weight, color) ¾Objects in dataspace produce and store their own data
Virtual organisation
Data Integration Systems
Web Search
¾Mobility & location Mgt for millions of mobile terminals ¾Wireless networks & small devices ¾Sensors : road, environmental, pollution, vending machines, home sensor, light detectors, body sensors ¾Sensors produce and/or store data ¾Sensors can also be mobile
Dealing with TIME & Calendard Time : Discrete vs Continuous Time Line : Left bound (Big Bang), Right bound (Big Crunch) ? Querying historical data SQL extensions: SQL2, TempSQL, TQUEL, TSQL2 A lot of work in the past
History = {(ti,vi)} Interpretation ? At time ti, the object value is vi Separate history : construction (write) & exploitation (read) « appendonly » DB semantic: adding tuples (ti, vi) without changing existing ones (what about correcting the past?) Reading semantics is not obvious.
¾SQL extensionsSpatial Objects and ¾Spatial Operations, Relationships ¾Spatial Querying : designate an area on the screen ¾Graphical output ¾Spatial Criteria
Location Dependent Queries (GPS) 1. Find the cheapest hotel in Paris 2. Find the cheapest and nearest hotel 3. Find the cheapest and nearest hotel / where I’ll be in one hour
1 : DB classical query 2 : necessary to locate the user (mobile or not mobile) and different results when time goes on… 3 : mobile user : where he/she is? What is the trajectory? Querying the future…
Remarks : Ti granularity ? Interpretation : in [ti, ti+1[ value is vi (between 30 & 39, value is 65) Same values for different timestamps : meaning what? Here we note a regular increase of ti But time goes on….. Data streams ?
LDQ : Localisation of Mobile Objects Mobile Units : vehicles, mobile phone, PC, PDA, … Mobile Users: localisation? Independently of the unit he/she used Mobile Code : agent Localisation, two extremes: – Look everywhere – Store the location every where Several solutions in this spectrum
CQ Execution FOREVER DO Execute query Q Return results to user Sleep for some period of time ENLOOP
¾Attention : result depends when Q is executed (eg, every hour)
¾At each execution, new results are produced and added to the previous one.
¾Inefficient execution if Q is re executed completely each time.
Continuous & Monotone
τ ∞ Set := FOREVER DO Set t := current time τ Execute queries QM(t) and QM( ) τ Return QM(t) QM( ) to user τ Set := t Sleep for some period of time ENLOOP
Some are naturally monotone, others can be converted and some are not Attention « appendonly » DB Select * From T where T.A= 3 Selections, joins, are in general Monotone
Monotone Query (2)
Select * From T where T.BαGetdate() Ifα< : monotone True
False T.B Select * From T where T.BαGetdate() IfαNON monotone> : False
Applications generating data streams –Network monitoring and traffic management –Call detail records in telecommunications –Transactions in retail chains, ATM operations in banks –Log records generated by Web Servers –Sensor network data,RFID tags –Financial applications –Manufacturing processes Characteristics of these applications – Large volume of data (+ terabytes) – Records coming with a high rate (with or without timestamp) – Data may come from mobile or ambient units
Problems : patterns findings, queries, real time statistical analysis on data streams
Unlimited sequence of data or events Very often associated to a temporal attribute –Timestamp (generated + or automatically) –Logical time (application dependent) Continuously generated events –“pushed” or “pulled” from producers to consumers CQ execution on one or several data streams
Window SELECT S.city, AVG(temp) Clause FROM SOME_STREAM S [range by ‘5 seconds’ slide by ‘5 seconds’] WHERE S.state = ‘California’ GROUP BY S.city Window “I want to look at 5 “I want a result tuple seconds worth of data” every 5 seconds” Data Stream … Result Tuple(s)Result Tuple(s)
Idea: A small random sample S of the data often well-represents all the data – For a fast approx answer, apply “modified” query to S – Example: select agg from R where R.e is odd (n=12) Data stream: 9 3 5 2 7 1 6 5 8 4 9 1
Sample S: 9 5 1 8
answer: 5 – If agg is avg, return average of odd elements in S – If agg is count, return average over all elements e in S of n if e is odd answer: 12*3/4 =9 0 if e is even
Unbiased:For expressions involving count, sum, avg: the estimator is unbiased, i.e., the expected value of the answer is the actual answer Garofalakis, Gehrke,Rastogi,SIGMOD’02 #57
DSMS : problems
Relations : set of tuples or sequences? DB updates? Append only ? Continuous and Snapshot Queries ? Exact or approximate result ? One pass query evaluation ? Access plan fixed or adaptive ? Limited resources (ex. memory) Real time processing of data streams
L.Golab, T.Ozsu, Issues in Data Stream Management, Sigmod Record, June 2003
– STREAM(Stanford) – Aurora& Borealis (Brown, Brandeis, MIT) – TelegraphCQ & TinyDB(Berkeley) Cooperation for a “stream system benchmark” Others : Hancock, Gigascope (AT&T) , NILE (Purdue), StreamMill (UCLA), etc. A large number of papers, tutorials, some books, A very active area Impossible to have a detailed view: choice
Mobile DB & Transactions Transactions :new models (ACID?) – Mobile Transaction : a transaction where at least one MU is involved in the execution Products: PointBase, Navajo de Poet, Oracle Lite, DB2 Every Place, Sybase iAnywhere, SQL Server CE Research: Clustering, Twotier replication, Promotion, Reporting, Semanticsbased, Kangaroo transactions, MDSTP, Moflex transactions… P.SERRANO, C.RONCANCIO, M.ADIBA, “A survey of Mobile Transactions” International Journal on Distributed &Parallel Databases 16 (2): 193230, September 2004 ,
Recovery – Network partitioning is frequent – Disconnection is not (always) a failure – More logging
¾Frequent disconnections ¾Variable bandwidth (55kbs or 100Mbs) ¾Communication cost maybe high ¾MU have “limited” capabilities Batteries Computing power Secondary storage
Course Content
¾Moving Objects DB & Location Dependent Queries ¾Stream Data Management: general introduction & OLAP/Data Mining reminder Sensor DB: tinyDBMS & Tiny SQL STREAM Project (Stanford) AURORA, BOREALIS & StreamBase (Product) Stream Management : other issues ¾Mobile Transactions