#schema-on-read (Hadoop)
1. Load the data
``` Hadoop
hdfs dfs -copyFromLocal bleh/name.txt
/user/hadoop/customer
```
2. Query the data
``` Hadoop
hadoop jar Hadoop-streaming.jar
-mapper customer-mapper.py
-reducer customer-reducer.py
-input /user/hadoop/*.txt
-output /user/hadoop/output/query1
```
In Hadoop (non-SQL), the data's structure is interpreted as it's read, in this case by a #python script.
#schema-on-write (SQL)
1. Create Schema
```SQL
CREATE TABLE Customers (
Key int, Name varchar(40),...
);
```
2. Add Data
``` SQL
BULK INSERT Customers
FROM '.../name.txt'
WITH FIELDTERMINATOR = '","';
```
3. Query Data
``` SQL
SELECT Key, Name FROM Customers;
```
In SQL can't add data until the table's schema has been declared.
If the data changes and needs to be redefined, what are the implications of dropping and re-loading 500TB of data?