公眾號展示程式碼會自動折行,建議橫屏閱讀
1. 引言
資料字典(Data Dictionary)中儲存了諸多資料庫的元資料資訊如圖1所示,包括基本Database, table, index, column, function, trigger, procedure,privilege等;以及與儲存引擎相關的元資料,如InnoDB的tablespace, table_id, index_id等。MySQL-8.0在資料字典上進行了諸多最佳化,本文將對其進行逐一介紹。
圖1
2. MySQL-8.0之前的資料字典
俗話說知己知彼,方能百戰不殆。在介紹MySQL-8.0的資料字典前,我們先一起回顧一下MySQL-8.0之前的資料字典。
2.1 Data Dictionary 分佈位置
圖2
如圖2所示,舊的資料字典資訊分佈在server層,mysql庫下的系統表和InnoDB內部系統表三個地方,其中儲存的資訊分別如下所示:
- server層檔案
- .frm files: Table metadata files.
- .par files: Partition definition files. InnoDB stopped using partition definition files in MySQL 5.7 with the introduction of native partitioning support for InnoDB tables.
- .TRN files: Trigger namespace files.
- .TRG files: Trigger parameter files.
- .isl files: InnoDB Symbolic Link files containing the location of file-per-table tablespace files created outside of the data directory.
- .db.opt files: Database configuration files. These files, one per database directory, contained database default character set attributes.
- mysql庫下的系統表
mysql.user mysql.db mysql.proc mysql.event等
show tables in mysql; - InnoDB內部系統表
- SYS_DATAFILES
- SYS_FOREIGN
- SYS_FOREIGN_COLS
- SYS_TABLESPACES
- SYS_VIRTUAL
2.2 存在的問題
- 資料字典分散儲存,維護管理沒有統一介面
- MyISAM系統表易損壞
- DDL沒有原子性,server層與innodb層資料字典容易不一致
- 檔案儲存資料字典擴充套件性不好
- 透過information_schema查詢資料字典時生成臨時表不友好
3. MySQL-8.0的資料字典
鑒於舊資料字典的種種缺點,MySQL-8.0對資料字典進行了較大的改動:把所有的元資料資訊都儲存在InnoDB dictionary table中,並且儲存在單獨的表空間mysql.ibd裡,其架構如圖3所示。下麵逐一介紹各項改變的細節。
圖3
3.1 儲存結構
MySQL下的原有系統表由MyISAM轉為了InnoDB表,沒有了proc和event表,直接改存到了dictionary table中。在debug樣式下,可用如下指令檢視dictionary tables:
SET SESSION debug='+d,skip_dd_table_access_check';select name from mysql.tables where hidden='System' and type='BASE TABLE';
+------------------------------+
| name |
+------------------------------+
| dd_properties |
| innodb_dynamic_metadata |
| innodb_ddl_log |
| catalogs |
| character_sets |
| collations |
| column_statistics |
| column_type_elements |
| columns |
| events |
| foreign_key_column_usage |
| foreign_keys |
| index_column_usage |
| index_partitions |
| index_stats |
| indexes |
| parameter_type_elements |
| parameters |
| resource_groups |
| routines |
| schemata |
| st_spatial_reference_systems |
| table_partition_values |
| table_partitions |
| table_stats |
| tables |
| tablespace_files |
| tablespaces |
| triggers |
| view_routine_usage |
| view_table_usage |
+------------------------------+
31 rows in set (0.01 sec)
3.2 Dictionary Object Cache
資料字典表資訊可以透過全域性的cache進行快取。
show variables like '%definition%';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| schema_definition_cache | 256 |
| stored_program_definition_cache | 256 |
| table_definition_cache | 2000 |
| tablespace_definition_cache | 256 |
+---------------------------------+-------+
- table_definition_cache:儲存表定義
- schema_definition_cache:儲存schema定義
- stored_program_definition_cache:儲存proc和func定義
- tablespace_definition_cache:儲存tablespace定義
另外還有character,collation,event,column_statistics也有cache,不過其大小硬編碼不可配置:
class Shared_dictionary_cache
{
...
Shared_multi_map<Abstract_table> m_abstract_table_map;
Shared_multi_map<Charset> m_charset_map;
Shared_multi_map<Collation> m_collation_map;
Shared_multi_map<Column_statistics> m_column_stat_map;
Shared_multi_map<Event> m_event_map;
Shared_multi_map<Resource_group> m_resource_group_map;
Shared_multi_map<Routine> m_routine_map;
Shared_multi_map<Schema> m_schema_map;
Shared_multi_map<Spatial_reference_system> m_spatial_reference_system_map;
Shared_multi_map<Tablespace> m_tablespace_map;
...
}
3.3 Information_schema
圖4
information_schema的變化如圖4所示,主要包括以下幾個方面:
1. information_schema部分表名變化
Old Name | New Name |
---|---|
INNODB_SYS_COLUMNS | INNODB_COLUMNS |
INNODB_SYS_DATAFILES | INNODB_DATAFILES |
INNODB_SYS_FIELDS | INNODB_FIELDS |
INNODB_SYS_FOREIGN | INNODB_FOREIGN |
INNODB_SYS_FOREIGN_COLS | INNODB_FOREIGN_COLS |
INNODB_SYS_INDEXES | INNODB_INDEXES |
INNODB_SYS_TABLES | INNODB_TABLES |
INNODB_SYS_TABLESPACES | INNODB_TABLESPACES |
INNODB_SYS_TABLESTATS | INNODB_TABLESTATS |
INNODB_SYS_VIRTUAL | INNODB_VIRTUAL |
2. 透過information_schema查詢時不再需要生成臨時表獲取,而是直接從資料字典表獲取
show create table SCHEMATA\G
*************************** 1. row ***************************
View: SCHEMATA
Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`mysql.infoschema`@`localhost` SQL SECURITY DEFINER VIEW `SCHEMATA` AS select `cat`.`name` AS `CATALOG_NAME`,`sch`.`name` AS `SCHEMA_NAME`,`cs`.`name` AS `DEFAULT_CHARACTER_SET_NAME`,`col`.`name` AS `DEFAULT_COLLATION_NAME`,NULL AS `SQL_PATH` from (((`mysql`.`schemata` `sch` join `mysql`.`catalogs` `cat` on((`cat`.`id` = `sch`.`catalog_id`))) join `mysql`.`collations` `col` on((`sch`.`default_collation_id` = `col`.`id`))) join `mysql`.`character_sets` `cs` on((`col`.`character_set_id` = `cs`.`id`))) where can_access_database(`sch`.`name`)
character_set_client: utf8
collation_connection: utf8_general_ci
1 row in set (0.00 sec)
3. 不需要像以前一樣掃描檔案夾獲取資料庫串列,不需要開啟frm檔案獲取表資訊,而是直接從資料字典表獲取
4. information_schema查詢以view的形式展現,更利於最佳化器最佳化查詢
EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';
+----+-------------+-------+------------+--------+--------------------+------------+---------+-------------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------+------------+---------+-------------------------+------+----------+-------------+
| 1 | SIMPLE | cat | NULL | index | PRIMARY | name | 194 | NULL | 1 | 100.00 | Using index |
| 1 | SIMPLE | sch | NULL | eq_ref | PRIMARY,catalog_id | catalog_id | 202 | mysql.cat.id,const | 1 | 100.00 | Using index |
| 1 | SIMPLE | tbl | NULL | eq_ref | schema_id | schema_id | 202 | mysql.sch.id,const | 1 | 100.00 | Using where |
| 1 | SIMPLE | stat | NULL | const | PRIMARY | PRIMARY | 388 | const,const | 1 | 100.00 | Using index |
| 1 | SIMPLE | ts | NULL | eq_ref | PRIMARY | PRIMARY | 8 | mysql.tbl.tablespace_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | col | NULL | eq_ref | PRIMARY | PRIMARY | 8 | mysql.tbl.collation_id | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+--------------------+------------+---------+-------------------------+------+----------+-------------+
6 rows in set, 1 warning (0.00 sec)
3.4 優點
- 去掉server層的元資料檔案,元資料統一儲存到InnoDB資料字典表,易於管理且crash-safe
- 支援原子DDL
- information_schema查詢更高效
4. Serialized Dictionary Information (SDI)
MySQL8.0不僅將元資料資訊儲存在資料字典表中,同時也冗餘儲存了一份在SDI中。對於非InnoDB表,SDI資料在字尾為.sdi的檔案中,而對於innodb,SDI資料則直接儲存與ibd中,如以下例子所示:
create table t1(c1 int) engine=InnoDB;
create table t2(c1 int) engine=MyISAM;
ll test/
-rw-r----- 1 root root 114688 2月 22 17:47 t1.ibd
-rw-r----- 1 root root 1495 2月 22 17:47 t2_337.sdi
-rw-r----- 1 root root 0 2月 22 17:47 t2.MYD
-rw-r----- 1 root root 1024 2月 22 17:47 t2.MYI
select id from mysql.tables where name='t2';
+-----+
| id |
+-----+
| 337 |
+-----+
4.1 非事務表
上述例子中MyISAM表t2的SDI為test/t2_337.sdi,其中337為table_id, t2_337.sdi可以直接開啟,資料是json格式(cat test/t2_337.sdi):
{
"mysqld_version_id": 80015,
"dd_version": 80014,
"sdi_version": 1,
"dd_object_type": "Table",
"dd_object": {
"name": "t2",
"mysql_version_id": 80015,
"created": 20190222094723,
"last_altered": 20190222094723,
"hidden": 1,
"options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=0;stats_auto_recalc=0;stats_sample_pages=0;",
"columns": [{
"name": "c1",
"type": 4,
"is_nullable": true,
"is_zerofill": false,
"is_unsigned": false,
"is_auto_increment": false,
"is_virtual": false,
"hidden": 1,
"ordinal_position": 1,
"char_length": 11,
"numeric_precision": 10,
"numeric_scale": 0,
"numeric_scale_null": false,
"datetime_precision": 0,
"datetime_precision_null": 1,
"has_no_default": false,
"default_value_null": true,
"srs_id_null": true,
"srs_id": 0,
"default_value": "",
"default_value_utf8_null": true,
"default_value_utf8": "",
"default_option": "",
"update_option": "",
"comment": "",
"generation_expression": "",
"generation_expression_utf8": "",
"options": "interval_count=0;",
"se_private_data": "",
"column_key": 1,
"column_type_utf8": "int(11)",
"elements": [],
"collation_id": 255,
"is_explicit_collation": false
}],
"schema_ref": "test",
"se_private_id": 18446744073709551615,
"engine": "MyISAM",
"last_checked_for_upgrade_version_id": 0,
"comment": "",
"se_private_data": "",
"row_format": 1,
"partition_type": 0,
"partition_expression": "",
"partition_expression_utf8": "",
"default_partitioning": 0,
"subpartition_type": 0,
"subpartition_expression": "",
"subpartition_expression_utf8": "",
"default_subpartitioning": 0,
"indexes": [],
"foreign_keys": [],
"partitions": [],
"collation_id": 255
}
}
4.2 InnoDB事務表
上述例子中的InnoDB表t1的SDI則可以透過工具ibd2sdi可以解析出來(ibd2sdi test/t1.ibd):
["ibd2sdi"
,
{
"type": 1,
"id": 336,
"object":
{
"mysqld_version_id": 80015,
"dd_version": 80014,
"sdi_version": 1,
"dd_object_type": "Table",
"dd_object": {
"name": "t1",
"mysql_version_id": 80015,
"created": 20190222094723,
"last_altered": 20190222094723,
"hidden": 1,
"options": "avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=0;stats_auto_recalc=0;stats_sample_pages=0;",
"columns": [
{
"name": "c1",
"type": 4,
"is_nullable": true,
"is_zerofill": false,
"is_unsigned": false,
"is_auto_increment": false,
"is_virtual": false,
"hidden": 1,
"ordinal_position": 1,
"char_length": 11,
"numeric_precision": 10,
"numeric_scale": 0,
"numeric_scale_null": false,
"datetime_precision": 0,
"datetime_precision_null": 1,
"has_no_default": false,
"default_value_null": true,
"srs_id_null": true,
"srs_id": 0,
"default_value": "",
"default_value_utf8_null": true,
"default_value_utf8": "",
"default_option": "",
"update_option": "",
"comment": "",
"generation_expression": "",
"generation_expression_utf8": "",
"options": "interval_count=0;",
"se_private_data": "table_id=1059;",
"column_key": 1,
"column_type_utf8": "int(11)",
"elements": [],
"collation_id": 255,
"is_explicit_collation": false
},
{
"name": "DB_ROW_ID",
"type": 10,
"is_nullable": false,
"is_zerofill": false,
"is_unsigned": false,
"is_auto_increment": false,
"is_virtual": false,
"hidden": 2,
"ordinal_position": 2,
"char_length": 6,
"numeric_precision": 0,
"numeric_scale": 0,
"numeric_scale_null": true,
"datetime_precision": 0,
"datetime_precision_null": 1,
"has_no_default": false,
"default_value_null": true,
"srs_id_null": true,
"srs_id": 0,
"default_value": "",
"default_value_utf8_null": true,
"default_value_utf8": "",
"default_option": "",
"update_option": "",
"comment": "",
"generation_expression": "",
"generation_expression_utf8": "",
"options": "",
"se_private_data": "table_id=1059;",
"column_key": 1,
"column_type_utf8": "",
"elements": [],
"collation_id": 63,
"is_explicit_collation": false
},
{
"name": "DB_TRX_ID",
"type": 10,
"is_nullable": false,
"is_zerofill": false,
"is_unsigned": false,
"is_auto_increment": false,
"is_virtual": false,
"hidden": 2,
"ordinal_position": 3,
"char_length": 6,
"numeric_precision": 0,
"numeric_scale": 0,
"numeric_scale_null": true,
"datetime_precision": 0,
"datetime_precision_null": 1,
"has_no_default": false,
"default_value_null": true,
"srs_id_null": true,
"srs_id": 0,
"default_value": "",
"default_value_utf8_null": true,
"default_value_utf8": "",
"default_option": "",
"update_option": "",
"comment": "",
"generation_expression": "",
"generation_expression_utf8": "",
"options": "",
"se_private_data": "table_id=1059;",
"column_key": 1,
"column_type_utf8": "",
"elements": [],
"collation_id": 63,
"is_explicit_collation": false
},
{
"name": "DB_ROLL_PTR",
"type": 9,
"is_nullable": false,
"is_zerofill": false,
"is_unsigned": false,
"is_auto_increment": false,
"is_virtual": false,
"hidden": 2,
"ordinal_position": 4,
"char_length": 7,
"numeric_precision": 0,
"numeric_scale": 0,
"numeric_scale_null": true,
"datetime_precision": 0,
"datetime_precision_null": 1,
"has_no_default": false,
"default_value_null": true,
"srs_id_null": true,
"srs_id": 0,
"default_value": "",
"default_value_utf8_null": true,
"default_value_utf8": "",
"default_option": "",
"update_option": "",
"comment": "",
"generation_expression": "",
"generation_expression_utf8": "",
"options": "",
"se_private_data": "table_id=1059;",
"column_key": 1,
"column_type_utf8": "",
"elements": [],
"collation_id": 63,
"is_explicit_collation": false
}
],
"schema_ref": "test",
"se_private_id": 1059,
"engine": "InnoDB",
"last_checked_for_upgrade_version_id": 0,
"comment": "",
"se_private_data": "",
"row_format": 2,
"partition_type": 0,
"partition_expression": "",
"partition_expression_utf8": "",
"default_partitioning": 0,
"subpartition_type": 0,
"subpartition_expression": "",
"subpartition_expression_utf8": "",
"default_subpartitioning": 0,
"indexes": [
{
"name": "PRIMARY",
"hidden": true,
"is_generated": false,
"ordinal_position": 1,
"comment": "",
"options": "",
"se_private_data": "id=140;root=4;space_id=2;table_id=1059;trx_id=2569;",
"type": 2,
"algorithm": 2,
"is_algorithm_explicit": false,
"is_visible": true,
"engine": "InnoDB",
"elements": [
{
"ordinal_position": 1,
"length": 4294967295,
"order": 2,
"column_opx": 1
},
{
"ordinal_position": 2,
"length": 4294967295,
"order": 2,
"column_opx": 2
},
{
"ordinal_position": 3,
"length": 4294967295,
"order": 2,
"column_opx": 3
},
{
"ordinal_position": 4,
"length": 4294967295,
"order": 2,
"column_opx": 0
}
],
"tablespace_ref": "test/t1"
}
],
"foreign_keys": [],
"partitions": [],
"collation_id": 255
}
}
}
,
{
"type": 2,
"id": 7,
"object":
{
"mysqld_version_id": 80015,
"dd_version": 80014,
"sdi_version": 1,
"dd_object_type": "Tablespace",
"dd_object": {
"name": "test/t1",
"comment": "",
"options": "",
"se_private_data": "flags=16417;id=2;server_version=80015;space_version=1;state=normal;",
"engine": "InnoDB",
"files": [
{
"ordinal_position": 1,
"filename": "./test/t1.ibd",
"se_private_data": "id=2;"
}
]
}
}
}
]
SDI在ibd中實際是以表(BTree)的形式儲存的。建表時會透過btr_sdi_create_index建立SDI的BTree,同時會向BTree插入table和tablespace的SDI資訊,表的結構如下:
create table SDI_$TABLESPACEID(type int,
id int,
compressed_len int,
uncompressed_len int,
data blob not null,
primary key(type,id));
dd::sdi::store
-->dd::sdi_tablespace::store_tsp_sdi // store tablespace SDI
-->dict_sdi_set
-->ib_sdi_set
-->ib_cursor_insert_row
-->dd::sdi_tablespace::store_tbl_sdi // store table SDI
-->ib_sdi_set
-->ib_sdi_set
-->ib_cursor_insert_row
4.3 其他表空間的SDI
ibd2sdi mysql.ibd,可以檢視所以mysql下的表,包括new dictionary和mysql下的普通表。需要註意的是ibdata1中不存放SDI資訊,使用ibd2sdi解析它會出現以下提示:
- [INFO] ibd2sdi: SDI is empty.
4.4 import
import (import table *.sdi)只支援MyISAM表,InnoDB不支援。由於SDI不包含trigger資訊,所以import也不會匯入trigger資訊,trigger需額外處理。
5. Data Dictionary存取實現
class Storage_adapter {
/**
Drop a dictionary object from persistent storage.
*/
template <typename T>
static bool drop(THD *thd, const T *object);
/**
Store a dictionary object to persistent storage.
*/
template <typename T>
static bool store(THD *thd, T *object);
}
例如create table 會涉及到mysql.tablespaces,mysql.tablespace_files, mysql.tables, mysql.indexes, mysql.columns,mysql.index_column_usage等。create table的過程如圖5所示:
圖5
mysql_create_table
-->mysql_create_table_no_lock
-->create_table_impl
-->rea_create_base_table
-->dd::cache::Dictionary_client::store<:table> // mysql.tables</:table>
-->dd::cache::Storage_adapter::store<:table/>
-->dd::Weak_object_impl::store
-->dd::Raw_new_record::insert // store mysql.tables
-->handler::ha_write_row
-->ha_innobase::write_row
-->dd::Table_impl::store_children
-->dd::Abstract_table_impl::store_children // store mysql.columns
-->dd::Collection<:column>::store_items</:column>
-->dd::Collection<:index>::store_items // store mysql.indexes</:index>
-->dd::Weak_object_impl::store
-->dd::Index_impl::store_children
-->dd::Collection<:index_element>::store_items // store mysql.index_column_usage</:index_element>
-->dd::sdi::store //store table SDI
-->ha_create_table
-->handler::ha_create
-->ha_innobase::create
-->innobase_basic_ddl::create_impl
-->create_table_info_t::create_table_update_global_dd
-->dd_set_autoinck
-->dd_create_implicit_tablespace
-->dd_create_tablespace
-->dd::cache::Dictionary_client::store<:tablespace/>
-->dd::cache::Storage_adapter::store<:tablespace/>
-->dd::Weak_object_impl::store // store mysql.tablespace
-->dd::Tablespace_impl::store_children
-->dd::Collection<:tablespace_file>::store_items</:tablespace_file>
-->dd::Tablespace_file_impl::store // store tablespace_files
-->dd::sdi::store // store tablespace SDI
-->dd::cache::Dictionary_client::update<:table> // 更新innodb引擎相關元資料</:table>
-->dd::cache::Storage_adapter::store<:table/>
-->dd::Weak_object_impl::store
下麵以表t1為例,演示create table在DD中的資料分佈:
create table t1(c1 int primary key, c2 int) engine=innodb;
select s.name as schema_name, t.* from mysql.tables t, mysql.schemata s where s.id=t.schema_id and s.name='test' and t.name='t1'\G
*************************** 1. row ***************************
schema_name: test
id: 374
schema_id: 5
name: t1
type: BASE TABLE
engine: InnoDB
mysql_version_id: 80015
row_format: Dynamic
collation_id: 255
comment:
hidden: Visible
options: avg_row_length=0;key_block_size=0;keys_disabled=0;pack_record=0;stats_auto_recalc=0;stats_sample_pages=0;
se_private_data: NULL
se_private_id: 1096
tablespace_id: NULL
partition_type: NULL
partition_expression: NULL
partition_expression_utf8: NULL
default_partitioning: NULL
subpartition_type: NULL
subpartition_expression: NULL
subpartition_expression_utf8: NULL
default_subpartitioning: NULL
created: 2019-03-12 19:30:46
last_altered: 2019-03-12 19:30:46
view_definition: NULL
view_definition_utf8: NULL
view_check_option: NULL
view_is_updatable: NULL
view_algorithm: NULL
view_security_type: NULL
view_definer: NULL
view_client_collation_id: NULL
view_connection_collation_id: NULL
view_column_names: NULL
last_checked_for_upgrade_version_id: 0
select * from mysql.indexes where table_id= 374\G
*************************** 1. row ***************************
id: 299
table_id: 374
name: PRIMARY
type: PRIMARY
algorithm: BTREE
is_algorithm_explicit: 0
is_visible: 1
is_generated: 0
hidden: 0
ordinal_position: 1
comment:
options: flags=0;
se_private_data: id=177;root=4;space_id=39;table_id=1096;trx_id=9996;
tablespace_id: 43
engine: InnoDB
select id, name, type from mysql.columns where table_id= 374;
+------+-------------+---------------------+
| id | name | type |
+------+-------------+---------------------+
| 4025 | c1 | MYSQL_TYPE_LONG |
| 4026 | c2 | MYSQL_TYPE_LONG |
| 4028 | DB_ROLL_PTR | MYSQL_TYPE_LONGLONG |
| 4027 | DB_TRX_ID | MYSQL_TYPE_INT24 |
+------+-------------+---------------------+
select * from mysql.index_column_usage where index_id=299;
+----------+------------------+-----------+--------+-------+--------+
| index_id | ordinal_position | column_id | length | order | hidden |
+----------+------------------+-----------+--------+-------+--------+
| 299 | 1 | 4025 | 4 | ASC | 0 |
| 299 | 2 | 4027 | NULL | ASC | 1 |
| 299 | 3 | 4028 | NULL | ASC | 1 |
| 299 | 4 | 4026 | NULL | ASC | 1 |
+----------+------------------+-----------+--------+-------+--------+
select * from mysql.tablespaces where name='test/t1';
+----+---------+---------+----------------------------------------------------------------------+---------+--------+
| id | name | options | se_private_data | comment | engine |
+----+---------+---------+----------------------------------------------------------------------+---------+--------+
| 43 | test/t1 | NULL | flags=16417;id=39;server_version=80015;space_version=1;state=normal; | | InnoDB |
+----+---------+---------+----------------------------------------------------------------------+---------+--------+
select * from mysql.tablespace_files where tablespace_id=43;
+---------------+------------------+---------------+-----------------+
| tablespace_id | ordinal_position | file_name | se_private_data |
+---------------+------------------+---------------+-----------------+
| 43 | 1 | ./test/t1.ibd | id=39; |
+---------------+------------------+---------------+-----------------+
select * from mysql.tablespaces a, mysql.tablespace_files b where a.id=b.tablespace_id and a.name='test/t1';
+----+---------+---------+----------------------------------------------------------------------+---------+--------+---------------+------------------+---------------+-----------------+
| id | name | options | se_private_data | comment | engine | tablespace_id | ordinal_position | file_name | se_private_data |
+----+---------+---------+----------------------------------------------------------------------+---------+--------+---------------+------------------+---------------+-----------------+
| 43 | test/t1 | NULL | flags=16417;id=39;server_version=80015;space_version=1;state=normal; | | InnoDB | 43 | 1 | ./test/t1.ibd | id=39; |
+----+---------+---------+------------------------------------------------------
drop table是create table的逆過程,不再具體分析。
6. Initialize
圖6
mysqld –initialize的原始碼流程如圖6所示。具體過程為:
dd::Dictionary_impl::init
-->bootstrap::initialize
-->DDSE_dict_init
-->innobase_ddse_dict_init
-->innobase_init_files //建立mysql.ibd
-->initialize_dictionary
-->create_dd_schema/initialize_dd_properties/create_tables
-->DDSE_dict_recover // 建立mysql/innodb_system tablespace
-->flush_meta_data
-->dd::cache::Storage_adapter::store() //
-->dd::Weak_object_impl::store
-->dd::Table_impl::store_attributes
-->dd::cache::Storage_adapter::core_store // 儲存到Object_registry m_core_registry;
-->dd::sdi::store
-->populate_tables
-->get_dml
-->update_properties
-->update_versions
7. Atomic DDL
7.1 Atomic DDL
定義:DDL所涉及的以下更改操作是原子的,這些更改操作要麼都提交,要麼都回滾。
- data dictionary
- storage engine
- binary log
只有InnoDB engine支援Atomic DDL,以下操作不支援:
- Table-related DDL statements that involve a storage engine other than InnoDB.
- INSTALL PLUGIN and UNINSTALL PLUGIN statements.
- INSTALL COMPONENT and UNINSTALL COMPONENT statements.
- CREATE SERVER, ALTER SERVER, and DROP SERVER statements.
7.2 DDL log
DDL過程中操作DD事物表是原子的,而DDL過程中也會操作檔案,建立和釋放BTree以及修改DD cache,這些操作不是原子的。為了實現atomic DDL, DDL過程中對檔案操作和Btree操作等記錄日誌,這些日誌會記錄到DD表mysql.innodb_ddl_log中。日誌有以下幾個型別:
enum class Log_Type : uint32_t {
/** Smallest log type */
SMALLEST_LOG = 1,
/** Drop an index tree */
FREE_TREE_LOG = 1,
/** Delete a file */
DELETE_SPACE_LOG,
/** Rename a file */
RENAME_SPACE_LOG,
/** Drop the entry in innodb_dynamic_metadata */
DROP_LOG,
/** Rename table in dict cache. */
RENAME_TABLE_LOG,
/** Remove a table from dict cache */
REMOVE_CACHE_LOG,
/** Alter Encrypt a tablespace */
ALTER_ENCRYPT_TABLESPACE_LOG,
/** Biggest log type */
BIGGEST_LOG = ALTER_ENCRYPT_TABLESPACE_LOG
};
mysql.innodb_ddl_log 表結構如下:
CREATE TABLE `innodb_ddl_log` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`thread_id` bigint(20) unsigned NOT NULL,
`type` int(10) unsigned NOT NULL,
`space_id` int(10) unsigned DEFAULT NULL,
`page_no` int(10) unsigned DEFAULT NULL,
`index_id` bigint(20) unsigned DEFAULT NULL,
`table_id` bigint(20) unsigned DEFAULT NULL,
`old_file_path` varchar(512) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`new_file_path` varchar(512) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `thread_id` (`thread_id`);
將DDL分為以下幾個階段, Prepare記錄DDL log,Post-DDL會replay log來提交或回滾DDL操作,同時也並清理DDL log。
Prepare: Create the required objects and write the DDL logs to the mysql.innodb_ddl_log table. The DDL logs define how to roll forward and roll back the DDL operation.
Perform: Perform the DDL operation. For example, perform a create routine for a CREATE TABLE operation.
Commit: Update the data dictionary and commit the data dictionary transaction.
Post-DDL: Replay and remove DDL logs from the mysql.innodb_ddl_log table. To ensure that rollback can be performed safely without introducing inconsistencies, file operations such as renaming or removing data files are performed in this final phase. This phase also removes dynamic metadata from the mysql.innodb_dynamic_metadata data dictionary table for DROP TABLE, TRUNCATE TABLE, and other DDL operations that rebuild the table.
7.3 Atomic DDL Examples
7.3.1 drop table
以drop table為例,drop 過程中會刪除ibd檔案,也會從mysql.innodb_dynamic_metadata 中刪除相應記錄。
在preppare階段只是記錄日誌,沒有真正刪除。如果drop過程成功, innobase_post_ddl才從mysql.innodb_ddl_log中讀取記錄去replay,replay_delete_space_log/replay_drop_log會真正執行刪除, replay完也會清理ddl log;如果drop過程失敗,rollback時mysql.innodb_ddl_log的記錄也回滾了,innobase_post_ddl時沒有記錄需要replay。
mysql_rm_table
-->mysql_rm_table_no_locks
-->drop_base_table
-->ha_delete_table
-—>handler::ha_delete_table
-->ha_innobase::delete_table
-->innobase_basic_ddl::delete_impl
-->row_drop_table_for_mysql
-->Log_DDL::write_drop_log // 記錄刪innodb_dynamic_metadata日誌
-—>Log_DDL::write_delete_space_log // 記錄刪ibd日誌
-->dd::drop_table
-->dd::cache::Dictionary_client::drop<:table/>
-->dd::cache::Storage_adapter::drop<:table/>
-->dd::sdi::drop
-->innobase_post_ddl
-->Log_DDL::post_ddl
-->Log_DDL::replay_by_thread_id
-->Log_DDL::replay
—>Log_DDL::replay_delete_space_log // post-ddl 真正刪除innodb_dynamic_metadata
—>Log_DDL::replay_drop_log // post-ddl 真正刪除ibd
-->delete_by_ids
-->DDL_Log_Table::remove
7.3.2 create table
drop table post_ddl階段執行的redo操作,而create table post ddl執行的是rollback操作。create table prepare階段會真正的建立ibd,BTree,修改DD share cache, 同時記錄相應的log到mysql.innodb_ddl_log中。
create_table
-->Log_DDL::write_delete_space_log
-->Log_DDL::write_remove_cache_log
-->Log_DDL::write_free_tree_log
如果DDL成功commit,在post-DDL階段,DDL log記錄被清理了,不需要replay。如果DDL失敗rollback,在post-DDL階段,DDL log清理操作也回滾了,需要replay, relay會rollback前面的建立ibd,BTree,以及修改DD share cache。
如果create table過程中發生crash, 重啟後會讀取ddl log完成ddl的回滾。
write_delete_space_log
{
// 插入記錄時在內部開啟trx執行
insert_delete_space_log(nullptr, id, thread_id, space_id, file_path,
dict_locked);
// 清理記錄用DDL自身的trx
delete_by_id(trx, id, dict_locked);
}
如果create table過程中發生crash, 重啟後會讀取ddl log完成ddl的回滾。
init_server_components
-->ha_post_recover
-->post_recover_handlerton
-->innobase_post_recover
-->Log_DDL::recover
-->Log_DDL::replay_all
-->Log_DDL::replay
-->replay_delete_space_log/replay_remove_cache_log/replay_free_tree_log
-->delete_by_ids
-->DDL_Log_Table::remove
7.3.3 truncate table
truncate 先rename 為臨時ibd,然後drop臨時ibd,再重建表。rename會記錄ddl log, 參考write_rename_space_log函式,刪除重建也會記錄ddl log, 同前面介紹的create/drop table, 可以做到原子。rollback時透過日誌將臨時ibd重新命名為原ibd,參考replay_rename_space_log函式。
::truncate Sql_cmd_truncate_table::truncate_table
-->ha_create_table
-->handler::ha_create
-->ha_innobase::create
-->ha_innobase::truncate_impl
-->innobase_truncate<:table>::exec</:table>
-->innobase_truncate
-->rename_tablespace . // t1.ibd rename to #sql-ib1084-513656514.ibd
-->fil_rename_tablespace
-->Fil_shard::space_rename
-->Log_DDL::write_rename_space_log
-->innobase_basic_ddl::delete_impl
-->row_drop_table_for_mysql
-->Log_DDL::write_delete_space_log// drop #sql-ib1084-513656514.ibd
-->innobase_basic_ddl::create_impl
-->create_table_info_t::create_table
-->create_table_info_t::create_table_def
-->row_create_table_for_mysql
-->dict_build_table_def
-->dict_build_tablespace_for_table
-->Log_DDL::write_delete_space_log // drop t1.ibd
-->Log_DDL::write_remove_cache_log
-->dict_create_index_tree_in_mem
-->Log_DDL::write_free_tree_log
-->innobase_post_ddl
-->Log_DDL::post_ddl
-->Log_DDL::replay_by_thread_id
-->Log_DDL::replay
-->Log_DDL::replay_delete_space_log // drop #sql-ib1084-513656514.ibd
--> Log_DDL::replay_drop_log
-->delete_by_ids
-->DDL_Log_Table::remove
7.4 Atomic DDL帶來的變化
drop 多表或多使用者時,如果個別失敗,整個DDL都會回滾,且不會記錄binlog;而在MySQL8.0以前, 部分DDL會成功且整個DDL會記錄binlog。
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| t1 |
+----------------+
mysql> drop table t1,tt;
ERROR 1051 (42S02): Unknown table 'test.tt'
mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| t1 |
+----------------+
show binlog events;
+------------------+-----+----------------+-----------+-------------+-----------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----+----------------+-----------+-------------+-----------------------------------------+
| mysql-bin.000001 | 4 | Format_desc | 1 | 124 | Server ver: 8.0.15-debug, Binlog ver: 4 |
| mysql-bin.000001 | 124 | Previous_gtids | 1 | 155 | |
+------------------+-----+----------------+-----------+-------------+-----------------------------------------+
2 rows in set (0.00 sec)
8. Persistent Autoinc
MySQL8.0以前自增值沒有持久化,重啟時透過select MAX(id)的方式獲取當前自增值,這種方式自增值會重覆利用。MySQL8.0開始支援自增值持久化,透過增加redo日誌和Data Dictonary 表mysql.innodb_dynamic_metadata來實現持久化。
每次insert/update更新自增值時會將自增值寫到redo日誌中,參考dict_table_autoinc_log函式,日誌格式如下:
MLOG_TABLE_DYNAMIC_META
id,
version,
PM_TABLE_AUTO_INC,
autoinc
同時dict_table_t增加了新的變數autoinc_persisted, 在每次checkpoint時會將autoinc_persisted儲存到表mysql.innodb_dynamic_metadata中。
dict_table從dictionary cache淘汰時也會將autoinc_persisted持久化到mysql.innodb_dynamic_metadata中。
log_checkpointer
-->log_consider_checkpoint
-->dict_persist_to_dd_table_buffer
-->dict_table_persist_to_dd_table_buffer_low
-->Persisters::write
-->AutoIncPersister::write
-->DDTableBuffer::replace
dict_table從dictionary cache淘汰時也會將autoinc_persisted持久化到mysql.innodb_dynamic_metadata中。
crash重啟時,先從mysql.innodb_dynamic_metadata獲取持久化的自增值,再從redo日誌中讀取最新的自增值, 參考MetadataRecover::parseMetadataLog,並透過MetadataRecover::apply更新到table->autoinc。
dict_table_remove_from_cache_low
-->dict_table_persist_to_dd_table_buffer_low
-->Persisters::write
-->AutoIncPersister::write
-->DDTableBuffer::replace
crash重啟時,先從mysql.innodb_dynamic_metadata獲取持久化的自增值,再從redo日誌中讀取最新的自增值, 參考MetadataRecover::parseMetadataLog,並透過MetadataRecover::apply更新到table->autoinc。
9. Upgrade
MySQL-8.0不支援跨版本升級,只能從5.7升級到8.0,不支援5.5,5.6直接升級到8.0。升級需要註意的問題:
- 原mysql5.7 mysql庫下不能存在dictinary table同名的表
- 不支援老版本(5.6之前)的資料型別decimal,varchar, data/datetime/timestamp, 透過check table xxx for upgrade可以檢測
- non-native 分割槽表不支援
- 不支援5.0之前的trigger,5.0之前的trigger沒有definer
- foreign key constraint name 不能超過64位元組
- view的column不能超過255 chars
- enum 型別不能超過255 chars.
- frm需與InnoDB系統表一致
- 一些空間函式如PointFromText需修改為ST_POINTFROMTEXT
10. 參考資訊
- https://github.com/mysql/mysql-server
- https://dev.mysql.com/doc/refman/8.0/en/data-dictionary.html
- https://dev.mysql.com/doc/refman/8.0/en/system-schema.html
- https://mysqlserverteam.com/upgrading-to-mysql-8-0-here-is-what-you-need-to-know/
- http://mysqlserverteam.com/mysql-8-0-improvements-to-information_schema/
- https://dev.mysql.com/doc/refman/8.0/en/atomic-ddl.html
- https://mysqlserverteam.com/bootstrapping-the-transactional-data-dictionary/
- https://www.slideshare.net/StleDeraas/dd-and-atomic-ddl-pl17-dublin
朋友會在“發現-看一看”看到你“在看”的內容