Skip to content

HIVE-29241: Distinguish the default location of the database by catalog#6267

Open
zhangbutao wants to merge 2 commits intoapache:masterfrom
zhangbutao:HIVE-29241_LOCATION
Open

HIVE-29241: Distinguish the default location of the database by catalog#6267
zhangbutao wants to merge 2 commits intoapache:masterfrom
zhangbutao:HIVE-29241_LOCATION

Conversation

@zhangbutao
Copy link
Copy Markdown
Contributor

@zhangbutao zhangbutao commented Jan 14, 2026

What changes were proposed in this pull request?

The main objectives of this PR are:

  1. Allow location to be omitted when creating a catalog. At the same time, it is necessary to determine catalogs that do not require a location, such as JDBC catalogs.

  2. For newly created catalogs, the location of their databases will be determined by the two newly added parameters: metastore.warehouse.catalog.dir and metastore.warehouse.catalog.external.dir. This helps better ensure that the location of the default Hive catalog's databases and tables remains unaffected.

For example, if metastore.warehouse.catalog.dir is hdfs://ns1/testdir, then the location for a newly created catalog named testcat would be hdfs://ns1/testdir/testcat. Consequently, the default path for a database like testdb created under this catalog would be hdfs://ns1/testdir/testcat/testdb.

  1. Modify the semantics of catalog creation. The type parameter must be specified when creating a catalog to distinguish its type. Based on this type, it will be determined whether the catalog's databases and tables require a location, or whether the catalog type supports creating databases and tables.
    CREATE CATALOG test_cat COMMENT 'Hive test catalog' PROPERTIES('type'='hive');

BTW, While working on this PR, some tasks need to be carried forward. For these follow-up tasks, I have created sub-tickets. You can also search for the TODO catalog comments in the catalog for more details on these follow-up tasks.
HIVE-29564
HIVE-29562
HIVE-29561
HIVE-29278

Why are the changes needed?

The paths of databases and tables created under a non-default catalog may interfere with those under the default Hive catalog. We need to introduce parameters to distinguish the paths for databases and tables under non-default catalogs, ensuring that the paths of the default Hive catalog remain unaffected.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces catalog-aware warehouse paths for Hive metastore, allowing databases and tables under non-default catalogs to have distinct storage locations. The changes ensure that catalog locations are properly isolated while maintaining backward compatibility for the default Hive catalog.

Changes:

  • Added new configuration parameters (WAREHOUSE_CATALOG and WAREHOUSE_CATALOG_EXTERNAL) for catalog-specific warehouse directories
  • Modified catalog creation to require a 'type' parameter and made location optional with automatic defaults
  • Updated database path resolution throughout the codebase to use catalog-aware logic via new Warehouse methods that accept catalog names

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
MetastoreConf.java Added WAREHOUSE_CATALOG and WAREHOUSE_CATALOG_EXTERNAL configuration variables
HiveConf.java Added corresponding Hive configuration variables for catalog warehouses
Warehouse.java Enhanced path resolution methods to accept catalog names; added deprecated markers for old methods
CatalogUtil.java New utility class for catalog type validation (NATIVE, ICEBERG)
CreateCatalogAnalyzer.java Updated to validate catalog type property and make location optional
CreateCatalogOperation.java Modified to generate default location from WAREHOUSE_CATALOG config
CreateDatabaseAnalyzer.java Added validation to check if catalog supports database creation
CreateDatabaseOperation.java Updated path building logic to include catalog names for non-default catalogs
DDLUtils.java Added helper method to check catalog support for database creation
HiveParser.g Modified grammar to make catalog location optional and properties required
EximUtil.java Updated to use catalog-aware warehouse paths for replication
Multiple test files Updated test queries to include required 'type' property in CREATE CATALOG statements
HMSHandler.java Updated default catalog and database creation to use catalog-aware paths
MetastoreDefaultTransformer.java Updated to pass Database object instead of just name
Migration/Authorization files Updated to use new Database-based path methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 28 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhangbutao
Copy link
Copy Markdown
Contributor Author

One thing to note: Currently, we have two entry points for creating catalogs—one is the SQL-based catalog creation entry involved in this PR, and the other is the HMS API-based catalog creation entry. In this PR, I intend to enforce the inclusion of the type attribute when users create catalogs via SQL, so that users can clearly distinguish between different types of catalogs. However, the HMS API side does not enforce the type attribute when creating catalogs, which would lead to inconsistency between the two entry points.
Therefore, I have two approaches to resolve this issue and ensure consistency between the two entry points:

  1. On the HMS server side, specifically in the HMSHandler, enforce a check during catalog creation to verify whether the user has provided the type attribute. If the type is missing, return an error. This approach would introduce backward compatibility issues for catalog creation.
  2. Modify the current PR to make the type attribute optional for SQL-based catalog creation. For both SQL-based catalog creation and HMS API-based catalog creation, if the user does not provide the type attribute, assign a default value (e.g., native) on the HMSHandler server side. This approach would not cause compatibility issues with the catalog creation syntax. However, other open-source components enforce the type attribute requirement, and the type helps users better understand the parameter requirements for the catalog, among other benefits.

What are your suggestions regarding these two approaches? Thanks in advance. cc @deniskuzZ @difin

@zhangbutao i would vote for option 2, fallback to default - native

Changes have been made to keep the type optional when creating a catalog. Thanks for the advice.

@zhangbutao
Copy link
Copy Markdown
Contributor Author

hi @zhangbutao, is the PR ready for 2nd round of review and includes the changes we discussed here?

Let's go :)

// Create a fake fs root for local fs
Path localFsRoot = new Path(path, "localfs");
warehousePath = new Path(localFsRoot, "warehouse");
warehouseCatPath = new Path(localFsRoot, "catalog");
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the catalog be under the warehousePath?

Warehouse Root
  └── Catalog (default or named)
        └── Database (.db)
              └── Table

/warehouse/<catalog_name>/<database_name>.db/<table_name>/   # non default catalog
/warehouse/hive/<database_name>.db/<table_name>/             # default catalog
/warehouse/hive/default.db/<table_name>/                     # default catalog & database

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the catalog be under the warehousePath?

No. The warehousePath is the path of the default catalog, under which the paths of Hive databases reside. If the catalog path is placed under the warehousePath, it will cause confusion with the path information of Hive databases under the default catalog.

The path information for the default catalog's databases is controlled by the property metastore.warehouse.dir, which defaults to /user/hive/warehouse. The path for non-default catalogs is controlled by the new added propertymetastore.warehouse.catalog.dir, which defaults to /user/hive/catalog/warehouse.

/user/hive/catalog/warehouse/<catalog_name>/<database_name>.db/<table_name>/     # non default catalog
/user/hive/warehouse/<database_name>.db/<table_name>/            				 # default catalog
/user/hive/warehouse/default.db/<table_name>/                     				 # default catalog & database


// Initialize props if null, and set default type to native if not specified
if (props == null) {
props = new HashMap<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we do this in the var initialization?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Changed.

}

private static void checkCatalogType(Map<String, String> props) throws SemanticException {
String catalogType = props.get("type");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we please create constant for that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Create constan TYPE in CatalogUtil

}
// Normalize and validate catalog type (fail fast on invalid input)
try {
props.put("type", CatalogUtil.normalizeCatalogType(catalogType));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify by handling isNullOrEmpty in normalizeCatalogType

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. fixed.

@Override
public int execute() throws Exception {
Catalog catalog = new Catalog(desc.getName(), desc.getLocationUri());
String catLocationUri = Optional.ofNullable(desc.getLocationUri())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. i think cat prefix in local var is redundant here since we are inside CreateCatalog class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i didn't get you here. Could you please describe it in detail? Thank you.

@@ -51,10 +52,9 @@ public CreateDatabaseAnalyzer(QueryState queryState) throws SemanticException {
@Override
public void analyzeInternal(ASTNode root) throws SemanticException {
Pair<String, String> catDbNamePair = DDLUtils.getCatDbNamePair((ASTNode) root.getChild(0));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about record CatalogDb(String catalog, String database) {}
please see #6379 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a Pair here is already simple and clear enough, there's no need to use the record CatalogDb.

}

private void makeLocationQualified(Database database) throws HiveException {
String catalogName = database.getCatalogName().toLowerCase();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the names be normalized here: special chars/encoding lowercase?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage, the database is being created, and the prerequisite is that a valid catalog already exists — there is no need to double validate the catalog here.

private void makeLocationQualified(Database database) throws HiveException {
String catalogName = database.getCatalogName().toLowerCase();
String dbName = database.getName().toLowerCase();
boolean isDefaultCatalog = Warehouse.DEFAULT_CATALOG_NAME.equalsIgnoreCase(catalogName);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving to utils

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added a util method:
HiveUtils.isDefaultCatalog(String catName, Configuration conf)


// managed warehouse root
String whManagedLocatoion = MetastoreConf.getVar(conf,
isDefaultCatalog ? MetastoreConf.ConfVars.WAREHOUSE
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see #6267 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replied in #6267 (comment)

if (db == null) {
LOG.warn("Database ({}) for query history table hasn't been found, auto-creating one", QUERY_HISTORY_DB_NAME);
String location = getDatabaseLocation(QUERY_HISTORY_DB_NAME);
db = new Database();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a bit weird creating 2 Database objects in same method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Path root = null;
try {
initWh();
// TODO catalog. Need to determine auth root path based on catalog name.
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add the ticket to the EPIC
cc @saihemanth-cloudera, as he is the most experienced in this area

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create ticket HIVE-29562

set hive.support.concurrency=true;

CREATE CATALOG testcat LOCATION '/tmp/testcat' COMMENT 'Hive test catalog';
CREATE CATALOG testcat LOCATION '/tmp/testcat' COMMENT 'Hive test catalog' PROPERTIES('type'='native');
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is redundant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

-- CREATE with comment
CREATE CATALOG test_cat LOCATION 'hdfs:///tmp/test_cat' COMMENT 'Hive test catalog';
-- CREATE with comment and default location
CREATE CATALOG test_cat COMMENT 'Hive test catalog' PROPERTIES('type'='NATIVE');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same 'type'='NATIVE' is redundant

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.
I also left a few DDL statements with 'type' = 'hive' to indicate that this parameter can be omitted for the Hive internal catalog.


-- CREATE INE already exists
CREATE CATALOG IF NOT EXISTS test_cat LOCATION 'hdfs:///tmp/test_cat';
CREATE CATALOG IF NOT EXISTS test_cat LOCATION 'hdfs:///tmp/test_cat' PROPERTIES('type'='native');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests for external catalog?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this PR is specifically targeting the internal/native catalog. If we were to implement a new external catalog, such as a JDBC catalog, I believe it would still require a lot of additional preliminary work(such as #6252), so we can consider doing it later.


public class CatalogUtil {
// Catalog type constants
public static final String NATIVE = "native";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for Strings, please use enum values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

FileSystem cmFs = cmroot.getFileSystem(conf);
cmFs.setPermission(cmroot, new FsPermission("770"));
try {
// TODO catalog. The Repl function is currently only available for the default catalog path ConfVars.WAREHOUSE.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add the ticket to the EPIC?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. See HIVE-29278

}

whRootString = getRequiredVar(conf, ConfVars.WAREHOUSE);
whCatRootString = getRequiredVar(conf, ConfVars.WAREHOUSE_CATALOG);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see the comment on default FS structure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already repied above.

return getWhRoot();
}
return new Path(getWhRoot(), dbName.toLowerCase() + DATABASE_WAREHOUSE_SUFFIX);
Database db = new Database();
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using Database only as a DTO object? instead you could use a record CatalogDb as mentioned earlier

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should work well. Regarding the changes made here, could you help me point out the more prominent advantages of using record CatalogDb?
Thank you.

if (configUri != null && !configUri.isEmpty()) {
properties.put("uri", configUri);
}
// TODO catalog. Consider adding new created native catalog warehouse(MetastoreConf.ConfVars.WAREHOUSE_CATALOG) later.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please elaborate, i don't quite get what you are suggesting here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. This just maybe a question. I created a ticket HIVE-28879 for this.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants